Questions tagged [extract]
Questions related to retrieving specific information from a (typically minimally structured) data source, such as a web site, media file, source code collection or compressed archive (in which case the desired information is one or more original, uncompressed files). When using this tag, please include additional tags to clarify which specific environment/language/scenario your question refers to.
7,896
questions
1143
votes
13
answers
1.3m
views
How to get first N number of elements from an array
I am working with Javascript(ES6) /FaceBook react and trying to get the first 3 elements of an array that varies in size. I would like do the equivalent of Linq take(n).
In my Jsx file I have the ...
676
votes
11
answers
334k
views
The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe
R provides two different methods for accessing the elements of a list or data.frame: [] and [[]].
What is the difference between the two, and when should I use one over the other?
325
votes
20
answers
1.1m
views
How do you extract a column from a multi-dimensional array?
Does anybody know how to extract a column from a multi-dimensional array in Python?
267
votes
15
answers
683k
views
How to extract all values from a dictionary in Python?
I have a dictionary d = {1:-0.3246, 2:-0.9185, 3:-3985, ...}.
How do I extract all of the values of d into a list l?
232
votes
2
answers
89k
views
How do I move a Git branch out into its own repository?
I have a branch that I'd like to move into a separate Git repository, and ideally keep that branch's history in the process. So far I've been looking at git filter-branch, but I can't make out ...
209
votes
7
answers
327k
views
How can I extract the folder path from file path in Python?
I would like to get just the folder path from the full path to a file.
For example T:\Data\DBDesign\DBDesign_93_v141b.mdb and I would like to get just T:\Data\DBDesign (excluding the \...
207
votes
7
answers
234k
views
Accessing last x characters of a string in Bash
I found out that with ${string:0:3} one can access the first 3 characters of a string. Is there a equivalently easy method to access the last three characters?
188
votes
15
answers
307k
views
How to extract text from a PDF? [closed]
Can anyone recommend a library/API for extracting the text and images from a PDF?
We need to be able to get at text that is contained in pre-known regions of the document, so the API will need to give ...
187
votes
9
answers
388k
views
Extract first item of each sublist in Python
I'm wondering what is the best way to extract the first item of each sublist in a list of lists and append it to a new list. So if I have:
lst = [[a,b,c], [1,2,3], [x,y,z]]
And, I want to pull out a, ...
183
votes
17
answers
257k
views
How to get the first word of a sentence in PHP?
I want to extract the first word of a variable from a string. For example, take this input:
<?php $myvalue = 'Test me more'; ?>
The resultant output should be Test, which is the first word of ...
173
votes
17
answers
260k
views
How to extract one column of a csv file
If I have a csv file, is there a quick bash way to print out the contents of only any single column? It is safe to assume that each row has the same number of columns, but each column's content would ...
166
votes
15
answers
413k
views
Javascript - How to extract filename from a file input control
When a user selects a file in a web page I want to be able to extract just the filename.
I did try str.search function but it seems to fail when the file name is something like this: c:\uploads\ilike....
161
votes
7
answers
363k
views
How to extract a floating number from a string [duplicate]
I have a number of strings similar to Current Level: 13.4 db. and I would like to extract just the floating point number. I say floating and not decimal as it's sometimes whole. Can RegEx do this or ...
141
votes
5
answers
295k
views
Get string after character [duplicate]
I have a string that looks like this:
GenFiltEff=7.092200e-01
Using bash, I would like to just get the number after the = character. Is there a way to do this?
118
votes
23
answers
198k
views
Extract images from PDF without resampling, in python?
How might one extract all images from a pdf document, at native resolution and format? (Meaning extract tiff as tiff, jpeg as jpeg, etc. and without resampling). Layout is unimportant, I don't care ...
106
votes
4
answers
29k
views
What algorithm does Readability use for extracting text from URLs?
For a while, I've been trying to find a way of intelligently extracting the "relevant" text from a URL by eliminating the text related to ads and all the other clutter.After several months of ...
94
votes
6
answers
232k
views
Read Content from Files which are inside Zip file
I am trying to create a simple java program which reads and extracts the content from the file(s) inside zip file. Zip file contains 3 files (txt, pdf, docx). I need to read the contents of all these ...
92
votes
8
answers
159k
views
Extract and delete all .gz in a directory- Linux [closed]
I have a directory. It has about 500K .gz files.
How can I extract all .gz in that directory and delete the .gz files?
88
votes
9
answers
527k
views
how to extract only the year from the date in sql server 2008?
In sql server 2008, how to extract only the year from the date.
In DB I have a column for date, from that I need to extract the year.
Is there any function for that?
87
votes
5
answers
220k
views
pandas extract year from datetime: df['year'] = df['date'].year is not working
I import a dataframe via read_csv, but for some reason can't extract the year or month from the series df['date'], trying that gives AttributeError: 'Series' object has no attribute 'year':
date ...
84
votes
8
answers
125k
views
How to parse the Manifest.mbdb file in an iOS 4.0 iTunes Backup
In iOS 4.0 Apple has redesigned the backup process.
iTunes used to store a list of filenames associated with backup files in the Manifest.plist file, but in iOS 4.0 it has moved this information to ...
83
votes
7
answers
49k
views
Extracting an information from web page by machine learning
I would like to extract a specific type of information from web pages in Python. Let's say postal address. It has thousands of forms, but still, it is somehow recognizable. As there is a large number ...
75
votes
5
answers
77k
views
Extract files from zip without keeping the structure using python ZipFile?
I try to extract all files from .zip containing subfolders in one folder. I want all the files from subfolders extract in only one folder without keeping the original structure. At the moment, I ...
71
votes
8
answers
94k
views
How to extract top-level domain name (TLD) from URL
how would you extract the domain name from a URL, excluding any subdomains?
My initial simplistic attempt was:
'.'.join(urlparse.urlparse(url).netloc.split('.')[-2:])
This works for http://www.foo....
70
votes
11
answers
180k
views
Extract the text out of HTML string using JavaScript
I am trying to get the inner text of HTML string, using a JS function(the string is passed as an argument). Here is the code:
function extractContent(value) {
var content_holder = "";
for ...
67
votes
8
answers
293k
views
Extract MSI from EXE
I want to extract the MSI of an EXE setup to publish over a network.
For example, using Universal Extractor, but it doesn't work for Java Runtime Environment.
61
votes
6
answers
89k
views
How to extract just plain text from .doc & .docx files? [closed]
Anyone know of anything they can recommend in order to extract just the plain text from a .doc or .docx?
I've found this - wondered if there were any other suggestions?
56
votes
4
answers
214k
views
Java: export to an .jar file in eclipse
I'm trying to export a program in Eclipse to a jar file.
In my project I have added some pictures and PDF:s. When I'm exporting to jar file, it seems that only the main has been compiled and ...
55
votes
6
answers
15k
views
How to extract one file with commit history from a Git repo with index-filter & co?
I have a Git repo converted from SVN to Mercurial to Git, and I wanted to extract just one source file. I also had weird characters like aÌ (an encoding mismatch corrupted Unicode ä) and spaces in the ...
51
votes
9
answers
151k
views
Get min and max value in PHP Array
I have an array like this:
array (0 =>
array (
'id' => '20110209172713',
'Date' => '2011-02-09',
'Weight' => '200',
),
1 =>
array (
'id' => '20110209172747',
'Date' => '...
47
votes
17
answers
31k
views
What is so wrong with extract()?
I was recently reading this thread, on some of the worst PHP practices.
In the second answer there is a mini discussion on the use of extract(), and im just wondering what all the huff is about.
I ...
45
votes
3
answers
128k
views
Extract string before "|" [duplicate]
I have a data set wherein a column looks like this:
ABC|DEF|GHI,
ABCD|EFG|HIJK,
ABCDE|FGHI|JKL,
DEF|GHIJ|KLM,
GHI|JKLM|NO|PQRS,
BCDE|FGHI|JKL
.... and so on
I need to extract the ...
42
votes
3
answers
105k
views
Extract string from between quotations
I want to extract information from user-inputted text. Imagine I input the following:
SetVariables "a" "b" "c"
How would I extract information between the first set of quotations? Then the second? ...
42
votes
6
answers
44k
views
Extract .xip file into a folder from command line?
Apple occasionally uses a proprietary XIP file format, particularly when distributing versions of Xcode. It is an analog to zip, but is signed, allowing it to verified on the receiving system. When a ...
41
votes
12
answers
29k
views
Extracting information from PDFs of research papers [closed]
I need a mechanism for extracting bibliographic metadata from PDF documents, to save people entering it by hand or cut-and-pasting it.
At the very least, the title and abstract. The list of authors ...
41
votes
7
answers
110k
views
How to extract data from a PDF file while keeping track of its structure?
My objective is to extract the text and images from a PDF file while parsing its structure. The scope for parsing the structure is not exhaustive; I only need to be able to identify headings and ...
40
votes
3
answers
53k
views
C# regex pattern to extract urls from given string - not full html urls but bare links as well
I need a regex which will do the following
Extract all strings which starts with http://
Extract all strings which starts with www.
So i need to extract these 2.
For example there is this given ...
39
votes
3
answers
71k
views
JAR - extracting specific files
I have .class and .java files in JAR archive. Is there any way to extract only .java files from it?
I've tried this command but it doesn't work:
jar xf jar-file.jar *.java
38
votes
7
answers
83k
views
Extract digits from string - Google spreadsheet
In Google spreadsheets, I need a formula to extract all digits (0 to 9) contained into an arbitrary string, that might contain any possible character and put them into a single cell.
Examples (Input -...
38
votes
3
answers
180k
views
Use binwalk to extract all files
I have a file music.mp3. After using binwalk, I get the result:
pexea12@DESMICE:~/Downloads$ binwalk music.mp3
DECIMAL HEXADECIMAL DESCRIPTION
----------------------------------------------...
37
votes
6
answers
75k
views
How do you extract a url from a string using python?
For example:
string = "This is a link http://www.google.com"
How could I extract 'http://www.google.com' ?
(Each link will be of the same format i.e 'http://')
37
votes
5
answers
96k
views
How to extract metadata from an image using python?
How can I extract metadata from an image using Python?
36
votes
3
answers
53k
views
Extract a ZIP file programmatically by DotNetZip library?
I have a function that get a ZIP file and extract it to a directory
(I use DotNetZip library.)
public void ExtractFileToDirectory(string zipFileName, string outputDirectory)
{
ZipFile zip = ...
36
votes
2
answers
12k
views
Extract part of a git repository?
Assume my git repository has the following structure:
/.git
/Project
/Project/SubProject-0
/Project/SubProject-1
/Project/SubProject-2
and the repository has quite some commits. Now one of the ...
35
votes
3
answers
70k
views
How can I untar a tar.bz file in unix?
I've found tons of pages saying how to untar tar.bz2 files, but how would one untar a tar.bz file?
34
votes
3
answers
25k
views
Java library for keywords extraction from input text [closed]
I'm looking for a Java library to extract keywords from a block of text.
The process should be as follows:
stop word cleaning -> stemming -> searching for keywords based on English linguistics ...
33
votes
6
answers
88k
views
Extracting text from PDFs in C# [closed]
Pretty simply, I need to rip text out of multiple PDFs (quite a lot actually) in order to analyse the contents before sticking it in an SQL database.
I've found some pretty sketchy free C# libraries ...
32
votes
11
answers
61k
views
How can I extract multiple 7z files in folder at once in Ubuntu?
How can I extract about 900 7z files which are all located in the same folder (all have only one file inside) without doing it one by one?
I am using Ubuntu 10.10. All files are located in /home/...
31
votes
3
answers
54k
views
Extract first word from a column and insert into new column [duplicate]
I have a dataframe below and want to extract the first word and insert it into a new column
Dataframe1:
COL1
Nick K Jones
Dave G Barros
Matt H Smith
Convert it to this:
Dataframe2:
COL1 ...
30
votes
4
answers
60k
views
Regular expressions C# - is it possible to extract matches while matching?
Say, I have a string that I need to verify the correct format of; e.g. RR1234566-001 (2 letters, 7 digits, dash, 1 or more digits). I use something like:
Regex regex = new Regex(patternString)...