PDF Text Extract
Extract text from pdfs that contain searchable pdf text. The module is wrapper that calls the pdftotext
command to perform the actual extraction
Installation
npm install --save pdf-text-extract
You will need the pdftotext
binary available on your path. There are packages available for many different operating systems
See https://github.com/nisaacson/pdf-extract#osx for how to install the pdftotext
command
Usage
As a module
extract(filePath, [options], [pdftotextcommand], callback)
Options and pdftotextcommand are not required.
var path = var filePath = pathvar extract =
The output will be an array of where each entry is a page of text. If you want just a string of all pages you can set the option to splitPages: false
.
var filePath = pathvar extract =
You can set the following options:
firstPage
: First page to extractlastPage
: Last page to extractresolution
: in dpi, as is specified by pdftotext -rcrop
: Should be an object { x:x, y:y, w:w, h:h }layout
: Should be eitherlayout
,raw
orhtmlmeta
. Default:layout
encoding
: Should be eitherUCS-2
,ASCII7
,Latin1
,UTF-8
,ZapfDingbats
orSymbol
. Default:UTF-8
eol
: End of line convention. One of either:unix
,dos
ormac
ownerPassword
: Owner password (for encrypted files)userPassword
: User password (for encrypted files)splitPages
: If true, the result will be and array of pages. Default: true.
If needed you can pass an optional arguments to the extract function. These will be passed to the child_process.spawn
call.
var filePath = pathvar extract = var options = cwd: "./"
You can also override the command for pdftotext
if it is installed in a location that is not available in the PATH
environment variable
var filePath = pathvar pdfToTextCommand = '/opt/bin/pdftotext'var extract = var options = cwd: "./"
As a command line tool
npm install -g pdf-text-extract
Execute with the filePath as an argument. Output will be json-formatted array of pages
pdf-text-extract ./test/data/multipage.pdf# outputs # ['<page 1 content...>', '<page 2 content...>']
Test
# install dev dependencies npm install# run tests npm test