xstruct
Set of tools for structured data extraction from web.
Installation
npm i xstruct --save
Example
Example of how easy it is to extract, for example, comments from dou.ua forum.
var $ = ; return $ ;
Description
getHtml(url[, qs][, encoding])
Returns promise with downloaded and cheerio-wrapped HTML (optionally, if encoding is specified, document will be converted before passing it to cheerio). If qs (query string object) is specified, query string will be appended to url.
getJson(url[, qs])
Returns promise with downloaded and parsed JSON. If qs (query string object) is specified, query string will be appended to url.
postForm(url, form)
Returns promise with result of form posting. Activates cookie persistence.
request(options)
Promised version of request.js
root function.
wrapHtml(cheerioElement)
Calls cheerio(cheerioElement)
and returns result synchronously.
format
Alias for util.format
.
cleanText(obj, path[, options])
Takes text from object using path and cleans it by removing heading and trailing spaces, removing space and period repetitions, converting to single-line text if options.singleline
is specified, and also removing any characters from ones specified via options.remove
(if specified). Returns null if result is empty string or nothing.
cleanNumber(obj, path)
Acts like cleanText
, but casts result to number in the end. If result is not-a-number, returns null.
cleanDateTime(obj, path[, options])
Acts like cleanText
, but casts result to date in the end (using moment.js). If result is not a valid date, returns null. You can optionally specify date-time format via options.format
.
cleanObject(obj)
Returns object as is or null if all its properties do not have value.
_.*
Exposes all functions from lodash
.
limit(requests, period)
Limits library to do at most requests
number of HTTP-requests per period
in milliseconds.
Building blocks
This library is built with heavy usage of request
, cheerio
, lodash
and bluebird
. Also it uses iconv-lite
, moment
and util
as additional utils.
License
MIT