This package has been deprecated

Author message:

Package no longer supported. Contact Support at https://www.npmjs.com/support for more info.

redis-web-crawler

1.0.0 • Public • Published

Web Crawler with Redis Graph

version downloads

Read the blog post.

Web Crawler built with NodeJS. Fetch site data from a given URL and recursively follow links across the web.

Search the sites with either breadth first search, or depth first search.

Every URL will be saved to a Graph (using an adjacency list). The Graph is stored with Redis.

Installation

npm install --save redis-web-crawler

Usage

Run a local redis server to store output: $ redis-server

Create a new crawler instance and pass in a configuration object. Call the run method to begin crawling.

  import WebCrawler from 'redis-web-crawler';

  const crawlerSettings = {
    startUrl: 'https://en.wikipedia.org/wiki/Main_Page',
    followInternalLinks: false,
    searchDepthLimit: null,
    searchAlgorithm: "breadthFirstSearch",
  }

  var crawler = new WebCrawler(crawlerSettings);
  crawler.run();

Configuration Properties

Name Type Description
startUrl string A valid URL off a page with links.
followInternalLinks boolean Toggle searching through internal site links
searchDepthLimit integer Set a limit on the recursive URL requests
searchAlgorithm string "breadthFirstSearch" or "depthFirstSearch"

Exporting the Redis Graph

  • clone the Redis Dump Repo
  • run commands to install gem dependencies (refer to redis-dump/README)
  • with redis server up and running:
    • note the slave and port of the redis-server (e.g. 6371)
    • in project root folder, run ./bin/redis-dump -u 127.0.0.1:6371 > db_full.json
    • view the Redis export in db_full.json

spencerlepine.com  ·  GitHub @spencerlepine  ·  Twitter @spencerlepine

Package Sidebar

Install

npm i redis-web-crawler

Weekly Downloads

4

Version

1.0.0

License

MIT

Unpacked Size

9.72 kB

Total Files

9

Last publish

Collaborators

  • spencerlepine