electron-dataminer

1.3.1 • Public • Published

electron-dataminer

The aim of electron-dataminer is to extract data from specific websites using reusable modules and to centralize the configuration and the source code as much as possible.

With electron there's 3 levels of scripting:

  1. main process: The node-electron javascript application

  2. renderer process: The web page loaded in the electron browser window (index.html)

  3. webview process: The webview(s) added to index.html according to the configuration file (config.js)

Each script can communicate with other ones using ipc events.

You can define event handlers at those three levels, either directly in the configuration

file, or (preferred) in a separate module.

You will specify the module used to navigate in a given website with the config.webviews[name].pageClass property (see config.js below)

You will specify the module used for data extraction with the config.webviews[name].api property (see config.js below)

Example:

config.js:

var config={

  // Show chrome developer tools for main window
  devTools: true,

  // Configure the main browserWindow
  // You can hide the main browser window with options show: false,
  // but to run "headless" you need something like Xvfb or Xvnc
  browserWindow: {
//  show: false,
    width: '100%',
    height: '100%'
  },

  url: `file://${__dirname}/index.html`,

  // Configure webviews
  webviews: {

    // The webview DOM element id
    webview1:  {

      // you can set optional webview attributes below eg:
      attr: {
        partition: 'persist:webview1'
      },

      // The pageClass module name for this webview
      // (will be stored in config.pageClass['my-page'])
      pageClass: 'my-page',

      /*
       For example if you declare:
         pageClass: 'mypage',
       then electron-dataminer will try to load:
         1. A module named __dirname/page/mypage.js (see electron-dataminer/test/page/my-page.js)
         2. A module named electron-dataminer-mypage (see npm package electron-dataminer-mypage in electron-dataminer/test/page/)
         3. A module named mypage (figure it out)
       the module should export a function returning the module exports (see page/my-page.js below)

       The same rules apply for the "my-api" module declared below
      */

      // The api module name for this webview
      // (will be stored in config.api['my-api'])
      api: 'my-api',

      // The url to load in the webview
      // (Can be overriden by the pageClass or api module with the value
      // returned by optional function <module>.renderer.loadURL())
      url: 'http://fsf.org',
      /*
       When the url above is loaded in the webview, the webview process will send
       a 'processPage' event to the renderer process which can be
       handled in the pageClass or/and api module (module.exports.ipcEvents.processPage)
       Code specific to a page class may override code specific to the type of data to mine,
       so the event handler for the page is called first.
      */

      devTools: true,

      // webcontents.loadURL_options
      loadURL_options: {
        // see https://github.com/electron/electron/blob/master/docs/api/web-contents.md#webcontentsloadurlurl-options
      }

      // You can add here any other option specific to this webview to be used
      // by the pageClass or the api modules

    }
  },

  api: {
    // api modules specified in the webview options will be stored here.
    // You could require (but should not define) apis here
    // eg with:
    //  'my-api': require('electron-dataminer-test')(global.electron,config,'my-api','api')

  },

  pageClass: {
    // pageClass modules specified in the webview options will be stored here.
    // You could require (but should not define) page classes here
  }

}

module.exports=config;

page/my-page.js:

// This module should be specific to the webpage(s) you want to mine
module.exports=function(electron){

  return {

    // electron.app related configuration (optional)
    app: {

      // electron.app event handlers
      events: {
        'browser-window-created': function(e,window){
          window.setMenu(null);
        }
      }  

    },

    // main process related configuration (optional)
    main: {

      // will be run from main.js (main process) at init time
      init: function myPage_main_init(config){
        var someVariable='initialized';
      },

      // electron.ipcMain event handlers for the main process
      // that you will probably trigger from the renderer process
      // using electron.ipcRenderer
      ipcEvents: {

        ping: function ping(event,options){
          console.log('received: ping');

          // electron and config are passed to require('section.js') in main.js
          // and available here
          var electron=options.electron;
          var config=options.config;

          // Arguments passed from the renderer in the
          // electron.ipcRenderer.send('ping',...) call below
          // are received in options.args

          if (options.args[0]=='hello') {
            console.log('hello');
          }
          console.log(someVariable);
          setTimeout(function(){
            global.mainWindow.webContents.send('nextPage');
          },5000);
        }
      }
    },

    // renderer process related configuration (optional)
    renderer: {

      // will be run from renderer.js (renderer process) at init time
      init: function myPage_renderer_init(config){
      },

      // ipc event handlers for renderer process
      ipcEvents: {

        // processPage is emitted by renderer.js when it receive
        // 'document-ready' (emitted from webview.js on jQuery document ready)
        'processPage': function renderer_processPage(){
          // trigger a 'ping' event for the main process
          console.log('send: ping');
          electron.ipcRenderer.send('ping','hello');
        },

        'nextPage': function renderer_nextPage(){
          console.log('nextPage');
        }
      }
    },

    // webview process related configuration (optional)
    webview: {
      // same format than "renderer" and "main" above
    }
  };
}


api/my-api.js:

// This module should be specific to the data you want to mine
// (could be from one or many page classes / webviews / electron instances)
// The format is the same than for the pageClass module above.
module.exports=function(electron){
  return {

  };
}

To Use

To clone and run this repository you'll need the following installed on your computer:

  • Git

  • Node.js (which comes with npm)

    I recommend using nvm to install/upgrade nodejs along with installed npm packages without hassle and so that "global" packages can be installed at the user level without administrator privileges.

    Also, after installing nodejs, install npm v3 with npm install -g npm for deeper better and slower dependencies checking (put progress=false in ~/.npmrc for less slower operation)

  • electron-prebuilt

    Install electron globally with npm install -g electron-prebuilt.

    IMPORTANT: never use sudo to install npm packages since package installation scripts (and their dependencies) can run any code with the permissions of the current user. Paranoid users should even use a secondary user account or a virtual machine for working with node and npm.

Test

For developing or testing purpose you can clone the electron-dataminer project and add a subdirectory for your configuration, eg ./test. In this directory you can put your config.js (see test/config.js or example/config.js) and add your pageClass or api modules (scripts) in folders ./test/api/ and ./test/page/

Then you can run your code from the project root folder with eg: npm start ./test/config.js

To run the test from the command line:

# Clone this repository 
git clone https://github.com/alsenet-labs/electron-dataminer
# Go into the repository 
cd electron-dataminer
# Install dependencies and run the test app 
cd test && npm install && bower install && cd ..
npm i && bower i && npm start test/config.js

You can also use the test directory structure for standalone project if you add electron-dataminer as a dependency in package.json with npm install --save electron-dataminer

Example

Look at the example package structure if you intend to develop or share reusable pageClass or api modules for electron-dataminer.

Instead of using the ./api and ./page directories to store pageClass and api modules like in the test directory , you have to create separate npm packages for each pageClass or api modules (eg: electron-dataminer-example-page or electron-dataminer-example-api)

At this time you can specify only one module of each class per webview, but a module could require another one(s)

cd example
npm i && bower i && npm start

Quickstart

You can either use the example or test directories as starting point or begin a new project with eg:

mkdir newProject
cd newProject
npm init .
npm install --save electron-dataminer
echo "var edm=require('electron-dataminer');" > index.js

Then create an index.html with a div#webview <div id="webviews" />

And require jquery and electron-dataminer/renderer.js, eg:

<script>
window.jQuery=window.$=require('./bower_components/jquery/dist/jquery.js');
require(path.join(process.cwd(),'node_modules','electron-dataminer','renderer.js'));
</script>

Finally you can write your configuration file eg config.js then test your application with npm start [<path to config.js>]

Learn more about electron-dataminer usage in test/config.js and test/page/my-page.js

Learn more about Electron and its API in the documentation.

License AGPLv3

Package Sidebar

Install

npm i electron-dataminer

Weekly Downloads

2

Version

1.3.1

License

AGPL-3.0

Last publish

Collaborators

  • bugdanov