Raivo Laanemets. Software consultant.

Dirty database for PhantomJS


Some of my freelance projects include web scraping with PhantomJS. The data from the scrapper has to be stored somewhere and in many cases there is not that much data to justify building of special application to pipe the data into an SQLite, a MySQL or a MongoDB database.

For NodeJS there is a simple key-value database implementation called Dirty. It stores data in memory and logs modifications into an append-only file as single-line JSON strings in the following format:

{"key": "a-key", "val": ... }

Unfortunately it cannot be used from PhantomJS as there is not support for Node-styled async IO. Instead of that, PhantomJS supports CommonJS/Filesystem which only has sync IO. Therefore I built a package that mimics the Dirty API and saves data in the compatible format. The compatible file format allows to read the database directly with Dirty to process the data on NodeJS or simply load the data in anything that supports files and JSON.

The PhantomJS-dirty package can be installed from NPM:

npm install phantomjs-dirty

The API methods are demonstrated with the example:

var dirty = require('./node_modules/phantomjs-dirty');

var db = dirty.open('test.dirty');

db.set('a-key', { a: 1 });

console.log(db.get('a-key'));

db.forEach(function(key, val) {
    console.log(key + ': ' + val);
});

As PhantomJS does not load directly from the node_modules directory, it must be included in the path of the require call.

P.S. In many cases it makes sense to use URLs as keys.

Warning. Single database cannot be shared by multiple processes!

Note. I have no idea why commonjs.org is down at the moment (2015-04-21). Tried to contact people about it.


Comments

No comments have been added so far.

Email is not displayed anywhere.
URLs (max 3) starting with http:// or https:// can be used. Use @Name to mention someone.