Building an OpenNLP application with Node Express

The objective of this post is to demonstrate how to integrate OpenNLP with Node Express. Were going to be making an Express application called HashCat. The source code for HashCat can be found on Github.

Getting Started

Generate a simple Express application. I used Yeomen express-generator to generate my Express application.

express hashcat --ejs  

The next step is to install the npm modules that are going to be used.

npm install java --save  
npm install opennlp --save  

If everything was done correctly, navigate to your localhost:port (Im using port 3000) and you should see the following in your browser :

Training

Were going to need a trained dataset that will categorize our hashtags.

Obtain and download the categorized hashtag dataset from my publicly shared datasets on Github and rename it as hashcat.tsv. Execute the following OpenNLP CLI command to create a trained model.

opennlp DoccatTrainer -model hashcat.bin -lang en -data hashcat.tsv -encoding UTF-8  

After 1-2 minutes the training should be complete and a hashcat.bin file should appear in the directory.

API

Add one api route to the Express app.js. This API route is going to handle the requests. For simplicity sake I have named my API route api.

/*
app.js  
*/
var api = require('./routes/api');  
app.use('/api', api);  

In the routes folder create the api file and insert the following code :

/*
routes/api.js  
 */
var express = require('express');  
var router = express.Router();  
var OpenNLP = require('opennlp');

/*
Point the doccat model to our trained NLP model.  
 */
var doccat = new OpenNLP({  
    models: {
        doccat: 'hashcat.bin'
    }
}).doccat


/*
Hashtag will be our api parameter  
 */

router.get('/:hashtag', function(req, res, next) {  
    var hashtag = req.param('hashtag')
    if (!hashtag) return res.json(500).json({
        error: "missing hashtag parameter"
    });


/*
Run categorization  
 */

    doccat.categorize(hashtag, function(err, list) {
        if (err) return res.status(500).json({
            error: JSON.stringify(err)
        });


/*
Choose best category  
 */

        doccat.getBestCategory(list, function(err, category) {
            if (err) return res.status(500).json({
                error: JSON.stringify(err)
            });
            return res.status(200).json({
                category: category
            });
        });
    });
});

module.exports = router;  

Now you have an awesome API that can categorize hashtags. Try it out with the following calls. http://localhost:3000/api/samsung should be categorized as technology and
http://localhost:3000/api/bacon should be categorized as food.


You can view a live hashcat demo on Heroku.

https://hashcatdemo.herokuapp.com


HashCat on Github : https://github.com/mbejda/HashCat

Blog: www.mbejda.com

Twitter: @notmilobejda