samedi 19 juillet 2014

snapJob - Part VII : Client side javascript dependencies using Bower


Part VI of this project ("Usage of Apache Kafka to send logs to logstash and propagate application configuration changes") can be found here.


Do you know "npm" ? If you are familiar with node.js, you know it. npm is a package manager that eases the pain of finding, downloading, and updating all of your project's dependencies... All of your dependencies ? answer is : almost.
npm takes care of libraries you use in you server application. In other words, it doesn't take care of javascript dependencies you may have in the actual web pages.

Enters Bower !

It works pretty much the same way, so this tutorial will be (for once) pretty short.

Installation :

First thing you have to do, is install it. Bower is installed via npm. We will use the -g command line option to install bower :

sudo npm install -g bower


-g means we will install bower in the global repository of node.js. It means that this package will be available to any of all of our applications and from anywhere through the command line.
Why sudo ? because npm will try to install into it's own directory, which is (at least on my computer) "/usr/local/lib/node_modules". And writing to this directory requires root writes, so... sudo !

Usage :

A file named bower.json must be created in the root folder of you application.
In the case of snapJob, we choose to really separate the html files from the RESTFul API. In other words, we could make node.js serve the html files, but to serve html files, there is better that node.js. We will serve files with nginx like described in the article "http://codeisanart.blogspot.fr/2014/07/snapjob-part-v-serving-files-with-nginx.html".

So we will just create an empty folder called snapjob with only one file in it, "bower.json". Just for the taste of it, we will add two dependencies : angular and jquery :
{
    "name": "snapJob",
    "version": "0.0.1",
    "dependencies": {
        "angular": "*",
        "jquery": "*"
    }
}

Now let's open a terminal, and go to the application directory, and simply type this :
bower install
As a response, you should see something like this :
bower angular#*             not-cached git://github.com/angular/bower-angular.git#*
bower angular#*                resolve git://github.com/angular/bower-angular.git#*
bower jquery#*              not-cached git://github.com/jquery/jquery.git#*
bower jquery#*                 resolve git://github.com/jquery/jquery.git#*
bower angular#*               download https://github.com/angular/bower-angular/archive/v1.2.20.tar.gz
bower jquery#*                download https://github.com/jquery/jquery/archive/2.1.1.tar.gz
bower angular#*                extract archive.tar.gz
bower angular#*           invalid-meta angular is missing "ignore" entry in bower.json
bower angular#*               resolved git://github.com/angular/bower-angular.git#1.2.20
bower jquery#*                 extract archive.tar.gz
bower jquery#*                resolved git://github.com/jquery/jquery.git#2.1.1
bower angular#*                install angular#1.2.20
bower jquery#*                 install jquery#2.1.1

What does-it means ? It simply means that bower has created a "bower_components" folder, containing all of the javascript failes you will include in your html file. In the next article, we will simply create an index.html file that will reference these javascript files as needed.

From now on, each time you want the latest angular of jquery libraries, you just have to run bower install the same way we did before.

Simple, easy, efficient...

Pretty nice, huh ? That wasn't so hard after all...

Presentation of the project can be found here.
Source code for this application can be downloaded from here.

jeudi 17 juillet 2014

snapJob - Part VI : Usage of Apache Kafka to send logs to logstash and propagate application configuration changes


Part V of this project ("Serving files with nginx") can be found here.

What is Kafka ? I'll just quote their web site :
"Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.
What does all that mean?
First let's review some basic messaging terminology:
  • Kafka maintains feeds of messages in categories called topics.
  • We'll call processes that publish messages to a Kafka topic producers.
  • We'll call processes that subscribe to topics and process the feed of published messages consumers..
  • Kafka is run as a cluster comprised of one or more servers each of which is called a broker.
So, at a high level, producers send messages over the network to the Kafka cluster which in turn serves them up to consumers like this:
Communication between the clients and the servers is done with a simple, high-performance, language agnostic TCP protocol. We provide a Java client for Kafka, but clients are available in many languages."

How will we use Kafka in snapJob :

  • First, to transport logs. The node.js app will act as a producer of messages, and we will have to set the input of logstash as a kafka receiver.
  • Second, to propagate configuration changes. In part III of this "saga" ("Storing and loading data with Apache CouchDB and node.js", see part "the swagger ugly validator function") , we saw that we had a problem with a configuration changes across application instances. With Kafka, if one application instance within our node.js cluster says that configuration needs to be updated, it will do it by sending a message though Apache Kafka. Every instance will act as a message consumer, and apply the configuration change when this message gets received.

Install and run :
Installation is pretty easy. First, download it, and then, copy the files in /opt. This way, you should have a "/opt/kafka_2.9.2-0.8.1.1" directory

Kafka uses Zookeeper to maintain consumers offset. By that, I mean at which position the consumer or consumer group currently is. To make an analogy, it's pretty much the same as if you borrow a book at the library and return it each time you stop reading. Next time you feel like you want to read again, you borrow the book one more time, and ask the librarian (which is our zookeeper) at which page you stopped reading last time.

So we need to start ZooKeeper (don't worry, we don't need to install it, as it is now embedded with Kafka since a few versions) :
cd /opt/kafka_2.9.2-0.8.1.1
sudo bin/zookeeper-server-start.sh config/zookeeper.properties

Next, we need to start kafka in another terminal :
cd /opt/kafka_2.9.2-0.8.1.1
sudo bin/kafka-server-start.sh config/server.properties

After the second command, you should see the first console display stuff and go crazy for a second. This is a good sign, it shows that Kafka successfully connected to zookeeper.

Let's create a topic called "logs" :
cd /opt/kafka_2.9.2-0.8.1.1
sudo ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 2 --replication-factor 1 --topic logs

And another one called "conf" :
cd /opt/kafka_2.9.2-0.8.1.1
sudo ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 --replication-factor 1 --topic conf

Note that you can see the list of your topics using the following command :
cd /opt/kafka_2.9.2-0.8.1.1
sudo ./bin/kafka-topics.sh --zookeeper localhost:2181 --describe

And if you do, you will see something like this :
Topic:conf      PartitionCount:1        ReplicationFactor:1     Configs:
        Topic: conf     Partition: 0    Leader: 0       Replicas: 0     Isr: 0
Topic:logs      PartitionCount:2        ReplicationFactor:1     Configs:
        Topic: logs     Partition: 0    Leader: 0       Replicas: 0     Isr: 0
        Topic: logs     Partition: 1    Leader: 0       Replicas: 0     Isr: 0

As you can see, we created two topics, one with two partitions, and another one with only one. This is because we expect the "logs" topic to be used far more often than the "conf" one.

To test our installation, we could start a consumer that listens to the "conf" topic :
cd /opt/kafka_2.9.2-0.8.1.1
sudo bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic conf --from-beginning

... and, start a producer in another terminal :
cd /opt/kafka_2.9.2-0.8.1.1
sudo bin/kafka-console-producer.sh --broker-list localhost:9092 --topic conf

if you type something in the producer console, you should see the same thing appear in the consumer console.

Producing log messages from node.js :
In our application, we will update util/log.js to produce messages to the Kafka server. We will use kafka0.8,

So let's add the dependency to our package.json application :
{
    "name": "snapJobAPI",
    "version": "0.0.1",
    "dependencies": {
        "express": "*",
        "swagger-node-express": "*",
        "minimist":"*",
        "node-uuid":"*",
        "nodemailer":"*",
        "cradle":"*",
        "async":"*",
        "kafka0.8":"*"
    }
}

We will also add a "messageBroker.js" file in the "util" directory (the zookeeper connection string - localhost:2181 - will probably be removed from here to be part of a configuration file, but for now, we will keep it "as it"). The important stuff in this file is here :
var Kafka = require('kafka0.8')

// [...]

// Sends a simple message to kafka
this.sendMessage = function(message, topic, partition, callback) {
    var zookeeperClient = new Kafka.Zookeeper('localhost:2181');
    var serializer = new Kafka.Serializer.Json();
    var payloads = [
        {
            topic: topic,
            partition: partition,
            serializer: serializer,
            messages: [ message ]
        }
    ];
    var producer = new Kafka.Producer({
        zkClient: zookeeperClient,
        requiredAcks: 1
    }, function () {
        producer.produce(payloads, callback);
    });
};

It says we need to connect to ZooKeeper on localhost:2181, use a Json serializer, and produce a message on a specified topic.

Next, we need to update the "/util/log.js" source file. Instead of sending to logstash as we previously did in the article "part II : Managing application logs using LogStash", we now can do this :
messageBroker.sendMessage(
    {
        message: message,
        request: cleanedRequest,
        level: level,
        workerId: cluster.worker ? cluster.worker.id : undefined,
        hostname: os.hostname()
    },
    'logs',
    cluster.worker ? cluster.worker.id % 2 : 0,
    callback);

The Json message itself is highlighted in blue, and we will publish in the topic "logs", on partition 0 if the application is launched using "node snapJob.js", and on partition 0 or 1 if launched from "node app.js" (see part IV : Scalling node.js, and ensure high availability using "cluster"). Kafka ensures messages order within a single partition. If we use 0 or the cluster worker id (if available), then we ensure that all logs will be properly ordered for each application thread. This is a nice trick.

Consuming logs from Kafka to LogStash :
  1. First, see "part II : Managing application logs using LogStash" to know what logstash is, and to know how to launch it.

  2. Logstash does not know how to read kafka message streams by itself. First, we need to give it a ruby gem, a plugin, to make it listen kafka. The plugin and it's documentation can be found here.

    But as I know you are as lazy as I am, here are the few command lines I used to install the plugin :
    wget https://github.com/joekiller/logstash-kafka/archive/v0.5.2.tar.gz
    
    tar -zxvf v0.5.2.tar.gz
    
    cd logstash-kafka-0.5.2
    
    sudo cp -r /opt/kafka_2.9.2-0.8.1.1/libs /opt/logstash-1.4.2/vendor/jar/kafka_2.8.0-0.8.1/lib
    
    sudo mkdir /opt/logstash-1.4.2/vendor/jar/kafka_2.8.0-0.8.1
    
    sudo cp -r /opt/kafka_2.9.2-0.8.1.1/libs /opt/logstash-1.4.2/vendor/jar/kafka_2.8.0-0.8.1
    
    sudo cp -r ./lib/* /opt/logstash-1.4.2/lib
    
    GEM_HOME=/opt/logstash-1.4.2/vendor/bundle/jruby/1.9 GEM_PATH= java -jar /opt/logstash-1.4.2/vendor/jar/jruby-complete-1.7.11.jar --1.9 ~/logstash-kafka-0.5.2/gembag.rb ~/logstash-kafka-0.5.2/logstash-kafka.gemspec
    

  3. Our logstash configuration file, "/opt/logstash-1.4.2/snapJpbLogs.conf", currently looks like this :
    input {
     tcp { port => 28777 type=>"log" }
    }
    output {
     elasticsearch { host => localhost }
    }
     
    filter {
     json {
      source => "message"
     }
    }
    

    We will update it to something like this :
    input {
     kafka {
      zk_connect => "localhost:2181"
      group_id => "logstash"
      topic_id => "logs"
      reset_beginning => false
      decorate_events => false
     }
    }
    output {
     stdout { codec => rubydebug }
     elasticsearch {
      host => localhost
      port => 9300
     }
    }
    
    The stdout is optional, but will let us see in the console the messages streaming to logstash.

  4. Now let's run logstash :
    bin/logstash -f snapJobLogs.conf web
    

    The result will be the same as before, except this time, we will be able to filter messages by cluster worker id, by hostnames, ...
Consuming Kafka messages from node.js :
The important part is in "/util/messageBroker.js" :
var _this = this;

var zookeeperClient = new Kafka.Zookeeper('localhost:2181');

var kTransport = new Kafka.Transport({
    zkClient: zookeeperClient
});
var serializer = new Kafka.Serializer.Json();

var worker = cluster.worker ? 'consumer' + cluster.worker.id + '-' + os.hostname() : 'defaultconsumer-' + os.hostname();
var consumer = new Kafka.Consumer({
    clientId: worker,
    group: worker,
    store: new Kafka.Store.Zookeeper(kTransport),
    payloads: [
        {                                        /* see 'Payloads' section for more advanced usages */
            topic: 'conf',
            partition: [0],
            serializer: serializer            /* we will parse json, see 'Serializer' section */
        }
    ],
    transport: kTransport
}, do_consume);

function do_consume() {
    consumer.consume(
        function(msg, meta, next) {
            _this.emit(meta.topic, msg);
            /* commit offset to offset store and get next message */
            next();
        },
        function() {
        },
        function(err) {
            setTimeout(do_consume, 1000);
        }
    )
}

Initialization from the zooKeeper client to the serializer is pretty much the same as we did for the producer.
As you can see, there are 4 key points following the initilization :

  1. When creating the consumer, we give it a payload, which says we want to consume the partition 0 of the topic "conf" (remember ? we created only one partition for this topic). But most of all, we also pass the function do_consume to the constructor of the consumer.
  2. Then comes the do_consume function.
  3. We then "read" the topic by calling consumer.consume, which takes 3 functions in the arguments list :
    • eachCallback(msg, meta, next): Executed for each message consumed msg: deserialized message meta: { topic, offset, partition } next: commit offset and get next message in message set, YOU HAVE TO CALL IT
    • doneCallback(): executed at the end of message set
    • endCallback(err): executed when everything has been consumed or if fetch request timed out
  4. The last key point is in the endCallback function, where we call the do_consume function again, after a certain amount of time.
For those who carefully rode the previous lines of code, you can see that there is this line :
_this.emit(meta.topic, msg);

This is the last bullet in my gun for the current article : The messageBroker is an event emitter. This means it inherits from the eventEmitter, and doing this is pretty easy in node.js :
var EventEmitter = require('events').EventEmitter;
// [...]
util.inherits(MessageBroker, EventEmitter);

To test this, I updated "/util/globals.js" to add these lines of code :
this.pushConfigurationToBroker = function(callback){
    messageBroker.sendMessage(this.configuration, 'conf', 0, callback);
};

var _this = this;
messageBroker.on('conf', function(data){
    _this.configuration = data;
    var worker = cluster.worker ? ' ' + cluster.worker.id : '';
    console.log('conf updated' + worker);
});

I also updated the 'models/test.js' source code file to call the pushConfigurationToBroker method :
var globals = require("./../util/globals");

// [...]

globals.pushConfigurationToBroker();

And now, when I click the test button in my swagger interface, I can see this :
conf updated 5
conf updated 8
conf updated 7
conf updated 1
conf updated 2
conf updated 3
conf updated 6
conf updated 4
This is the proof that configuration changes has been propagated to all of our node.js application instances.
Important note: To test the source code, you need to run the command "node app.js --masterAPIKey=somekey --port=8080" after you have deployed the swagger files from the "nginx files" folder to nginx (see "part V : Serving files with nginx"), browsed to http://localhost/api, and set the api_key to "somekey".


Pretty nice, huh ? That wasn't so hard after all...

Presentation of the project can be found here.
Source code for this application can be downloaded from here.

Next part : snapJob - Part VII : Client side javascript dependencies using Bower

mardi 15 juillet 2014

snapJob Part V : Serving files with nginx


Part IV of this project ("Scalling node.js, and ensure high availability using "cluster") can be found here.

The architecture behind the web site we intend to create requires a front end application, which will be a single webpage using bootstrap, angular.js, and other stuff. This page will call a web service, so we are simply not in a regular php architecture, where a web page is processed, and then returned to the browser. Here, we serve a static web page that does all the job of binding data to the view on the client side.
It means that we need to serve only static files, such as an html files, javascript files, images, ...

node.js can serve static pages... but there is a better way to do it : use a real file server, that does caching and all other stuff : nginx.

Our current RESTFul API uses swagger to provide a nice and clean web interface for testing our service. But the html file provided by swagger, the javascript files, and other images can be moved to a file server, leaving only the real processing job to node.js.

So let's install nginx (on Ubuntu) :
sudo apt-get install nginx

Root folder for websites served by nginx, by default, are located in /usr/share/nginx/html. We will simply add a subfolder called api here, and copy the following files in it with admin  :

  • css/*
  • images/*
  • lib/*
  • index.html
  • o2c.html

As we will serve static files on another http port that the node.js server is, it means that we need to change on thing in the inde.html file. Replace :
    $(function () {
      window.swaggerUi = new SwaggerUi({
      url: "/api-docs",
      dom_id: "swagger-ui-container",
      supportedSubmitMethods: ['get', 'post', 'put', 'delete'],
      onComplete: function(swaggerApi, swaggerUi){
with :
    $(function () {
      window.swaggerUi = new SwaggerUi({
      url: "http://localhost:8080/api-docs",
      dom_id: "swagger-ui-container",
      supportedSubmitMethods: ['get', 'post', 'put', 'delete'],
      onComplete: function(swaggerApi, swaggerUi){

Then restart nginx :
service nginx restart

Now if we browse http://localhost/api/, we can see this :
"Can't read from server. It may not have the appropriate access-control-origin settings."

It means we have to allow external applications, such as the swagger ui html file, to access our RESTApi in the node.js application.

To do this, we have to add this lines in snapJob.js :
swagger.setHeaders = function setHeaders(res) {
    res.header("Access-Control-Allow-Origin", "*");
    res.header("Content-Type", "application/json; charset=utf-8");
};

First header is important, second one is less important, but as we serve json content, it's a good idea to keep it.

Also, these lines were removed, as our node.js application will not serve any static files anymore :
app.use('/lib', express.static(path.join(__dirname, 'lib')));
app.use('/css', express.static(path.join(__dirname, 'css')));
app.use('/images', express.static(path.join(__dirname, 'images')));

and this one too :
app.get('/', function (req, res) {
    res.sendfile(__dirname + '/index.html');
});

Now it we hit http://localhost/api, we have exactly the same behaviour as before, except that static files are served from gnix, which is better because nginx just rocks :)

Pretty nice, huh ? That wasn't so hard after all...

Presentation of the project can be found here.
Source code for this application can be downloaded from here.

Next part : snapJob - Part VI : Usage of Apache Kafka to send logs to logstash and propagate application configuration changes

lundi 14 juillet 2014

snapJob Part IV : Scalling node.js, and ensure high availability using "cluster"



Part III of this project ("Storing and loading data with Apache CouchDB and node.js") can be found here.

By nature, node.js is single thread. Wherever you have a 2, 4, 8, 16 (or more) cores architecture, you will not take full advantages of it.

Here comes cluster !

This is a node.js api well described on other websites, such as here, and there.

Cluster allows an existing application to be scaled on a single machine, on every core processor it has. So this is "mono-machine scaling".
It acts as a load-balancer, allowing multiple instances of our application to share the same port for handling web requests.

implementing "cluster" doesn't even require any refactoring at all. In fact, the only thing I did in the snnapJob RESTFul API was to rename app.js as snapJob.js, and recreate the app.js file with the following content :

var cluster = require('cluster');

if (cluster.isMaster) {
    var cpuCount = require('os').cpus().length;

    // Create a worker for each CPU
    for (var i = 0; i < cpuCount; i += 1) {
        cluster.fork();
    }
    cluster.on('exit', function (worker) {
        console.log('Worker ' + worker.id + ' died !');
        cluster.fork();
    });
} else {
    require('./snapJob');
}

Note : I also added a reference to the new dependency "cluster" in the package.json file and ran the "npm install" command.

Now, if I run this :
/usr/bin/node app.js

The node.js output shows (I have 8 cores on my machine) :
snapJob API running on http://localhost:8080 on worker 1
snapJob API running on http://localhost:8080 on worker 8
snapJob API running on http://localhost:8080 on worker 6
snapJob API running on http://localhost:8080 on worker 7
snapJob API running on http://localhost:8080 on worker 2
snapJob API running on http://localhost:8080 on worker 4
snapJob API running on http://localhost:8080 on worker 3
snapJob API running on http://localhost:8080 on worker 5


Cluster first runs a master process. It gets the number of CPUs on the machine, and creates a sub process for each CPU. If one of the process exits, it is recreated. If we don't do that, we may have a master process that will load balance requests to dead children process, and the application will not work anymore.

Obviously, cluster.isMaster is true only once, for the master process. For every child, I just call require('./snapJob'), which will simply reload the application.

Now if I run this :
/usr/bin/node snapJob.js

Then I just run the application the way it was before, on a single thread, and the output is :
snapJob API running on http://localhost:3000


Testing cluster
To be sure that requests are correctly load-balanced on every processes, I just added something new in the /util/log.js file :

cluster = require('cluster')


and
    this.cleanRequest = function(req, callback){
        if(callback) callback(
            req === undefined ?
                undefined :
                {ip: req.ip,
                url: req.url,
                workerId : cluster.worker ? cluster.worker.id : undefined,
                api_key: req.headers["api_key"]});
    };

In other words, if we are in a context where we are clustered (so if we launched the application using the "node app.js" command line and not the "node snapJob.js" command line), then we save on which worker the log has been invoked.

Just run the application as described in the previous articles, and go to Kibana to watch the logs :



As you can see, workers 6, 8, 4, 5, and 1 were called, proving  that load balancing has been correctly done.

Another test you can see in the application sources consists in explicitly exit the application when calling the test method located in models/test.js :
exports.dummyTestMethod = {
    'spec': {
        description : "Test method",
        path : "/test/",
        method: "GET",
        summary : "This is a dummy test method",
        type : "void",
        nickname : "dummyTestMethod",
        produces : ["application/json"]
    },
    'action': function (req, res) {
        logger.logInfo('dummyTestMethod called', req);
        process.exit(1);
        res.send(JSON.stringify("test is ok"));
    }
};

The goal here is to see if the child process is correctly recreated when it dies. Calling the dummy test method via the swagger interface (http://localhost:8080/#!/test, remember to run the app with "node app.js --masterAPIKey=06426e19-d807-4921-a668-4708287d8878", put the masterPIKey in the text field on the top right of the swagger interface, and click explore before you try to call the test method) produces the folowing output in the node.js command prompt :
snapJob API running on http://localhost:8080 on worker 5
snapJob API running on http://localhost:8080 on worker 2
snapJob API running on http://localhost:8080 on worker 4
snapJob API running on http://localhost:8080 on worker 6
snapJob API running on http://localhost:8080 on worker 1
snapJob API running on http://localhost:8080 on worker 3
snapJob API running on http://localhost:8080 on worker 8
snapJob API running on http://localhost:8080 on worker 7
Worker 3 died !
Worker 8 died !
Worker 7 died !
snapJob API running on http://localhost:8080 on worker 10
snapJob API running on http://localhost:8080 on worker 9
snapJob API running on http://localhost:8080 on worker 11
Worker 11 died !
snapJob API running on http://localhost:8080 on worker 12

Which means dying processes are correctly reinstanciated :)

Pretty nice, huh ? That wasn't so hard after all...

Presentation of the project can be found here.
Source code for this application can be downloaded from here.

Next part (snapJob Part V : Serving files with nginx) can be found here

samedi 12 juillet 2014

snapJob Part III : Storing and loading data with Apache CouchDB and node.js

Part II of this project ("Managing application logs using LogStash") can be found here.

How cool and relaxed you think this guy is, lazing on it's couch ?

In this article, we will try to access our couchDB NOSQL database from our node.js application with a realy simple example.

Why is couchDB good in our case ? Because it keeps track of revisions on a document ! I have the idea that beeing able to watch previous versions of a curriculum vitae on the web site, for an applicant, may be a really nice feature to have. On top of this, this is Apache... this is open source !

But first of all, let's install it ! As I am on Ubuntu, let's just type the following command in a terminal :
sudo apt-get install couchdb

Now let's open our favorite browser and go to http://localhost:5984/


Fantastic ! It works !
Event better, browse http://localhost:5984/_utils/ :


Let's not create any databases, to see what happens next.

In our node.js application, we will use craddle, a node.js API that wraps CouchDB calls.

The very first thing we'll do will be to manage API Keys that will allow access to our RESTFul service. Each time an application tries to access the RESTFul web service, the API key must be provided. This will allow us to keep track of which application tries to access which data, and also block access to an unauthorized application. The API Key (a guid) must be provided each time any application tries to access to the service via http request headers.

The only two functions that will not require any API Key will be the one for requesting a key, and the one to activate a key.

So the logic will be the following :

  • A user calls the requestAPI function, providing a email and an application name;
  • The system generates a new key, and a confirmation key, and stores the key with the confirmation status as "pending";
  • An email is send to the developer, with an htlm link to the method that activates the key. This method takes two arguments, the API key itself, and the confirmation key. This way, we can be sure that the email provided while requesting the key is valid;
  • Once the developer opens this mail an clicks the link, the key is activated, and he can start playing with the api.

During that time, when the developer accesses the API via swagger, he provides no key. So only the two functions (request key and activate key) are available.
If he provides the key without having it activated, the same behavior applies.
If the key has been properly activated, then he can access to any of the RESTFul API functions.
On top of this, we want to be able to run the node.js application with a --masterAPIKey parameter. This master key will have access to any functions we want from the RESTFul web service. This may be handy later on, and will not be that difficult to implement.

... But first, let's do a brief refactoring...

We have a new folder, called "api'. This folder will contain singleton classes for each "application domain". Here, we have a very first one : API Keys management. So this class will contain our two methods : requestAPIKey, and confirmAPIKey. The methods in these kind of classes are not intended to actually send a response to the browser. It just do the job of storing data and perform other logical operations.

In the "models" folder, we expose only the swagger methods and here we do the job of sending back the response to the browser or any other application that calls the RESTFul api.

We also have a few new files in the "util" folder :
- database.js : a class that will wrap all of our database calls
- globals.js : a class that will contain miscellaneous application parameters such as the application domain and port, ...
- mail.js : a class to wrap email creation and performs sending (this thing does not work yet, but we'll see that later on)

The database.js file
This is the main focus for this article. So take a look at it, it's not that hard to understand :)

var cradle = require('cradle');
var logger = require('./log');
var async = require('async');

var Database = function Database() {
    var _this = this;
    // Connect to the apikeys database...
    this.ConnectApiKeysDb = function(callback) {
        _this.apiKeysDb = new (cradle.Connection)('127.0.0.1', 5984).database('apikeys');
        if(callback) callback();
    };

    this.ConnectAllDb = function(callback){
        async.parallel([
            _this.ConnectApiKeysDb
        ], callback)
    };

    this.ensureApiKeysDbExists = function(callback) {
        // if the apikeys database does not exists, let's create it...
        _this.apiKeysDb.exists(
            function (err, exists) {
                if (err) {
                    logger.logError(err);
                    if(callback) callback();
                } else if (!exists) {
                    logger.logWarning('database apikeys not exists and will be created.');
                    _this.apiKeysDb.create();
                    // ... and create a view to request all API keys, and confirmed ones...
                    _this.apiKeysDb.save('_design/apikeys', {
                        all: {
                            map: function (doc) {
                                if (doc.name) emit(doc.apiKey, doc);
                            }
                        },
                        confirmed: {
                            map: function (doc) {
                                if (doc.apiKey && doc.confirmationStatus == 'confirmed') {
                                    emit(doc.apiKey, doc);
                                }
                            }
                        }
                    }, callback);
                }
            }
        );
    };

    this.ensureAllDatabasesExists = function(callback){
        async.parallel([
            _this.apiKeysDb.ensureApiKeysDbExists
        ], callback)
    };

    // Saves an entry in the apikeys database.
    this.saveToApiKeys = function(key, data, callback) {
        _this.apiKeysDb.save(key, data,
            function (err, res) {
                if (err) logger.logError(err);
                if (callback) callback(err, res);
            }
        );
    };

    // Gets an entry from it's key from the apikeys database.
    this.getFromApiKeys = function(key, callback){
        _this.apiKeysDb.get(key, function (err, doc) {
            if (err) logger.logError(err);
            if(callback) callback(err, doc);
        });
    };

    // Gets entries from a view in the apikeys database.
    this.viewFromApiKeys = function(view, callback){
        _this.apiKeysDb.view(view, function (err, res) {
            if (err) logger.logError(err);
            if(callback) callback(err, res);
        });
    };
};

Database.instance = null;

/**
 * Singleton getInstance definition
 * @return singleton class
 */
Database.getInstance = function(){
    if(this.instance === null)
        this.instance = new Database();
    return this.instance;
};

module.exports = Database.getInstance();

As you can see, a new reference to the "craddle" api is used, which means we need to update our package.json file, and perform the "mpn install" command to download this new dependency.

In the app.js file, I just call what is needed to ensure that the databases exists, are connected, and event load api keys using the view we created by calling the refreshConfirmedApiKeys function :
database.ConnectAllDb(function(){
    database.ensureAllDatabasesExists(function(){
        apikey.refreshConfirmedApiKeys();
    });
});

So it you do run the app, you can see this :

You can even see that we already have one document... but what is this document ?!?

Let's dig a little bit :

It's the view we've created !

Let's have a closer view :

Do recognize that code ? It's the code we have in the database.js file, where we created the view !
So views are considered as documents !

So now, let's use our database.js file...

The api/apikey.js file
var globals = require("./../util/globals")
    , uuid = require('node-uuid')
    , database = require("./../util/database")
    , mailer = require("./../util/mail");

var ApiKey = function ApiKey() {
    // Sends a request for a new API key.
    this.requestAPIKey = function(applicationName, email, callback) {
        var data = {
            apiKey: uuid.v1(),
            confirmationKey: uuid.v4(),
            confirmationStatus: 'pending',
            applicationName: applicationName,
            email: email
        };

        var confirmationUrl = globals.applicationUrl + '/apikey/' + data.apiKey + '/' + data.confirmationKey;
        console.log(confirmationUrl);

        mailer.sendMail(
            'noreply@snapjob.com',
            data.email,
            'snapJob - Your API Key is here !',
            'Dear developer, as requested, here is your api key. This will not be valid until you activate it at the following address : ' + confirmationUrl,
            'Dear developer, as requested, here is your api key. This will not be valid until you activate it at the following address : <a href="' + confirmationUrl + '">' + confirmationUrl + '</a>'
        );

        database.saveToApiKeys(data.apiKey, data,
            function(err, res) {
                if(err) {
                    if (callback) callback(503, JSON.stringify(err));
                } else {
                    if (callback) callback(200, JSON.stringify('ApiKey for application "' + data.applicationName + '" is "' + data.apiKey + '". A confirmation email has been sent to "' + data.email + '".'));
                }
            }
        );
    };

    // Confirms an API key
    this.confirmAPIKey = function(apiKey, confirmationKey, callback) {
        var _this = this;
        database.getFromApiKeys(apiKey,
            function(err, doc) {
                if (err) {
                    if (callback) callback(503, err);
                } else {
                    if (doc == undefined) {
                        if (callback) callback(404, 'api-key not found');
                    } else {
                        if (doc.confirmationKey !== confirmationKey) {
                            if (callback) callback(403, 'Confirmation key is not correct');
                        }else {
                            switch (doc.confirmationStatus) {
                                case 'pending':
                                    doc.confirmationStatus = 'confirmed';
                                    database.saveToApiKeys(apiKey, doc,
                                        function (err) {
                                            if (err) {
                                                if (callback) callback(503, err);
                                            }else {
                                                if (callback) callback(200, 'API key is now active');
                                                _this.refreshConfirmedApiKeys();
                                            }
                                        }
                                    );
                                    break;
                                case 'banned':
                                    if (callback) callback(403, 'API key has been banned and cannot be reused');
                                    break;
                                case 'confirmed':
                                    if (callback) callback(403, 'API key has been already been confirmed');
                                    break;
                            }
                        }
                    }
                }
            }
        );
    };

    this.confirmedApiKeys = [];

    // Refreshes all confirmed API keys and puts them in the confirmedApiKeys array.
    this.refreshConfirmedApiKeys = function(){
        var _this = this;
        var newResult = [];
        database.viewFromApiKeys('apikeys/confirmed',
            function(err, res) {
                if(res !== undefined) {
                    res.forEach(
                        function (row) {
                            newResult.push(row.apiKey);
                        }
                    );
                    _this.confirmedApiKeys = newResult;
                }
            }
        );
    }
};

ApiKey.instance = null;

/**
 * Singleton getInstance definition
 * @return singleton class
 */
ApiKey.getInstance = function(){
    if(this.instance === null)
        this.instance = new ApiKey();
    return this.instance;
};

module.exports = ApiKey.getInstance();

There you can see that we have 3 functions :
- One that saves an API key request;
- Another one that confirms an API key (called after clicking on the link in the mail sent to the developer);
- And the last one takes advantages of the couchDB view we created to load all confirmed API keys in the memory.

app.js, the swagger ugly validator function

Swagger has been described in the previous article. It has a feature that is as interesting as it is ugly : The validator function.

Lets take a look at this function :

// Adding an API validator to ensure the calling application has been properly registered
swagger.addValidator(
    function validate(req, path, httpMethod) {
        // refresh all confirmed api keys...
        apikey.refreshConfirmedApiKeys();

        //  example, only allow POST for api_key="special-key"
        if(path.match('/apikey/*')){
            logger.logInfo('API call for the /apikey/ path allowed', req);
            return true;
        }

        var apiKey = req.headers["api_key"];
        if (!apiKey) {
            apiKey = url.parse(req.url,true).query["api_key"];
        }

        if(!apiKey) return false;

        if (apiKey === globals.masterAPIKey) {
            logger.logInfo('API call allowed by master key', req);
            req.clientName = 'master api-Key';
            return true;
        }

        return apikey.confirmedApiKeys.indexOf(apiKey) > -1;
    }
);

What I find ugly here, is that you provide a synchronous method, that returns true or false. If you try to access to a database or any other function that takes a callback as a parameter, then the asynchronous philosophy of node.js is broken. Here, for example, if I try to access my database, the result of my database call will come after we exit the validation function, which will not allow us to return true or false. This is why, here, I just take a look at pre-loaded confirmed API keys.

But if you think about it, this is a RESTFul web service, that is intended to be load balanced on several servers. Even if the confirmAPIKey in the api/apikey.js file updates the array, you may not be ensured that if the next call, using the confirmed api key, will be processed on the same server, which means the confirmedApiKeys array you can find in that file may not be properly updated every where... This is why, to contain this problem, I just reload the confirmed API keys asynchronously at every call. So in therory, this problem may occur a very few times, but I have no other solutions at that time.

Another thing you can see in this method, is that it returns true for every path that starts with "/apikey/" (yes, it's a regular expression).

If the masterKey (provided within the application parameters in the command line) is provided, we return always true.

If the key provided is in the array containing the confirmed keys, then we also return true.

Testing

Lets start the node.js application this way :
node app.js --masterAPIKey=06426e19-d807-4921-a668-4708287d8878

If you browse to http://localhost:8080/, you can see that you can expand the apikey methods, but not the test method.

If you put your master key in the field at the top right of the window and click "explore", then you can see all functions.

Now let's request a new API key. Call the POST method and provide a name and an email :

Click "Try it out!".

Your node.js console should provide you a link. For me, it is http://localhost:8080/apikey/12440de0-079c-11e4-81ff-db35b05bd397/0762362b-a7dd-4309-9e4e-e52f89ab4ec4, but it will be something alse for you, as it includes generated keys.

Also, in your database, you should see your pending key request :

Now if you follow the link that appeared in your node.js console, it should confirm the API key, your browser should say : API key is now active

Yeay ! We did it ! That wasn't so bad, isn't it ? :)

Presentation of the project can be found here.
Source code for this application can be downloaded from here.

Next part : snapJob Part IV : Scalling node.js, and ensure high availability using "cluster"

vendredi 11 juillet 2014

snapJob Part II : Managing application logs using LogStash


Part I of this project ("Creating a simple RESTFull API that rocks! REST in peace!") can be found here.

How does logstash works ?!?

Log stash takes one or several inputs as a streams of incoming messages, and produces an output to display or store messages.

We could specify the stdin as an input, which means that stuff typed in our console will be catched by logstash. But this ain't the best solution.

We could specify Apache Kafka (a messaging system) to be our input, which could be very efficient, but a little bit more complex to configure and implement.

So, as a first try, we will specify a tcp input on port 28777, and just use a npm API, winston-logstash.

The output should be elasticsearch, but as we want a firt try, lets output to the console.

In this example, we will use logstash 1.4.2, installed in the directory "/opt".

  1. Configuring LogStash
    We will create a simple json configuration file called snapJob.conf in the "/opt/logstash-1.4.2" directory, containing the following text :
    input {
     tcp { port => 28777 type=>"log" }
    }
    output {
     stdout { codec => rubydebug }
    }
    
    filter {
     json {
      source => "message"
     }
    }
    


  2. Running LogStash
    Realy simple. Just run these commands :
    cd /opt/logstash-1.4.2
    bin/logstash -f snapJobLogs.conf
    


  3. Modifying our node.js application
    Let's start by adding a new dependency to our project by adding a dependency in our package.json file :
    "winston-logstash":"*"
    

    Next, create a new "log.js" file, located in a subdirectory in our application called "util" :
    var winston = require('winston');
    
    // Requiring `winston-logstash` will expose `winston.transports.Logstash`
    require('winston-logstash');
    
    var Logger = function Logger() {
        this.logger = new (winston.Logger)({
            transports: [
                new (winston.transports.Logstash)({
                    port: 28777,
                    node_name: 'snapJob',
                    localhost: 'localhost',
                    pid: 12345 ,
                    ssl_enable: false,
                    ca: undefined
                })
            ]
        });
    
        this.cleanRequest = function(req, callback){
            if(callback) callback(
                req === undefined ?
                    undefined :
                    {ip: req.ip, url: req.url});
        }
    
        // Logs an info message
        this.logInfo = function(message, req, callback) {
            var _this = this;
            this.cleanRequest(req, function(cleanedRequest, callback){
                _this.logger.log('info', {message: message, req: cleanedRequest}, {stream: 'log'}, callback);
            });
        }
    };
    
    Logger.instance = null;
    
    /**
     * Singleton getInstance definition
     * @return singleton class
     */
    Logger.getInstance = function(){
        if(this.instance === null)
            this.instance = new Logger();
        return this.instance;
    };
    
    module.exports = Logger.getInstance();
    


    This will create a singleton class that will be used to handle logging.
    This is a good idea to wrap this in a new class, containing the dependency to "winston" in a single file, because if later on we want to produce messages to kafka and set the logstash input as a kafka topic, we will have only one file to update.


  4. Produce a new log from our application
    To do that, we just have to add a few lines in our app.js file :

    var express = require("express")
        , swagger = require("swagger-node-express")
        , path = require('path')
        , argv = require('minimist')(process.argv.slice(2))
        , test = require("./models/test")
        , models = require("./models/models")
        , logger = require("./util/log")
        , app = express();
    
    
    // [...]
    // File truncated for reading purposes
    // [...]
    
    // Log application start
    logger.logInfo('snapJob API running on ' + applicationUrl);
    
    // Start the web server
    app.listen(port);
    


    Let's also update our models/test.js to add informational logs :
    var logger = require("./../util/log");
    
    exports.dummyTestMethod = {
        'spec': {
            description : "Test method",
            path : "/test/",
            method: "GET",
            summary : "This is a dummy test method",
            type : "void",
            nickname : "dummyTestMethod",
            produces : ["application/json"]
        },
        'action': function (req, res) {
            logger.logInfo('dummyTestMethod called', req);
            res.send(JSON.stringify("test is ok"));
        }
    };
    
  5. Run the application
    First, we need to update our npm dependencies because we added a new dependency to our node.js app :
    npm install
    

    Then we need to run the application :
    node app.js
    
  6. Application outputs
    At launch, in the console where you previously launched logstash, you should see something like this (logstash should have been launched before you started the node.js application) :
    yoann@LYnux:/opt/logstash-1.4.2$ bin/logstash -f snapJobLogs.conf 
    Using milestone 2 input plugin 'tcp'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.2/plugin-milestones {:level=>:warn}
    Using milestone 2 filter plugin 'json'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.2/plugin-milestones {:level=>:warn}
    {
           "message" => "{ message: 'snapJob API running on http://localhost:8080',\n  req: undefined } { stream: 'log' } undefined",
          "@version" => "1",
        "@timestamp" => "2014-07-08T21:39:00.160Z",
              "host" => "127.0.0.1:34602",
              "type" => "log",
             "level" => "info"
    }
    


    Now, if you click on the "Try it out!" button, located here (see previous article) http://localhost:8080/#!/test/dummyTestMethod, you should see a new log line in the logstash console :
    {
           "message" => "{ message: 'dummyTestMethod called',\n  req: { ip: '127.0.0.1', url: '/test/' } } { stream: 'log' } undefined",
          "@version" => "1",
        "@timestamp" => "2014-07-08T21:41:14.092Z",
              "host" => "127.0.0.1:34602",
              "type" => "log",
             "level" => "info"
    }
    


    Yeay ! \o/
    We did it !
    Next step ? Output to elasticsearch instead of the console !


  7. Install elasticsearch
    On Ubuntu, pretty easy step : just download and install the debian package !
    If you're lucky, you should be able to test your installation by browsing the following url : http://localhost:9200/
    ...and get the following json string displayed in your browser :
    {
      "ok" : true,
      "status" : 200,
      "name" : "Slyde",
      "version" : {
        "number" : "0.90.10",
        "build_hash" : "0a5781f44876e8d1c30b6360628d59cb2a7a2bbb",
        "build_timestamp" : "2014-01-10T10:18:37Z",
        "build_snapshot" : false,
        "lucene_version" : "4.6"
      },
      "tagline" : "You Know, for Search"
    }


  8. Reconfigure logstash
    Let's update our /opt/logstash-1.4.2/snapJobLogs.conf file to something like this:
    input {
     tcp { port => 28777 type=>"log" }
    }
    output {
     elasticsearch { host => localhost }
    }
    
    filter {
     json {
      source => "message"
     }
    }
    
  9. View logs with Kibana
    Kibana is embeded with logstash. To run it, just change the way you launch logstash :
    bin/logstash -f snapJobLogs.conf web
    

    Now browse http://localhost:9292 to see this :

Pretty nice, huh ? That wasn't so hard after all...

Presentation of the project can be found here.
Source code for this application can be downloaded from here.

Next part : snapJob Part III : Storing and loading data with Apache CouchDB and node.js

snapJob Part I : Creating a simple RESTFull API that rocks! REST in peace!



Creating a RESTFul API is pretty easy in many programming languages, but we want more !
We want a RESTFul API with a user interface that will self-describes the methods options and operations, and which allows as to test the various methods we've created without having to actually build the user interface at the same time. This way, we will only focus on what the API should do, without taking care of anything else.

For those who may ask, building an API is good, because this will allow us to build any front end application we want, wherever it is a html web application, an android app, a Microsoft Windows app, or an IPhone application.

To do that, we will use the following technologies :
- node.js to host, serve, and perform logical operations on data
- swagger, a node.js npm library from Wordnik that will be used to expose the API thought Express (another npm library) and present methods in a fashionable way so we can test it.

For now, we will just create a simple rest service with only one dummy method. Later on, we will add data persistence, log management, data processing, ...e

  1. Setting up the RESTFul API

    Setting up the RESTFul API is pretty easy.

    1. First, we need to copy the files that are in the "dist" directory of the swagger-ui GitHub project  into our application folder.
    2. Then we need to customize a little bit the index.html file to replace the "title" tag, and the title in the html content of the page, but most of it, we need to replace the API url (just remove the striked text) :
      $(function () {
      
      window.swaggerUi = new SwaggerUi({
      url: "http://petstore.swagger.wordnik.com/api/api-docs",
      dom_id: "swagger-ui-container",
      supportedSubmitMethods: ['get', 'post', 'put', 'delete'],
      onComplete: function(swaggerApi, swaggerUi){
      [...]
      
    3. I also moved the "swagger-ui.js" file in the "lib" directory, and updated one line in "index.html":
      <script src="lib/swagger-ui.js" type="text/javascript">

















  • In the root folder of our application, we now need to put a few files :
    • The package.json file, that will describe dependences of our application. :
      {
          "name": "snapJobAPI",
          "version": "0.0.1",
          "dependencies": {
              "express": "*",
              "swagger-node-express": "*",
              "minimist":"*"
          }
      }
      

      express is used to serve files, swagger-node-express to handle the RESTFul api requests, and minimist to parse application parameters.
    • The app.js file, which will be our application's main file. :
      {
      var express = require("express")
          , swagger = require("swagger-node-express")
          , path = require('path')
          , argv = require('minimist')(process.argv.slice(2))
          , test = require("./models/test")
          , models = require("./models/models")
          , app = express();
      
      
      app.use('/js', express.static(path.join(__dirname, 'js')));
      app.use('/lib', express.static(path.join(__dirname, 'lib')));
      app.use('/css', express.static(path.join(__dirname, 'css')));
      app.use('/images', express.static(path.join(__dirname, 'images')));
      
      // Set the main handler in swagger to the express app
      swagger.setAppHandler(app);
      
      // Adding models and methods to our RESTFul service
      swagger.addModels(models)
          .addGet(test.dummyTestMethod);
      
      // set api info
      swagger.setApiInfo({
          title: "snapJob API",
          description: "API to manage job applications, job offers, profiles...",
          termsOfServiceUrl: "",
          contact: "yoann.diguet@snapjob.com",
          license: "",
          licenseUrl: ""
      });
      
      app.get('/', function (req, res) {
          res.sendfile(__dirname + '/index.html');
      });
      
      // Set api-doc path
      swagger.configureSwaggerPaths('', 'api-docs', '');
      
      // Configure the API domain
      var domain = 'localhost';
      if(argv.domain !== undefined)
          domain = argv.domain;
      else
          console.log('No --domain=xxx specified, taking default hostname "localhost".')
      
      // Configure the API port
      var port = 8080;
      if(argv.port !== undefined)
          port = argv.port;
      else
          console.log('No --port=xxx specified, taking default port ' + port + '.')
      
      // Set and display the application URL
      var applicationUrl = 'http://' + domain + ':' + port;
      console.log('snapJob API running on ' + applicationUrl);
      
      swagger.configure(applicationUrl, '1.0.0');
      
      // Start the web server
      app.listen(port);
      

      express is used to serve files, swagger-node-express to handle the RESTFul api requests, and minimist to parse application parameters.
    • The models/models.js file that will describe the application data output models (if we want to return user information, for example, the "user" data structure will be described here, for example) :
      exports.models = {
          "void": {
              "id": "void",
              "properties": {
              }
          }
      }
      

    • The models/test.js file that will describe a dummy test method and it's body :
      exports.dummyTestMethod = {
          'spec': {
              description : "Test method",
              path : "/test/",
              method: "GET",
              summary : "This is a dummy test method",
              type : "void",
              nickname : "dummyTestMethod",
              produces : ["application/json"]
          },
          'action': function (req, res) {
              res.send(JSON.stringify("test is ok"));
          }
      };
      

  • Run the application
    To run the application, we need to perform the following command line commands within the application directory (on windows, the node.js command-line tool is required) :
    • npm install
      
      This will install dependencies decribed in the package.json file.
    • node app.js
      This will run the application.
  • Application outputs
    Command line output application should be something like this :
    /usr/bin/node app.js
    No --domain=xxx specified, taking default hostname "localhost".
    No --port=xxx specified, taking default port 8080.
    snapJob API running on http://localhost:8080

    And in the WebBrowser, you should see something like this :


    If you click on the "Try it out" button, you should have something like this :



  • Quick note : As you can see from the code, you can launch the application with two optional parameters :
    • The first one, --port=xxx, specifies the port on with the server should listen web requests inputs (default is 8080).
    • The second one, --hostname=xxx, specifies the host url.
    Pretty nice, huh ? That wasn't so hard after all...

    Presentation of the project can be found here.
    Source code for this application can be downloaded from here.

    Next part : snapJob Part II : Managing application logs using LogStash...

    snapJob - A web site for job seekers

    Looking for a job may not be an easy task. In fact, looking for a job IS a job. Many web sites exists to do that, but that's not our point here. We are on a technical blog, so the goal is to see how we could build such a website the best way we can, using new technologies.

    The web site will use several technologies :
    - node.js
    - express
    - swagger
    - angular
    - sockets
    - bootstrap
    - elasticsearch
    - couchdb
    - logstash
    - kibana
    - ...

    The website will also be build like this
    - A RESTFul web service made with swagger to expose data
    - A frontend node.js application, which will mostly by a single-page application that will call a RESTFul API to grab data
    - A couchbase database to store information
    - An elasticsearch database to be able to perform searches
    - A logstash/kibana database and viewer to present logs in a readable way.

    All of this will use opensourced technologies. The only "non-free" tool used will be WebStorm, the IDE I use to code, as it is a very good IDE.

    Part I : Creating a simple RESTFull API that rocks !
    Part II : Managing application logs using LogStash
    Part III : Storing and loading data with Apache CouchDB and node.js
    Part IV : Scalling node.js, and ensure high availability using "cluster"
    Part V : Serving files with nginx
    Part VI : Usage of Apache Kafka to send logs to logstash and propagate application configuration changes
    Part VII : Client side javascript dependencies using Bower

    ... + more to come.