Mood Bot – a Serverless Slack Integration

Pull me on GitHub!

Mood Bot

So it’s been a tradition in my office to use Slack to gauge the team’s mood once a week. Previously our PM would post a message asking for feedback and people would add a reaction to show how they were feeling. This worked fine, though there were a couple of issues: firstly it was pretty hard to interpret the weird collection of party parrots and doges, and secondly people tend to follow the herd when they can see how others have reacted.

Here’s my idea for the new, automated workflow…

Screenshot 2017-05-03 12.20.48

By far the cheapest and maybe the simplest way to host all of the code to do this is to “go serverless” using many of the cool features available on AWS to host code, databases and APIs on a pay-per-use basis.  Here’t the technical architecture…


Based on the above there are three broad areas for development: send the webhook to Slack; deal with responses when users click the buttons and serve up a chart showing the results for the week.

Sending the Web Hook

Slack allows you to post Interactive Messages using an Incoming Webhook. In order to do this you’ll need to add a new slack bot integration using their very friendly web UI. I called mine “MoodBot”. Once you have a bot set up, you need to enable “Incoming Webhooks” and add the target URL to an environment variable (see here or here for more details).

The format of the message you send needs to be something like the following.  Note that the “interactive” part of the message is included as an attachment.

const message = {
  "text": ":thermometer: @channel *Time for a Team Temp Check!* @channel :thermometer: \n _Click as many times as you like, only your last vote will be counted._",
  "channel": "@laurence.hubbard",
  "attachments": [
      "text": "How are you feeling this week?",
      "fallback": "I am unable to understand your feelings. Too deep maybe?",
      "callback_id": "mood_survey",
      "color": "#3AA3E3",
      "actions": [
          "name": "mood",
          "text": "Good :+1:",
          "type": "button",
          "value": "good"
          "name": "mood",
          "text": "Meh :neutral_face:",
          "type": "button",
          "value": "meh"
          "name": "mood",
          "text": "Bad :-1:",
          "type": "button",
          "value": "bad"
          "name": "mood",
          "text": "Terrible :rage:",
          "type": "button",
          "value": "terrible"
          "name": "mood",
          "text": "AWESOME!!!   :doge:",
          "type": "button",
          "value": "awesome"

This gives you a slack message looking like this:

The webhook is sent by a Lambda function, which is triggered crontab-style by a CloudWatch event rule.  The Lambda looks like this:

const AWS = require('aws-sdk');
const url = require('url');
const https = require('https');

const kmsEncryptedHookUrl = process.env.kmsEncryptedHookUrl;
let hookUrl;

function postMessage(inputData, callback) {
    const body = JSON.stringify(message);
    const options = url.parse(hookUrl);
    options.method = 'POST';
    options.headers = {
        'Content-Type': 'application/json',
        'Content-Length': Buffer.byteLength(body),

    const postReq = https.request(options, (res) => {
        const chunks = [];
        res.on('data', (chunk) => chunks.push(chunk));
        res.on('end', () => {
            if (callback) {
                    body: chunks.join(''),
                    statusCode: res.statusCode,
                    statusMessage: res.statusMessage,
        return res;


function processEvent(slackMessage, callback) { = slackChannel;
    postMessage(slackMessage, (response) => {
        if (response.statusCode < 400) {
  'Message posted successfully');
        } else if (response.statusCode < 500) { console.error(`Error posting message to Slack API: ${response.statusCode} - ${response.statusMessage}`); callback(null); // Don't retry because the error is due to a problem with the request } else { // Let Lambda retry callback(`Server error when processing message: ${response.statusCode} - ${response.statusMessage}`); } }); } exports.handler = (event, context, callback) => {
    console.log("Sending a temp check request")
    if (hookUrl) {
        // Container reuse, simply process with the key in memory
        processEvent(event, callback);
    } else if (kmsEncryptedHookUrl && kmsEncryptedHookUrl !== '<kmsEncryptedHookUrl>') {
        const encryptedBuf = new Buffer(kmsEncryptedHookUrl, 'base64');
        const cipherText = { CiphertextBlob: encryptedBuf };

        const kms = new AWS.KMS();
        kms.decrypt(cipherText, (err, data) => {
            if (err) {
                console.log('Decrypt error:', err);
                return callback(err);
            hookUrl = `https://${data.Plaintext.toString('ascii')}`;
            processEvent(event, callback);
    } else {
        callback('Hook URL has not been set.');

Setting up the rule to trigger the event is pretty simple. Log into the AWS console, select CloudWatch and choose Events -> Rules from the menu on the left. You can specify when the rule will run using a crontab line.  I used…

0 09 ? * WED *

Which will run at 9am (GMT) every Wednesday.  All this is set up via a reasonably clunky web interface!

Collating Responses

This is the most complicated bit (and there’s an extra tricky bit to deal with too). To handle the responses when users click buttons on the interactive Slack message you need four things: 1. A lambda function to handle the POST request and push data to a database, 2. an API Gateway resource to provide an HTTP end-point, translate the request and forward it to the Lambda function, 3. a database to store the data and finally 4. a config setting in Slack to tell it where to send the POST.

Here’s the code for my Lambda function. It’s simple enough – it just takes the JSON in the incoming request, grabs the bits it wants and adds a few dates and times to create another JSON object to post to DynamoDB. The response sent back to slack is a replacement message, which will overwrite the one already in the channel. Here I add a list of users who have clicked so far (a better man would have pulled this list from the DB!).

var AWS = require('aws-sdk');

var dynamo = new AWS.DynamoDB.DocumentClient();
const table = "MoodResponses";

function updateVoters(original, voter) {
    var updated = original;
    var msg = "\nVoted so far: ";
    var comma = true;
    if(!updated.includes(msg)) {
        updated = updated + msg;
        comma = false;
    if(!updated.includes(voter)) {
        if(comma) {
            updated = updated + ", ";
        updated = updated + "<@" + voter + ">";

    return updated;

Date.prototype.getWeek = function() {
        var onejan = new Date(this.getFullYear(), 0, 1);
        return Math.ceil((((this - onejan) / 86400000) + onejan.getDay() + 1) / 7);

exports.handler = function(event, context, callback) {
    console.log('Received Slack Message: ', JSON.stringify(event, null, 2));
    var mood = event.actions[0].value;
    var date = new Date(Number(event.message_ts) * 1000);
    var key = + "@" + date.getFullYear() + "-" + date.getWeek();
    var record = {
        TableName: table,
        Item: {
            key: key,
            message_ts: Number(event.message_ts),
            mood: mood,
            date_string: date.toISOString(),
            day: date.getDate(),
            month: (date.getMonth() + 1),
            week: date.getWeek(),
            year: date.getFullYear()
    console.log("Created mood record: " + JSON.stringify(record, null, 2));

    dynamo.put(record, function(err, data) {
        if (err) {
            console.error("Unable to add item. Error JSON:", JSON.stringify(err, null, 2));
            callback(null, {
                  text: "An error occurred inserting to DynamoDB. Error attached.",
                  attachments: [{text: JSON.stringify(err, null, 2)}],
                  replace_original: false
        } else {
            console.log("Added item:", JSON.stringify(record, null, 2));
            callback(null, {
                  text: updateVoters(event.original_message.text,,
                  attachments: event.original_message.attachments,
                  replace_original: true

Setting up the API Gateway (The Extra Tricky Bit)

Setting up the API Gateway should be simple enough – you add a new API then a new resource then a new POST method. Then configure the method to forward requests to the Lambda function you just created. However, there are a couple of issues.

Firstly, you need to enable cross site access (CORS) which is easy enough – you just select “Enable CORS” from the “Actions” dropdown. This will open your method up to calls from other sites.

The second and far more infuriating issue is that Slack’s Interactive Buttons send the data in a funky way, encoding it weirdly in the message body rather than just posting JSON as all the other calls do.  After a couple of days of intermittent head-scratching I finally found this Gist, which contains the code to fix the problem:

This code needs to be placed into a Body Mapping Template for your POST method within the AWS API Gateway UI. The following screenshot hopefully give you enough of a clue on how to set this up.  Now, when Slack sends the malformed (IMHO) POST, the API gateway will reformat it and pass it through to your lambda function as if it were a normal JSON payload.

Screenshot 2017-04-24 18.45.47

Database Setup

I decided to use DynamoDB – Amazon’s “Document Database as a Service” (DDaaS?). I’m not sure it’s the perfect choice for this work, since querying is pretty limited, but it is very cheap and incredibly simple to use.

For this step, just use the web UI to create a new table called “MoodResponses”. I used an “id” field as the index.  The lambda creates “id” by concatenating the user ID and current week. This means you automatically limit each user to a single vote per week, which is exactly the functionality I was looking for – more or less for free!

Slack Request URL

Final step is very simple – use the Slack admin UI for your bot to add the address of your API resource as the target for interactive message callbacks.  Go to the admin page and select Features -> Interactive Messages from the panel on the left and paste in the URL of your API Gateway method.

Displaying Results

Though there are more boxes on the diagram below, this is actually the easiest step by far. We serve up a simple D3js “single page app” direct from S3 as static content. This SPA page calls a GET method on the REST service we created above which in turn calls a Lambda function. The Lambda hits out database, pulls out the results and sends them back as a JSON payload.

There’s not much more to explain, so I’ll just link to a Fiddle which includes the code for my front end – this one actually hits my production database, so you’ll be able to see how my team feel!

Serving this code up as a static HTML file is very easy: Create an index.html document and add the javascript, HTML and CSS from the fiddle; create a new S3 bucket and, in the properties for the bucket, enable “Static Website Hosting”; upload your index.html file to the bucket, select it and select “Make Public” from the “Actions” dropdown.

Here’s the code for the Lambda function which is servicing the GET request:

var AWS = require("aws-sdk");

var docClient = new AWS.DynamoDB.DocumentClient();

Array.prototype.countBy = function(key) {
  return this.reduce(function(rv, x) {
    rv[x[key]] = (rv[x[key]] || 0) + 1;
    return rv;
  }, {});

Date.prototype.getWeek = function() {
    var onejan = new Date(this.getFullYear(), 0, 1);
    return Math.ceil((((this - onejan) / 86400000) + onejan.getDay() + 1) / 7);

Date.prototype.previousWeek = function() {
    return new Date(this.getTime() - (7 * 24 * 60 * 60 * 1000));

function forWeek(week) {
    return {
        TableName : "MoodResponses",
        IndexName: 'week-user_id-index',
        KeyConditionExpression: "#wk = :week",
            "#wk": "week"
        ExpressionAttributeValues: {

function handleError(err, callback) {
    console.error("Unable to query. Error:", JSON.stringify(err, null, 2));
    callback(null, {"error": JSON.stringify(err, null, 2)});

function handleData(data, callback, week) {
    console.log("Query succeeded.");
    var results = {
        week: week,
        moods: data.Items.countBy("mood")

    callback(null, results);

function runFor(date, tries, callback) {
    var week = date.getWeek();    
    console.log('Fetching mood results for week: ' + week);
    docClient.query(forWeek(week), function(err, data) {
        if (err) {
            handleError(err, callback);
        } else if(data.Items.length > 0 || tries <= 0) {
            handleData(data, callback, week);
        } else {
            runFor(date.previousWeek(), tries - 1, callback);

exports.handler = function(event, context, callback) {
    runFor(new Date(), 1, callback);

One Last Thing!

Dynamo can only query against fields which are part of an index. Here we need to query by week number, so I added a new index to my Dynamo table by week. This took 5 minutes to update (even though the table only had 5 records in it at the time!) but was simple enough to do.  If you look at the code above, you can see where I specify the index in the query parameters.

Wrap Up

So this all works really well.  There’s lots left to do: making the results look prettier and looking at how the sourcecode is deployed and managed being two things at the top of my list.

Slack is the industry standard tool for team collaboration these days, and bots and integrations amp up the power and productivity at your disposal. Build status, Jira tickets, team morale, coffee orders and whatever else you fancy can all be brought together with conversational APIs, simplifying just about everything.

On the AWS side, there’s still a lot of googling required to build this sort of thing, and sometimes information is scarce. Those who enjoy building “proper applications” using IDEs like IntelliJ or Visual Studio are going to find this frustrating – the pieces feel disjoint and uncontrolled sometimes.  However, all in all it’s pretty cool what you can do without a single server in the mix.

It’s hard to deny that this development model is going to be the de-facto standard within the next couple of years, as it’s just so damned quick and simple. So get out and get serverless!

Amazon Athena – First Look

Amazon recently launched Athena – their answer to Google’s Big Query. It’s basically an SQL interpreter which runs over files in S3.  It reminds me of Apache Drill, but people round the office say it looks more like Hive.

AWS Athena is in no way associated with the ancient goddess of wisdom. Any similarity is purely coincidental.

The barrier to entry is very low. Upload the data files (CSV, Parquet and JSON are supported, amongst others), define a table, run a query. All this is done using a simple query editor.

Quick “Hello World”

To test Athena I uploaded some Parquet files, containing data from the open house price dataset to an S3 bucket (I had wanted to load the CSV files “as is” but due to limitations in the CSV reader I couldn’t). I then declared a table like so:

CREATE EXTERNAL TABLE IF NOT EXISTS house_prices.price_paid (
  `id` string,
  `price` int,
  `date` string,
  `postcode` string,
  `property_type` string,
  `old_or_new` string,
  `tenure_duration` string,
  `address1` string,
  `address2` string,
  `street` string,
  `locality` string,
  `town` string,
  `district` string,
  `county` string,
  `ppd_category` string,
  `record_status` string,
  `month` string 
  'serialization.format' = '1'
) LOCATION 's3://'

And a few seconds later we’re ready to go:

select town, avg(price) as price 
from house_prices.price_paid 
group by town 
order by price desc
1	GATWICK	2683329.6666666665
2	THORNHILL	985000.0
3	VIRGINIA WATER	741140.2347652348
4	CHALFONT ST GILES	731333.515394913
5	COBHAM	610556.8430019713
6	BEACONSFIELD	587652.6552173913
7	KESTON	584417.7181571815
8	ESHER	551595.5002180074
9	GERRARDS CROSS	513740.5765843979
10	ASCOT	461468.9531164819

Good Stuff

The ease of setup in simple cases makes this technology very lightweight. If you already have data in S3, you can just start using Athena straight away. It’s perfect for ad-hoc querying, sanity checking and QA/test activities.

Athena uses a “server less” model – you pay for the rows you scan – no need to set up a cluster etc. At the time of writing, it’s something like $5 per 1TB of data scanned. As with everything on AWS, this is bearable but not exactly cheap.

Not Good Stuff

At the time of writing, Athena is very new. There are many missing features at the moment, which I hope Amazon will be adding in future.

Firstly, CSV read is limited to pure comma-separated data. Quotes are not supported. This is painfully annoying, as almost all CSV data has quotes around string fields. If you have to transform existing CSV data to remove quotes, the cost is going to outweigh any benefit you might have got from doing the direct queries.

The other annoyance to me is the lack of options for saving data back to S3. select into and create as select style statements are not (yet) supported. This breaks a key use-case for me: the ability to do one-off transforms of legacy or 3rd party data to new file formats. Wouldn’t it be nice to take a CSV file, uploaded by a 3rd party, change a few field names, transform to parquet (or JSON or whatever) and save back into your data warehouse? Yes it would. But you can’t. Sorry.


Athena is pretty good if you want a simple tool for doing basic ad-hoc querying over data stored in S3 – provided that data is in a compatible format.

Sadly though, Athena is just not ready for the big time, as yet. With the addition of support for more data formats and the ability to save data back to S3, it could be an incredibly useful tool, but right now I could count the number of use-cases on one hand.

One to watch!

Evolutionary Algorithm: Playable Demo

Here I’m combining a bit of visualisation with my other favourite subject – the Evolutionary Algorithm (or Genetic Algorithm if you prefer).  I’m not going to write anything about the properties of the algorithm – you can just play with the controls below the chart and see how the different settings effect its ability to find a good solution, adapt to changes and explore the problem space.

The problem: Find a value of x which maximises the value of y. The function is a set of sinusoidal waves of varying frequency and amplitude. The blue line shows the “fitness” for each value of x.

Basically, a population of different solutions is maintained – in this case, each solution is simply a value for x. Every individual has a fitness which can be calculated based on it’s value. Each iteration (100ms here) a solution is removed from the population – killed by selective pressure. Fitter individuals have a greater chance at surviving, less fit individuals have a less of a chance.

A replacement solution is “bred” each iteration, to replace the solution killed-off by selective pressure. This new individual is generated by combining the “genetic material” of one or more parents. In this case, just by taking the x value of a single parent. Importantly, a mutation is applied to the new solution – this is key to exploring the problem space effectively.

And that’s all there is to an Evolutionary Algorithm – it’s just a way of finding the right combination of input variables to maximise some arbitrarily complex fitness function. It does this through a guided random search.


Well travelled or just plain old?

A friend of mine has always said that young cars with high mileage are better than old cars with low mileage. The theory being that company cars, which have spent their time cruising on the motorways, have had a much easier life than their stay-at-home cousins who’ve done short hops around town and sat on their driveways seizing up.

So I pointed some very simple Spark queries at the UK government’s open MOT data to see what I could find (you can read about the last time I did this here). First factoid to note is that both mileage and age are relevant when it comes to predicting pass rates. The following two charts show pass rate vs mileage and age.

Pass Rate vs Mileage

Pass Rate vs Age

To look at all three variables together I created the following chart which shows shows age on on the x axis and mileage on the y. Pass rate is a colour scale with red being the worst and green the best. Green squares show combinations of mileage and age at which vehicles are more likely to pass their MOT on the first attempt. Red squares show combinations where a first-try failure is likely.

Pass Rate vs Mileage and Age

There is some truth to my mate’s theory – at least if this chart is to be believed – the pass rate for 3-5 year old cars looks pretty good even at very high mileages. Looking horizontally for very-low-mileage cars of increasing age there seems to be something quite odd going on for vehicles on less than 20k miles. For the 20k-40k range there does seem to be a green stripe across the ages, but it is not as apparent as it’s vertical counterpart.

So should we all be buying a four-year-old car with 180k miles on the clock? Well, no. At least not if we want to keep it for more than a year or two. Cars with high mileages on the clock go into the red much earlier than those with low mileage (based on the fact that vehicles can only move right and up through the chart as they get older and drive further).

Pass Rate vs Mileage and Age… to the MAX

That last chart shows the same heat-matrix view, but to the full extents of the data. There are some interesting facts hidden in that chart… but I’ll leave them as an exercise for the reader!144

UPDATE: Proper Stats:

So it turns out that calculating correlation and covariance with Spark is pretty easy. Here’s the results and the code:

For cars < 20 years and < 250,000 miles
cov(testMileage, pass) = -3615.011
corr(testMileage, pass) = -0.195
cov(age, pass) = -0.401
corr(age, pass) = -0.235
For all data
cov(testMileage, pass) = -3680.0456
corr(testMileage, pass) = -0.177
cov(age, pass) = -0.383
corr(age, pass) = -0.152

Looking at cars in the “normal” range (i.e. less than 20 years old and less than 250k miles) there’s a stronger correlation between age and pass rate than between mileage and pass rate. Interestingly, looking over the full range of the data this relationship is inverted, with mileage being very slightly better.  There’s little to separate the two as a predictor for pass or fail – not least because age and mileage are largely dependant on each other (with a correlation of 0.277 across all data).

Basic statistical functions are available under DataFrame.stat. See the calls hidden in the println lines below:

  it should "calculate covariance and correlation for normal cars" in {
    val motTests =

    val df = motTests
      .filter("testClass like '4%'") // Cars, not buses, bikes etc
      .filter("testType = 'N'") // only interested in the first test
      .filter("age &amp;lt;= 20")
      .filter("testMileage &amp;lt;= 250000")
      .withColumn("pass", passCodeToInt(col("testResult")))

    println("For cars &amp;lt; 20 years and &amp;lt; 250,000 miles")
    println(s"cov(testMileage, pass) = ${df.stat.cov("testMileage", "pass")}")
    println(s"corr(testMileage, pass) = ${df.stat.corr("testMileage", "pass")}")

    println(s"cov(age, pass) = ${df.stat.cov("age", "pass")}")
    println(s"corr(age, pass) = ${df.stat.corr("age", "pass")}")

Moving data around with Apache NiFi

I’ve been playing around with Apache NiFi in my spare time (on the train) for the last few days. I’m rather impressed so far so I thought I’d document some of my findings here.

NiFi is a tool for collecting, transforming and moving data. It’s basically an ETL with a graphical interface and a number of pre-made processing elements. Stuffy corporate architects might call it a “mediation platform” but for me it’s more like ETL coding with Lego Mindstorms.

This is not a new concept – Talend have been around for a while doing the same thing. Something just never worked with talend though, perhaps they abstracted at the wrong level or prerhaps they tried to be too general. Either way, the difference between Talend and NiFi is like night and day!

Screenshot 2016-07-01 17.44.54

Garmin Track Data

So I don’t have access to a huge amount of “big data” on my laptop, and I’ve done articles on MOT and National Rail data recently, so I decided to use a couple of gigs of Garmin Track data to test NiFi. The track data is a good test as it’s XML: exactly the sort of data you don’t want going into your big data system and therefore exactly the right use-case for NiFi.

<?xml version="1.0" encoding="UTF-8"?>
<TrainingCenterDatabase xsi:schemaLocation="blah blah blah">
    <Activity Sport="Biking">
      <Lap StartTime="2015-04-06T13:26:53.000Z">

          <!-- ... -->


The only data in the file I’m particularly interested in is “where I went”. The calorie counts and suchlike are great on the day, but don’t tell us much after the fact. So, the plan is to extract the Latitude and Longitude fields from the Track element. Everything else is just noise.

Working with NiFi

NiFi uses files as the fundamental unit of work. Files are collected, processed and output by a flow of processors. Files can be transformed, split or combined into more files as needed. The links between processors act as buffers, queuing files between processing stages.

Screenshot 2016-07-04 07.40.18

The first part of the flow gathers the XML files from their location on disk (since Garmin charge an obcene amount for access to your own data via their API), splits the XML into multiple files then uses a simple XPath expression to extract out the Latitude and Longitude.

A GetFile processor reads whole XML file. Next a SplitXml processor takes the XML in each file and splits into multiple files by chopping the XML at a secified level (in this case 5) making a set of new files, one per TrackPoint element. Following that, an EvaluateXPath processor extracts the Lat and Long and stores them as attributes on each individual file.

Screenshot 2016-07-04 07.47.49

The rather naive XML split will return all elements at the specified level within the document tree. XPath is fine with that, it will either match a Lat and Long or it won’t. The issue is that we’ll end up with a large number of files where no location was found. The RouteOnAttribure process can be used to discard all these empty files. Settings shown below:

Screenshot 2016-07-04 18.28.52

So, now we have a stream of files (actually empty files!) each of which is decorated with attribues for Latitude and Longitude. The last part of the flow is all about saving these to a file.

Screenshot 2016-07-04 18.31.06

The first processor in this part of the flow takes the attributes of each file and converts them to JSON, dropping the JSON string into the file body. We could just save the file at this stage, but that would be a lot of files. The second block takes a large number of single-record JSON files and joins them together to create a single line-delimited JSON file which culd be read by something like Storm or Spark. I had all sorts of trouble escaping a carriage return within the MergeContent block, so in the end I stored a carriage return character in a file called “~/newLine.txt” and referenced that in the processor settings. Not pretty, but it works. The last block in the flow saves files – not much more to say about that!

Drawbacks and/or Benefits

It took a little over one train journey to get this workflow set up and working and most of that was using Google! Compared to using Talend for the same job it was an abslute dream!

Perhaps the only shortcoming of the system is that it can’t do things like aggregations – so I can’t convert the stream of locations to a “binned” map wit counts per 50x50m square for example. De-duplication doesn’t seem possible either… but if you think about how these operations would have to be implemented, you realise how complicated and resource hungry they would make the system. If you want to do aggregations, de-duplication and all that jazz, you can plug NiFi into Spark Streaming.

Most data integration jobs I’ve seen are pretty simple: moving data from a database table to HDFS, pulling records from a REST API, downloading things from a dropzone… and for all of these jobs, NiFi is pretty much perfect. It has the added benefit that it can be configured and maintaned by non-technical people, which makes it cheaper to integrate into a workflow.

I like it!

Thatcham Trains

This is the final article in my brief series on the National Rail API. As usual, the code can be found on github:

The Idea

There are a million and one different websites and apps which will tell you the next direct train from London Paddington to Thatcham (or between any other two railway stations) but all those apps are very general. You have to struggle through the crowds on the Circle Line while selecting the stations from drop-downs and clicking “Submit”, for example. Wouldn’t it be good if there was a simple way to see the information you need without any user input? Even better, what if you could get notifications when the direct trains are delayed or cancelled?

Enter stage left, the Twitter API. This article is all about a simple mash-up of the National Rail and twitter APIs to show information on direct trains between London and Thatcham. You can use it for other stations too – it’s all in the command line parameters.

People who live in Thatcham can use my twitter feed @ThatchamTrains or you can set up your own feed and run the python script to populate it with the stations you’re interested in.

The script also sends direct messages if the trains are more than 15 minutes late or cancelled.

Using the script

I host my instance of the script on my raspberry pi, which is small, cheap, quiet and can be left on 24×7 without much hassle. These instructions are therefore specific to setup on the pi, but the script will work on Windows and other version of Linux too.

1. Install the python libraries you need. You may already have these installed.

$ sudo easy_install argparse
$ sudo easy_install requests
$ sudo easy_install xmltodict
$ sudo easy_install flask

2. Get a twitter account and a set of API keys by following the steps on the Twitter developers page. You’ll need four magic strings in total, which you pass to the script as command line parameters.

3. Get a national rail API key from their website. You just need one key for this API, which is nice!

4. Clone the source and run the script using the three commands below… simples!

$ git clone
$ cd national-rail
$ python --rail-key YOUR_NATIONAL_RAIL_KEY --consumer-key YOUR_CUST_KEY --consumer-secret YOUR_CUST_SECRET --access-token YOUR_ACCESS_TOKEN --access-token-secret YOUR_ACCESS_TOKEN_SECRET --users YourTwitterName --forever

When run with the –forever option, the script will query the NR API and post to twitter every 5 minutes. Note that there are some basic checks to prevent annoying behaviour and duplicate messages. You can specify one or more usernames who you’d like to receive direct messages when there are delays and cancellations; note that only users who follow you can receive DMs on twitter.

You can use other stations by specifying the three character station codes (CRS) for “home” and “work” on the command line. Here are the command line options:

$ python --help

usage: [-h] [--home HOME] [--work WORK] [--users USERS]
                      [--forever] --rail-key RAIL_KEY --consumer-key
                      CONSUMER_KEY --consumer-secret CONSUMER_SECRET
                      --access-token ACCESS_TOKEN --access-token-secret

Tweeting about railways

optional arguments:
  -h, --help            show this help message and exit
  --home HOME           Home station CRS (default "THA")
  --work WORK           Work station CRS (default "PAD")
  --users USERS         Users to DM (comma separated)
  --forever             Use this switch to run the script forever (once ever 5 mins)
  --rail-key RAIL_KEY   API Key for National Rail
  --consumer-key CONSUMER_KEY
                        Consumer Key for Twitter
  --consumer-secret CONSUMER_SECRET
                        Consumer Secret for Twitter
  --access-token ACCESS_TOKEN
                        Access Token for Twitter
  --access-token-secret ACCESS_TOKEN_SECRET
                        Access Token Secret for Twitter

The Code

There’s not much to say about the code, since I’ve covered the National Rail API in graphic detail in a previous article. The only real difference between this script and my previous adventures with the API is that this time I did unit testing.

There’s a fair bit of business logic in the twitter app: rules about when to post and when to be quiet, duplicate message detection and all sorts of time- and data-based rules which can’t be tested using real data. It’s also pretty bad form to test code like this against a live API, so I mocked out the NR query and the Twitter API and wrote a small suite of tests to check that the behaviour is right.

Like I said, all the code is on GitHub, so I won’t bang on about it here.

Live Train Route Animation

The code for this article is available on my github, here:

Building on the Live Departures Board project from the other day, I decided to try out mapping some departure data. The other article shows pretty much all the back-end code, which wasn’t changed much.


The AngularJS app takes the routes of imminent departures from various stations and displays them on a Leaflet map as polylines. I used this great Snake library to animate the lines as they appear. Map tiles come from CartoDB, which is free, unlike Mapbox.


Here’s the code-behind for the Angular app:

var mapApp = angular.module('mapApp', ['ngRoute']);

		    	controller: 'MapController',
			    templateUrl: 'map.html'
		    .otherwise({redirectTo: '/'});
	.controller('MapController', function($scope, $http, $timeout, $routeParams) {

        var mymap ='mapid').fitBounds([ [51.3933180851, -1.24174419711], [51.5154681995, -0.174688620494] ]);
        L.tileLayer('http://{s}{z}/{x}/{y}.png', {
            attribution: '© <a href="">OpenStreetMap</a> © <a href="">CartoDB</a>',
            subdomains: 'abcd',
            maxZoom: 19

        $scope.routeLayer = L.featureGroup().addTo(mymap);
        $scope.categoryScale = d3.scale.category10();

        $scope.doStation = function(data) {
                var color = $scope.categoryScale(route[0].crs)
                var path = []

                route.filter(function(x) {return x.latitude && x.longitude}).forEach(function(station) {
                    var location = [station.latitude, station.longitude];

                var line = L.polyline(path, {
                    weight: 4,
                    color: color,
                    opacity: 0.5


        $scope.refresh = function() {

            $scope.crsList.forEach(function(crs) {
                $http.get("/routes/" + crs).success($scope.doStation);

            }, 10000)

        $http.get("/loaded-crs").success(function(crsData) {
            $scope.crsList = crsData;