Transport for London (TfL) is a fascinating organization. The iconic red circle is practically part and parcel of the everyday life of the 1.3 billion people that the TfL network transports across London.
As part of their mandate, TfL is constantly on the search for ways better manage traffic, train capacity, maintenance, and even account for air quality during commutes. These are some very interesting challenges, so when TfL, Amazon Web Services and Geovation hosted a public hackathon, we at WSO2 decided to come up with our own answers to some of these problems.
Framing the problem
TfL pushes out a lot of data regarding the many factors that affect public transport within Greater London; a lot of this is easily accessible via the TfL Unified API from https://api.tfl.gov.uk/. In addition to volumes of historical data, TfL also controls a network of SCOOT traffic sensors deployed across London. Given a two-day timeframe, we narrowed our focus down to three main areas:
- To use historical data regarding the number of passengers at stations to predict how many people would be on a selected train or inside a selected station
- To use Google Maps and combine that with sensor data from TfL sensors across the city to pick the best routes from point A to B, while predicting traffic, five to ten minutes into the future, so that commuters could pick the best routes
- To pair air quality data from any given region and suggest safer walking and cycling routes for the denizens of Greater London
Using WSO2 Complex Event Processor (which holds our Siddhi CEP engine) with Apache Spark and Lucene (courtesy of WSO2 Data Analytics Server), we were able to use TfL’s data to build a demo app that provided a solution for these three scenarios.
For starters, here’s how we addressed the first problem. With data analysis, it’s not just possible to estimate how many people are inside a station; we can break this down to understand traffic from entrance to a platform, from a platform to the exit, and between platforms. This makes it possible to predict incoming and outgoing crowd numbers. The map-based user interface that you see above allows us to represent this analysis.
The second solution makes use of the sensor network we spoke of earlier. Here’s how TfL sees traffic.
The red dots are junctions; yellow dots are sensors; dashed lines indicate traffic flow. The redder the dashed lines are, the denser the traffic at that area. We can overlay the map with reported incidents and ongoing roadworks, as seen in the screenshot below:
Once this picture is complete, we have the data needed to account for road and traffic conditions while finding optimal routes.
This is what Google suggests:
We can push the data we have to WSO2 CEP, which runs streaming queries to perform flow, traffic, and density analytics. Random Forest classification enables us to use this data to build a machine learning model for predicting traffic - a model which, even with relatively little data, was 88% accurate in our tests. Combining all of this gives us a richer traffic analysis picture altogether.
For the third problem - the question of presenting safer walking and cycling routes using air quality - our app pulled air pollution data from TfL’s Unified API.
This helps us to map walking routes; since we know where the bike stations are, it also lets us map safer cycling routes. It also allows us to push weather forecasts and air quality updates to commuters.
A better understanding of London traffic
In each scenario, we were also able to pinpoint ways of expanding on, or improving what we hacked together. What this essentially means is that we can better understand traffic inside train stations, both for TfL and for commuters. We can use image processing and WiFi connections to better gauge the number of people inside each compartment; we can show occupancy numbers in real-time across screens in each station, and on apps, and assist passengers with finding the best platform to catch a less crowded compartment.
We can even feed Oyster Card tap data into WSO2 Data Analytics Server, apply machine learning to build a predictive model, and use WSO2 CEP to predict source to destination travel times. Depending on screen real estate, both air quality and noise level measures could be integrated to keep commuters better informed of their travelling conditions.
How can we improve on traffic prediction? By examining historical data, making a traffic prediction, then comparing that with actual traffic levels, we could potentially predict traffic incidents that our sensors might have missed. We could also add location-based alerts pushed out the commuters - and congestion warnings and time-to-target countdowns on public buses.
We have to say that there were a number of other companies hacking away on excellent solutions of their own; it was rather gratifying to be picked as the winners of the hackathon. For more information, and to learn about the solutions that we competed against, please read TfL’s blog post on the hackathon.