Friday, December 27, 2013

Amtrak real-time train data

Recently the US passenger rail operator Amtrak announced a nationwide real-time map of it's services.

That's great...until you try it.

The United States is big, even though the number of trains is fairly small. The number of Google map tiles is big. The polygons that show the routes are enormous. The nationwide train data is big.

That's a lot of data to load --and so slow to display-- if all I really want is realtime data on a single train (is my train late?).
 
After looking at the GETs emitted by all the (slow, unnecessary) javascript on that site, I figured out how to bypass it and pull just the table of current train status. It's 400kb, but it's a lot less than the over 1MB to go through the website and map....

https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-08584582962951999356/
features?version=published&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&maxResults=250&
callback=jQuery19106722136430546803_1388112103417&dataType=jsonp&
jsonpCallback=retrieveTrainsData&contentType=application%2Fjson&_=1388112103419


The only fields required are 'version' and 'key' (which is different from the site cookie):

https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-08584582962951999356/
features?version=published&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4

So we can easily download the current status table using wget:

$ wget -O train_status https://www.googleapis.com/mapsengine/v1/tables/
01382379791355219452-08584582962951999356/features?version=published\&
key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4\&jsonpCallback=retrieveTrainsData


Since parsing 400kb of JSON using shell script will be awkward and annoying, let's change over to Python3:

#/usr/bin/python3
import httplib2
url   = \
"""https://www.googleapis.com/mapsengine/v1/table
/01382379791355219452-08584582962951999356/features?version=published&
key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&jsonpCallback=retrieveTrainsData"""

h = httplib2.Http(None)
resp, content = h.request(url, "GET")

Let's tell Python that the content is a JSON string, and iterate through the list of trains looking for train number 7. That's a handy train because each run takes over 24 hours - it will always have a status.

import json
all_trains = json.loads(content.decode("UTF-8"))
for train in all_trains["features"]:
   if train["properties"]["TrainNum"] == "7":
       do something

After locating the train we care about, here is data about it:

latitude     = train["geometry"]["coordinates"][0]
longitude    = train["geometry"]["coordinates"][1]
speed        = train["properties"]["Velocity"]
report_time  = train["properties"]["LastValTS"]
next_station = train["properties"]["EventCode"]


Let's find a station (FAR) along the route. This is a little tougher, because Python's JSON module reads each Station as a string instead of a dict. We need to identify Station lines, and feed those through the JSON parser (again).

for prop in train["properties"].keys():
          if "Station" in prop:
              sta = json.loads(train["properties"][prop])
              if sta["code"] == "FAR":
                  # Schedule data
                  station_time_zone        = sta['tz']
                  scheduled_arrival_time   = sta['scharr'] # Only if a long stop
                  scheduled_departure_time = sta['schdep'] # All stations

                  # Past stations
                  actual_arrival_time      = sta['postarr']
                  actual_departure_time    = sta['postdep']
                  past_station_status      = sta['postcmnt']

                  # Future stations
                  estimated_arrival_time   = sta['estarr']
                  future_station_status    = sta['estarrcmnt']

The final Python script for Train 7 at FAR is:

#/usr/bin/python3

import httplib2
import json

url = "https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-08584582962951999356/features?version=published&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&jsonpCallback=retrieveTrainsData"
h = httplib2.Http(None)
resp, content = h.request(url, "GET")

all_trains = json.loads(content.decode("UTF-8"))
for train in all_trains["features"]:
   if train["properties"]["TrainNum"] == "7":
      for prop in train["properties"].keys():
          if "Station" in prop:
              sta = json.loads(train["properties"][prop])
              if sta["code"] == "FAR":
                  print('Train number {}'.format(train["properties"]["TrainNum"]))
                  if 'schdep' in sta.keys():
                      print('Scheduled: {}'.format(sta['schdep']))
                  if 'postcmnt' in sta.keys():
                      print("Status: {}".format(sta['postcmnt']))
                  if 'estarrcmnt' in sta.keys():
                      print("Status: {}".format(sta['estarrcmnt']))

And the result looks rather like:

$ python3 Current\ train\ status.py 

Train number 7
Scheduled: 12/26/2013 03:35:00
Status: 1 HR 4 MI LATE

Train number 7
Scheduled: 12/27/2013 03:35:00




Two more items.

First, the Python script can be tweaked to show *all* trains for a station. Just remove the train number filter. To be useful, you might add some time comprehension, since some scheduled times can be a day in the future or past.

Second, the station information table has a similar static page: https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-17620014524089761219/features?&version=published&maxResults=1000&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&callback=jQuery19106722136430546803_1388112103417&dataType=jsonp&jsonpCallback=retrieveStationData&contentType=application%2Fjson&_=13881121034

As with the train status JSON application, many of the fields are dummies. This shorter URL works, too: https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-17620014524089761219/features?&version=published&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&callback=jQuery19106722136430546803_1388112103417




2 comments:

Edward Vielmetti said...

Thanks, this is awesome.

I did a GET on the URL you mention and got back a "login required" message;

curl 'https://www.googleapis.com/mapsengine/v1/tables/
01382379791355219452-08584582962951999356/features?version=published&
key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&jsonpCallback=retrieveTrainsData'
{
"error": {
"errors": [
{
"domain": "global",
"reason": "required",
"message": "Login Required",
"locationType": "header",
"location": "Authorization"
}
],
"code": 401,
"message": "Login Required"
}
}

Ian said...

Well, I can't seem to duplicate your issue. Both curl and wget return lots of great data.

However, I *did* try the Amtrak website in a browser first to see if perhaps the URL had changed (it hadn't).

It's possible that Amtrak is tracking IPs and sessions, and blocking requests from IPs that lack an associated session.

Do play around with it a bit and let us know what you figure out!