Getting Realtime Transit Data from the STM API using Python
I recently used one of the Realtime data APIs provided by Société de transport de Montréal (STM) to get data about bus locations using Python.
This was my first attempt at working with transit data and I learned a lot. I am sharing what I learned here as a guide. I hope you find it useful.
Setting up the project
To set up the project:
-
Create a directory to store all the data and code for the project. I call the directory
transit_data_project
. -
Create and activate a virtual environment. I use Python 3.12, but you can use any recent Python version (3.8+). I use Python's venv to create the virtual environment and activate it in VS Code. See Python environments in VS Code for a good intro to Python environments in VS Code.
-
In the virtual environment, install the packages we'll use:
pip install requests protobuf
Note on what we are installing
requests
to fetch the data.protobuf
to deseralize the data. More on Protocol Buffers later.
Creating an account and getting an API key
To access realtime data, you'll need to create an account on the STM Developer Portal and generate an API key for the Realtime data API.
To get an API key:
- Sign up for the developer portal.
- Once you have access to the portal, create an application from the Application menu, following the instructions for How to create an application available in the Wiki menu. When completing step 4 of those instructions, ensure you add the Données Ouverte iBUS - GTFS-Realtime (v2.0) API.
Retrieving the API key
To retrieve the API key:
- Log in to the portal (if you aren't already) and find the application you created in the Applications menu.
- Select the application and scroll to Authentication & Credentials.
- Copy the API Key.
With an API key, you can start making calls to get realtime data, but first it's good to know a bit more about the data that the API returns.
What is GTFS?
The API is Données Ouverte iBUS - GTFS-Realtime (v2.0), but what are GTFS and GTFS-Realtime?
The General Transit Feed Specification (GTFS) is an open standard for transit data that means it can be used by different organisations to publish their transit data and that data can then easily be consumed by software applications.
There is a static data specification known as GTFS Schedule and a data specification for providing realtime updates, called GTFS Realtime, which is what we explore here.
For more details on GTFS, see the specification docs. There's also a good overview of GTFS Realtime on this Google Developers page.
GTFS Realtime
There are a few different aspects to the GTFS Realtime data. It provides a specification to deliver info on trip updates, service alerts, vehicle positions, and trip modifications.
As of June 2024, it looks like the STM API provides feeds for trip updates and vehicle positions.
When I was exploring this, the data I was particularly interested in was vehicle positions, because I'd like to be able to map them to see all the buses running in Montreal currently.
See Overview of GTFS Realtime feeds for more reading on this.
Data from the API
I've you've retrieved data from APIs before, you may have seen they often return data in JSON format. You might do something like this to get the data and access the JSON in Python.
import requests
url = "https://fantasy.premierleague.com/api/bootstrap-static/"
response = requests.get(url)
data = response.json()
This snippet uses the requests library to make the request, and also to decode the JSON.
The Realtime Data API doesn't return content as JSON, however. It uses Protocol Buffers.
Protocol Buffers
Protocol Buffers are another way to seralize data.
With Protocol Buffers, a schema for the data is defined using a .proto
file. So, there is a GTFS .proto
file for Realtime data that describes the structure of that data. You can see that file at https://gtfs.org/realtime/proto/
Protocol Buffers are language-neutral, and with a .proto
file, we can compile it for use in different languages, allowing us to read/write data based on the specified format. When we compile the file for Python, we'll get a .py
file that we can use.
Here, I go through the steps of compiling the
.proto
file, because I wanted to learn more about Protocol Buffers. However, there are Python GTFS-realtime Language Bindings available on the General Transit Feed Specification website, which mean you can avoid this step. You'll also find a code example there if you want to adapt the code later on this page to use the Python package it provides.
Compiling the proto file
1. Download the compiler
To compile the .proto
file, you need the Protocol Buffers compiler. The latest version at time of writing is available on the Protocol Buffers relases tab here in GitHub.
2. Save the .proto
file to your project directory
Save the file at https://gtfs.org/realtime/proto/ to your project directory.
3. cd
into your project directory in the terminal.
4. Run the compiler on the .proto
file
protoc --python_out=. gtfs-realtime.proto
This will create a .py
file called gtfs_realtime_pb2
See the Python Generated Code Guide for a good intro to Protcol Buffers in Python.
Getting the endpoint
We will also need to know where the data is available to use it in our Python code. This is the API endpoint. To retrieve it:
- Go to the APIs menu.
- Select Données Ouverte iBUS - GTFS-Realtime (v2.0).
- Go to the Specs tab.
- Select Authorize and input your API key.
- Under Positions, select Try it out.
- Copy the Request URL.
Python code
Open your project directory in your favourite code editor. I use VS Code, which makes it easy to work with Jupyter Notebooks.
Right now in our project directory, we have the .proto
file, and the Python output from compiling it, in a file called gtfs_realtime_pb2.py
. Next, we'll want a file to write our code in. In this code, we'll make the request to the STM API and process the response.
In the project directory, create a file called realtime_data.ipynb
and add the following code. We'll walk through it line by line.
import requests
import gtfs_realtime_pb2
url = "https://api.stm.info/pub/od/gtfs-rt/ic/v2/vehiclePositions"
headers = {
"accept": "application/x-protobuf",
"apiKey": "<your-api-key>", # replace this with your API key
}
response = requests.get(url, headers=headers)
transit_data = response.content
message = gtfs_realtime_pb2.FeedMessage()
message.ParseFromString(transit_data)
message
The code explained
- First we import
requests
. This is a Python package for making http requests. - Then, we import the Python module
gtfs_realtime_pb2
, which is what was generated when we compiled the.proto
file. - Next, we specify the endpoint URL that we want to get the data from. See the Getting the Endpoint section from earlier for details on finding this in the STM portal.
- We declare a
dict
ofheaders
to pass with the request to the endpoint, specifying what type of data we accept as a response, and the API key from the STM portal. - With
response = requests.get(url, headers=headers)
we make the request to the endpoint. - We access the response's content with
response.content
and save it to a variable calledtransit_data
. - With
message = gtfs_realtime_pb2.FeedMessage()
we create an instance ofFeedMessage
, which is the root type in the Realtime schema. - That
FeedMessage
object has aParseFromString
method that we pass thetransit_data
we received from the API to. - Finally, by specifying
message
as the last line of the cell, we can see the output in the notebook.
Useful resources
In a future post, I'll look at processing the data and saving it.