Saving Realtime Transit Data to a DataFrame
In the post Getting Realtime Transit Data from the STM API using Python, I looked at getting realtime transit data and printing it to a notebook cell.
Here, I'll instead save it to a Polars DataFrame so it's easy to analyze. I like working with Polars because it has a really intuitive API.
Set up
Install polars
into the same environment as used in the Getting Realtime Transit Data post:
pip install polars
Notebook setup
I write the code. In the first cell of the notebook, import the libraries we'll use:
import requests
import gtfs_realtime_pb2
import polars as pl
The first two here are the same as in the "Getting Realtime Transit Data" post.
Getting the data
The following code comes from the earlier post, the only difference now being that we add it within a function, realtime
, and then instead of hardcoding the API key, we add an api_key
parameter to the function. Now we'll have code that's easier to work with in our notebook and we can call it multiple times to get realtime data for multiple points in time.
def realtime(api_key):
url = "https://api.stm.info/pub/od/gtfs-rt/ic/v2/vehiclePositions"
headers = {
"accept": "application/x-protobuf",
"apiKey": f"{api_key}",
}
response = requests.get(url, headers=headers)
protobuf_data = response.content
message = gtfs_realtime_pb2.FeedMessage()
message.ParseFromString(protobuf_data)
Processing to a DataFrame
So message
contains our realtime data. We are going to process the fields we want from that realtime data. We will initially represent each returned entity as a dictionary and then store each of those dictionaries in a list.
[
{'trip_id':'123', 'route_id':'45', 'longitude'....}
{'trip_id':'456', 'route_id':'29', 'longitude'....}
...
]
With a list of dictionaries, where each dictionary represents one entity, we'll then convert it to a DataFrame.
Here's what the code will look like:
...
# Create a list to store each entity in
data = []
# Get the timestamp from the message header
header_timestamp = message.header.timestamp
# Loop through the entities
for entity in message.entity:
# Create an empty dict to store the entity information
entity_data = {}
# Extract all the relevant fields and add them to the empty dict
entity_data['header_timestamp'] = header_timestamp
entity_data['entity_id'] = entity.id
trip = entity.vehicle.trip
entity_data['trip_id'] = trip.trip_id
entity_data['start_time'] = trip.start_time
entity_data['start_date'] = trip.start_date
entity_data['route_id'] = trip.route_id
position = entity.vehicle.position
entity_data['latitude'] = position.latitude
entity_data['longitude'] = position.longitude
entity_data['bearing'] = position.bearing
entity_data['speed'] = position.speed
entity_data['current_stop_sequence'] = entity.vehicle.current_stop_sequence
entity_data['current_status'] = entity.vehicle.current_status
entity_data['timestamp'] = entity.vehicle.timestamp
vehicle = entity.vehicle.vehicle
entity_data['vehicle_id'] = vehicle.id
entity_data['occupancy_status'] = entity.vehicle.occupancy_status
# Add this record to the list
data.append(entity_data)
# Convert the list of dicts to a polars DataFrame
df = pl.DataFrame(data)
return df
Now calling the function with an API key:
df = realtime("<api_key>")
We get a polars
DataFrame object back:
We can now start to explore the data. For example, using a filter to get the current buses running for a particular route. Here, I use route 45...because it's the best in Montreal.
df.filter(pl.col("route_id")=="45")
In a future post, I'll look at exploring the data in more detail.