Exploring the Structure of Realtime Transit Data
When we queried the STM Realtime API in Getting Realtime Transit Data from the STM API using Python, this is what the content we got back looked like:
To work with this data returned by the API, we'll need to first understand its structure. What are the different fields? How do we access them? And how do they relate to what is defined in the .proto
file?
The response fields
Let's explore the response (which we saved to a variable called message
) in more detail, and see how it relates to what's in the GTFS Realtime .proto file.
FeedMessage
In the Getting Realtime Transit Data from the STM API using Python, we saw how the FeedMessage
is the root type of the Realtime schema. This is the type of our message
variable, which we can see in Python with:
type(message)
This shows the following output:
gtfs_realtime_pb2.FeedMessage
This FeedMessage
has different fields, each of which also has types. Those types could be simple data types such as string
, bool
, or float
, but the type could also be another Protocol Buffer message.
If we look at the FeedMessage
in the .proto
file, it defines these two fields:
// Metadata about this feed and feed message.
required FeedHeader header = 1;
// Contents of the feed.
repeated FeedEntity entity = 2;
For the field of type FeedHeader
, the field name is a header
, and it's a required
field. (The number = 1
is a unique tag used when it's encoded).
For the second field, FeedEntity
, the name is entity
, and repeated
means a FeedMessage
can contain zero or more of these.
FeedHeader
Looking closer at FeedHeader
, we can see the header info of this response by accessing it via its field name:
message.header
The output looks like this:
gtfs_realtime_version: "2.0"
incrementality: FULL_DATASET
timestamp: 1718676059
So, the header
is what we see in the earlier screenshot of the content returned by the API. It has a timestamp
, the gtfs_realtime_version
, and incrementality
.
Looking for the FeedHeader
message in the .proto
file, we can see each of these with comments to explain what they are. For example, the timestamp
is "the moment when the content of this feed has been created".
It tells us it's in POSIX time
, which we can convert to a more readable date time with Python's datetime module
import datetime
response_date_time = datetime.datetime.fromtimestamp(message.header.timestamp)
print(response_date_time)
Output:
2024-06-17 22:00:59
FeedEntity
The FeedMessage
can have zero or more fields of type FeedEntity
, with the name entity
. We can see the entities in response with:
message.entity
The output we get looks like this.
[id: "42003"
vehicle {
trip {
trip_id: "277329494"
start_time: "21:48:00"
start_date: "20240617"
route_id: "193"
}
position {
latitude: 45.5492249
longitude: -73.6228638
bearing: 44
speed: 3.33336
}
current_stop_sequence: 15
current_status: IN_TRANSIT_TO
timestamp: 1718676044
vehicle {
id: "42003"
}
occupancy_status: STANDING_ROOM_ONLY
}
, id: "42004"
vehicle {
trip {
trip_id: "277329391"
start_time: "20:51:00"
start_date: "20240617"
route_id: "48"
}....
It's returning multiple entities. We can confirm by checking the length of message.entity
:
len(message.entity)
If we want to see just one, for example the third one returned, we could access it like this:
message.entity[3]
If we want to loop through the entities returned and print each one, we can do it with a for loop in Python:
for entity in message.entity:
print(entity)
Each of those entities is a FeedEntity
message. Similar to what we saw with FeedMessage
, if we go to FeedEntity
in the .proto
file, we see it also has fields, with types and names
FeedEntity
should include exactly one of:
optional TripUpdate trip_update = 3;
optional VehiclePosition vehicle = 4;
optional Alert alert = 5;
Returning to the API response, we see what we have are VehiclePosition
s, because they have the field name vehicle
:
[id: "42003"
vehicle {
trip {
trip_id: "277329494"
start_time: "21:48:00"
start_date: "20240617"
route_id: "193"
}
position {
latitude: 45.5492249
longitude: -73.6228638
bearing: 44
speed: 3.33336
}
.....
VehiclePosition
As we continue to walk through the .proto
file, we see VehiclePosition
also has many fields. In our API response, we have a trip
field. The type of this field from the .proto
file is TripDescriptor
, another message type defined in the file:
// The Trip that this vehicle is serving.
// Can be empty or partial if the vehicle can not be identified with a given
// trip instance.
optional TripDescriptor trip = 1;
We also have a position
field. This field from the .proto
file is of type Position
:
// Current position of this vehicle.
optional Position position = 2;
This is another message type in the .proto
file.
TripDescriptor and Position
TripDescriptor
also defines fields, but now some of these are simple data types. The field with the name route_id
is a string
:
// The route_id from the GTFS that this selector refers to.
optional string route_id = 5;
The Position
message's fields are simple data types, rather than other messages. Both the latitude
and longitude
fields are of type float
.
Printing all vehicle locations
So we've went through the structure of the data from the root type, looking at the structure in the .proto
file and what's included in some of the data returned by the API. We know:
-
There's a
header
field, which returns metadata such as thetimestamp
:message.header.timestamp
-
There's an
entity
field. And there can be zero or more of these returned:
message.entity
-
We can loop through all the entities returned with:
for entity in message.entity: print(entity)
-
Each entity has
trip
field, which in turn hasroute_id
. -
Each entity has a
position
field, which in turn has both alongitude
and alatitude
.
Putting all that together, we can print all vehicle locations: the latitude
and longitude
of each vehicle location, the route ID, and the time the real time vehicle positions are for.
import datetime
response_date_time = datetime.datetime.fromtimestamp(message.header.timestamp)
for entity in message.entity:
print(f'At {response_date_time}, {entity.vehicle.trip.route_id} is at {entity.vehicle.position.latitude}, {entity.vehicle.position.longitude}')
print('----')
Additional resources
- Here, I went through the
.proto
file, getting information on different fields through the file's comments. There is also a complete GTFS Realtime Reference available.