Exploring the Structure of Realtime Transit Data

When we queried the STM Realtime API in Getting Realtime Transit Data from the STM API using Python, this is what the content we got back looked like:

Data structure

To work with this data returned by the API, we'll need to first understand its structure. What are the different fields? How do we access them? And how do they relate to what is defined in the .proto file?

The response fields

Let's explore the response (which we saved to a variable called message) in more detail, and see how it relates to what's in the GTFS Realtime .proto file.

FeedMessage

In the Getting Realtime Transit Data from the STM API using Python, we saw how the FeedMessage is the root type of the Realtime schema. This is the type of our message variable, which we can see in Python with:

type(message)

This shows the following output:

gtfs_realtime_pb2.FeedMessage

This FeedMessage has different fields, each of which also has types. Those types could be simple data types such as string, bool, or float, but the type could also be another Protocol Buffer message.

If we look at the FeedMessage in the .proto file, it defines these two fields:

// Metadata about this feed and feed message.
required FeedHeader header = 1;

// Contents of the feed.
repeated FeedEntity entity = 2;

For the field of type FeedHeader, the field name is a header, and it's a required field. (The number = 1 is a unique tag used when it's encoded).

For the second field, FeedEntity, the name is entity, and repeated means a FeedMessage can contain zero or more of these.

FeedHeader

Looking closer at FeedHeader, we can see the header info of this response by accessing it via its field name:

message.header

The output looks like this:

gtfs_realtime_version: "2.0"
incrementality: FULL_DATASET
timestamp: 1718676059

So, the header is what we see in the earlier screenshot of the content returned by the API. It has a timestamp, the gtfs_realtime_version, and incrementality.

Looking for the FeedHeader message in the .proto file, we can see each of these with comments to explain what they are. For example, the timestamp is "the moment when the content of this feed has been created".

It tells us it's in POSIX time, which we can convert to a more readable date time with Python's datetime module

import datetime

response_date_time = datetime.datetime.fromtimestamp(message.header.timestamp)

print(response_date_time)

Output:

2024-06-17 22:00:59

FeedEntity

The FeedMessage can have zero or more fields of type FeedEntity, with the name entity. We can see the entities in response with:

message.entity

The output we get looks like this.

[id: "42003"
vehicle {
  trip {
    trip_id: "277329494"
    start_time: "21:48:00"
    start_date: "20240617"
    route_id: "193"
  }
  position {
    latitude: 45.5492249
    longitude: -73.6228638
    bearing: 44
    speed: 3.33336
  }
  current_stop_sequence: 15
  current_status: IN_TRANSIT_TO
  timestamp: 1718676044
  vehicle {
    id: "42003"
  }
  occupancy_status: STANDING_ROOM_ONLY
}
, id: "42004"
vehicle {
  trip {
    trip_id: "277329391"
    start_time: "20:51:00"
    start_date: "20240617"
    route_id: "48"
  }....

It's returning multiple entities. We can confirm by checking the length of message.entity:

len(message.entity)

If we want to see just one, for example the third one returned, we could access it like this:

message.entity[3]

If we want to loop through the entities returned and print each one, we can do it with a for loop in Python:

for entity in message.entity:
    print(entity)

Each of those entities is a FeedEntity message. Similar to what we saw with FeedMessage, if we go to FeedEntity in the .proto file, we see it also has fields, with types and names

FeedEntity should include exactly one of:

optional TripUpdate trip_update = 3;
optional VehiclePosition vehicle = 4;
optional Alert alert = 5;

Returning to the API response, we see what we have are VehiclePositions, because they have the field name vehicle:

[id: "42003"
vehicle {
  trip {
    trip_id: "277329494"
    start_time: "21:48:00"
    start_date: "20240617"
    route_id: "193"
  }
  position {
    latitude: 45.5492249
    longitude: -73.6228638
    bearing: 44
    speed: 3.33336
  }
 .....

VehiclePosition

As we continue to walk through the .proto file, we see VehiclePosition also has many fields. In our API response, we have a trip field. The type of this field from the .proto file is TripDescriptor, another message type defined in the file:

// The Trip that this vehicle is serving.
// Can be empty or partial if the vehicle can not be identified with a given
// trip instance.
optional TripDescriptor trip = 1;

We also have a position field. This field from the .proto file is of type Position:

// Current position of this vehicle.
optional Position position = 2;

This is another message type in the .proto file.

TripDescriptor and Position

TripDescriptor also defines fields, but now some of these are simple data types. The field with the name route_id is a string:

// The route_id from the GTFS that this selector refers to.
optional string route_id = 5;

The Position message's fields are simple data types, rather than other messages. Both the latitude and longitude fields are of type float.

Printing all vehicle locations

So we've went through the structure of the data from the root type, looking at the structure in the .proto file and what's included in some of the data returned by the API. We know:

  • There's a header field, which returns metadata such as the timestamp: message.header.timestamp

  • There's an entity field. And there can be zero or more of these returned:
    message.entity

  • We can loop through all the entities returned with:

    for entity in message.entity:
        print(entity)
    
  • Each entity has trip field, which in turn has route_id.

  • Each entity has a position field, which in turn has both a longitude and a latitude.

Putting all that together, we can print all vehicle locations: the latitude and longitude of each vehicle location, the route ID, and the time the real time vehicle positions are for.

import datetime

response_date_time = datetime.datetime.fromtimestamp(message.header.timestamp)

for entity in message.entity:
    print(f'At {response_date_time}, {entity.vehicle.trip.route_id} is at {entity.vehicle.position.latitude}, {entity.vehicle.position.longitude}')
    print('----')

Additional resources

  • Here, I went through the .proto file, getting information on different fields through the file's comments. There is also a complete GTFS Realtime Reference available.