Measuring Transit Equity

Introduction

In this lab, we will learn how to use different types of data to explore and visualize access to transportation. This will allow us to identify the spatial disparities due to unequal distribution of transportation infrastructure (e.g. transit stops, transit routes, transit frequency, etc.). We are also able to see the repercussions on commute quality (i.e., trip duration, transportation expenses, etc.)

In this lab(link) the main things you will learn are the following:

Calculate and visualize accessibility measure to transportation infrastructure (i.e., bus stops, charging stations, etc)
Calculate and visualize percentage of public transportation users
Calculate and visualize average commute times

You can build on those to see the correlation between income/race/gender and the different transportation equity measures.

Github Lab Repository

If you have not already done so, follow this link to accept the lab Github Classroom assignment repository.

Access to transportation infrastructure

In this example I am using bus stops as a proxy for transportation infrastructure. You can use many other infrastructure types and explore their distribution in relation to socioeconomic and demographics of you neighborhood/region. You can find transit information from GTFS as well as open source portals. For the example I showed during class on electric charging stations, I had to grab this data from a private provider.

First lets start by loading our libraries. By now, you should be familiar with tidycensus, tidyverse, and tigris. I am introducing Simple Futures (sf) for some spatial analysis (i.e., calculating the density of buses per census tract)

Code

# Load required packages
library(tidycensus)
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.1     ✔ purrr   0.3.4
✔ tibble  3.2.1     ✔ dplyr   1.1.1
✔ tidyr   1.2.0     ✔ stringr 1.4.1
✔ readr   2.1.2     ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Code

library(sf)

Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE

Code

library(tigris)

To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.

Let’s download our shape files for our neighborhood. I am using Cook County as my case study. I am using the census tract level. I have defined a certain type of projection crs=4326 to unify our projection system

Code

# Download and filter Chicago census tracts
cook_county_tracts <- tracts(state = "IL", county = "Cook") |>
  st_transform(crs = 4326)

Retrieving data for the year 2020


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |============================================                          |  64%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%

Now that we have our census tract shapefile, let’s download the bus stops data, read them, and project them.

Code

# Read bus stop data
bus_stops <- read_csv("stops.txt")

Rows: 11217 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): stop_name, stop_desc
dbl (7): stop_id, stop_code, stop_lat, stop_lon, location_type, parent_stati...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

# Convert bus_stops data to an sf object
bus_stops_sf <- st_as_sf(bus_stops, coords = c("stop_lon", "stop_lat"), crs = 4326)

Last data input we need is the socioeconomic of our neighborhood. For the sake of this tutorial I am limiting it to population and household median and income. But please feel free to explore further attribute.

Code

# Specify the ACS variables you want to download
variables <- c("B01003_001", # Total population
               "B19013_001") # Median household income

# Download the data for Cook County census tracts
cook_county_data <- get_acs(geography = "tract",
                            state = "17",
                            county = "031",
                            variables = variables,
                            year = 2019,
                            output="wide",
                            survey = "acs5",
                            geometry = FALSE)

Getting data from the 2015-2019 5-year ACS

To make our table more legible, I am renaming our columns and rescaling the median income variable

Code

cook_county_data <- cook_county_data |> 
  rename(name = NAME, pop_tot = B01003_001E, med_income = B19013_001E) |> 
  select(GEOID, name, pop_tot, med_income) 

# scale income data so you are able to correlate it with transportation infrastructure
cook_county_data <- cook_county_data |>  mutate(income_scaled = scale(med_income))

library(gt)
cook_county_data |> head() |> gt()

GEOID	name	pop_tot	med_income	income_scaled
17031630200	Census Tract 6302, Cook County, Illinois	1825	37422	-0.83877210
17031580700	Census Tract 5807, Cook County, Illinois	5908	47000	-0.57701469
17031590600	Census Tract 5906, Cook County, Illinois	3419	46033	-0.60344186
17031600700	Census Tract 6007, Cook County, Illinois	2835	45294	-0.62363801
17031611900	Census Tract 6119, Cook County, Illinois	1639	24507	-1.19172646
17031804505	Census Tract 8045.05, Cook County, Illinois	3445	71438	0.09085204

Great! Now we have our census demographic data ready. Let’s spend time on preparing our bus stops data. Remeber these are represented in Lat, Long. We have to associate them to the correct census tracts. To do so, we will have to spatial join the bus shapefile to the census tract shapefile. We can then calculate how many bus stops per tract are available.

Code

# buses per census tract
#sf_use_s2(FALSE)
bus_stops_per_census_tract <- bus_stops_sf %>%
  st_join(cook_county_tracts) %>%
  group_by(TRACTCE) %>%
  summarise(bus_stop_count = n()) %>%
  ungroup()

View results

Code

bus_stops_per_census_tract |> head() |> gt()

TRACTCE	bus_stop_count	geometry
010100	2	c(-87.6770962, -87.6663015, 42.02289954, 42.01930539)
010201	3	c(-87.68426462, -87.68054061, -87.6777439, 42.01941115, 42.01936686, 42.01937362)
010202	18	c(-87.67632528, -87.67580666, -87.67532833, -87.675138, -87.67380984, -87.67305981, -87.67316244, -87.672892, -87.67290599, -87.67281382, -87.67279844, -87.67279597, -87.6724214, -87.67456121, 42.01918517, 42.01798492, 42.01611703, 42.01595, 42.01665359, 42.0180352, 42.01896555, 42.019063, 42.01874262, 42.01850551, 42.01822603, 42.01776729, 42.01744751, 42.01276478)
010300	17	c(-87.67058707, -87.67103858, -87.6692571266, -87.669092, -87.66932276, -87.6689834, -87.66617692, -87.665802, -87.66551305, -87.665018, -87.66511368, -87.66467448, -87.663375, -87.663994, -87.664145, -87.66469399, 42.01928162, 42.01808861, 42.0160204165, 42.015876, 42.01931247, 42.0190888, 42.01919502, 42.018021, 42.01802283, 42.016268, 42.01580068, 42.01520955, 42.013005, 42.014113, 42.013757, 42.01482404)
010400	9	c(-87.65693556, -87.65834886, -87.66071682, -87.66060059, -87.660658, -87.660848, -87.661598, -87.661995, -87.662803, 41.99833161, 41.99827386, 42.003871, 42.00038209, 42.001702, 42.00572, 42.008133, 42.009358, 42.01166)
010501	8	c(-87.66177918, -87.661749, -87.662146, -87.6623874, -87.662952, -87.663526, -87.66563004, -87.66379711, 42.00790263, 42.007777, 42.009002, 42.0092214, 42.011414, 42.012649, 42.0078605, 42.00787643)

Now, lets create the final dataset that includes both the bus data and demographics/socioeconomics of our neighborhood. Use join.

Code

# Merge the count of bus stations per census tract back to the original census tracts shapefile.
cook_county_tracts_merged <- st_join(cook_county_tracts, bus_stops_per_census_tract, left = FALSE)

# Merge the count of bus stations per census tract back to the original census tracts shapefile.
cook_county_tracts_merged <- left_join(cook_county_tracts_merged, cook_county_data, by = "GEOID")

Now let’s visualize this and see the ratio of population served by buses. Where are they clustered and if there are any inferences we can make.

Code

# Create a map.
ggplot() +
  geom_sf(data = cook_county_tracts_merged, 
          mapping = aes(fill =cut_number((pop_tot/bus_stop_count), 5)), 
          color = "white", 
          size = 0.1) +
  scale_fill_brewer(palette = "PuBu", 
                     name = "Bus stations normalized per populatin",
                    na.value = "gray")+       
  labs(title = "Bus Stations per Census Tract in Cook County",
       subtitle = "Number of Bus Stations",
       caption = "Source: Census data and Bus Stations data") +
  theme_minimal() +
  theme(panel.background = element_blank()) +                   
  coord_sf(datum = NA)

Now it’s your turn to explore other demographic attributes and see how they are correlated to various transportation infrastructure.

Travel behavior

Providing transportation infrastructure is not sufficient for assuring equitable access to mobility. By investigating travel behavior, one can see the discrepancies in accessibility, as well as the burden that certain communities have to face. In this example, I will look at transportation modes and commute times. To make this more informative, try to explore the relationship between average commute times and race ratios per census tract.

For this analysis we will need commute information from ACS and census tract shape files for visualization.

Let’s first start by identifying the variables we are interested in. I haven’t renamed them, but good practice is to always rename your columns with legible names.

Code

variables <- c("B19013_001", # Median Household Income
"B01003_001", # Total population
"B03002_003", # White population
"B03002_012", # Hispanic or Latinx population
"B03002_004", # African American population
"B08201_001", # Number of households
"B08141_001", # population using transportation
"B08141_006", # population drive
"B08141_011", # population carpool
"B08141_016", # population transit
"B08141_021", # population walk
"B08141_026", # population other
"B08141_031", # Population working from home 
"B08303_001", # Population of commuters to work
"B08303_002", # Population of commuters less than 5 min
"B08303_003", # Population of commuters 5 to 9 minutes
"B08303_004", # Population of commuters 10 to 14 minutes
"B08303_005", # Population of commuters 15 to 19 minutes
"B08303_006", # Population of commuters 20 to 24 minutes 
"B08303_007", # Population of commuters 25 to 29 minutes 
"B08303_008", # Population of commuters 30 to 34 minutes 
"B08303_009", # Population of commuters 35 to 39 minutes 
"B08303_010", # Population of commuters 40 to 44 minutes 
"B08303_011", # Population of commuters 45 to 59 minutes 
"B08303_012", # Population of commuters 60 to 89 minutes
"B08303_013") # Population of commuters 90 or more minutes

Let’s download the data from ACS using tidycensus.

Code

cook_county_data <- get_acs(geography = "tract", 
                            variables = variables, 
                            state = "IL", 
                            county = "Cook",
                            survey = "acs5", 
                            output="wide",
                            year = 2019)

Getting data from the 2015-2019 5-year ACS

We can use the same shapefile from the previous exercise, so there is no need to re download the census tract shapefile. Now let’s join our demographic data to our shapefile using GEOID as our common index.

Code

cook_county_tracts <- tracts(state = "IL", county = "Cook") |>
  erase_water(area_threshold = .9, year = 2020) %>% 
  st_transform(crs = 4326)

Retrieving data for the year 2020

Fetching area water data for your dataset's location...

Erasing water area...
If this is slow, try a larger area threshold value.

Code

cook_county_data_sf <- left_join(cook_county_tracts, cook_county_data, by = c("GEOID" = "GEOID"))

Now that we have master dataframe with all the data in it. Let’s go ahead and calculate two measure:

Percentage of transit users (pct_public_transit)
Average commute time (avg_commute_travel_time). For this measure, I am multiplying each range of commute time by the lowest value in the range (i.e. 5 mins in 5-10mins). I then add all the values and divide them by the total number pf commuters.

Code

cook_county_data_sf <- cook_county_data_sf %>%
  mutate(pct_public_transit = B08141_016E / B08141_001E * 100,
         avg_commute_travel_time = ((B08303_002E*1+
                                      B08303_003E*5+
                                      B08303_004E*10+
                                      B08303_005E*15+
                                      B08303_006E*20+ 
                                      B08303_007E*25+ 
                                      B08303_008E*30+
                                      B08303_009E*35+ 
                                      B08303_010E*40+ 
                                      B08303_011E*45+
                                      B08303_012E*60+
                                      B08303_013E*90)/
                                      1)/B08303_001E)

Since the are regions with no commuters, we get inf (infinite values). We have to make sure to convert them to NA so we are able to analyse these and visualize them.

Code

cook_county_data_sf$avg_commute_travel_time[sapply(cook_county_data_sf$avg_commute_travel_time, is.infinite)] <- NA

Lets have a look at the public transportation users and where they are located

Code

# Create public trasnportation users map
ggplot() +
  geom_sf(data = cook_county_data_sf, 
          mapping = aes(fill =cut_number((avg_commute_travel_time), 5)), 
          color = "white", 
          size = 0.1) +
  scale_fill_brewer(palette = "PuBu", 
                     name = "Percentage",
                    na.value = "gray")+       
  labs(title = "Percentage of Workers with Public Transportation by Census Tract in Cook County",
       caption = "Source: US Census Bureau ACS 5-Year Estimates, 2019") +
  theme_minimal() +
  theme(panel.background = element_blank()) +                   
  coord_sf(datum = NA)

Now let’s have a look at the average commute time in each census tract and how they are spatially distributed.

Code

# Create average commute times map
ggplot() +
  geom_sf(data = cook_county_data_sf, 
          mapping = aes(fill =cut_number((avg_commute_travel_time), 5)), 
          color = "white", 
          size = 0.1) +
  scale_fill_brewer(palette = "PuBu", 
                     name = "Avergae commute times",
                    na.value = "gray")+       
  labs(title = "Average Commute map by Census Tract in Cook County",
       caption = "Source: US Census Bureau ACS 5-Year Estimates, 2019") +
  theme_minimal() +
  theme(panel.background = element_blank()) +                   
  coord_sf(datum = NA)

Lab Evaluation

In evaluating your lab submission, we’ll be paying attention to the following:

As you get into the lab, please feel welcome to ask us questions, and please share where you’re struggling with us and with others in the class.

Introduction

Github Lab Repository

Access to transportation infrastructure

Travel behavior

References