Presentation Exercise

Author

Joaquin Ramirez

Introduction

In this analysis, I will reproduce a graph from the FiveThirtyEight article “Congress Today Is Older Than It’s Ever Been” and develop tables to support the findings. The dataset titled data_aging_congress.csv includes demographic information about the United States Senate and House of Representatives over time, focusing on age demographics from the 66th Congress (1919-192) to the 118th Congress (2023-2025). The analysis aims to visualize trends in median age over time for both the House and Senate. The graph will be created using plotly to allow interactive features, and the tables will be generated using the gt package.

Data

The data used for this analysis is sourced from the FiveThirtyEight Congress demographics dataset, located in the “fivethirtyeight- data”

To effectively recreate and refine the visualizations, I’ve documented some of the AI prompts, below:

Reproduce Graphs/Tables

  • Prompt: Can you recreate this graph to show the median age trends over time for the House and Senate?
  • Response: The initial graph was created to visualize the median age trends for the House and Senate over time, using plotly for interactivity and styling elements from ggplot2 such as labs and theme settings.

Error Fixing and Debugging

  • Prompt: How do I fix the error in geom_line(aes(color = chamber))?
  • Response: Error fixed by ensuring the correct mapping of the chamber column and adjusting the data preprocessing steps.

Table Creation

  • Prompt: Create a publication-quality table showing the median age for each Congress session.

  • Response: A table was generated using the gt package, formatted to match publication standards.

Visual Enhancements

  • Prompt: How can I make the graph more visually appealing with color schemes and annotations?

  • Response: The graph was enhanced with a color scheme and annotations to highlight key trends and make it more visually appealing.

  • Prompt: Can I include interactive features in the graph for better user engagement?

  • Response: Interactive features were added to the graph using Plotly, allowing users to hover over data points for additional information.

These questions guided the step-by-step process for creating and modifying the graphs and tables in R.

Load Necessary Libraries

As we move forward, the next crucial step involves loading the appropriate libraries and verifying the data to ensure it is ready for the upcoming analyses and visualizations.

library(readr)
library(dplyr)
Warning: package 'dplyr' was built under R version 4.3.3

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.3.3
library(gt)
library(plotly)

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
library(tibble)
library(here)
Warning: package 'here' was built under R version 4.3.3
here() starts at C:/Users/Joaquin/School/DA - 6833 (Summer 2024)/Joaquin_Ramriez_Portfolio_II

Data Loading and Verification

# Check the root directory determined by here
print(here::here())
[1] "C:/Users/Joaquin/School/DA - 6833 (Summer 2024)/Joaquin_Ramriez_Portfolio_II"
# Define the file path for the aging congress data
data_aging_congress_path <- here::here("presentation-exercise", "data_aging_congress.csv")

# Load the data from the CSV file
data_aging_congress <- read_csv("data_aging_congress.csv")
Rows: 29120 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (5): chamber, state_abbrev, bioname, bioguide_id, generation
dbl  (6): congress, party_code, cmltv_cong, cmltv_chamber, age_days, age_years
date (2): start_date, birthday

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Print the first few rows of the data frame to verify it is loaded correctly
print(head(data_aging_congress))
# A tibble: 6 × 13
  congress start_date chamber state_abbrev party_code bioname        bioguide_id
     <dbl> <date>     <chr>   <chr>             <dbl> <chr>          <chr>      
1       82 1951-01-03 House   ND                  200 AANDAHL, Fred… A000001    
2       80 1947-01-03 House   VA                  100 ABBITT, Watki… A000002    
3       81 1949-01-03 House   VA                  100 ABBITT, Watki… A000002    
4       82 1951-01-03 House   VA                  100 ABBITT, Watki… A000002    
5       83 1953-01-03 House   VA                  100 ABBITT, Watki… A000002    
6       84 1955-01-03 House   VA                  100 ABBITT, Watki… A000002    
# ℹ 6 more variables: birthday <date>, cmltv_cong <dbl>, cmltv_chamber <dbl>,
#   age_days <dbl>, age_years <dbl>, generation <chr>
# Check the structure of the data frame
str(data_aging_congress)
spc_tbl_ [29,120 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ congress     : num [1:29120] 82 80 81 82 83 84 85 86 87 88 ...
 $ start_date   : Date[1:29120], format: "1951-01-03" "1947-01-03" ...
 $ chamber      : chr [1:29120] "House" "House" "House" "House" ...
 $ state_abbrev : chr [1:29120] "ND" "VA" "VA" "VA" ...
 $ party_code   : num [1:29120] 200 100 100 100 100 100 100 100 100 100 ...
 $ bioname      : chr [1:29120] "AANDAHL, Fred George" "ABBITT, Watkins Moorman" "ABBITT, Watkins Moorman" "ABBITT, Watkins Moorman" ...
 $ bioguide_id  : chr [1:29120] "A000001" "A000002" "A000002" "A000002" ...
 $ birthday     : Date[1:29120], format: "1897-04-09" "1908-05-21" ...
 $ cmltv_cong   : num [1:29120] 1 1 2 3 4 5 6 7 8 9 ...
 $ cmltv_chamber: num [1:29120] 1 1 2 3 4 5 6 7 8 9 ...
 $ age_days     : num [1:29120] 19626 14106 14837 15567 16298 ...
 $ age_years    : num [1:29120] 53.7 38.6 40.6 42.6 44.6 ...
 $ generation   : chr [1:29120] "Lost" "Greatest" "Greatest" "Greatest" ...
 - attr(*, "spec")=
  .. cols(
  ..   congress = col_double(),
  ..   start_date = col_date(format = ""),
  ..   chamber = col_character(),
  ..   state_abbrev = col_character(),
  ..   party_code = col_double(),
  ..   bioname = col_character(),
  ..   bioguide_id = col_character(),
  ..   birthday = col_date(format = ""),
  ..   cmltv_cong = col_double(),
  ..   cmltv_chamber = col_double(),
  ..   age_days = col_double(),
  ..   age_years = col_double(),
  ..   generation = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

Having successfully loaded and verified the dataset, we can now proceed to visualize the data. The next step involves recreating plots to analyze the trends in median age for both the U.S. Senate and House of Representatives from 1919 to 2023.

Plot Creation: Median age of the U.S. Senate and U.S. House by Congress from 1919 to 2023.

This graph is adapted from the FiveThirtyEight article“Congress Today Is Older Than It’s Ever Been”.

# Aggregate the data to get the median age for each Congress for House and Senate
median_age_congress <- data_aging_congress %>%
  group_by(congress, chamber) %>%
  summarize(median_age = median(age_years, na.rm = TRUE), .groups = 'drop')

# Adjust the year calculation based on the 66th Congress starting in 1919
first_congress <- 66
first_congress_year <- 1919
median_age_congress <- median_age_congress %>%
  mutate(year = first_congress_year + (congress - first_congress) * 2)

# Create the plotly plot
interactive_plot <- plot_ly(data = median_age_congress, 
                            x = ~year, 
                            y = ~median_age, 
                            color = ~chamber, 
                            colors = c("Senate" = "purple", "House" = "darkgreen"),
                            type = 'scatter', 
                            mode = 'lines+markers', 
                            line = list(width = 2),
                            marker = list(size = 5),
                            text = ~paste(
                              chamber, "<br>",
                              year, "<br>",
                              round(median_age, 1)
                            ),
                            hoverinfo = 'text') %>%
layout(
    title = list(
      text = "<b>The House and Senate are older than ever before</b>", 
      font = list(size = 14, family = "Georgia", weight = "bolder"),
      x = .075, # Align title to the left
      y = 0.95, # Position title at the top
      xanchor = 'left',
      yanchor = 'top',
      pad = list(b = 0) # No extra padding between title and subtitle
    ),
    annotations = list(
      list(
        x = 0, # Align subtitle to the left
        y = 1.23, # Position just below the title
        text = "Median age of the U.S. Senate and U.S. House by Congress, 1919 to 2023",
        showarrow = FALSE,
        xref = "paper",
        yref = "paper",
        font = list(size = 13, family = "Georgia"),
        align = "left"
      )
    ),
    xaxis = list(
      title = TRUE, 
      range = c(1918, 2024),
      tickvals = seq(1920, 2020, 10),
      tickfont = list(size = 12),
      showline = TRUE,
      linewidth = 2,
      linecolor = 'white',
      showgrid = FALSE
    ),
    yaxis = list(
      title = FALSE, 
      range = c(43, 68),
      tickvals = seq(45, 70, 5),
      tickfont = list(size = 12),
      showline = FALSE,
      zeroline = FALSE, # Ensure the zero line is not shown
      showgrid = TRUE # Optionally hide the grid lines
    ),
    legend = list(
      orientation = "h", 
      x = 0, 
      y = 1, 
      xanchor = "left", 
      yanchor = "bottom",
      font = list(size = 14)
    ),
    plot_bgcolor = 'white',
    paper_bgcolor = 'white',
    margin = list(t = 120) # Increase top margin to shift graph down and accommodate title and subtitle
  )

# Print the interactive plot
interactive_plot

From the graph: the median age of Senate and House members, several trends are evident. Over the years, the median age of both the Senate and the House has fluctuated. From 1919 to the early 2000s, the median age of Senate members never exceeded 60 years, and the median age of the House of Representatives never surpassed 55 years.

According to the FiveThirtyEight article “Congress Today Is Older Than It’s Ever Been”, the median age of the 118th Congress is 59 years. Specifically, the median age of a senator is 65 years, while the median age of a representative is about 58 years. In comparison, the median age of the entire U.S. population is around 39 years, with about 42 percent of people being 45 years or older.

In contrast, the House of Representatives has also experienced fluctuations in median age, reaching its lowest point in the 1980s. Despite these fluctuations, the median age in Congress has increased over the past decade.

It is also important to note that the House of Representatives has consistently had a younger median age compared to the Senate. This could be a result of the fact that members of the House of Representatives must be at least 25 years old, while members of the Senate must be at least 30 years old.

To better understand these trends, let’s look at the detailed data presented in the table below.

Table: Median Age of U.S. Congress Members by Chamber and Party.

# Calculate median age by chamber and party, and sort by median age within each chamber
median_age_table <- data_aging_congress %>%
  group_by(chamber, party_code) %>%
  summarize(
    median_age = median(age_years, na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  # Arrange by chamber first, then by median_age in descending order within each chamber
  arrange(chamber, desc(median_age)) %>%
  gt() %>%
  tab_header(
    title = "Median Age of U.S. Congress Members by Chamber and Party",
    subtitle = "From 1919 to 2023"
  ) %>%
  cols_label(
    chamber = "Chamber",
    party_code = "Party Code",
    median_age = "Median Age"
  ) %>%
  fmt_number(
    columns = c(median_age),  # Updated from vars(median_age) to c(median_age)
    decimals = 1
  ) %>%
  cols_align(
    align = "center",
    columns = everything()
  ) %>%
  tab_style(
    style = list(
      cell_fill(color = "lightgrey"),
      cell_borders(
        sides = "all",
        color = "black",
        weight = px(1)
      )
    ),
    locations = cells_body(
      columns = c(chamber, party_code, median_age)  # Updated from vars(chamber, party_code, median_age) to c(chamber, party_code, median_age)
    )
  ) %>%
  tab_style(
    style = list(
      cell_text(weight = "bold"),
      cell_fill(color = "lightblue")
    ),
    locations = cells_column_labels()
  ) %>%
  tab_options(
    table.font.size = 12,
    heading.title.font.size = 14,
    heading.subtitle.font.size = 12,
    table.border.top.color = "black",
    table.border.top.width = px(2),
    table.border.bottom.color = "black",
    table.border.bottom.width = px(2),
    column_labels.border.top.color = "black",
    column_labels.border.top.width = px(2),
    column_labels.border.bottom.color = "black",
    column_labels.border.bottom.width = px(2),
    data_row.padding = px(5)
  )

# Print the table
median_age_table
Median Age of U.S. Congress Members by Chamber and Party
From 1919 to 2023
Chamber Party Code Median Age
House 380 64.0
House 370 57.6
House 537 55.4
House 328 54.2
House 356 54.0
House 347 53.6
House 200 52.7
House 100 52.2
House 331 51.7
House 329 42.9
House 523 42.2
House 522 40.1
House 402 34.4
Senate 328 69.3
Senate 200 58.0
Senate 100 57.4
Senate 537 51.5
Senate 112 49.8
Senate 370 44.9

The table above provides a detailed view of the “Median Age of U.S. Congress Members by Chamber and Party from 1919 to 2023.” It highlights the median ages of Congress members across different chambers and party. Key observations include:

House of Representatives:

  • The median age of House members ranges from 34 years to 64 years.
  • Liberal (party code 402): lowest median age at 34 years.
  • Independents (party code 328): highest median age at 64 years.
  • Democrats (party code 100) median age of 52 years.
  • Republicans (party code 200) median age of at 52 years.

Senate:

  • The median age of Senate members ranges from 44 years to 69 years.
  • Progressive (party code 328), lowest median age of 44 years.
  • Independent (party code 328), highest median age of 69 years.
  • Democrats (party code 100), median age is 57 years.
  • Republicans (party code 200), median age of 58 years.

The trends in median age across different chambers and party affiliations provide valuable insights into the changing demographics of Congress. To deepen our understanding, we further explored the composition of Congress by examining the top generation. This additional analysis will highlight the generational representation.

Table: Number of U.S. Congress Members by Generation and Chamber.

# Calculate the number of members by generation and chamber
generation_chamber_table <- data_aging_congress %>%
  group_by(chamber, generation) %>%
  summarize(
    member_count = n(),
    .groups = 'drop'
  ) %>%
  # Arrange by chamber and then by member_count in descending order within each chamber
  arrange(chamber, desc(member_count)) %>%
  gt() %>%
  tab_header(
    title = "Number of U.S. Congress Members by Generation and Chamber",
    subtitle = "From 1919 to 2023"
  ) %>%
  cols_label(
    chamber = "Chamber",
    generation = "Generation",
    member_count = "Number of Members"
  ) %>%
  fmt_number(
    columns = c(member_count),  # Updated from vars(member_count) to c(member_count)
    decimals = 0
  ) %>%
  cols_align(
    align = "center",
    columns = everything()
  ) %>%
  tab_style(
    style = list(
      cell_fill(color = "lightgrey"),
      cell_borders(
        sides = "all",
        color = "black",
        weight = px(1)
      )
    ),
    locations = cells_body(
      columns = c(chamber, generation, member_count)  # Updated from vars(chamber, generation, member_count) to c(chamber, generation, member_count)
    )
  ) %>%
  tab_style(
    style = list(
      cell_text(weight = "bold"),
      cell_fill(color = "lightblue")
    ),
    locations = cells_column_labels()
  ) %>%
  tab_options(
    table.font.size = 12,
    heading.title.font.size = 14,
    heading.subtitle.font.size = 12,
    table.border.top.color = "black",
    table.border.top.width = px(2),
    table.border.bottom.color = "black",
    table.border.bottom.width = px(2),
    column_labels.border.top.color = "black",
    column_labels.border.top.width = px(2),
    column_labels.border.bottom.color = "black",
    column_labels.border.bottom.width = px(2),
    data_row.padding = px(5)
  )

# Print the table
generation_chamber_table
Number of U.S. Congress Members by Generation and Chamber
From 1919 to 2023
Chamber Generation Number of Members
House Greatest 5,858
House Silent 4,416
House Boomers 4,323
House Lost 3,830
House Missionary 3,713
House Gen X 1,014
House Progressive 325
House Millennial 129
House Gilded 13
House Gen Z 1
Senate Greatest 1,289
Senate Silent 1,185
Senate Missionary 1,055
Senate Lost 902
Senate Boomers 785
Senate Progressive 160
Senate Gen X 116
Senate Millennial 4
Senate Gilded 2

The historical data reveals notable generational distributions within the U.S. Congress from 1919 to 2023:

House of Representatives:

  1. Greatest Generation: 5,858 members
  2. Silent Generation: 4,416 members
  3. Boomer Generation: 4,323 members
  4. Lost Generation: 3,830 members
  5. Missionary Generation: 3,713 members

Senate:

  1. Greatest Generation: 1,289 members
  2. Silent Generation: 1,185 members
  3. Missionary Generation: 1,055 members
  4. Lost Generation: 902 members
  5. Boommers - 785 members

The data highlights the Greatest Generation with the most represented group in both the House and Senate. Followed by the Silent Generation and the Boomers. In the House, the Boomer Generation ranks third, while in the Senate, it ranks fifth.

Moving forward, we will dive deeper into the generational impact on Congress by focusing specifically on the Baby Boomers, Lost Generation, Greatest Generation, and Silent Generation. This next analysis will provide a closer look at how these generations have influenced the legislative landscape and how their representation has evolved over time.

Plot Creation: Congress is never dominated by generations as old as boomers.

# Calculate the share of each generation within each Congress
generation_share <- data_aging_congress %>%
  group_by(congress, generation) %>%
  summarise(count = n(), .groups = 'drop') %>%
  group_by(congress) %>%
  mutate(total = sum(count),
         share = count / total) %>%
  ungroup()

# Identify the largest generation in each Congress
largest_generation <- generation_share %>%
  group_by(congress) %>%
  filter(share == max(share)) %>%
  select(congress, generation)

# Calculate the median age for the largest generation
median_age_largest_gen <- data_aging_congress %>%
  semi_join(largest_generation, by = c("congress", "generation")) %>%
  group_by(congress, generation) %>%
  summarize(median_age = median(age_years, na.rm = TRUE), .groups = 'drop') %>%
  mutate(year = 1919 + (congress - 66) * 2)  # Convert Congress number to year

# Filter years to 1940 through 2020
median_age_largest_gen <- median_age_largest_gen %>%
  filter(year >= 1940 & year <= 2023)


# Create the plotly plot directly with adjusted year range and annotations
interactive_plot <- plot_ly(
  data = median_age_largest_gen,
  x = ~year,
  y = ~median_age,
  color = ~generation,
  colors = c("Lost" = "black", "Greatest" = "darkgreen", "Silent" = "orange", "Boomers" = "red"),
  type = 'scatter',
  mode = 'lines+markers',  # Show lines and markers
  line = list(width = 2),
  marker = list(size = 5, symbol = 'circle', line = list(width = 2)),
  text = ~paste(
    "", generation, "<br>",
    "", year, "<br>",
    "", round(median_age, 1))  # Round the age to the nearest tenth
  ,
  hoverinfo = 'text'
) %>%
  layout(
    title = list(
      text = "<b>Congress is never dominated by generations as old as boomers</b>",
      font = list(size = 14, family = "Georgia", weight = "bolder"),
      x = .075,
      y = 0.95,
      xanchor = 'left',
      yanchor = 'top',
      pad = list(b = 0)
    ),
    annotations = list(
      list(
        x = 0,
        y = 1.23,
        text = "Median age of the largest generation in each Congress, 1940 to 2023",  # Updated subtitle to reflect the extended year range
        showarrow = FALSE,
        xref = "paper",
        yref = "paper",
        font = list(size = 13, family = "Georgia"),
        align = "left"
      )
    ),
    xaxis = list(
      title = TRUE,
      range = c(1935, 2027),  # Ensures the x-axis covers years from 1935 to 2027
      tickvals = seq(1940, 2023, 10),
      tickfont = list(size = 12),
      showline = TRUE,
      linewidth = 2,
      linecolor = 'white',
      showgrid = FALSE
    ),
    yaxis = list(
      title = FALSE,
      range = c(37, 73),
      tickvals = seq(40, 70, 5),
      tickfont = list(size = 12),
      showline = TRUE,
      linecolor = 'white',
      zeroline = FALSE,
      showgrid = TRUE
    ),
    legend = list(
      orientation = "h",
      x = 0,
      y = 1,
      xanchor = "left",
      yanchor = "bottom",
      font = list(size = 14)
    ),
    plot_bgcolor = 'white',
    paper_bgcolor = 'white',
    margin = list(t = 120)
  )

# Print the interactive plot
interactive_plot

The plot “Congress is Never Dominated by Generations as Old as Boomers” visualizes the median age of the largest generation in each Congress from 1940 to 2023. The plot illustrates how, despite fluctuations, no single generation has dominated Congress as significantly as the Baby Boomers. This visualization reveals that while older generations like the Greatest Generation and the Silent Generation had notable impacts.The plot’s line highlight the changing generational leadership in Congress and show how the median age of the largest generation has evolved over the decades.

Conclusion

This analysis explored the evolving age demographics of the U.S. Congress from 1919 to 2023, focusing on trends in median age and generational representation. The recreated graph from the FiveThirtyEight article “Congress Today Is Older Than It’s Ever Been” highlighted that the median age of Congress members has increased over time, with the 118th Congress showing a median age of 59 years compared to historical figures.

The examination of the median age of Congress members by chamber and party revealed that, while the House of Representatives generally has a younger median age compared to the Senate, both chambers have seen significant age increases in recent decades.

Additionally, the analysis of generational representation demonstrated that the Greatest Generation, the Lost Generation, the Silent Generation and the Baby Boomers had the most significant presence in both the House and Senate.

Finally, the plot “Congress is Never Dominated by Generations as Old as Boomers” illustrated that despite fluctuations, no single generation has dominated Congress as consistently as the Baby Boomers did at their peak. This plot effectively visualizes how generational influence has shifted and provides insights into the legislative landscape over the past century.

Overall, this study underscores the dynamic nature of Congressional demographics and offers a detailed view of how age and generational shifts shape U.S. legislative bodies.