In this analysis, I will reproduce a graph from the FiveThirtyEight article “Congress Today Is Older Than It’s Ever Been” and develop tables to support the findings. The dataset titled data_aging_congress.csv includes demographic information about the United States Senate and House of Representatives over time, focusing on age demographics from the 66th Congress (1919-192) to the 118th Congress (2023-2025). The analysis aims to visualize trends in median age over time for both the House and Senate. The graph will be created using plotly to allow interactive features, and the tables will be generated using the gt package.
Data
The data used for this analysis is sourced from the FiveThirtyEight Congress demographics dataset, located in the “fivethirtyeight- data”
To effectively recreate and refine the visualizations, I’ve documented some of the AI prompts, below:
Reproduce Graphs/Tables
Prompt: Can you recreate this graph to show the median age trends over time for the House and Senate?
Response: The initial graph was created to visualize the median age trends for the House and Senate over time, using plotly for interactivity and styling elements from ggplot2 such as labs and theme settings.
Error Fixing and Debugging
Prompt: How do I fix the error in geom_line(aes(color = chamber))?
Response: Error fixed by ensuring the correct mapping of the chamber column and adjusting the data preprocessing steps.
Table Creation
Prompt: Create a publication-quality table showing the median age for each Congress session.
Response: A table was generated using the gt package, formatted to match publication standards.
Visual Enhancements
Prompt: How can I make the graph more visually appealing with color schemes and annotations?
Response: The graph was enhanced with a color scheme and annotations to highlight key trends and make it more visually appealing.
Prompt: Can I include interactive features in the graph for better user engagement?
Response: Interactive features were added to the graph using Plotly, allowing users to hover over data points for additional information.
These questions guided the step-by-step process for creating and modifying the graphs and tables in R.
Load Necessary Libraries
As we move forward, the next crucial step involves loading the appropriate libraries and verifying the data to ensure it is ready for the upcoming analyses and visualizations.
library(readr)library(dplyr)
Warning: package 'dplyr' was built under R version 4.3.3
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.3.3
library(gt)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(tibble)library(here)
Warning: package 'here' was built under R version 4.3.3
here() starts at C:/Users/Joaquin/School/DA - 6833 (Summer 2024)/Joaquin_Ramriez_Portfolio_II
Data Loading and Verification
# Check the root directory determined by hereprint(here::here())
# Define the file path for the aging congress datadata_aging_congress_path <- here::here("presentation-exercise", "data_aging_congress.csv")# Load the data from the CSV filedata_aging_congress <-read_csv("data_aging_congress.csv")
Rows: 29120 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): chamber, state_abbrev, bioname, bioguide_id, generation
dbl (6): congress, party_code, cmltv_cong, cmltv_chamber, age_days, age_years
date (2): start_date, birthday
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Print the first few rows of the data frame to verify it is loaded correctlyprint(head(data_aging_congress))
# A tibble: 6 × 13
congress start_date chamber state_abbrev party_code bioname bioguide_id
<dbl> <date> <chr> <chr> <dbl> <chr> <chr>
1 82 1951-01-03 House ND 200 AANDAHL, Fred… A000001
2 80 1947-01-03 House VA 100 ABBITT, Watki… A000002
3 81 1949-01-03 House VA 100 ABBITT, Watki… A000002
4 82 1951-01-03 House VA 100 ABBITT, Watki… A000002
5 83 1953-01-03 House VA 100 ABBITT, Watki… A000002
6 84 1955-01-03 House VA 100 ABBITT, Watki… A000002
# ℹ 6 more variables: birthday <date>, cmltv_cong <dbl>, cmltv_chamber <dbl>,
# age_days <dbl>, age_years <dbl>, generation <chr>
# Check the structure of the data framestr(data_aging_congress)
Having successfully loaded and verified the dataset, we can now proceed to visualize the data. The next step involves recreating plots to analyze the trends in median age for both the U.S. Senate and House of Representatives from 1919 to 2023.
Plot Creation: Median age of the U.S. Senate and U.S. House by Congress from 1919 to 2023.
# Aggregate the data to get the median age for each Congress for House and Senatemedian_age_congress <- data_aging_congress %>%group_by(congress, chamber) %>%summarize(median_age =median(age_years, na.rm =TRUE), .groups ='drop')# Adjust the year calculation based on the 66th Congress starting in 1919first_congress <-66first_congress_year <-1919median_age_congress <- median_age_congress %>%mutate(year = first_congress_year + (congress - first_congress) *2)# Create the plotly plotinteractive_plot <-plot_ly(data = median_age_congress, x =~year, y =~median_age, color =~chamber, colors =c("Senate"="purple", "House"="darkgreen"),type ='scatter', mode ='lines+markers', line =list(width =2),marker =list(size =5),text =~paste( chamber, "<br>", year, "<br>",round(median_age, 1) ),hoverinfo ='text') %>%layout(title =list(text ="<b>The House and Senate are older than ever before</b>", font =list(size =14, family ="Georgia", weight ="bolder"),x = .075, # Align title to the lefty =0.95, # Position title at the topxanchor ='left',yanchor ='top',pad =list(b =0) # No extra padding between title and subtitle ),annotations =list(list(x =0, # Align subtitle to the lefty =1.23, # Position just below the titletext ="Median age of the U.S. Senate and U.S. House by Congress, 1919 to 2023",showarrow =FALSE,xref ="paper",yref ="paper",font =list(size =13, family ="Georgia"),align ="left" ) ),xaxis =list(title =TRUE, range =c(1918, 2024),tickvals =seq(1920, 2020, 10),tickfont =list(size =12),showline =TRUE,linewidth =2,linecolor ='white',showgrid =FALSE ),yaxis =list(title =FALSE, range =c(43, 68),tickvals =seq(45, 70, 5),tickfont =list(size =12),showline =FALSE,zeroline =FALSE, # Ensure the zero line is not shownshowgrid =TRUE# Optionally hide the grid lines ),legend =list(orientation ="h", x =0, y =1, xanchor ="left", yanchor ="bottom",font =list(size =14) ),plot_bgcolor ='white',paper_bgcolor ='white',margin =list(t =120) # Increase top margin to shift graph down and accommodate title and subtitle )# Print the interactive plotinteractive_plot
From the graph: the median age of Senate and House members, several trends are evident. Over the years, the median age of both the Senate and the House has fluctuated. From 1919 to the early 2000s, the median age of Senate members never exceeded 60 years, and the median age of the House of Representatives never surpassed 55 years.
According to the FiveThirtyEight article “Congress Today Is Older Than It’s Ever Been”, the median age of the 118th Congress is 59 years. Specifically, the median age of a senator is 65 years, while the median age of a representative is about 58 years. In comparison, the median age of the entire U.S. population is around 39 years, with about 42 percent of people being 45 years or older.
In contrast, the House of Representatives has also experienced fluctuations in median age, reaching its lowest point in the 1980s. Despite these fluctuations, the median age in Congress has increased over the past decade.
It is also important to note that the House of Representatives has consistently had a younger median age compared to the Senate. This could be a result of the fact that members of the House of Representatives must be at least 25 years old, while members of the Senate must be at least 30 years old.
To better understand these trends, let’s look at the detailed data presented in the table below.
Table: Median Age of U.S. Congress Members by Chamber and Party.
# Calculate median age by chamber and party, and sort by median age within each chambermedian_age_table <- data_aging_congress %>%group_by(chamber, party_code) %>%summarize(median_age =median(age_years, na.rm =TRUE),.groups ='drop' ) %>%# Arrange by chamber first, then by median_age in descending order within each chamberarrange(chamber, desc(median_age)) %>%gt() %>%tab_header(title ="Median Age of U.S. Congress Members by Chamber and Party",subtitle ="From 1919 to 2023" ) %>%cols_label(chamber ="Chamber",party_code ="Party Code",median_age ="Median Age" ) %>%fmt_number(columns =c(median_age), # Updated from vars(median_age) to c(median_age)decimals =1 ) %>%cols_align(align ="center",columns =everything() ) %>%tab_style(style =list(cell_fill(color ="lightgrey"),cell_borders(sides ="all",color ="black",weight =px(1) ) ),locations =cells_body(columns =c(chamber, party_code, median_age) # Updated from vars(chamber, party_code, median_age) to c(chamber, party_code, median_age) ) ) %>%tab_style(style =list(cell_text(weight ="bold"),cell_fill(color ="lightblue") ),locations =cells_column_labels() ) %>%tab_options(table.font.size =12,heading.title.font.size =14,heading.subtitle.font.size =12,table.border.top.color ="black",table.border.top.width =px(2),table.border.bottom.color ="black",table.border.bottom.width =px(2),column_labels.border.top.color ="black",column_labels.border.top.width =px(2),column_labels.border.bottom.color ="black",column_labels.border.bottom.width =px(2),data_row.padding =px(5) )# Print the tablemedian_age_table
Median Age of U.S. Congress Members by Chamber and Party
From 1919 to 2023
Chamber
Party Code
Median Age
House
380
64.0
House
370
57.6
House
537
55.4
House
328
54.2
House
356
54.0
House
347
53.6
House
200
52.7
House
100
52.2
House
331
51.7
House
329
42.9
House
523
42.2
House
522
40.1
House
402
34.4
Senate
328
69.3
Senate
200
58.0
Senate
100
57.4
Senate
537
51.5
Senate
112
49.8
Senate
370
44.9
The table above provides a detailed view of the “Median Age of U.S. Congress Members by Chamber and Party from 1919 to 2023.” It highlights the median ages of Congress members across different chambers and party. Key observations include:
House of Representatives:
The median age of House members ranges from 34 years to 64 years.
Liberal (party code 402): lowest median age at 34 years.
Independents (party code 328): highest median age at 64 years.
Democrats (party code 100) median age of 52 years.
Republicans (party code 200) median age of at 52 years.
Senate:
The median age of Senate members ranges from 44 years to 69 years.
Progressive (party code 328), lowest median age of 44 years.
Independent (party code 328), highest median age of 69 years.
Democrats (party code 100), median age is 57 years.
Republicans (party code 200), median age of 58 years.
The trends in median age across different chambers and party affiliations provide valuable insights into the changing demographics of Congress. To deepen our understanding, we further explored the composition of Congress by examining the top generation. This additional analysis will highlight the generational representation.
Table: Number of U.S. Congress Members by Generation and Chamber.
# Calculate the number of members by generation and chambergeneration_chamber_table <- data_aging_congress %>%group_by(chamber, generation) %>%summarize(member_count =n(),.groups ='drop' ) %>%# Arrange by chamber and then by member_count in descending order within each chamberarrange(chamber, desc(member_count)) %>%gt() %>%tab_header(title ="Number of U.S. Congress Members by Generation and Chamber",subtitle ="From 1919 to 2023" ) %>%cols_label(chamber ="Chamber",generation ="Generation",member_count ="Number of Members" ) %>%fmt_number(columns =c(member_count), # Updated from vars(member_count) to c(member_count)decimals =0 ) %>%cols_align(align ="center",columns =everything() ) %>%tab_style(style =list(cell_fill(color ="lightgrey"),cell_borders(sides ="all",color ="black",weight =px(1) ) ),locations =cells_body(columns =c(chamber, generation, member_count) # Updated from vars(chamber, generation, member_count) to c(chamber, generation, member_count) ) ) %>%tab_style(style =list(cell_text(weight ="bold"),cell_fill(color ="lightblue") ),locations =cells_column_labels() ) %>%tab_options(table.font.size =12,heading.title.font.size =14,heading.subtitle.font.size =12,table.border.top.color ="black",table.border.top.width =px(2),table.border.bottom.color ="black",table.border.bottom.width =px(2),column_labels.border.top.color ="black",column_labels.border.top.width =px(2),column_labels.border.bottom.color ="black",column_labels.border.bottom.width =px(2),data_row.padding =px(5) )# Print the tablegeneration_chamber_table
Number of U.S. Congress Members by Generation and Chamber
From 1919 to 2023
Chamber
Generation
Number of Members
House
Greatest
5,858
House
Silent
4,416
House
Boomers
4,323
House
Lost
3,830
House
Missionary
3,713
House
Gen X
1,014
House
Progressive
325
House
Millennial
129
House
Gilded
13
House
Gen Z
1
Senate
Greatest
1,289
Senate
Silent
1,185
Senate
Missionary
1,055
Senate
Lost
902
Senate
Boomers
785
Senate
Progressive
160
Senate
Gen X
116
Senate
Millennial
4
Senate
Gilded
2
The historical data reveals notable generational distributions within the U.S. Congress from 1919 to 2023:
House of Representatives:
Greatest Generation: 5,858 members
Silent Generation: 4,416 members
Boomer Generation: 4,323 members
Lost Generation: 3,830 members
Missionary Generation: 3,713 members
Senate:
Greatest Generation: 1,289 members
Silent Generation: 1,185 members
Missionary Generation: 1,055 members
Lost Generation: 902 members
Boommers - 785 members
The data highlights the Greatest Generation with the most represented group in both the House and Senate. Followed by the Silent Generation and the Boomers. In the House, the Boomer Generation ranks third, while in the Senate, it ranks fifth.
Moving forward, we will dive deeper into the generational impact on Congress by focusing specifically on the Baby Boomers, Lost Generation, Greatest Generation, and Silent Generation. This next analysis will provide a closer look at how these generations have influenced the legislative landscape and how their representation has evolved over time.
Plot Creation: Congress is never dominated by generations as old as boomers.
# Calculate the share of each generation within each Congressgeneration_share <- data_aging_congress %>%group_by(congress, generation) %>%summarise(count =n(), .groups ='drop') %>%group_by(congress) %>%mutate(total =sum(count),share = count / total) %>%ungroup()# Identify the largest generation in each Congresslargest_generation <- generation_share %>%group_by(congress) %>%filter(share ==max(share)) %>%select(congress, generation)# Calculate the median age for the largest generationmedian_age_largest_gen <- data_aging_congress %>%semi_join(largest_generation, by =c("congress", "generation")) %>%group_by(congress, generation) %>%summarize(median_age =median(age_years, na.rm =TRUE), .groups ='drop') %>%mutate(year =1919+ (congress -66) *2) # Convert Congress number to year# Filter years to 1940 through 2020median_age_largest_gen <- median_age_largest_gen %>%filter(year >=1940& year <=2023)# Create the plotly plot directly with adjusted year range and annotationsinteractive_plot <-plot_ly(data = median_age_largest_gen,x =~year,y =~median_age,color =~generation,colors =c("Lost"="black", "Greatest"="darkgreen", "Silent"="orange", "Boomers"="red"),type ='scatter',mode ='lines+markers', # Show lines and markersline =list(width =2),marker =list(size =5, symbol ='circle', line =list(width =2)),text =~paste("", generation, "<br>","", year, "<br>","", round(median_age, 1)) # Round the age to the nearest tenth ,hoverinfo ='text') %>%layout(title =list(text ="<b>Congress is never dominated by generations as old as boomers</b>",font =list(size =14, family ="Georgia", weight ="bolder"),x = .075,y =0.95,xanchor ='left',yanchor ='top',pad =list(b =0) ),annotations =list(list(x =0,y =1.23,text ="Median age of the largest generation in each Congress, 1940 to 2023", # Updated subtitle to reflect the extended year rangeshowarrow =FALSE,xref ="paper",yref ="paper",font =list(size =13, family ="Georgia"),align ="left" ) ),xaxis =list(title =TRUE,range =c(1935, 2027), # Ensures the x-axis covers years from 1935 to 2027tickvals =seq(1940, 2023, 10),tickfont =list(size =12),showline =TRUE,linewidth =2,linecolor ='white',showgrid =FALSE ),yaxis =list(title =FALSE,range =c(37, 73),tickvals =seq(40, 70, 5),tickfont =list(size =12),showline =TRUE,linecolor ='white',zeroline =FALSE,showgrid =TRUE ),legend =list(orientation ="h",x =0,y =1,xanchor ="left",yanchor ="bottom",font =list(size =14) ),plot_bgcolor ='white',paper_bgcolor ='white',margin =list(t =120) )# Print the interactive plotinteractive_plot
The plot “Congress is Never Dominated by Generations as Old as Boomers” visualizes the median age of the largest generation in each Congress from 1940 to 2023. The plot illustrates how, despite fluctuations, no single generation has dominated Congress as significantly as the Baby Boomers. This visualization reveals that while older generations like the Greatest Generation and the Silent Generation had notable impacts.The plot’s line highlight the changing generational leadership in Congress and show how the median age of the largest generation has evolved over the decades.
Conclusion
This analysis explored the evolving age demographics of the U.S. Congress from 1919 to 2023, focusing on trends in median age and generational representation. The recreated graph from the FiveThirtyEight article “Congress Today Is Older Than It’s Ever Been” highlighted that the median age of Congress members has increased over time, with the 118th Congress showing a median age of 59 years compared to historical figures.
The examination of the median age of Congress members by chamber and party revealed that, while the House of Representatives generally has a younger median age compared to the Senate, both chambers have seen significant age increases in recent decades.
Additionally, the analysis of generational representation demonstrated that the Greatest Generation, the Lost Generation, the Silent Generation and the Baby Boomers had the most significant presence in both the House and Senate.
Finally, the plot “Congress is Never Dominated by Generations as Old as Boomers” illustrated that despite fluctuations, no single generation has dominated Congress as consistently as the Baby Boomers did at their peak. This plot effectively visualizes how generational influence has shifted and provides insights into the legislative landscape over the past century.
Overall, this study underscores the dynamic nature of Congressional demographics and offers a detailed view of how age and generational shifts shape U.S. legislative bodies.