For this project, I’m seeking to explore the differing experiences of COVID-19 based on racial identity and state of residence. A lack of equitable healthcare infrastructure, types of employment, and location are all factors that can and have contributed to the disproportionate impact of COVID-19 on racial and ethnic minorities in the United States. This is evidenced by CDC data showing that Native Hawaiian or Pacific Island (NHOPI), Hispanic, and American Indian and Alaska Native (AIAN) people face a 1.5 times higher right of COVID-19 infection that White people. In addition, AIAN, Hispanic, NHOPI and Black people face a 2x higher likelihood of death than White people from COVID-19 infections. Compounding racial disparities are differences in state response to the pandemic.The extent of COVID-19 protection measures varied greatly by state, depending on factors such as the state’s political leanings and attitudes of leadership regarding the pandemic. Some states saw more lenient regulations vs others, which ultimately resulted in significantly different experiences with COVID-19. At times, these factors, compounded by pre-existing genetic conditions related to race, have contributed to heightened rates of infection, hospitalization, and deaths within these vulnerable communities. In this project, I’d like to do a deeper dive into the extent of the experience gap between these marginalized communities and white communities through variables such as location and race.
The data I’m using comes from a collaboration between The Covid Tracking Project at The Atlantic and the Boston University Center for Antiracist Research. It tracks data from April 15, 2020 to March 7, 2021, so almost a full year of data at the height of the pandemic. The data was then updated twice a week.
data <- read_csv("CovidRace.csv")
acs <- read_csv("ACSDP5Y2020.DP05-Data.csv")
saveRDS(data, "data.rds")
# remove all rows without a total number of cases
# remove all territories (Puerto Rico, Northern Mariana Islands, Guam, Virgin Islands) for just the 50 states and Washington D.C.
data <- data %>% filter(
!is.na(Cases_Total),
!State %in% c("PR","MP","GU","VI"))
data <- data %>%
mutate(date1 = ymd(Date))
Here I’m pulling the total population of each state, as well as population of each racial group for normalization purposes and renaming the variables to match the racial groups in the original data. Also adding postal codes so that I can join the two tables later.
acs <- acs %>%
select(GEO_ID, NAME, DP05_0001E, DP05_0037E, DP05_0038E, DP05_0039E, DP05_0044E, DP05_0052E, DP05_0057E, DP05_0058E, DP05_0070E) %>%
filter(!NAME == "Puerto Rico", !GEO_ID == "Geography") %>%
mutate(post_code = c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL", "GA", "HI", "ID", "IL", "IN", "IO", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH",
"NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"))
colnames(acs)[c(3:11)] <- c("Total_pop", "White_pop", "Black_pop", "AIAN_pop", "Asian_pop", "NHPI_pop", "Other_pop", "Multiracial_pop", "Latinx_pop")
### Joining the tables using postal code!
norm_data <- acs %>% select(-Total_pop) %>% # don't want to add the total to the other cases, so removing it
left_join(data, by = c("post_code" = "State"))
norm_data_total <- acs %>%
left_join(data, by = c("post_code" = "State"))
### Cases, hospitalizations, tests, and deaths normalized by population- without total cases
norm_data <- norm_data %>% mutate(
White_cases_norm = Cases_White/as.numeric(White_pop),
White_hosp_norm = Hosp_White/as.numeric(White_pop),
White_tests_norm = Tests_White/as.numeric(White_pop),
White_deaths_norm = Deaths_White/as.numeric(White_pop),
Black_cases_norm = Cases_Black/as.numeric(Black_pop),
Black_hosp_norm = Hosp_Black/as.numeric(Black_pop),
Black_tests_norm = Tests_Black/as.numeric(Black_pop),
Black_deaths_norm = Deaths_Black/as.numeric(Black_pop),
AIAN_cases_norm = Cases_AIAN/as.numeric(AIAN_pop),
AIAN_hosp_norm = Hosp_AIAN/as.numeric(AIAN_pop),
AIAN_tests_norm = Tests_AIAN/as.numeric(AIAN_pop),
AIAN_deaths_norm = Deaths_AIAN/as.numeric(AIAN_pop),
Asian_cases_norm = Cases_Asian/as.numeric(Asian_pop),
Asian_hosp_norm = Hosp_Asian/as.numeric(Asian_pop),
Asian_tests_norm = Tests_Asian/as.numeric(Asian_pop),
Asian_deaths_norm = Deaths_Asian/as.numeric(Asian_pop),
NHPI_cases_norm = Cases_NHPI/as.numeric(NHPI_pop),
NHPI_hosp_norm = Hosp_NHPI/as.numeric(NHPI_pop),
NHPI_tests_norm = Tests_NHPI/as.numeric(NHPI_pop),
NHPI_deaths_norm = Deaths_NHPI/as.numeric(NHPI_pop),
Other_cases_norm = Cases_Other/as.numeric(Other_pop),
Other_hosp_norm = Hosp_Other/as.numeric(Other_pop),
Other_tests_norm = Tests_Other/as.numeric(Other_pop),
Other_deaths_norm = Deaths_Other/as.numeric(Other_pop),
Multiracial_cases_norm = Cases_Multiracial/as.numeric(Multiracial_pop),
Multiracial_hosp_norm = Hosp_Multiracial/as.numeric(Multiracial_pop),
Multiracial_tests_norm = Tests_Multiracial/as.numeric(Multiracial_pop),
Multiracial_deaths_norm = Deaths_Multiracial/as.numeric(Multiracial_pop),
Latinx_cases_norm = Cases_Latinx/as.numeric(Latinx_pop),
Latinx_hosp_norm = Hosp_Latinx/as.numeric(Latinx_pop),
Latinx_tests_norm = Tests_Latinx/as.numeric(Latinx_pop),
Latinx_deaths_norm = Deaths_Latinx/as.numeric(Latinx_pop))
### Cases, hospitalizations, tests, and deaths normalized by population- With total cases
norm_data_total <- norm_data_total %>% mutate(
Total_cases_norm = Cases_Total/as.numeric(Total_pop),
White_cases_norm = Cases_White/as.numeric(White_pop),
Black_cases_norm = Cases_Black/as.numeric(Black_pop),
AIAN_cases_norm = Cases_AIAN/as.numeric(AIAN_pop),
Asian_cases_norm = Cases_Asian/as.numeric(Asian_pop),
NHPI_cases_norm = Cases_NHPI/as.numeric(NHPI_pop),
Other_cases_norm = Cases_Other/as.numeric(Other_pop),
Multiracial_cases_norm = Cases_Multiracial/as.numeric(Multiracial_pop),
Latinx_cases_norm = Cases_Latinx/as.numeric(Latinx_pop),
Total_hosp_norm = Hosp_Total/as.numeric(Total_pop),
White_hosp_norm = Hosp_White/as.numeric(White_pop),
Black_hosp_norm = Hosp_Black/as.numeric(Black_pop),
AIAN_hosp_norm = Hosp_AIAN/as.numeric(AIAN_pop),
Asian_hosp_norm = Hosp_Asian/as.numeric(Asian_pop),
NHPI_hosp_norm = Hosp_NHPI/as.numeric(NHPI_pop),
Other_hosp_norm = Hosp_Other/as.numeric(Other_pop),
Multiracial_hosp_norm = Hosp_Multiracial/as.numeric(Multiracial_pop),
Latinx_hosp_norm = Hosp_Latinx/as.numeric(Latinx_pop),
Total_tests_norm = Tests_Total/as.numeric(Total_pop),
White_tests_norm = Tests_White/as.numeric(White_pop),
Black_tests_norm = Tests_Black/as.numeric(Black_pop),
AIAN_tests_norm = Tests_AIAN/as.numeric(AIAN_pop),
Asian_tests_norm = Tests_Asian/as.numeric(Asian_pop),
NHPI_tests_norm = Tests_NHPI/as.numeric(NHPI_pop),
Other_tests_norm = Tests_Other/as.numeric(Other_pop),
Multiracial_tests_norm = Tests_Multiracial/as.numeric(Multiracial_pop),
Latinx_tests_norm = Tests_Latinx/as.numeric(Latinx_pop),
Total_deaths_norm = Deaths_Total/as.numeric(Total_pop),
White_deaths_norm = Deaths_White/as.numeric(White_pop),
Black_deaths_norm = Deaths_Black/as.numeric(Black_pop),
AIAN_deaths_norm = Deaths_AIAN/as.numeric(AIAN_pop),
Asian_deaths_norm = Deaths_Asian/as.numeric(Asian_pop),
NHPI_deaths_norm = Deaths_NHPI/as.numeric(NHPI_pop),
Other_deaths_norm = Deaths_Other/as.numeric(Other_pop),
Multiracial_deaths_norm = Deaths_Multiracial/as.numeric(Multiracial_pop),
Latinx_deaths_norm = Deaths_Latinx/as.numeric(Latinx_pop))
cases_long_norm_start <- norm_data_total %>%
filter(date1 == as.Date("2020-04-15")) %>%
select(post_code, Total_cases_norm:Latinx_cases_norm) %>%
mutate(across(Total_cases_norm:Latinx_cases_norm, ~replace_na(.x, 0))) %>%
pivot_longer(-post_code, names_to = "race", values_to = "cases",
names_prefix = "Cases_")
cases_long_norm_end <- norm_data_total %>%
filter(date1 == as.Date("2021-03-07")) %>%
select(post_code, Total_cases_norm:Latinx_cases_norm) %>%
mutate(across(Total_cases_norm:Latinx_cases_norm, ~replace_na(.x, 0))) %>%
pivot_longer(-post_code, names_to = "race", values_to = "cases",
names_prefix = "Cases_")
cases_long_norm_start %>%
filter(race == "Total_cases_norm") %>%
ggplot(aes(y = fct_reorder(post_code, cases), x = cases)) +
geom_col(fill = "darkolivegreen4") +
labs(y = "State", x = "Percentage of Population with COVID", title = "Percentage of Cases in Population by State in April 2020") +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 5))
cases_long_norm_end %>%
filter(race == "Total_cases_norm") %>%
ggplot(aes(y = fct_reorder(post_code, cases), x = cases)) +
geom_col(fill = "darkolivegreen4") +
labs(y = "State", x = "Percentage of Population with COVID", title = "Percentage of Cases in Population by State in March 2021") +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 5))
bw_start <- cases_long_norm_start %>%
filter(race %in% c("White_cases_norm", "Black_cases_norm")) %>%
ggplot(aes(x = reorder(post_code, desc(post_code)), y = cases, color = race)) +
geom_point() +
coord_flip() +
scale_color_manual("Race", values = c("darkolivegreen4", "lightgoldenrod2")) +
labs(x = "State", y = "Cases Normalized for Population",
title = "Comparing Proportion of Cases by Race in each State in April 2020") +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 5))
bw_start
bw_end <- cases_long_norm_end %>%
filter(race %in% c("White_cases_norm", "Black_cases_norm")) %>%
ggplot(aes(x = reorder(post_code, desc(post_code)), y = cases, color = race)) +
geom_point() +
coord_flip() +
scale_color_manual("Race", values = c("darkolivegreen4", "lightgoldenrod2")) +
labs(x = "State", y = "Cases Normalized for Population",
title = "Comparing Proportion of Cases by Race in each State in March 2021") +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 5))
bw_end
lxw_start <- cases_long_norm_start %>%
filter(race %in% c("White_cases_norm", "Latinx_cases_norm")) %>%
ggplot(aes(x = reorder(post_code, desc(post_code)), y = cases, color = race)) +
geom_point() +
coord_flip() +
scale_color_manual("Race", values = c("darkolivegreen4", "lightgoldenrod2")) +
labs(x = "State", y = "Cases Normalized for Population",
title = "Comparing Proportion of Cases by Race in each State in April 2020") +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 5))
lxw_start
lxw_end <- cases_long_norm_end %>%
filter(race %in% c("White_cases_norm", "Latinx_cases_norm")) %>%
ggplot(aes(x = reorder(post_code, desc(post_code)), y = cases, color = race)) +
geom_point() +
coord_flip() +
scale_color_manual("Race", values = c("darkolivegreen4", "lightgoldenrod2")) +
labs(x = "State", y = "Cases Normalized for Population",
title = "Comparing Proportion of Cases by Race in each State in March 2021") +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 5))
lxw_end
aw_start <- cases_long_norm_start %>%
filter(race %in% c("White_cases_norm", "Asian_cases_norm")) %>%
ggplot(aes(x = reorder(post_code, desc(post_code)), y = cases, color = race)) +
geom_point() +
coord_flip() +
scale_color_manual("Race", values = c("darkolivegreen4", "lightgoldenrod2")) +
labs(x = "State", y = "Cases Normalized for Population",
title = "Comparing Proportion of Cases by Race in each State in April 2020") +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 5))
aw_start
aw_end <- cases_long_norm_end %>%
filter(race %in% c("White_cases_norm", "Asian_cases_norm")) %>%
ggplot(aes(x = reorder(post_code, desc(post_code)), y = cases, color = race)) +
geom_point() +
coord_flip() +
scale_color_manual("Race", values = c("darkolivegreen4", "lightgoldenrod2")) +
labs(x = "State", y = "Cases Normalized for Population",
title = "Comparing Proportion of Cases by Race in each State in March 2021") +
theme(axis.title = element_text(size = 10),
axis.text = element_text(size = 5))
aw_end
Some interesting insights here, namely that in all states but Indiana, Kansas, Kentucky, Nebraska, Oklahoma, and Tennessee, the proportion of cases in the Black community is higher than the proportion of cases in the White community after normalization for population at the end of the data collection period in March of 2021. Here, we can clearly see that data collection was not as robust at the beginning of the pandemic with only 27 states and D.C. reporting cases by race at this point in April of 2020. However, even at the beginning of the pandemic, all states but Montana see disproportionate cases in Black communities over White communities. For the Latinx demographic group, the results are less stark, instead portraying higher proportions of White cases compared to the overall White population, regardless of the year. The scatterplot for Asian Americans are even more ambiguous, with different states seeing a different group with higher proportion of cases compared to total population, regardless of the year.
cases_long_counts_start <- norm_data_total %>%
filter(date1 == as.Date("2020-04-15")) %>%
select(post_code, Cases_Total:Cases_Unknown) %>%
mutate(across(Cases_Total:Cases_Unknown, ~replace_na(.x, 0))) %>%
pivot_longer(-post_code, names_to = "race", values_to = "cases",
names_prefix = "Cases_")
cases_long_counts_end <- norm_data_total %>%
filter(date1 == as.Date("2021-03-07")) %>%
select(post_code, Cases_Total:Cases_Unknown) %>%
mutate(across(Cases_Total:Cases_Unknown, ~replace_na(.x, 0))) %>%
pivot_longer(-post_code, names_to = "race", values_to = "cases",
names_prefix = "Cases_")
# long pop counts
pop_long_counts_start <- norm_data_total %>%
filter(date1 == as.Date("2020-04-15")) %>%
select(post_code, Total_pop:Latinx_pop) %>%
mutate(across(Total_pop:Latinx_pop, as.numeric)) %>%
pivot_longer(-post_code, names_to = "race", values_to = "pop") %>%
mutate(race = str_remove(race, "_pop"))
pop_long_counts_end <- norm_data_total %>%
filter(date1 == as.Date("2021-03-07")) %>%
select(post_code, Total_pop:Latinx_pop) %>%
mutate(across(Total_pop:Latinx_pop, as.numeric)) %>%
pivot_longer(-post_code, names_to = "race", values_to = "pop") %>%
mutate(race = str_remove(race, "_pop"))
# joining the two (and pivotting again)
cases_long_counts_start <- cases_long_counts_start %>%
left_join(pop_long_counts_start) %>%
pivot_longer(-c(post_code, race), names_to = "group", values_to = "count")
cases_long_counts_end <- cases_long_counts_end %>%
left_join(pop_long_counts_end) %>%
pivot_longer(-c(post_code, race), names_to = "group", values_to = "count")
# creating stacked bars to compare by state
cases_long_counts_start %>%
filter(race != "Total") %>%
ggplot(aes(x = group, y = count, fill = race)) +
scale_fill_manual(values = c("#800000FF", "#FFA319FF", "#8A9045FF", "#155F83FF", "#C16622FF", "#8F3931FF", "#58593FFF", "#767676FF", "#350E20FF")) +
geom_col(position = "fill") +
facet_wrap(~ post_code)
cases_long_counts_end %>%
filter(race != "Total") %>%
ggplot(aes(x = group, y = count, fill = race)) +
scale_fill_manual(values = c("#800000FF", "#FFA319FF", "#8A9045FF", "#155F83FF", "#C16622FF", "#8F3931FF", "#58593FFF", "#767676FF", "#350E20FF")) +
geom_col(position = "fill") +
facet_wrap(~ post_code)
Something interesting that pops up in this view is the minimal proportion of cases in Latinx communities. The proportion of community members with cases is significantly smaller for Latinx communities than the total Latinx population might suggest there would be. In several states, such as Mississippi, South Carolina, Louisiana, and Maryland, there are higher proportions of Black cases than the total Black population in each state might suggest. This suggests a potentially disproportionate experience of COVID-19 among different communities.
Another question I wanted to explore with my data was whether certain states had higher rates of infection over others. I was specifically looking to test the hypothesis that more liberal states with stricter COVID-19 regulations had lower rates of infections, hospitalizations, and deaths of Black Americans compared to more conservative states with lenient COVID-19 regulations. I first looked at how the proportion of Black cases to the state’s Black population compared to 2020 election results. Then I looked at how the proportion of White cases to the state’s White population compared to 2020 election results.
party <- norm_data_total %>%
mutate(party2020 = ifelse(post_code %in% c("AK", "AL", "AR", "FL", "ID", "IN", "IA", "KS", "KY", "LA", "MS", "MO", "MT", "NE", "NC", "ND", "OH", "OK", "SC", "SD", "TN", "TX", "UT", "WV", "WY"), "Republican", "Democrat"
))
usstates <- ne_states(iso_a2 = "US", returnclass = "sf")
us48 <- usstates %>%
filter(!woe_name %in% c("Alaska", "Hawaii"))
party1 <- party %>%
filter(!post_code %in% "AK" | !post_code %in% "HI") %>%
mutate(postal = post_code)
us48_join <- left_join(us48, party1, by = "postal")
plot6 <- ggplot(us48_join) +
geom_sf(aes(fill= party2020), alpha = 0.4) +
scale_fill_manual("Party Affiliation \n in 2020 Election", values = c("dark blue", "dark red")) +
theme(legend.position = "bottom") +
labs(title = "2020 Election Results")
plot6
plot7 <- ggplot(us48_join) +
geom_sf(aes(fill= Black_cases_norm)) +
scale_fill_carto_c(palette = "Burg", name = "Black COVID-19 Cases") +
theme(legend.position = "bottom") +
labs(title = "Black COVID-19 Cases by State")
plot7
plot7.1 <- ggplot(us48_join) +
geom_sf(aes(fill= White_cases_norm)) +
scale_fill_carto_c(palette = "Burg", name = "White COVID-19 Cases") +
theme(legend.position = "bottom") +
labs(title = "White COVID-19 Cases by State")
plot7.1
At first glance, it doesn’t seem like the maps depicting proportion of members in a demographic community with COVID However, if you look closer at the scale, it’s clear that the proportion of members in white communities with COVID is much lower than the proportion in Black communities. In general, most states seem to have similar proportions across demographic communities, regardless of how their state voged in the 2020 election.
I also created a box plot to see the distribution of cases by 2020 Election results.
#total cases in a boxplot by how the state voted in the 2020 election histogram
#indicator variable for political party
partybplot <- party %>%
ggplot(aes(x = party2020, y = Total_cases_norm, fill = party2020)) +
geom_boxplot(outlier.size = 0.5, alpha= 0.4) +
scale_fill_manual("Party Affiliation \n in 2020 Election", values = c("dark blue", "dark red")) +
labs(x = "Party of Candidate Voted for in the 2020 Election", y = "Distribution of Cases", title = "Distribution of Total Cases by 2020 Election Results ")
partybplot
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
After normalizing for population differences, the states with the higher counts of COVID cases are states that voted for the Republican candidate in the 2020 election. The interquartile range for the Republican voting states is larger, indicating more variability, but ultimately the median proportion of cases to total state population and maximum cases is higher for Republican voting states than Democrat voting states. Other factors could contribute to this higher count, but there does seem to maybe be a relationship between state voting patterns and the prevalance of COVID.
Overall, my data visualizations seem to generally imply a disproportionate experience of COVID-19 based on race, with differences being especially stark between the Black and White communities. Among other racial groups, differences in proportions and case prevalance do seem to exist but to a lesser degree. I’m not surprised by the findings of this project, but I think it’s interesting that differences are less tangible for other racial groups other than the Blck community. I do wish there was more data available from earlier on in the pandemic at the start of data collection. I think this could’ve allowed for a more robust analysis, especially when comparing case counts and response from the start of the pandemic to partway through it. I think more could be done to explore the experiences of demographic groups other than larger minority populations, such as Native Hawaiian and Pacific Islanders and American Indian/Alaskan Natives with that data.