paygap <- read.csv("paygap.csv")
departments <- c("Operations", "Management", "Administration", "Sales", "Engineering")
paygap <- paygap %>%
mutate(dept = factor(dept,
levels = departments))
educ <- c("College", "Masters", "PhD", "High School")
paygap <- paygap %>%
mutate(edu = factor(edu,
levels = educ))
paygap <- paygap %>%
mutate(age_bin = cut(age,
breaks = c(0, 25, 35, 45, 55, Inf),
right = FALSE)) %>%
mutate(total_pay = basePay + bonus) %>%
mutate_if(is_character, fct_infreq) %>%
mutate(age_bin = fct_infreq(age_bin)) ->
paygap
This project uses data from a Glassdoor survey.
head(paygap, 10) %>%
kbl(align = "c") %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>%
scroll_box(height = "300px")
jobTitle | gender | age | perfEval | edu | dept | seniority | basePay | bonus | age_bin | total_pay |
---|---|---|---|---|---|---|---|---|---|---|
Graphic Designer | Female | 18 | 5 | College | Operations | 2 | 42363 | 9938 | [0,25) | 52301 |
Software Engineer | Male | 21 | 5 | College | Management | 5 | 108476 | 11128 | [0,25) | 119604 |
Warehouse Associate | Female | 19 | 4 | PhD | Administration | 5 | 90208 | 9268 | [0,25) | 99476 |
Software Engineer | Male | 20 | 5 | Masters | Sales | 4 | 108080 | 10154 | [0,25) | 118234 |
Graphic Designer | Male | 26 | 5 | Masters | Engineering | 5 | 99464 | 9319 | [25,35) | 108783 |
IT | Female | 20 | 5 | PhD | Operations | 4 | 70890 | 10126 | [0,25) | 81016 |
Graphic Designer | Female | 20 | 5 | College | Sales | 4 | 67585 | 10541 | [0,25) | 78126 |
Software Engineer | Male | 18 | 4 | PhD | Engineering | 5 | 97523 | 10240 | [0,25) | 107763 |
Graphic Designer | Female | 33 | 5 | High School | Engineering | 5 | 112976 | 9836 | [25,35) | 122812 |
Sales Associate | Female | 35 | 5 | College | Engineering | 5 | 106524 | 9941 | [35,45) | 116465 |
Gender pay gap measures the difference between average remuneration for men and women in the workforce. The graph shows that men on an average get paid more than women regardless of their age, education level or age.
#mcp comments, representing , no need paygap_clean, just add it to the code chunk while ggplotting, change groupby while plotting other graphs, just use the mean, bin total pay as categories dont do it in the beginning but after, density estimates(total pay not mean)- value of total pay fill gender, geom col is same as geom bar stat=identity so use geom col.
paygap %>%
group_by(gender) %>%
summarize(mean_pay = mean(total_pay)) %>%
ggplot(aes(x = gender, y = mean_pay, fill = gender))+
geom_col() +
labs(
title = "Gender Pay Gap based on Mean Pay",
x = "Gender",
y = "Mean Pay") +
scale_fill_manual(values = c(wes_palette("Rushmore1") [3], wes_palette("Royal1") [2]))
The graph shows the difference in average salaries for men and women of the same occupation. It can be observed that there is a higher pay gap in warehouse associate, software engineer, data scientist and driver occupations.
paygap %>%
group_by(gender, jobTitle) %>%
summarize(mean_pay = mean(total_pay)) %>%
ggplot(aes(x = jobTitle, y = mean_pay, fill = gender ) ) +
geom_col() +
labs(
title = "Gender Pay Gap based on Occupation",
x = "Occupation",
y = "Mean Pay") +
scale_fill_manual(values = c(wes_palette("Royal1") [4], wes_palette("Royal2") [5])) +
coord_flip()
## `summarise()` has grouped output by 'gender'. You can override using the
## `.groups` argument.
paygap %>%
group_by(gender, jobTitle) %>%
summarize(mean_pay = mean(total_pay)) %>%
ggplot(aes(x = jobTitle, y = mean_pay) ) +
geom_line(color = "dark grey") +
geom_point(aes(color = gender, size = 0.0005)) +
guides(color = guide_legend("Gender")) +
labs(
title = "Gender Pay Gap based on Occupation",
x = "Occupation",
y = "Mean Pay") +
scale_color_manual(values = c(wes_palette("Royal1") [4], wes_palette("Royal2") [5])) +
coord_flip() +
theme_minimal()
## `summarise()` has grouped output by 'gender'. You can override using the
## `.groups` argument.
The graph shows the difference in average salaries for men and women of various age groups. It can be observed that there is consistent pay increase as the age of the employees increase for both men and women yet women are considerably paid less than men even though they are in the same age group.
paygap %>%
group_by(gender, age_bin) %>%
summarize(mean_pay = mean(total_pay)) %>%
ggplot(aes(x = age_bin, y = mean_pay, fill = gender ) ) +
geom_bar(stat = "identity") +
labs(
title = "Gender Pay Gap based on Age",
x = "Age Bins",
y = "Mean Pay") +
scale_fill_manual(values = c(wes_palette("Zissou1") [1], wes_palette("Zissou1") [4])) +
facet_wrap(~gender)
## `summarise()` has grouped output by 'gender'. You can override using the
## `.groups` argument.
The density plot shows the difference in average salaries for men and women of different educational backgrounds. Women with high school education level as well as masters degree level get paid less than men with the same educational backgrounds.
paygap %>%
group_by(gender, edu) %>%
ggplot(aes(x = total_pay, fill = gender)) +
geom_density(alpha = 0.4)+
labs(
title = "Gender Pay Gap based on Education",
x = "Education Levels",
y = "Total Pay") +
scale_fill_manual(values = c(wes_palette("Royal1") [2], wes_palette("Rushmore1") [3])) +
facet_wrap(~edu)
Source: https://glassdoor.box.com/shared/static/beukjzgrsu35fqe59f7502hruribd5tt.csv