paygap <- read.csv("paygap.csv")
departments <- c("Operations", "Management", "Administration", "Sales", "Engineering")
paygap <- paygap %>% 
  mutate(dept = factor(dept, 
                       levels = departments))
educ <- c("College", "Masters", "PhD", "High School")

paygap <- paygap %>% 
  mutate(edu = factor(edu,
                      levels = educ))

paygap <- paygap %>% 
  mutate(age_bin = cut(age,
                     breaks = c(0, 25, 35, 45, 55, Inf),
                     right = FALSE)) %>%
  mutate(total_pay = basePay + bonus) %>% 
  mutate_if(is_character, fct_infreq) %>%
  mutate(age_bin = fct_infreq(age_bin)) -> 
  paygap

A Glimpse into the data

This project uses data from a Glassdoor survey.

head(paygap, 10) %>%
  kbl(align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(height = "300px")
jobTitle gender age perfEval edu dept seniority basePay bonus age_bin total_pay
Graphic Designer Female 18 5 College Operations 2 42363 9938 [0,25) 52301
Software Engineer Male 21 5 College Management 5 108476 11128 [0,25) 119604
Warehouse Associate Female 19 4 PhD Administration 5 90208 9268 [0,25) 99476
Software Engineer Male 20 5 Masters Sales 4 108080 10154 [0,25) 118234
Graphic Designer Male 26 5 Masters Engineering 5 99464 9319 [25,35) 108783
IT Female 20 5 PhD Operations 4 70890 10126 [0,25) 81016
Graphic Designer Female 20 5 College Sales 4 67585 10541 [0,25) 78126
Software Engineer Male 18 4 PhD Engineering 5 97523 10240 [0,25) 107763
Graphic Designer Female 33 5 High School Engineering 5 112976 9836 [25,35) 122812
Sales Associate Female 35 5 College Engineering 5 106524 9941 [35,45) 116465

Visualizations

Pay gap on the basis of mean_pay for both genders of all occupations, education levels and ages

Gender pay gap measures the difference between average remuneration for men and women in the workforce. The graph shows that men on an average get paid more than women regardless of their age, education level or age.

#mcp comments, representing , no need paygap_clean, just add it to the code chunk while ggplotting, change groupby while plotting other graphs, just use the mean, bin total pay as categories dont do it in the beginning but after, density estimates(total pay not mean)- value of total pay fill gender, geom col is same as geom bar stat=identity so use geom col. 
paygap %>% 
  group_by(gender) %>% 
  summarize(mean_pay = mean(total_pay)) %>% 
  ggplot(aes(x = gender, y = mean_pay, fill = gender))+
  geom_col() +
  labs(
    title = "Gender Pay Gap based on Mean Pay",
    x = "Gender",
    y = "Mean Pay") +
  scale_fill_manual(values = c(wes_palette("Rushmore1") [3], wes_palette("Royal1") [2]))

Gender, mean pay and occupation

The graph shows the difference in average salaries for men and women of the same occupation. It can be observed that there is a higher pay gap in warehouse associate, software engineer, data scientist and driver occupations.

paygap %>% 
  group_by(gender, jobTitle) %>% 
  summarize(mean_pay = mean(total_pay)) %>% 
  ggplot(aes(x = jobTitle, y = mean_pay, fill = gender ) ) +
  geom_col() +
  labs(
    title = "Gender Pay Gap based on Occupation",
    x = "Occupation",
    y = "Mean Pay") +
  scale_fill_manual(values = c(wes_palette("Royal1") [4], wes_palette("Royal2") [5])) +
  coord_flip()
## `summarise()` has grouped output by 'gender'. You can override using the
## `.groups` argument.

paygap %>% 
  group_by(gender, jobTitle) %>% 
  summarize(mean_pay = mean(total_pay)) %>% 
  ggplot(aes(x = jobTitle, y = mean_pay) ) +
  geom_line(color = "dark grey") +
  geom_point(aes(color = gender, size = 0.0005)) +
  guides(color = guide_legend("Gender")) +
  labs(
    title = "Gender Pay Gap based on Occupation",
    x = "Occupation",
    y = "Mean Pay") +
  scale_color_manual(values = c(wes_palette("Royal1") [4], wes_palette("Royal2") [5])) +
  coord_flip() +
  theme_minimal()
## `summarise()` has grouped output by 'gender'. You can override using the
## `.groups` argument.

Gender, mean pay and age

The graph shows the difference in average salaries for men and women of various age groups. It can be observed that there is consistent pay increase as the age of the employees increase for both men and women yet women are considerably paid less than men even though they are in the same age group.

paygap %>% 
  group_by(gender, age_bin) %>% 
  summarize(mean_pay = mean(total_pay)) %>% 
  ggplot(aes(x = age_bin, y = mean_pay, fill = gender ) ) +
  geom_bar(stat = "identity") +
  labs(
    title = "Gender Pay Gap based on Age",
    x = "Age Bins",
    y = "Mean Pay") +
  scale_fill_manual(values = c(wes_palette("Zissou1") [1], wes_palette("Zissou1") [4])) +
  facet_wrap(~gender)
## `summarise()` has grouped output by 'gender'. You can override using the
## `.groups` argument.

Gender, mean pay and education

The density plot shows the difference in average salaries for men and women of different educational backgrounds. Women with high school education level as well as masters degree level get paid less than men with the same educational backgrounds.

paygap %>% 
  group_by(gender, edu) %>% 
  ggplot(aes(x = total_pay, fill = gender)) +
  geom_density(alpha = 0.4)+
  labs(
    title = "Gender Pay Gap based on Education",
    x = "Education Levels",
    y = "Total Pay") +
  scale_fill_manual(values = c(wes_palette("Royal1") [2], wes_palette("Rushmore1") [3])) +
  facet_wrap(~edu)

Source: https://glassdoor.box.com/shared/static/beukjzgrsu35fqe59f7502hruribd5tt.csv