Problem set questions?

Artwork by @allison_horst

Artwork by @allison_horst

Wrangling & Visualization

ggplot expects tidy data, data that is structured such that

  • Each variable has its own column
  • Each observation has its own row
  • Each value has its own cell
Wickham and Grolemund Ch 12

Wickham and Grolemund Ch 12

Separate/Unite

separate: Split a single column into multiple columns by separating each cell in the column into a row of cells.

separate(df, col = rate, into = c("cases", "pop"), sep = "/")

unite: Combine several columns into a single column by uniting their values across rows.

unite(df, col = year, century:year, sep = "")

Joins

Joins merge data sets based on key variables. The syntax is always name_join(x, y, by = "key")

Animated visuals created by Garrick Aden-Buie

  • full_join(): keeps all observations in x and y

  • left_join(): keeps all observations in x

  • right_join(): keeps all observations in y

  • inner_join(): keeps observations in both x and y

More Vis

  • Amounts
  • Proportions
  • Scatterplots
  • Slope graphs, dumbell plots

To the Script

Bad Viz Examples

To the slack!

Do No Harm

  • “If I were one of the data points on this visualization, would I feel offended?” – Kim Bui
  • “If I only saw this chart on Twitter, would I draw the correct conclusion?”
  • Understand the data – how are they generated, what/whose purpose do they serve, who is included or excluded, and more
  • Use language thoughtfully, use colors thoughtfully, consider missing groups

Color

Scales

Color used to distinguish groups requires a qualitative color scale that is

  • finite and unordered
  • readily distinguished
  • approximately equivalent

Color used to representing values or comparative magnitude requires a sequential color scheme that

  • uses a many-valued gradient to distinguish larger/smaller values
  • represets the distance between values
  • may be single-hued, multi-hued, diverging

Color to highlight a group or threshold value requires accent colors that

  • stands out/pops relative to the rest of the colors
  • may be a single color against grey backdrop
  • may be baed on intensity of colors in color scale

Pitfalls

  1. Encoding too much information (e.g., too many groups)
    • Wilke suggests qualitative scales work best with 3 to 5 groups and work poorly beyond 8 groups
    • Labeling points is an alternative
  2. Coloring for the sake of coloring
    • And using oversaturated colors
  3. Using non-monotonic scales for values (e.g., the rainbow scale)
  4. Ignoring accessibility (e.g., color perception)

To the Script

R Markdown

R Markdown creates dynamic documents by combining markdown (an easy to write plain text format) with embedded R code chunks. When compiled, the code can be evaluated so that the code, its output, and your prose can be included in the final document to make reports reproducible.

  • R Markdown documents (.Rmd files) can be rendered to multiple formats including HTML and PDF.
  • The R code in an .Rmd document is processed by knitr, while the resulting .md file is rendered by pandoc to the final output formats (e.g. HTML or PDF).

R Markdown files contain

  • A YAML header (yet-another-markup-language), offset by —-
  • Text with markdown formatting
  • Chunks of R code, offset by ``` (keyboard shortcut: Cmd/Ctrl + Alt + I)

Additional Resources


XKCD Inspiration

XKCD, Randall Munroe, https://xkcd.com/2048/