Data Viz in R: Week 2

Problem set questions?

Evaluating Visualization

Cairo’s Qualities
Schwabish’s Guidelines
D’Ignazio’s and Klein’s Emotion/Embodiment

tidyverse: factors

Viz in R: ggplot

To the practice script

Problem set questions?

Artwork by @allison_horst

Evaluating Visualization

Cairo’s Qualities

Is it Truthful? Does it get the information as right as possible? Are you honest with yourself and your audience?

Is it Functional? does it help your audience interpret the information correctly? Has the purpose in producing the figure shaped the information?

Is it Beautiful? Does it elicit an emotional experience for the reader – awe, wonder, pleasure, surprise? If appropriate, is it simple and elegant?

Is it Insightful? Does it reveal new knowledge, either spontaneously or gradually? Does it reveal something to the reader, helping them build knowledge?

Is it Englightening? Does it provoke a reader to change their mind? Does it contribute to improving well being?

Schwabish’s Guidelines

Show the data: understanding what data is central to the point (highlighting)
Reduce the clutter: removing unnecessary visual elements (gridlines, 3d effect, texture/fill, etc.)
Integrate graphics and text: directly labeling where possible, creating active titles that describe the conclusion rather than the data, and adding annotations and detail to guide readers
Avoid spaghetti: small multiples to break up one overly-full chart into many
Start with gray: be intentional in use of color, labels, etc.

Considering equity: Are we using language and images that are inclusive? When do we need to provide historical and social context for problems people are facing? How might our work might be misunderstood? When should we collaborate to bridge substantive and visual?

From Better Data Visualizations

D’Ignazio’s and Klein’s Emotion/Embodiment

Reject the idea that data is neutral, that data visualization can be objective (true?); instead that data visualization should embrace emotion and embodiment.

Visualization as rhetoric: data visualizations arise from a set of “choices about the selection and representation of reality” – it is rhetorical whether it intends to persuade or not.
Data visceralization: representations of data to be experienced beyond sight (e.g., emotionally as well as physically, audio as well as sight)
Periscopic’s visualization of gun deaths

From Data Feminism: Washington Post Figure

Next week: your examples of bad data viz!

tidyverse: factors

Factors are variables which take on a limited number of values, aka categorical variables. In R, factors are stored as a vector of integer values with the corresponding set of character values you’ll see when displayed (colloquially, labels; in R, levels).

property %>% count(condition) # currently a character

property %>% 
  mutate(condition = factor(condition)) %>% # make a factor
  count(condition)

# assert the ordering of the factor levels
cond_levels <- c("Excellent", "Good", "Average", "Fair", "Poor", "Very Poor", "Unknown")
property %>% 
  mutate(condition = factor(condition, levels = cond_levels)) %>% 
  count(condition)

The forcats package, part of the tidyverse, provides helper functions for working with factors. Including

fct_infreq(): reorder factor levels by frequency of levels
fct_reorder(): reorder factor levels by another variable
fct_relevel(): change order of factor levels by hand
fct_recode(): change factor levels by hand
fct_collapse(): collapse factor levels into defined groups
fct_lump(): collapse least/most frequent levels of factor into “other”

Viz in R: ggplot

ggplot breaks up the task of making a graph into a series of distinct tasks (layering); each task is carried out in code based on identifiable functions (geom_, scale_, labs, legends, and more).

To the practice script

Distributions (fill, color, alpha; axes ranges, legends, and labels)

Amounts (fill, position, color; legends and themes)

Proportions (polar coordinates, themes)

XKCD, Randall Munroe, https://xkcd.com/1338/