How to create within-subject scatter plots in R with ggplot2
Scatterplots can be a very effective form of visualization for data from within-subjects experiments. You’ll often see within-subject data visualized as bar graphs (condition means, and maybe mean difference if you’re lucky.) But alternatives exist, and today we’ll take a look at within-subjects scatterplots.
Today, we’ll take a look at creating a specific type of visualization for data from a within-subjects experiment (also known as repeated measures, but that can sometimes be a misleading label). You’ll often see within-subject data visualized as bar graphs (condition means, and maybe mean difference if you’re lucky.) But alternatives exist, and today we’ll take a look at within-subjects scatterplots.
For example, Ganis and Kievit (2015) asked 54 people to observe, on each trial, two 3-D shapes with various rotations and judge whether the two shapes were the same or not.
There were 4 angles (0, 50, 100, and 150 degree rotations), but for simplicity, today we’ll only look at items that were not rotated with respect to each other, and items rotated 50 degrees. The data are freely available (thanks!) in Excel format, and the below snippet loads the data and cleans into a useable format:
if (!file.exists("data.zip")) {download.file("https://ndownloader.figshare.com/files/1878093", "data.zip")}unzip("data.zip")files <-list.files("Behavioural_data/",pattern ="sub[0-9]+.xlsx",full.names = T)dat <-map( files,~read_xlsx(.x, range ="A4:G100", col_types =rep("text", 7))) %>%bind_rows(.id ="id")dat <- dat %>%filter(angle %in%c("0", "50")) %>%transmute(id =factor(id),angle =factor(angle),rt =as.numeric(Time),accuracy =as.numeric(`correct/incorrect`) )
Warning: There was 1 warning in `transmute()`.
ℹ In argument: `rt = as.numeric(Time)`.
Caused by warning:
! NAs introduced by coercion
Example data.
id
angle
rt
accuracy
1
0
1355
1
1
50
1685
1
1
50
1237
1
1
0
1275
1
1
50
2238
1
1
0
1524
1
We’ll focus on comparing the reaction times between the 0 degree and 50 degree rotation trials.
Subject means
We’ll be graphing subjects’ means and standard errors, so we compute both first
This figure shows quite clearly that the mean reaction time in the 50 degree angle condition was higher than in the 0 degree angle condition, and the spread across individuals in each condition. However, we often are specifically interested in the within-subject effect of condition, which would be difficult to visually display in this image. We could draw lines to connect each point, and the effect would then be visible as a “spaghetti plot”, but while useful, these plots may sometimes be a little overwhelming especially if there’s too many people (spaghetti is great but nobody likes too much of it!)
Within-subject scatterplots
To draw within-subjects scatterplots, we’ll need a slight reorganization of the data, such that it is in wide format with respect to the conditions.
Then we can simply map the per-subject angle-means and standard errors to the X and Y axes. I think it’s important for these graphs to usually have a 1:1 aspect ratio, an identity line, and identical axes, which we add below.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
This plot shows each person (mean) as a point and their SEs as thin lines. The difference between conditions can be directly seen by how far from the diagonal line the points are. Were we to use CIs, we could also see subject-specific significant differences. Points above the diagonal indicate that the person’s (mean) RT was greater in the 50 degrees condition. All of the points lie below the identity line, indicating that the effect was as we predicted, and robust across individuals.
This is a very useful diagnostic plot that simultaneously shows the population- (or group-) level trend (are the points, on average, below or above the identity line?) and the expectation (mean) for every person (roughly, how far apart the points are from each other?). The points are naturally connected by their location, unlike in a bar graph where they would be connected by lines. Maybe you think it’s an informative graph; it’s certainly very easy to do in R with ggplot2. Also, I think it is visually very convincing, and doesn’t necessarily lead one to focus unjustly just on the group means: I am both convinced and informed by the graph.
Conclusion
Within-subject scatter plots are pretty common in some fields (psychophysics), but underutilized in many fields where they might have a positive impact on statistical inference. Why not try them out on your own data, especially when they’re this easy to do with R and ggplot2?
Recall that for real applications, it’s better to transform or model reaction times with a skewed distribution. Here we used normal distributions just for convenience.
Finally, this post was made possible by the Ganis and Kievit (2015) who generously have shared their data online.
References
Ganis, Giorgio, and Rogier Kievit. 2015. “A New Set of Three-Dimensional Shapes for Investigating Mental Rotation Processes: Validation Data and Stimulus Set.”Journal of Open Psychology Data 3 (1). https://doi.org/10.5334/jopd.ai.