Beyond multivariate applications: Quarto for reproducible interactive documents

Sean van der Merwe, Johané Nienkemper-Swanepoel, Raeesa Ganey

Introduction

Welcome to our workshop!

You can follow the slides at

seanvdm.co.za/mdag2025

On the last slide of Part 1 you will find a link to all the resources for the day.

Outline

Hit refresh (F5 or Ctrl+r) if nothing is showing.

Part 1

Think about your own situations. With each example, think about where you might apply these ideas in your own

  • Research presentations
  • Teaching presentations
  • Resources you make available to students

Interactive data exploration

Example: survey exploration

What about time series?

We could simulate a pair of \(VARIMA_2(1,1,1)-tGARCH(1,1)\) financial time series like so:

Or you can just explore data directly

The ACTG 315 dataset, available in the ushr R package, includes longitudinal measurements of HIV viral load (log\(_{10}\) RNA copies/mL) over time. It features data on 46 patients, with the longest measurement recorded on Day 196 after baseline (Day 0).

Data set

Interactive model fits for conveying research results

Exploring simulation study results

When you want to know whether two population means are different based on independent samples, is it better to

  1. do a F test for differing variances followed by a t-test assuming equal/unequal variances based on the previous result, or
  2. do a unequal variances t-test always?

Let’s simulate a few scenarios:

  1. Equal means, equal variances
  2. Equal means, different variances (1 vs 2)
  3. Equal variances, different means (0 vs 1)
  4. Unequal variances, different means

We calculate the power when testing at \(\alpha =\) 0.05 and compare the approaches under each scenario.

Simulation study results

Zoom in to see the ‘difference’.

Shower thought for the day

When is it better to use a table and when is it better to use a plot to present results?

  • Do you have variables with many levels, particularly continuous or ordinal variables?
    • Do you have 5 sample sizes or 50 threshold choices?
    • What about variable hyperparameters?
  • A plot better conveys information that changes gradually
    • allowing the reader to see patterns in the bigger picture
  • Do you have a bunch of factors with few levels, particularly binary or trinary factors?
  • A table better accommodates such cases because there are only so many factors you can effectively show on a plot
    • Once you’ve gone through colour, shape/linetype, size, facet grid or lattice, then the rest becomes noise
  • Does someone need to process your results further?
  • Are there so many factors and statistics that it is difficult to fit on a page?
  • Output big tables to a neat spreadsheet and upload it publicly (e.g. GitHub or Figshare)

Three continuous, 1 nominal

Plotting data on the model

One of the best ways to evaluate the fit of a model is to plot the model and show the data over it. Here we are analysing a reduction in pollution at a dam:

Curvy data

Teaching a class

Example: Let’s shuffle a deck of cards

deck <- paste( c(2:10, 'Jack', 'Queen', 'King', 'Ace') |> rep(times=4), 'of',
               c('Spades','Diamonds','Clubs','Hearts') |> rep(each=13) )
# To draw a hand of 7 cards:
hand <- deck |> sample(7)

# Shuffling is taking a sample the same size (52) without replacement:
shuffled_deck <- deck |> sample(length(deck))

# To do a bootstrap sample just add: , TRUE

Explaining MCMC

This plot is the basis for my MCMC class:

plotly::plot_ly(z=~volcano) |> plotly::add_surface()

Spatial data class

Exploring code live

Text information

Perhaps we shall summarise the complete works of Shakespeare in a word cloud?

Multivariate fun - Let’s try a biplot

X <- rnorm(200) |> matrix(50)
colnames(X) <- paste('Var', 1:ncol(X))
Group <- sample(c('🐬', '🐪'), nrow(X), TRUE)
biplot.plotly(X, grouping_points = Group, width = fig_w, height = fig_h) # Sean's function

Conclusion

  • Interactive visualisations have many uses
  • Presentations do not have to be static
    • Interactivity is better than animation
  • Have you thought about how such tools can help you convey information to your audiences?

This presentation was created using the Reveal.js format in Quarto, using the RStudio IDE. Background image created using image editor GIMP by compositing images from CoPilot.

Workshop resources

You likely received an email with these resources:

All the workshop resources are on GitHub