This presentation focusses on some consultation projects that I worked on this year. Some are easy, some are more complex.
4 recent projects that might interest you
I will only highlight some of the results for discussion.
And maybe we might learn something…
This analysis was done for Elani Boshoff, a lecturer in English.
The strongest result in the study was that UFS students (studying English) were incredibly anxious about how much reading they will need to do at university.
I find this strange
UFS reading requirements are much lighter than at top US and EU universities by my understanding.
The best surveys are highly focussed, asking exactly what one wants to know to answer a specific research question.
Practically though, you can’t do a new survey every time you think of a new question. So people tend to ask a lot of questions at once.
The most interesting questions are a mix of nominal and ordinal items. One approach to visualising them together is Multiple Correspondence Analysis (MCA).
Item | Description |
---|---|
Answer 32 | I get nervous when I have to read academic texts on a digital device. |
Answer 7 | How do you usually read your prescribed academic texts? |
Answer 8 | If you could choose, what would you prefer to use to read academic texts? |
We note that those who get least nervous about reading on a digital device did and wanted to read on a laptop or computer; while those most nervous about reading on a digital device indicated that they want to read printed text. The more neutral respondents indicated that they used a mobile phone but might prefer a computer.
Yes, you can drink it.
Warning
Do not try this at home!
Solution
Mixed effects regression models
For each dependent variable, models considered included:
The predictive distributions (random future sample taken from a random future animal) will be compared against these limits and the probability of exceeding each safety threshold visualised.
In this “controlled” experiment some students were helped to brainstorm using a chatbot before starting their essay, while the control group brainstormed without AI. All the students then did their essays in what was supposed to be a controlled environment (no AI).
Treatment is not significant. Word count is barely significant, but you can’t give marks for writing if there’s no writing I suppose. The vast majority of the variation is assigned to the facilitators by the model.
This analysis is a nightmare. Your advice would be appreciated.
The reviewers say that the analysis is not complex enough, but it is already so complicated that they can’t fully follow or understand it 😢.
How many of you remember studying it?
Why isn’t it a major outcome in all our modules?
Because it is nearly impossible to teach!
Some universities (e.g. UCT I think) try to teach it at postgrad level, mostly via examples; but even if you work through a dozen examples it won’t really prepare you for what you’ll face at work.
Kaitlyn Taylor, M student in Animal Science, went to 3 reserves and filmed 252 video of giraffes reacting to sounds that she played them, then coded the behaviours for 180s from the sound (for 55 behaviours).
She also noted a lot of metadata for each video, such as:
Video_Number | Stimuli_Order | Wind_Direction |
Location | ID | Temperature |
Date | Sex | Speaker_Lat |
Time_At_Start | Age | Speaker_Long |
Habituation_Period | Group_Size | Observer_Lat |
Sound_Type | Speaker_Distance | Observer_Long |
Sound_Variation | Wind_Speed | Giraffe1_Lat |
But how do we go from the raw data to something we can analyse statistically to address the research questions?
Feature engineering varies dramatically, it can include:
Contrast | Average | Median | Lower | Upper | p_value | adj_p_val |
---|---|---|---|---|---|---|
SubAdult_M - Adult_F | 3.803 | 3.801 | 2.451 | 5.173 | 0.000 | 0.000 |
SubAdult_M - Adult_M | 4.506 | 4.514 | 2.993 | 5.930 | 0.000 | 0.000 |
SubAdult_M - SubAdult_F | 4.049 | 4.057 | 2.775 | 5.416 | 0.000 | 0.000 |
Adult_F - Adult_M | 0.703 | 0.704 | -0.353 | 1.704 | 0.183 | 0.548 |
SubAdult_F - Adult_M | 0.457 | 0.458 | -0.328 | 1.183 | 0.237 | 0.548 |
Adult_F - SubAdult_F | 0.246 | 0.241 | -0.605 | 1.105 | 0.572 | 0.572 |
Location | Contrast | Average | Median | Lower | Upper | p_value | adj_p_val |
---|---|---|---|---|---|---|---|
APGR | Drone - Dove | 3.803 | 3.801 | 2.451 | 5.173 | 0.000 | 0.000 |
APGR | Vehicle - Dove | 5.201 | 5.199 | 3.571 | 6.833 | 0.000 | 0.000 |
FGR | Drone - Dove | 1.969 | 1.977 | 1.059 | 2.854 | 0.000 | 0.000 |
WGL | Talking - Dove | 1.552 | 1.547 | 0.847 | 2.316 | 0.000 | 0.000 |
APGR | Talking - Dove | 2.306 | 2.305 | 1.021 | 3.689 | 0.001 | 0.017 |
WGL | Vehicle - Dove | 1.171 | 1.173 | 0.388 | 1.927 | 0.003 | 0.044 |
WGL | Drone - Dove | 1.106 | 1.110 | 0.334 | 1.852 | 0.004 | 0.046 |
APGR | Vehicle - Talking | 2.895 | 2.899 | 0.952 | 4.858 | 0.005 | 0.057 |
FGR | Talking - Dove | 1.116 | 1.122 | 0.259 | 1.932 | 0.010 | 0.098 |
FGR | Vehicle - Dove | 1.062 | 1.069 | 0.249 | 1.895 | 0.011 | 0.103 |
APGR | Drone - Talking | 1.497 | 1.504 | -0.120 | 3.222 | 0.080 | 0.642 |
FGR | Drone - Vehicle | 0.907 | 0.909 | -0.143 | 1.950 | 0.089 | 0.642 |
FGR | Drone - Talking | 0.853 | 0.858 | -0.232 | 1.881 | 0.117 | 0.704 |
APGR | Vehicle - Drone | 1.398 | 1.394 | -0.517 | 3.374 | 0.149 | 0.746 |
WGL | Talking - Drone | 0.445 | 0.452 | -0.465 | 1.358 | 0.348 | 1.000 |
WGL | Talking - Vehicle | 0.380 | 0.375 | -0.551 | 1.308 | 0.423 | 1.000 |
WGL | Vehicle - Drone | 0.065 | 0.062 | -0.867 | 1.006 | 0.900 | 1.000 |
FGR | Talking - Vehicle | 0.054 | 0.050 | -0.942 | 1.036 | 0.922 | 1.000 |
Since the models here have the same outcome but different explanatory variable construction, we can compare the models to determine which explanatory variable better matches the data generating process.
elpd_diff | se_diff | elpd_loo | se_elpd_loo | p_loo | se_p_loo | looic | se_looic | |
---|---|---|---|---|---|---|---|---|
HNR_Model | 0.000 | 0.000 | -1167.259 | 23.278 | 23.706 | 2.488 | 2334.518 | 46.557 |
Sound_Type_Model | -9.825 | 4.969 | -1177.084 | 23.671 | 42.469 | 4.446 | 2354.168 | 47.342 |
The HNR model is more parsimonious and more likely to resemble the data generating process (actual giraffe behavioural processes).
This presentation was created using the Reveal.js format in Quarto, using the RStudio IDE. Font and line colours according to UFS branding, and background image using image editor GIMP by compositing images from CoPilot.
2025/10/03 - Consultation