Bayes Assignment 1 of 2025

Your goal with this assignment is to produce a short introductory guide to a common statistical topic for your fellow students, as if you were a statistics lecturer. 

The assignment should be 2 to 4 pages. It should cover the basics of the topic, in-text citations (with matching references at the end) for optional further reading, key underlying assumptions, common misconceptions regarding the topic, and two relatable examples (one that fits the assumptions and one that does not). 

For the examples, the data should be simulated by you to best illustrate the concept. Your examples must start with a fake introduction to the data that is decolonialised. Your variable names and descriptions should relate to a problem in any South African culture (be it Zulu, Xhosa, South African English, Benoni culture, farm culture, Shopright culture, or anything else that's special to South Africans).

One of the specific graduate outcomes being advanced in this course is the ability to do reproducible research, that is research where someone else (or future you) can take your work and easily follow along with what you did to the point of being able to get to the same results (and hopefully the same conclusions). One way to advance this principle is to submit an integrated document that interlaces your structured thoughts (in sections with headings), links & citations, code, and results neatly. Thus, heavy mark allocation will be given to these principles. 

For this assignment you should generate the data using R code, then illustrate the generated data visually, before using it to help explain or demonstrate the statistical principle chosen. Use the function set.seed(your student number) once at the start for stable output. 

Examples of topics are things you touched on in your undergraduate studies but didn't study in depth. Things like Kruskall-Wallis or Mann-Whitney tests, visual assessment of a QQ plot, Mahalanobis distances or other metrics, interval coverage, prediction intervals for exponential smoothing models, time series filters, interpreting a box plot, the EM algorithm, etc. Or you could ask yourself an interesting question, like how does maximum likelihood give intervals? What do those intervals mean? Feel free to come up with your own ideas, but the ideas should not match any other student if possible - try to be creative.