# statistics

Introduction Universities require that students be given opportunities to evaluate the modules/courses/subjects which they attend. Evaluation processes are often heavily flawed. I hope to address a number of the flaws. I want to make it easier and faster for lecturers to perform module evaluations. Why use my system? It requires no human intervention by anyone other than yourself and the students. This eliminates the wait times that occur when other people are involved.

## The power of markdown for teaching, research, and consultation

Introduction Why? Do you ever work through an analysis example in class with code and then never get around to typing it up nicely? Do you ever get students copy-pasting assignments from their classmates? Do you ever type up a piece of analysis by copy-pasting graph after graph or table after table, only to realise there’s a problem with the data and you have to redo everything? Do you ever need to include code from one or more languages and struggle to get the syntax highlighted nicely?

## Assignment Memo Example

Currently marking student 2001234567 # This code block (if run manually) generates an Excel file with different data for each student according to the list of student numbers # In this case the data is for a regression with one significant explanatory variable and one irrelevant variable library(openxlsx) students <- c(2001234567,2012345678,2000000123) nStudents <- length(students) n <- 100 datasets <- vector('list',nStudents) for (i in 1:nStudents) { x1 <- rnorm(n,4,1) x2 <- rgamma(n,4,2) y <- 20 + 2*x1 + rnorm(n) datasets[[i]] <- data.

## Analysis of experiment by Talana Cronje, UFS, on behalf of Potatoes South Africa

Experiment description Opportunistic potatoes are a pest that affect farms in many negative ways and need to be controlled. This experiment attempts to find the economically optimal dose of a specific treatment for this purpose. In the original experiment 8 measures were taken for various cultivars and doses, but we will restrict ourselves to only two now. Introduction and disclaimer The data collected pertains to a specific set of conditions, and we should not try to extend the results beyond that setting without seriously considering and accounting for any systematic differences between that setting and any broader setting.

## Skew t fit to IBM log returns

Introduction First we read in the data The data is available here. library(openxlsx) sourcedata <- read.xlsx('IBM19891998.xlsx') lret <- sourcedata$LogReturns dates <- sourcedata$year + sourcedata$month/12 + sourcedata$day/365 Then we plot the data library(viridisLite) cols <- viridis(3) par(mfrow=c(2,1), mar=c(4,4,2,0.2)) plot(dates, lret, type='l', col=cols, lwd=1, xlab='Date', ylab='Log return', main='IBM') hist(lret, 50, col=cols, density = 20, ylab='Frequency', xlab='Log return', main='IBM') Stan The new way to fit statistical models is the STAN interface.

## Case study: The Challenger data

Introduction We proceed with a logistic regression analysis of the famous Challenger O-ring data, but from a Bayesian perspective. Data and properties The data is available here. alpha <- 0.1 # The chosen significance level for intervals library(openxlsx) alldata <- read.xlsx('challenger.xlsx','Challenger') n <- nrow(alldata) The significance level chosen is 0.1. This means that we only look at results where the p-value is less than 0.1, as other results could easily just be chance variation.