R

Increasing probability of success

Problem You run a series of trials. Trials are independent of each other and any other results. The probability of success on a trial starts at 2% and increases linearly / additively by 3% after each failure. The probability of success resets to 2% after a success; but this is irrelevant as you only run trials until you obtain a single success. Should you be interested in multiple successes then merely repeat the entire experiment exactly.

The power of markdown for teaching, research, and consultation

Introduction Why? Do you ever work through an analysis example in class with code and then never get around to typing it up nicely? Do you ever get students copy-pasting assignments from their classmates? Do you ever type up a piece of analysis by copy-pasting graph after graph or table after table, only to realise there’s a problem with the data and you have to redo everything? Do you ever need to include code from one or more languages and struggle to get the syntax highlighted nicely?

Assignment Memo Example

Currently marking student 2001234567 # This code block (if run manually) generates an Excel file with different data for each student according to the list of student numbers # In this case the data is for a regression with one significant explanatory variable and one irrelevant variable library(openxlsx) students <- c(2001234567,2012345678,2000000123) nStudents <- length(students) n <- 100 datasets <- vector('list',nStudents) for (i in 1:nStudents) { x1 <- rnorm(n,4,1) x2 <- rgamma(n,4,2) y <- 20 + 2*x1 + rnorm(n) datasets[[i]] <- data.

Analysis of experiment by Talana Cronje, UFS, on behalf of Potatoes South Africa

Experiment description Opportunistic potatoes are a pest that affect farms in many negative ways and need to be controlled. This experiment attempts to find the economically optimal dose of a specific treatment for this purpose. In the original experiment 8 measures were taken for various cultivars and doses, but we will restrict ourselves to only two now. Introduction and disclaimer The data collected pertains to a specific set of conditions, and we should not try to extend the results beyond that setting without seriously considering and accounting for any systematic differences between that setting and any broader setting.

Skew t fit to IBM log returns

Introduction First we read in the data The data is available here. library(openxlsx) sourcedata <- read.xlsx('IBM19891998.xlsx') lret <- sourcedata$LogReturns dates <- sourcedata$year + sourcedata$month/12 + sourcedata$day/365 Then we plot the data library(viridisLite) cols <- viridis(3) par(mfrow=c(2,1), mar=c(4,4,2,0.2)) plot(dates, lret, type='l', col=cols[1], lwd=1, xlab='Date', ylab='Log return', main='IBM') hist(lret, 50, col=cols[2], density = 20, ylab='Frequency', xlab='Log return', main='IBM') Stan The new way to fit statistical models is the STAN interface.

Case study: The Challenger data

Introduction We proceed with a logistic regression analysis of the famous Challenger O-ring data, but from a Bayesian perspective. Data and properties The data is available here. alpha <- 0.1 # The chosen significance level for intervals library(openxlsx) alldata <- read.xlsx('challenger.xlsx','Challenger') n <- nrow(alldata) The significance level chosen is 0.1. This means that we only look at results where the p-value is less than 0.1, as other results could easily just be chance variation.