Problem
You run a series of trials. Trials are independent of each other and any other results.
The probability of success on a trial starts at 2% and increases linearly / additively by 3% after each failure.
The probability of success resets to 2% after a success; but this is irrelevant as you only run trials until you obtain a single success. Should you be interested in multiple successes then merely repeat the entire experiment exactly.

Introduction
Why?
Do you ever work through an analysis example in class with code and then never get around to typing it up nicely?
Do you ever get students copy-pasting assignments from their classmates?
Do you ever type up a piece of analysis by copy-pasting graph after graph or table after table, only to realise there’s a problem with the data and you have to redo everything?
Do you ever need to include code from one or more languages and struggle to get the syntax highlighted nicely?

Currently marking student 2001234567
# This code block (if run manually) generates an Excel file with different data for each student according to the list of student numbers
# In this case the data is for a regression with one significant explanatory variable and one irrelevant variable
library(openxlsx)
students <- c(2001234567,2012345678,2000000123)
nStudents <- length(students)
n <- 100
datasets <- vector('list',nStudents)
for (i in 1:nStudents) {
x1 <- rnorm(n,4,1)
x2 <- rgamma(n,4,2)
y <- 20 + 2*x1 + rnorm(n)
datasets[[i]] <- data.

Experiment description
Opportunistic potatoes are a pest that affect farms in many negative ways and need to be controlled. This experiment attempts to find the economically optimal dose of a specific treatment for this purpose.
In the original experiment 8 measures were taken for various cultivars and doses, but we will restrict ourselves to only two now.
Introduction and disclaimer
The data collected pertains to a specific set of conditions, and we should not try to extend the results beyond that setting without seriously considering and accounting for any systematic differences between that setting and any broader setting.

Introduction
First we read in the data
The data is available here.
library(openxlsx)
sourcedata <- read.xlsx('IBM19891998.xlsx')
lret <- sourcedata$LogReturns
dates <- sourcedata$year + sourcedata$month/12 + sourcedata$day/365
Then we plot the data
library(viridisLite)
cols <- viridis(3)
par(mfrow=c(2,1), mar=c(4,4,2,0.2))
plot(dates, lret, type='l', col=cols[1], lwd=1, xlab='Date', ylab='Log return', main='IBM')
hist(lret, 50, col=cols[2], density = 20, ylab='Frequency', xlab='Log return', main='IBM')
Stan
The new way to fit statistical models is the STAN interface.

Introduction
We proceed with a logistic regression analysis of the famous Challenger O-ring data, but from a Bayesian perspective.
Data and properties
The data is available here.
alpha <- 0.1 # The chosen significance level for intervals
library(openxlsx)
alldata <- read.xlsx('challenger.xlsx','Challenger')
n <- nrow(alldata)
The significance level chosen is 0.1. This means that we only look at results where the p-value is less than 0.1, as other results could easily just be chance variation.

© 2021 · Powered by the Academic theme for Hugo.