Currently marking student 2001234567
# This code block (if run manually) generates an Excel file with different data for each student according to the list of student numbers
# In this case the data is for a regression with one significant explanatory variable and one irrelevant variable
library(openxlsx)
students <- c(2001234567,2012345678,2000000123)
nStudents <- length(students)
n <- 100
datasets <- vector('list',nStudents)
for (i in 1:nStudents) {
x1 <- rnorm(n,4,1)
x2 <- rgamma(n,4,2)
y <- 20 + 2*x1 + rnorm(n)
datasets[[i]] <- data.
Experiment description
Opportunistic potatoes are a pest that affect farms in many negative ways and need to be controlled. This experiment attempts to find the economically optimal dose of a specific treatment for this purpose.
In the original experiment 8 measures were taken for various cultivars and doses, but we will restrict ourselves to only two now.
Introduction and disclaimer
The data collected pertains to a specific set of conditions, and we should not try to extend the results beyond that setting without seriously considering and accounting for any systematic differences between that setting and any broader setting.
Introduction
First we read in the data
The data is available here.
library(openxlsx)
sourcedata <- read.xlsx('IBM19891998.xlsx')
lret <- sourcedata$LogReturns
dates <- sourcedata$year + sourcedata$month/12 + sourcedata$day/365
Then we plot the data
library(viridisLite)
cols <- viridis(3)
par(mfrow=c(2,1), mar=c(4,4,2,0.2))
plot(dates, lret, type='l', col=cols[1], lwd=1, xlab='Date', ylab='Log return', main='IBM')
hist(lret, 50, col=cols[2], density = 20, ylab='Frequency', xlab='Log return', main='IBM')
Stan
The new way to fit statistical models is the STAN interface. We will attempt to use this interface via RSTAN to implement our models.
Introduction
We proceed with a logistic regression analysis of the famous Challenger O-ring data, but from a Bayesian perspective.
Data and properties
The data is available here.
alpha <- 0.1 # The chosen significance level for intervals
library(openxlsx)
alldata <- read.xlsx('challenger.xlsx','Challenger')
n <- nrow(alldata)
The significance level chosen is 0.1. This means that we only look at results where the p-value is less than 0.1, as other results could easily just be chance variation.
Introduction and disclaimer
The data simulated here pertains to a specific set of assumptions, and we should not try to extend the results beyond that without seriously considering and accounting for any systematic differences between that situation and any broader situation.
The data analysed here is inherently random. If the study were to be repeated then the results will differ.
While the computer software used is tried and tested, the analysis involves multiple human elements.
Observed data
We begin by reading in data. We will use the old FTSE stock exchange index for this example. We will try to reach stationarity via calculating log returns.
data(EuStockMarkets)
Index <- EuStockMarkets[,'FTSE']
n <- length(Index)
LogReturns <- diff(log(Index))
n1 <- length(LogReturns)
par(mfrow=c(1,2),bg=rgb(0,0,0),fg=rgb(1,1,1),col.axis='white',col.main='white',col.lab='white',mar=c(4,4,0.6,0.2))
plot(Index,col='#2A9FD6',lwd=2)
plot(LogReturns,col='#2A9FD6')
Next we consider any residual correlation.
source('autocor.r')
par(mfrow=c(1,2),bg=rgb(0,0,0),fg=rgb(1,1,1),col.axis='white',col.main='white',col.lab='white',mar=c(4,4,0.6,0.2))
autocor(LogReturns)
## Autocor Partial Autocor
## [1,] 0.0920293254 0.092029325
## [2,] -0.0080311473 -0.016641487
## [3,] 0.