R | SeanvdM

Assignment Memo Example

Currently marking student 2001234567 # This code block (if run manually) generates an Excel file with different data for each student according to the list of student numbers # In this case the data is for a regression with one significant explanatory variable and one irrelevant variable library(openxlsx) students <- c(2001234567,2012345678,2000000123) nStudents <- length(students) n <- 100 datasets <- vector('list',nStudents) for (i in 1:nStudents) { x1 <- rnorm(n,4,1) x2 <- rgamma(n,4,2) y <- 20 + 2*x1 + rnorm(n) datasets[[i]] <- data.

Analysis of experiment by Talana Cronje, UFS, on behalf of Potatoes South Africa

Experiment description Opportunistic potatoes are a pest that affect farms in many negative ways and need to be controlled. This experiment attempts to find the economically optimal dose of a specific treatment for this purpose. In the original experiment 8 measures were taken for various cultivars and doses, but we will restrict ourselves to only two now. Introduction and disclaimer The data collected pertains to a specific set of conditions, and we should not try to extend the results beyond that setting without seriously considering and accounting for any systematic differences between that setting and any broader setting.

Skew t fit to IBM log returns

Introduction First we read in the data The data is available here. library(openxlsx) sourcedata <- read.xlsx('IBM19891998.xlsx') lret <- sourcedata$LogReturns dates <- sourcedata$year + sourcedata$month/12 + sourcedata$day/365 Then we plot the data library(viridisLite) cols <- viridis(3) par(mfrow=c(2,1), mar=c(4,4,2,0.2)) plot(dates, lret, type='l', col=cols[1], lwd=1, xlab='Date', ylab='Log return', main='IBM') hist(lret, 50, col=cols[2], density = 20, ylab='Frequency', xlab='Log return', main='IBM') Stan The new way to fit statistical models is the STAN interface. We will attempt to use this interface via RSTAN to implement our models.

Case study: The Challenger data

Introduction We proceed with a logistic regression analysis of the famous Challenger O-ring data, but from a Bayesian perspective. Data and properties The data is available here. alpha <- 0.1 # The chosen significance level for intervals library(openxlsx) alldata <- read.xlsx('challenger.xlsx','Challenger') n <- nrow(alldata) The significance level chosen is 0.1. This means that we only look at results where the p-value is less than 0.1, as other results could easily just be chance variation.

Simulation attempt for Andréhette Verster

Introduction and disclaimer The data simulated here pertains to a specific set of assumptions, and we should not try to extend the results beyond that without seriously considering and accounting for any systematic differences between that situation and any broader situation. The data analysed here is inherently random. If the study were to be repeated then the results will differ. While the computer software used is tried and tested, the analysis involves multiple human elements.

Time Series Prediction via simulated paths

Observed data We begin by reading in data. We will use the old FTSE stock exchange index for this example. We will try to reach stationarity via calculating log returns. data(EuStockMarkets) Index <- EuStockMarkets[,'FTSE'] n <- length(Index) LogReturns <- diff(log(Index)) n1 <- length(LogReturns) par(mfrow=c(1,2),bg=rgb(0,0,0),fg=rgb(1,1,1),col.axis='white',col.main='white',col.lab='white',mar=c(4,4,0.6,0.2)) plot(Index,col='#2A9FD6',lwd=2) plot(LogReturns,col='#2A9FD6') Next we consider any residual correlation. source('autocor.r') par(mfrow=c(1,2),bg=rgb(0,0,0),fg=rgb(1,1,1),col.axis='white',col.main='white',col.lab='white',mar=c(4,4,0.6,0.2)) autocor(LogReturns) ## Autocor Partial Autocor ## [1,] 0.0920293254 0.092029325 ## [2,] -0.0080311473 -0.016641487 ## [3,] 0.