Observed data
We begin by reading in data. We will use the old FTSE stock exchange index for this example. We will try to reach stationarity via calculating log returns.
data(EuStockMarkets)
Index <- EuStockMarkets[,'FTSE']
n <- length(Index)
LogReturns <- diff(log(Index))
n1 <- length(LogReturns)
par(mfrow=c(1,2),bg=rgb(0,0,0),fg=rgb(1,1,1),col.axis='white',col.main='white',col.lab='white',mar=c(4,4,0.6,0.2))
plot(Index,col='#2A9FD6',lwd=2)
plot(LogReturns,col='#2A9FD6')
Next we consider any residual correlation.
source('autocor.r')
par(mfrow=c(1,2),bg=rgb(0,0,0),fg=rgb(1,1,1),col.axis='white',col.main='white',col.lab='white',mar=c(4,4,0.6,0.2))
autocor(LogReturns)
## Autocor Partial Autocor
## [1,] 0.0920293254 0.092029325
## [2,] -0.0080311473 -0.016641487
## [3,] 0.
Why do we care?
Understanding and predicting time series can help us make better decisions. Better decisions can lead to more profit, or less losses, or less wasting of natural resources, and other benefits.
How can we predict the unpredictable?
By breaking down the problem into parts we know how to deal with.
We know how to deal with independent and identically distributed values:
Draw a histogram and get summary statistics.
What are we doing and why?
We are just going to draw a graph in R with multiple lines on one graph. This is interesting because the way base R draws graphs is a bit strange to people who are used to other packages. Some explanation is useful.
Specific example
In this example we are going to use the lengths of the 25 most popular movies of each year from 1931 to 2013, as explained here bu Randy Olson.
What are we doing and why?
We are going to look forward. We want to predict the next thing. Companies want to know what’s going to happen with the next client, the next month’s profits, the next deal, etc.
we are thus very much in the data science paradigm. However, a lot of data scientists are narrow in their thinking: they just want to predict the values as close as they can.
Step 1: How long must the time series be?
Begin by deciding on a reasonable length for your time series, based on the problem at hand.
Then add a burn-in period: that is an extra piece that we add at the start of the time series while simulating but throw away later. When you simulate a time series the first part you simulate will not follow your chosen model and must be discarded.