In this post, we try to examine the usefulness of Facebook Prophet in forcasting stock market indices.
library(xts)
library(quantmod)
library(prophet)
library(ggplot2)
library(forecast)
library(lubridate)
setwd("F:/My Files/R Studio/DOW S&P500")
dow_data=read.csv('dow_data_v3.csv',header = T,sep = ',')
head(dow_data)
tail(dow_data)
Standardize the dates and prepare data for conversion to time series xts format, using from year 2008 onward
dow_data$Date=dmy(dow_data$X) #Standardize the dates using lubridate package
df=data.frame(dow_data$DJI) #Prepare time series format
rownames(df)=dow_data$Date
colnames(df)=c('DJI')
df_xts=as.xts(df) #Convert to xts format
df_xts=df_xts['2008/'] #extract data frm 2008 onward
We observe an upward multiplicative (exponential) trend. As prophet library makes time series forecast based on additive regression trend model, we need to do a log transformation to linearize the data using log transformation.
chartSeries(df_xts,
theme=chartTheme('white'))
df=data.frame(df_xts)
df$ds=rownames(df)
ds=df$ds
y=df$DJI
df=data.frame(ds,y)
df$ds=as.Date(df$ds)
y=log(df$y)
df=data.frame(ds,y)
df$ds=as.Date(df$ds)
qplot(ds,y,data=df,
main='Dow Jones Industrial Average in log scale')
Let's predict and validate for the last 252 trading days
training_length=length(df$y)-252
test_start=training_length+1
df_training=df[1:training_length,]
df_test=df[test_start:length(df$y),]
m=prophet()
m=fit.prophet(m,df_training)
future=make_future_dataframe(m,periods=252)
forecast=predict(m,future)
We observe a long term upward trend, as well as seasonality, where the index peaks in the months of May and August and bottoms out in the months of March and November. There is no weekly pattern being observed in the above plots.
prophet_plot_components(m,forecast)
We observe an overall linear upward trend, with R-squared at 0.9825. Up to 98.25% of the variation in predicted values can be explained by the variation in actual values in the prediction model. However, this measures the goodness-of-fit and does not provide information on the accuracy of the model.
pred=forecast$yhat
actuals=df$y
plot(actuals,pred)
abline(lm(pred~actuals),col='red',lwd=2)
summary(lm(pred~actuals))
#Create dataframe for residuals
residuals_m=df$y-forecast$yhat
df_residuals=data.frame(df$ds,residuals_m)
colnames(df_residuals)=c('ds','residuals')
#Forecast plot
plot(m,forecast)
#Residual plot
qplot(ds,residuals,data=df_residuals,
main='Plot of residuals in log scale')+
geom_vline(xintercept = as.numeric(ymd(df_test[1,1])),
color = "blue")
We notice that the accuracy metrics of the model deteriorates when we compare the validation set against the training set, suggestive of overfitting. This could be due to the highly volatile nature of the stock index, where the variance of the residuals are not constant, rendering the model which uses the least squares method less accurate. Facebook Prophet is not that useful for predicting stock index in times of high volatility, also, the accuracy for forecasts deteriorates as we project further into the future (medium to long-term).
#Performance metrics for training set
predicted_train=forecast$yhat[1:(length(forecast$yhat)-252)]
actuals_train=df_training$y
round(accuracy(predicted_train,actuals_train),2)
#Performance metrics for test/validation set
predicted_v=forecast$yhat[(length(forecast$yhat)-252+1):length(forecast$yhat)]
actuals_v=df_test$y
round(accuracy(predicted_v,actuals_v),2)
round(accuracy(exp(predicted_v),exp(actuals_v),2))
df$dowjones=exp(df$y)
df$predicted=exp(forecast$yhat)
df_residuals$Residuals=df$dowjones-df$predicted
ggplot(df, aes(ds)) +
geom_line(aes(y = dowjones, color='dow')) +
geom_line(aes(y = predicted, color='predicted'))+
geom_vline(xintercept = as.numeric(ymd(df_test[1,1])),
color = "blue")
qplot(ds,Residuals,data=df_residuals,
main='Plot of residuals')+
geom_vline(xintercept = as.numeric(ymd(df_test[1,1])),
color = "blue")