Stock Market Analysis

Walk-through of a stock market time-series analysis project

Justin Giovatto
6 min readNov 12, 2021

Overview

This article will walk through the process of building a time series model in order to best predict future cumulative stocks returns over the next 6-month period. The stocks will be from three different market sectors according to the highest volume traded stocks for each sector. The process for this analysis will follow the Cross Industry Standard Process for Data Mining (CRISP DM). We will first analyze the stock datasets in order to gain an understanding of the data. Next the data will be cleaned and processed in preparation for the time series modeling phase. Two different model types will then be built using the data. The Models will then be analyzed according to forecasted predictions and risk as the main evaluation metrics.

Business Objective/Methods

Hypothetical Situation: A stock market investor is interested in investing in a portfolio consisting of three technology stocks, three healthcare stocks, and one cryptocurrency for the next six-month period. The investor would also like to minimize risk as much as possible for this portfolio. Taking this into account we will analyze the top ten tech and healthcare stocks according to volume traded as well as the top five cryptocurrency stocks by volume traded. The stocks also must have at least five years worth of trading history.

The stocks will then be analyzed according to past future returns dating back to January 2016. The stocks’ volatility over that period as well as price to earnings ratios (P/E ratios) will also be taken into account for the tech and healthcare stocks in order to analyze potential risk. A SARIMA model and Facebook Prophet model will then be run on each of the stocks in order to predict future returns over the next six-month period. Model results will then be compared and analyzed, and portfolio investment recommendations will then be provided based on the analysis.

Data Description

The stock data used for this project is from the Yahoo Finance API. The top 10 stocks with at least five years of trading data, according to volume traded in the technology and healthcare sectors will be analyzed. Also the top five cryptocurrency stocks with at least five years of trading data, according to volume traded will also be analyzed. The start date for the stocks is January 1st, 2016 and the end date is June 1st, 2021. The stocks will then be broken up into individual data frames as well as combined into three separate data frames according to sector. The stocks are formatted into daily data and provides the following features: “High”, “Low”, “Open”, “Close”, “Volume”, and “Adj Close”. An additional column for “Stock” (name), “Returns”, and “Cumulative Returns” will also be added onto the data for analysis.

Exploratory Analysis

We will check the autocorrelation, decomposition, stationarity of the top stock in each industry in order to get a better feel for the data. Will also analyze the past five year returns of the top stocks, as well as check the volatility of the top stocks in each sector, and the P/E ratios of the Tech and Healthcare stocks in order to gather further information in making portfolio recommendations. Stock volatility will be calculated by taking the standard deviation of the annual log returns of the close price of the stock. We can see this represented in the following sector comparison graphs.

Modeling

We will first run a SARIMA model on the data. A SARIMA model was chosen because it takes into account the seasonality component of the stocks as well as the trend and auto regressive components. We will also run a Facebook Prophet model on each of the stocks, which operates in a similar way to the SARIMA model. The models will be compared according to Root Mean Squared Error (RMSE) on the testing data. The model with the lowest overall RMSE will then be chosen as our final evaluation model.

Results/Final Analysis

After analyzing the two models, we find that overall the SARIMA models tend to have a lower RMSE than the Prophet models. For this reason we will focus our recommendations based on the SARIMA model results. Taking portfolio risk into account we will add another evaluation metric to the analysis consisting of the minimum value for the lower 95% confidence intervals of the SARIMA models. This basically mean that because 95% of the models’ predictions fall within the confidence interval, by taking the minimum value at the lower interval, this will essentially be the lowest return that the model predicts within that 95% confidence interval. Our recommendations will be based mainly on this metric as well as the models’ forecasted cumulative returns.

Portfolio Recommendations

Looking at the final comparison graphs, our two main Tech sector recommendations will be Nvidia and Tesla due to high SARIMA forecasts as well as high min. lower confidence interval predictions. Although both have high P/E ratios as seen in the analysis below, they feature the highest main evaluation metric forecasts. The third tech stock recommendation will be TSMC. This stock produces the third highest SARIMA 6 month forecast as well as a low P/E ratio in comparison to the other stocks in the sector.

For the healthcare sector I would recommend BIO, HCA, and UNH as these stocks are the highest healthcare stocks in terms of SARIMA forecast and min. lower confidence intervals as seen in the above graph. These stocks all have low P/E ratios seen in the analysis below.

Finally for the crypto sector recommendation, although DOGE has the highest SARIMA forecast it also has a significantly lower min. lower confidence interval than ETH, as observed in the above graph. Because avoiding risk is a big factor in choosing this portfolio I would instead recommend ETH over DOGE for this reason.

Future Work

Would like to add another model type such as an RNN for further model comparisons.

Would also like to analyze more stocks in each sector in order to find potentially better portfolio recommendations.

Finally would like to look into more stock evaluation metrics in order to improve recommendations further.

--

--