Learn how to quantify correlations between stocks with Python and build portfolios using this information.
This article will cover:
- A brief background on modern portfolio theory and correlation;
- Connecting to IEX Cloud to obtain historical stock price data;
- Creating stock price charts;
- Determining correlations between stock pairs; and
- Constructing portfolios using stock correlations.
To follow along, some familiarity with Python is required. We will be using Python 3, with these Python packages: numpy and pandas (for data manipulation) plotly (for visualization), and scipy for data analysis.
Install each package (in your virtual environment) with a simple pip install [PACKAGE_NAME] and you should be ready to go.
The entire code is available here for you to download, or edit as you please.
We often find unpredictability exciting. Chance can lead us to great things: an unexpected job opportunity, adventures, or a lifelong passion. In investments, however, such unpredictability is referred to as risk. And as the name suggests, this is generally seen as undesirable.
Modern Portfolio Theory, or MPT, is one such effort to reduce risk. MPT argues that an investor can construct a portfolio that will maximize returns for a given level of risk by collecting assets based on their correlations.
Now let’s step back for a moment and talk about correlation.
Correlation describes how much two properties or variables are related. When they tend to rise or fall together, like real estate prices and the cost of living, they are said to be positively correlated. If they tend to move in opposite directions they are said to be negatively correlated, and those with unrelated movements are said to be uncorrelated.
Examples of data and x-y correlation Wikipedia
You can see how this might relate to investment portfolios. By collecting negatively correlated assets, fluctuation in one asset may be cancelled out or reduced by another asset’s movement in the opposite direction.
Take a look at this example, where the average fluctuations are much smaller than either stock’s:
A portfolio with negatively correlated stocks can even out individual fluctuations
Where negatively correlated assets are undesirable or not available, collecting uncorrelated assets may also protect a portfolio from fluctuations by reducing the probability of asset prices moving together.
Some investors seek to diversify their portfolios for these reasons. They theorize that a diversified portfolio is going to comprise negatively correlated or uncorrelated items by collecting assets in different classes (e.g. bonds, real estate, or stocks) or in different sectors (e.g. healthcare vs. materials). Now, let’s construct some hypothetical demonstrations of this theory with actual market data from IEX Cloud.
Connecting to IEX Cloud to obtain historical price data
Getting data from IEX Cloud is straightforward. (If you are new to IEX Cloud, sign up for a free account here.)
Just get your API token that will tell IEX Cloud who you are, and create an API request using the syntax for the endpoint of your choosing using the API documentation. In this example, we’re using the Historical Prices endpoint.
Here’s how I put it together as a function:
To get the data for, say, Microsoft for the last three months, we could run:
This reads my token from a text file (to keep it safe and prevent accidental uploads) and connects with IEX Cloud’s Historical Price endpoint to obtain price data from the last three months. The returned data simply contains a list of dates and end-of-day prices for those dates.
We process the returned data; first to a list and then to a pandas DataFrame for easy manipulation:
Visualizing the stock price data
Before going further, we visualize the resulting data with below as a sanity check:
We can validate this chart against another trusted source, and now the same process can be repeated for multiple stocks.
Let’s choose a few examples; starting with some in a similar sector such as Nvidia and Apple, and others in different sectors such as Johnson & Johnson (Healthcare), Kraft (Consumer), and Allstate (Financial).
The data is converted into a DataFrame and visualized using the same process outlined above:
You might notice that the variations in starting prices makes comparisons difficult. Instead let’s normalize the data to chart the relative changes to prices:
These graphs are not easy to decipher. It could be said that Microsoft stock moves generally in similar directions to Nvidia, while Allstate does not. But visual interpretation is very difficult even in sections, and the overall level of correlation perhaps even more so.
Furthermore, this type of visual analysis obviously is not scalable. There are over 124,000 possible pairs of stocks in the S&P 500 and it’s simply impossible for visual comparisons to cope. So how do we proceed?
Determining correlations between stock pairs
Luckily, we can turn to one of many statistical methods that exist to determine correlations between curves. Here we are concerned with the general direction of movement, so we will use a Pearson correlation.
SciPy package’s scipy.stats.pearsonr function can be used here. Doing so, we see that Microsoft stock has had the strongest correlation with Johnson & Johnson (0.612) and Nvidia (0.338), and the smallest correlation with Allstate (-0.044).
The results (thankfully) agree with our intuition. More importantly, however, this method also allows us to quantify correlations which in turn allow us to rank stock pairs based on similarities, no matter how large the dataset is.
Let’s think a little bigger, and compare a random set of 250 stocks from the S&P 500 based on their last three months’ prices. With a dataset this size, I can construct a matrix of correlation values between all 250 stocks (over 31,000 pairs).
(You should be able to replicate this analysis with IEX Cloud's Apperate plans and data bundles. If you would like to go further - with data from a longer period, or more stocks, you may be interested in IEX Cloud’s data bundles.)
And using this data, we can find the stock pair with the most negative correlation:
The pair with the smallest correlation:
And for the best-correlated stock pair:
Note: Depending on which random 250 stocks are in your set, you might see the algorithm find almost identical stock pairs like GOOG and GOOGL. Don’t worry, nothing has gone wrong. Some companies are listed multiple times with different “classes” of shares, each with slightly different properties (e.g. voting rights) but almost identical prices. I have ignored these in my dataset.)
As you can see, sorting and finding the stock pairs is a cinch with this approach regardless of the size of the dataset.
As a last step, let’s construct a few hypothetical portfolios and compare their past performances.
The first set of portfolios will each be comprised of stocks from one sector only, and the second set of portfolios will be comprised of ten randomly selected stocks from the dataset. Then, the effects of diversification can be tested by adding to each portfolio a group of most negatively correlated stock for each constituent stock. As a result, the “diversified” portfolios will include twice as many stocks.
For simplicity, these portfolios will be weighted equally and tracked by relative prices from the start of the three-month period. The risk for each portfolio is then quantified as variance of fluctuations in prices, as shown here:
(See 4_compare_portfolios.py for details on how the charts in this section are generated.)
The resulting variances are plotted below:
The results reveal a few interesting insights. First of all, it shows single-sector portfolios to vary wildly in levels of risk, indicating the kind of unpredictability that many undiversified portfolios experience. Secondly, the diversified portfolios (shown in green) immediately and markedly reduce the level of risk in most cases.
Now let’s take the overall returns into account and look at the relative risk (risk taken for the return gained), quantified as the return divided by the variance. How did the hypothetical diversified portfolios perform?
The results are in line with our expectations. The diversified portfolios in general show less relative risk (return / risk), indicating that the returns have been steadier than their undiversified counterparts.
Regardless of whether it is applied to a single-sector portfolio or a randomly selected portfolio, introducing additional, negatively correlated stocks appears to generally reduce its relative risk. While this is a small sample of results, they certainly are supportive of MPT as a theory.
We have seen in this article the core concepts of MPT as it relates to correlation, and that portfolios built according to MPT can lower relative risk, resulting in better returns for a given amount of risk. More concretely, we have seen how to obtain historical price data from IEX Cloud and how that information can be used to quantify correlation between stocks.
The next time you are deciding how to structure a portfolio, or evaluating different portfolios, you should be able to apply these techniques for assistance, such as to quantify correlations between assets or to reduce relative risks of portfolios. In doing so, the data from IEX Cloud should provide the fuel to help you to carry these out.
Of course, nobody can predict the fate of any given asset price, and there are other factors to consider in making such decisions. However, we think that these skills and knowledge can certainly be a useful tool in your arsenal for analyzing stocks and the market at large.
The entire codebase used to produce the above analysis is available here to help you get started. We hope you found this tutorial useful, and look forward to seeing your feedback and comments in GitHub!
IEX Cloud Services LLC makes no promises or guarantees herein regarding results from particular products and services, and neither the information, nor any opinion expressed here, constitutes a solicitation or offer to buy or sell any securities or provide any investment advice or service.
Working with Data in Motion: A Guide to Data Streaming
Learn about types, challenges, best practices, and trends for the continuous and real-time flow of data from various sources.
Building on the Blockchain: Best Tools to Get Started
In this article, you'll learn about some of the different tools you can use when working with blockchain. You'll also learn about some of the various things you can do with it, such as implementing it on your own sites and apps, developing original tools with it, and participating in the crypto economy.
Have question about our platform and how to get started?