The Misuse of Correlation Part 2 – the results

In this post, I want to talk about an insidious error that creeps in with the usage of correlation in finance.

FT lexicon supports the idea that:

a correlation is said to be positive if movements between the two variables are in the same direction and negative if it moves in the opposite direction.”

This definition is not unusual, commonly seen in finance textbooks.

Occasionally the formula may be presented:

But caveats in using the formula will likely be absent or at the least hidden from view.
By that, I mean the terms and are critically important but this importance is rarely appreciated. [1]

In the previous post, I asked you about the correlation of the changes of the two assets in the chart below:

a. Positive correlation

b. Negative correlation

c. They are uncorrelated

d. Not sure (be honest!)


The most obvious answer is of course b)
One line goes up and the other goes down so this means they have a negative correlation. This is unfortunately strictly incorrect if you paid attention to the instruction to consider the “changes in the two assets”.

A good answer is d)
Given the amount of information I had supplied, it’s a perfectly reasonable one.

Because another answer is a)

The correlation of the changes in the variables is +1, PERFECT POSITIVE correlation
and the lines are going in the OPPOSITE direction!!

(If you doubt this result please look at the data and calculations in the sheet attached (download) and use the CORREL function in excel.)

b) is an intuitive answer but a) is the answer that a financial analyst would calculate. If you imagine of situations where you are being given financial advice, it is clear there could be an immediate conflict!


First insidious confusion – the importance of the mean

If you have never seen this before, you may think I am lying or this is a convoluted trick. But it rests upon one key part of the calculation of correlation that is missing from virtually every definition I see, and is certainly missing from the vast bulk of work done by analysts in the finance industry.

The key is that correlation is calculated by looking at the relationship in deviations from the means (the terms and in the complicated mathematical equation).

In our example, the changes in the two variables in the chart have equal and opposite means, and so trend in different directions. However, the day to day volatility (deviation from the mean of the changes) is identical for both variables, and it is this term that drives the correlation whilst having no impact on the trend.

Here is a scatterplot of the % changes for each variable. Observe all the dots are distributed along the line – a perfect POSITIVE correlation.

This has a clear relationship to the way we think about the change in market prices of any asset:

In financial markets, the daily noise is usually much greater than the daily trend, and so forms the focus of most market commentary.

The key result is that if the noise term correlates for two assets, then they will correlate irrespective of their underlying trend, given the way correlation is calculated.
i.e. they could end up in very different places even if they are positively correlated!

Second insidious confusion – levels vs changes

The second insidious confusion can arise from a reference to correlation of the CHANGES or a correlation in the LEVELS of the two variables.

In financial markets, the method invariably used is to look at the changes in variables. In our example, we get the answer of positive 1 i.e. perfect positive correlation.

If we calculate the correlation using the levels or prices, we get an answer of -0.97
i.e. strong negative correlation

The intuitive result is the opposite of the result most likely to be calcuated by financial analysts.

Why does finance prefer the use the correlation of changes?

It is done for good reason. When you are looking at data with strong trends, as a lot of asset prices do, the correlation of levels can yield very strange results. Let’s take an example.

Let’s look at the US equity market (S&P 500 price – white line) and its PE ratio (orange line) over the last 30 years.

If we first look at the correlation of levels, we get a correlation of virtually zero.
This suggests a rather unintuitive result that there is no meaningful correlation between PE ratio and equity prices!

If we instead look at the correlation of changes, we get that there is a meaningful positive correlation of 0.78 which makes a lot more sense.

Conclusion

If these differences in the correlation results is were just some statistical fluke, from a couple of silly examples, then it would not matter.
But it is not an unusual result and it occurs when looking at the biggest and most commonly traded financial markets. It is therefore critical to avoid confusions such as these when thinking about what type of correlation to use or, more often, what someone else has used in the analysis you are reading.


[1] I very much enjoyed this paper by Francois-Serge Lhabitant which explains this issue very well. http://www.edhec-risk.com/edhec_publications/all_publications/RISKReview.2011-09-07.3757/attachments/EDHEC_Working_Paper_Correlation_vs_Trends_F.pdf

The misuse of Correlation Part 1 – Quick Refresher and Quiz

First, let’s refresh our memories of what correlation means.
This may seem very basic right now, but I would like to make sure the meaning is clear before we move on to its use.

I have included a question at the end, once you have read and thought about the definition:

  • A definition from the FT Lexicon:
    “a correlation is said to be positive if movements between the two variables are in the same direction and negative if it moves in the opposite direction.”
  • You can read examples in a number of sources such as

https://www.mathsisfun.com/data/correlation.html

and
http://www.bbc.co.uk/schools/gcsebitesize/maths/statistics/scatterdiagramsrev2.shtml

Here is a range of correlations, shown via a scatterplot:

Some important concepts

  • A positive correlation is “when the values increase together”
    An example would be temperature and ice cream sales as “warmer weather and higher sales go together”.
  • A negative correlation is “when one value increases and the other decreases
    Note this is sometimes called an “inverse correlation”.
    An example would be weight of a car and its fuel efficiency as “cars that are heavier tend to get less miles per gallon.”
  • No correlation is when “there is no connection”. An example would be IQ and house number.”
  • For those of you with a more formal approach the mathematical formula for correlation is:
  • In practice, most of us find it much easier to use the function CORREL() in Excel!

Question time

Here is an example with two asset prices A and B. When we represent the data in a chart it can often be done in one of two ways.

This chart has two lines, showing how both the prices of asset A and B moved over time.

The other way to chart this is to put the prices of A and B on the two axes instead. It looks like this.

To make sure you have understood the basic concept of correlation, I would appreciate it if you could vote on an answer to the following question. (all anonymous of course!)

The misuse of Significance

Definition

What does the word “significant” mean?

Dictionaries most often suggest a range of closely related definitions.
In a more everyday sense:

  1. Importance e.g. this new discovery is a significant development
  2. Meaningful e.g. the significance of the message was not lost on John

In mathematics, you get the example of:

  1. Significant figures – e.g. 1.524658 is 1.5 to 2 sig fig

This use of the word is mathematical jargon with a precise meaning, but it also tallies with our general use of the word. We only want to look at the digits which are important and mean something.

In statistics:

  1. “significant” means probably true (not due to chance)

Some issues arise from this

  1. Something statistically significant may not be important

A result may be true and therefore significant when backed up by statistics, it doesn’t however mean it is important in the more standard English usage sense. I think this statistical interpretation can easily come into conflict with the everyday meaning and is fraught with danger.

When you jump out of a plane without a parachute it is likely that holding up an umbrella has a “significant” effect on your speed. I doubt you would think that this effect was important when you hit the ground.

I’m sure you can think of many things that are probably true but not important!

  1. Statistical relationships are not transitive

An example from medicine, drugs for the most part are tested against a placebo rather than against each other. Drug A may perform better in tests against a placebo than Drug B. (ie has more significant results) However, that does not mean you know that Drug A will perform better in tests against Drug B. Unfortunately, current medical practice makes this implicit assumption when approving drugs.

This is a common misconception that you can use simple logic to infer other relationships. Unfortunately, this is not true. There is a similarly confused relationship with correlation. Statistical relationships like this are not transitive. https://iase-web.org/documents/papers/isi56/CPM80_CastroSotos.pdf

  1. The 5% threshold for statistical significance is arbitrary

    When you say that one result is significant and another is not because one has a 4.9% chance of being random and the other has 5.1%. This is the correct usage of the technical term but people ascribe more meaning to the word than that. One of the ideas is held to be “true” and the other is discarded.
  1. A significant result may have happened by random chance

Saying that a certain outcome would only occur 1 time in 20 if it were random sounds good. But what if you ran 20 sets of analysis? By random chance you should expect one of them to pass the “significance” test.


Was the test constructed properly?

This relates to a supremely important point that often statistics are quoted in situations they are not supposed to be used or have been not properly applied

  1. How many relationships did you test?
    In finance, all analysts look at lots of different data sets, over different time periods in search of something “significant”.
  1. Did you look at any of the data before choosing what test to run?
    I cannot imagine how someone could not fall into this trap. We only run tests on things we think might work. But the reason we think they might work is that we have done some rough statistical work already e.g. looked at a picture or perhaps just subconsciously noted some signs of a relationship. This means that the data has been mined and your choice of test is not independent.
  2. How many people are trying to find these relationships?
    Let’s say that you are extremely careful in how you do your statistics. Let’s imagine that everyone else in the firm you work at is similarly careful. Then when you produce a “significant” result you may reasonably think it is meaningful. After all you only ran one test and it worked! You then show your boss. Should she be impressed? Maybe not.
  1. How many failed tests are not shown?
    In my experience, analysts do not show me large quantities of research they have done which they think is completely meaningless.  Highly trained with great degrees, they want to show me “good” work with “good” results.  This means that the 19 analysts that did not find anything today do not show me anything. From the perspective of the individual the result appears to be strongly non-random. From my perspective, it looks entirely consistent with being random.


Is it meaningless?

No. it just means exactly what the equation says it means. You should remain aware of the context if you want to use it. My interaction with professionals of all types is that they are enormously well trained in the complexity of statistical methods and woefully under trained in the limitations of them. In fact, their high proficiency with manipulating the data and the methods makes them even more prone to methodological error of this type as they have essentially been trained in the art of data-mining.

Conclusion

I am yet to read a research piece from a bank which presents data demonstrating that their hypothesis is has no statistical significance. We should remember that this is significant.

How to reduce your Risk Part III

Trick question (click here for the question, and here for the answers)
There is no right answer because risk cannot be minimised.
It can only be transformed from one type into another.


What did people choose?

Option A was the most common answer. For those who trade in financial markets, this may be surprising.

If I reframed the question and asked:

  • Please calculate the DV01 of Options A and B
  • Please calculate the VAR of Options A and B
  • Please tell me which of A or B has greater risk

You would quickly work out that B has zero DV01 and zero VAR. Hence by the definition of risk used on trading floors, A has higher risk. Unsurprisingly asking this question to a room of traders at investment banks, I get the overwhelming answer B because that is the context in which they think about “risk”.

If I ask the question to people who work in property or private equity, then I am more likely to get the answer A as certainty of cashflow is critical, especially when thinking about assets and liabilities. In the accrual accounting world of regular banking, they think about Earnings at Risk (EAR) and Option A is the way to reduce the risk.

The answer given likely relates to your personal circumstances and the exact framing of the question. If I had the time running a series of experiments with slightly different wording, rates or quantities I think would give interesting results.

But for now, the practical lesson is important. People do not instinctively understand risk at all well. We are presented with questionnaires from investment advisors which ask us for our risk preferences with no definition of risk. From the results of typically recommended portfolios, it would suggest that bonds are low risk and equities high risk.

My approach

I think that the best way to think of this question is in terms of a balance sheet. Whether choice A or B “reduces” your risk depends on the extent to which it matches the tenor of your liabilities. If your liability is short term then Option B is the sensible answer. For investment banks, they have no corresponding long-term liability apart from capital. They typically hold wafer-thin amounts of capital against market-to-market assets so naturally recognise A as a risk. For someone who is keenly aware of what they see as fixed longer-term liabilities such as paying school fees or retirement expenses then the choice of a long-term asset i.e. Option A, is far more natural.

Risk matters

Whenever risk gets mentioned, I very rarely observe a discussion of this nature. Often only one side of the balance sheet is being examined and the vastly important implicit assumptions from the liability side are not considered. I am an advocate of multiple forms of risk measurement, including VAR, but only if it is used in the correct context. Many of the worst financial disasters have occurred by taking a risk and accounting concept that was appropriate in one context and transplanting it to another. AIG and Enron are the biggest ones that spring to mind.