Using Statistics – Page 3

Are Golf Handicaps Fair?

I played in a two-day golf tournament recently and had a conversation about whether golf handicaps were fair, even for completely honest golfers. As I thought more about it I realised that they are not, but not always in the ways I had imagined. I was aware that both high and low handicappers often thought the system was biased against them. I had not realised that depending on the circumstances they were both right.

What is the handicap for?

According to the USGA “the purpose of the USGA Handicap System is to make the game of golf more enjoyable by enabling players of differing abilities to compete on an equitable basis.”

http://www.usga.org/Handicapping/handicap-manual.html#!rule-14367

But it does not simply create a handicap by taking an average of your scores. It “disregards high scores that bear little relation to the player’s potential ability”. The method mixes up the ideas of “equitable” with “potential” which has profound implications for which golfer should expect to win.

Note I will use the USGA system in this post as it is the system I understand the best. It is worth noting how many different systems are currently used across the world. These other systems will have some impact on the “fairness” but the key points are true for all of them.

How is your handicap calculated? (slightly simplified!)

Take your score on a full round of golf and adjust it slightly by eliminating any very bad holes. For example, an 18 handicapper records any hole which is more than a 7 as a 7.
Enter your last 20 adjusted scores and compare them to par e.g. if you have a score of 82 on a par 72 course this is 10 over.
Take an average of your best 10 scores compared to par. Ignore the worst 10.
That is your handicap

The same mathematical process has been performed on both players’ data to adjust their scores. Does this mean that the result is fair?

The problem is that the way that golfers of different abilities vary is not just in their average score. They also have different volatilities. A low handicapper has a far more consistent game, which translates from consistency on each shot to each hole score to each total score per round. This difference in volatility of the players has a big impact on who you should expect to win.

Let’s take a simplified case to illustrate the issue. Let the scratch (zero handicap) golfer have zero volatility i.e. they shoot level par gross and net every time. Let the 18-handicapper have a more realistic (and obviously higher) volatility of score.

Head to Head – low handicapper wins

If we put these two golfers head to head then it is pretty clear that the low handicapper is very likely to win. The low handicappers “potential” is the same as his average performance. For the high handicapper he has the “potential” to win the match but since his average score is far higher he is pretty unlikely to do so.

High handicapper A wins only 4 times out of 20 while low handicapper B wins 15 times out of 20.

This is often how a high handicapper perceives golf handicaps. They know they are unfair. They get regularly beaten by low handicappers and have (hopefully) learned not to bet with them on the golf course. It is worth noting though that the lower handicapper will still often moan about how unfair it is to give strokes on some particular hole such as a par 3.

What if we make a different handicap system and do not only include the golfer’s “potential” but all the data on how they actually perform. The 18-handicapper actually averages 22 over par and so if we use that as the handicap instead we get this table of results in which each player wins 10 times.

So is this system fairer? Well not necessarily.

Tournament – high handicapper wins

We just saw that the low handicapper has an advantage in head to head competition. But what if there are lots of players in a tournament. Let’s have 40 golfers, 20 scratch golfers who shoot level and 20 18 handicappers with the range of scores.

If we simply rank all the scores then the top 4 in the tournament will be high handicappers having an unusually great day. But the bottom 15 golfers are also high handicappers having a more typical or even poor day.

The low handicapper has virtually no chance of winning a tournament as the top spots will be taken by a high volatility golfer having a good day. This leads to justifiable frustration from the low handicappers and sometimes the incorrect assumption that the high handicappers must be sandbaggers.

Summary

In head-to-head competition, the low handicapper has a large advantage
In a tournament, a high handicapper is more likely to win

How to combat the problem?

I see problem 2 combatted very frequently. It is common for only a fraction of the handicap to be actually used, such as 2/3 or ¾. I do not have the data to know whether this makes it equally likely for a low and high handicapper to win. But I will be extremely confident that there will be a host of high handicappers with very poor net scores at the end of that event. So even making it “fair” in terms of the overall winner will not result in all participants feeling that way.

I have never seen problem 1 addressed. In practice if anything I tend to see the lower handicapper try to argue for a reduction in strokes given!

A theoretical solution

A theoretical solution would be to recognise that a single number cannot cover both of

Difference in average score
Difference in volatility of score

A revised system could involve a measure of both.

I do not think this is a sensible idea. It would be complex and given the poor quality of the underlying data (self-reported ad hoc scoring) it would be hard to rely on it.

My solution

Head-to-head handicap gold tends to be social and there are more fun ways to decide a handicap. For a regular partner, the winner has to give an additional stroke the next time you play. I doubt you will convince them to give you more strokes any other way.

Handicap golf tournaments perhaps should not be taken too seriously. The system does a decent job and will make the contest close enough and the result uncertain enough to be fun. With the common correction in tournaments everyone has a chance (unless there are real sandbaggers of course) but high handicappers have to accept they have a good chance of a terrible score.

But maybe that is just because I have been playing this tournament for 15 years without winning anything….

Next steps for golf

The global handicap system is being revised.

http://www.randa.org/News/2017/04/World-Handicap-System-to-be-developed-for-golf

I will be interested to see how they deal with the issues.

The misuse of Correlation part 5 – Hedging and Portfolio Management

For macro trading, thinking about how one asset moves versus another is important.
To this end, correlation is most commonly calculated using daily changes. The results of a reasonable relationship might look something like this:

Hedging

This concept is particularly useful if you are a market maker, or anyone in need of a reasonable short-term hedge for your risk. However, if you are holding for a longer period, the potential difference in the trends (the means that we touched on in part 1) are likely to dominate your returns, irrespective of the correlations.

Portfolio Risk

For constructing portfolios, measures like VaR (Value at Risk) are often used to explain and think about risks. The inputs to these measures usually take daily returns.

This can lead to problems with serious consequences if you are using this analysis to understand the risk of a portfolio you are planning to hold for a longer period.

In this chart, I take a selection of major markets that a typical macro portfolio may contain: major currency pairs, interest rates and the S&P.
I plot the correlations of the pairs calculated two ways:

Return Correlation up the X axis
Taking daily changes as we did previously
Price Correlation up the Y axis
Looks at the correlation of the levels of each price series
(i.e. if both assets went up over the year, implies high positive correlation)

The results are important for the construction of a longer-term macro portfolio. Take 2yr US rates and USDJPY as an example:

Daily return correlation is decent around 50%
If you are long USDJPY and you add an opposite US Rates position,
Overall portfolio reported risk would therefore decrease

If you hold the portfolio for one day it is reasonable to expect that your hedge will act to reduce the volatility of your returns.

Price correlation is actually negative

As above your reported risk is determined by the daily return correlation and decreases.

But if you took the supposedly offsetting position above, at the end of year If you lost money on USDJPY, you would have lost on your “hedge” too
The “hedge” would have reduced your reported risk but increased your return volatility on a one-year horizon.

So what does a “good hedge” look like?

Two plausible but very different definitions seem clear:

High correlation of daily changes
Consistent with VaR and best hedge for VAR, short term traders, market makers, options traders (delta hedging). A lot of option hedging is done via proxies and this is the type of statistic they would care about.

Long term hedge
Much more important for longer term position such as a macro hedge fund, a pension fund or your personal portfolio. There a hedge would mean that if you hold both assets for a year they would have similar (offsetting) P+Ls

The above analysis shows how different these two time horizons can be. The big risk we face as portfolio managers, is that we do too much of the analysis based on short term price changes, which links conveniently to VAR style risk reporting. This gives a completely misleading guide on the long term P+L risk we are actually taking.

Conclusion

In these pieces, we have seen that Correlation is probably not what you thought it was.
Correlations are used in risk reporting (as we have mentioned here) but also in Portfolio Theory, CAPM and how an investor should think of designing a portfolio.

This topic has important ramifications for many areas of modern finance and I will return to it later.

The Misuse of Correlation Part 4

Continuing from the previous post, I’m looking at issues and common mistakes arising from the use of the word correlation.

Uncorrelated does not mean unrelated
Correlation does not imply causation
Correlation is not transitive
Data issues

We have covered number 1 and 2 already.

Correlation is not transitive.

In my post on significance, we covered that only some relationships are transitive.
For example, weight is a simple example of a transitive relationship

If Adam weighs more than Bert
And if Bert weighs more than Charlie
then Adam weighs more than Charlie.

But, just as for significance, this is not true for correlation.

Modern medicine gives us many clear examples

High cholesterol correlates with higher risk heart disease
Certain drugs (e.g. statins) correlate to lower cholesterol
Therefore

Certain drugs (e.g. statins) correlate with lower risk of heart disease

Correlation is not transitive so this is a common logic error.
We have to study the direct relationship of the drug and heart disease to see. But by the time the statins advice was given, trials were still not conclusive. Respecting this problem makes proper drug testing expensive and difficult, but ignoring it, makes writing an article in the Daily Mail really easy.

I saw a similar result in the news recently.

Taking aspirin correlates to lower risk of heart attack
Heart attack correlates to early death
Therefore

Taking aspirin correlates to lower risk of early death
However, having trialled this with real patients, the results found
Taking aspirin also correlated to fatal internal bleeds (especially in the elderly)

Doctors are now much less confident in their original advice as the number of deaths, due to taking are thought to be material, so other factors must be considered.

Not only is this relationship non-transitive, but it’s clear how complicated the real world and overly simple results from correlation may be very misleading.

Similar examples we see in financial markets.
Let’s build a basic model for the oil price using two drivers, inventories, and the US dollar. If I run a regression of the over 10 years, I get a correlation coefficient:

–0.78 for the oil price and the level of inventories
–0.44 for the oil price and the US dollar.

What do I get if I do a regression of oil inventories and the US dollar?

0.04 i.e. virtually no correlation at all.

Data issues

A. The problem of selective attention

“How not to be wrong” by Jordan Ellenberg mentions this good example of Berkson’s fallacy.

“Why do literary snobs believe that popular books are badly written?”

Let’s imagine a world in which half the books are popular and half the books are unpopular.
Only 20% of the books are good with 80% being bad.
Let’s make no relationship (correlation) between those two variables.

We would get the grid below:

However, who pays attention to books which are both bad and unpopular?
No one

From a literary snob’s point of view, you can redraw the grid with only the books they are conscious of existing (i.e. the unpopular bad books are in a blind spot)

The grid they perceive looks like this:

In this case, they see the important statistics as

Half of good books are unpopular (10/20)
80% of popular books are bad (40/50).
A bad book has a 100% chance (!!) of being popular (40/40)

Conclusion “people have terrible taste and to make some money I should write a bad book!”

Given the perceived data set, this conclusion would be solid. But looking at the entire data set, it’s clear that they are making a mistake.

We are in danger of doing this all the time in economics and finance. But finding a good example is hard as the whole point of a blind spot is that we tend to be unaware of it.
What is clear is that the choice of data set is critically important, and likely a far more important choice than the sophistication of the statistical tools you later apply.

B. Choosing a time window

A related and common problem in financial markets analysis is the biased selection of data.
As I wrote about more fully in Significance (https://appliedmacro.com/2017/06/12/the-misuse-of-significance/), analysts often want to produce statistics with compelling results.

For correlation type analysis, the most common trick is the selection of the time window. If we take the short and long-term interest rate example from Part 1 “Uncorrelated does not mean unrelated.” We observed that the correlation is very close to zero for the last 5 years. However, if we extend the time window to the last 15 years, the correlation increases dramatically to 0.84 which is a decent relationship.

Conclusion

In these posts, I have discussed a number of ways in which correlation analysis is misunderstood and misused. Analysts often know these issues, but they still manage to fall into the traps. I have certainly been guilty of making all the mistakes above many times! I have tried hard to train my analysts to watch out for this sort of error in their work and encourage them to look for it in the work of others – it can be hard to spot once you’ve already worked hard on something.

Of course, these are not the only errors made with correlation in finance. The more serious mistakes follow from a more profound misunderstanding of correlation which took me many years and a lot of painful experiences to gain an appreciation of. I will turn to those in the next post.

The Misuse of Correlation Part 3

“It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.” Mark Twain

Definition

Dictionaries have a range of broadly related definitions, mostly suggesting the mathematical meaning and everyday sense are identical. As we saw with the word “significance” https://appliedmacro.com/2017/06/12/the-misuse-of-significance/, when there is no clear distinction in statistical and everyday usage, it can lead directly to confusion and often dangerous errors.

A few meanings:

Historical origin of the word is likely from medieval Latin:
Modern common usage, it is “a mutual relationship or connection between two things”
In statistics, it broadly means a “quantity measuring the interdependence of variable quantities”.
Here a difficulty arises, as although statistics has this broad meaning for it, often (especially in economics or finance) when people say correlation, they are talking about the Pearson Correlation Coefficient, also referred to as Pearson’s r.
Person’s r is the formula I gave you in the previous posts (https://appliedmacro.com/2017/06/28/the-misuse-of-correlation-part-1-quick-refresher-and-quiz/) and is specifically a measure of linear correlation between two variables.
It has a value between +1 and −1, where 1 is perfect positive linear correlation, 0 is no linear correlation, and −1 is perfect negative linear correlation.

Issues

I would like to deal with well-known issues that still lead to common mistakes.

Uncorrelated does not mean unrelated
Correlation does not imply causation
Correlation is not transitive
Data issues

I’ll deal with the first two in the remainder of this post, and the second two in the following, to keep the posts to a reasonable length.

Uncorrelated does not mean unrelated

Saying things are uncorrelated, people generally mean there is no relationship between them in everyday usage. Let’s look at a couple of examples where this common logic error leads to dangerously incorrect conclusions:

A). Relationships can be non-linear

When a cannonball is fired, its path will form a parabola (OK strictly only in a vacuum!). If we look for a relationship between height and time, draw a chart then the relationship is very clear.

If we ran a Pearson r correlation analysis instead of drawing the picture, we would find there is ZERO correlation between height and time. Other statistical correlation measures, such Spearman’s rho which looks for monocity may pick up non-linear monotonic relationships, but also will not help you here for a parabolic relationship.

If you are using a statistical analysis early on in your investigations, the absence of a measurable correlation, such as a linear one, can lead you to assume there is not any relationship and thus discard and ignore a very valid but non-linear relationship. This is a serious problem.

As with “significance”, using the same word for a mathematical term that we use in everyday language can lead to serious mistakes. Once explained in this way, it may seem obvious but starting an analysis by filtering early, only looking for relationships with high correlations, remains shockingly common even with people that are aware of the problem.

B). There can be a logical relationship which is important

Take an example from financial markets, there is an intuitive connection between the short end and the long end of the bond market. If we look at US government bond yields 2 year (x-axis) versus 10 year (y-axis) over the last 5 years (chart below). The correlation between their levels is virtually zero and, if we were to only look at correlation, we may falsely conclude there is no relation between them.

A more dramatic example with very dangerous consequences was the lack of a correlation between US house prices and the price of AAA tranches of mortgage backed securities before the crisis.

Before the crisis Correlation coefficient virtually zero

During the crisis Correlation coefficient = 0.96 (virtually 1!)

This was a very bad way to think about the relationship but a huge amount of money was invested on this poor assumption. It also relates to a serious logic flaw – just because something hasn’t happened in the past does not mean it can’t happen in the future.

Correlation does not imply causation

Everyone is taught this early in the study of statistics. Often with an example, as below, of a spuriously high correlation, where the intuitive relationship suggests something rather different.

https://www.mathsisfun.com/data/correlation.html

Everyone knows that ice cream and sunglasses have a common driver i.e. the weather. Perhaps it is less well known, how frequently this error is repeatedly made in economics and finance, even in the upper echelons of academia and policy making,

The policy impact and subsequent furore over the paper “Growth in a Time of Debt” by Carmen Reinhart and Ken Rogoff is a notable example. They found a correlation between national debt and growth rates, stating that “for levels of external debt in excess of 90%” GDP, growth was “roughly cut in half”.

On both sides of the political spectrum, the calculated correlation had become all that mattered:

For those who wanted to reduce budget deficits in the US and UK, this was referred to as “conclusive empirical evidence” (Paul Ryan) and “convincing” (George Osborne). A strong correlation proved the case for austerity
For their opponents, their attention was focused on the details of a data error which reduced the strength of the calculated relationship. The weak correlation proved there should be no restriction on debt levels.

Both sides of that argument were so simplistic, it was bizarre. This is not a fault of the original work, doing statistical analysis is a good idea, it is a fault of over-simplistic interpretations of its meaning.

The relationship between macro data and financial crises are similarly an area of extreme concern.

It may be true that there is a correlation between budget deficits and currency crises.

If you then conclude that budget deficits cause currency crises, then it is a quick jump to proposing that the way to prevent a currency crisis is to focus on the deficit and cut spending.

This of course fails to explore some crucial, causal links. If budget problems and currency weakness are both manifestations of a common underlying problem then treating one of the symptoms will not cure anything. Once again, an overly simplistic analysis based on correlations can lead to disastrous policy recommendations.