Trading System Performance Metrics
Your pocket glossary of the most important formulas for evaluating a trading system
If you're a trader building your own system, or if you want to follow someone else's, then you need to learn how to assess a strategy to find out if it's worth investing in. Unfortunately, it's not as simple as just looking at returns.
You need to look beyond profits and measure things like:
- How much drawdown should I expect with this trading system?
- What do the day to day returns look like – are they steady, or do they all come at once?
- What is the correlation, is it basically just an S&P500 fund with leverage or is it really generating unique returns?
The questions don't stop there and the math can be intimidating to reach the answers.
Which is why we’ve broken down the most popular metrics to evaluate a trading system for you in the glossary below.
For each measurement we define:
- The term “in plain English” to make it easy to understand
- The formula with each of its elements defined
- “High vs. low values,” which tells you the outcome you want to see
- Limitations, or where the shortfalls or blindspots might be
We hope by the end of this glossary you'll feel more confident measuring the efficiency of trading systems and ultimately become a more informed trader.
In plain English: CAGR stands for compound annual growth rate. It is the interest/growth rate required to grow an investment from a present value to a future value. This interest rate takes into account the compounding effect of reinvested profits after each investment period.
- EV: ending value
- BV: beginning value
- n: number of years
High vs. low values: You want to see a high CAGR as it represents the trading strategy earned a greater annual return over time.
Limitation: As an investor, you earn dollars, not returns, so despite what your CAGR shows, you are still susceptible to sequence of return risk. Even a high positive CAGR can't guarantee that you will end your next period in positive territory. It's one of the best long-term measurements of a system's profitability, but it is not a reliable measure of total return in the near-term.
In plain English: Maximum drawdown is the largest decline in value of a portfolio from a peak to a trough over a period of time. It signifies the downside risk level of the investment strategy.
High vs. low values: A low MDD is preferred as this implies the investment strategy does a good job at managing risk and preventing drawdowns from getting out of hand.
Limitation: MDD only measures the size of the largest decline but not the frequency nor the length of time the strategy is underwater in its drawdown.
In plain English: Sharpe ratio is one of the most popular risk-adjusted return ratios. It shows the average return the portfolio earns on top of the risk-free rate per one unit of risk taken. The level of risk is represented by the standard deviation (volatility) of the portfolio’s excess return.
- Rp: return of portfolio
- Rf: risk-free rate, usually the yield of Treasury Bills
- p: standard deviation of the portfolio’s excess return
High vs. low values: A high Sharpe ratio is preferred as it shows the investment strategy is producing high returns relative to the risks taken.
Limitation: Because the denominator of the equation uses “all” returns this ratio will penalize upside volatility (aka periods of strong positive performance).
In plain English: MAR is the ratio of annualized compounded returns relative to the investment strategies maximum drawdown.
High vs. low values: A higher MAR ratio is better as it shows a higher return on the investment after adjusting for drawdown risk.
Limitation: Similar to maximum drawdown, MAR only takes into account the size of the largest decline but not the frequency nor the length of time the strategy is in drawdown.
In plain English: Correlation measures the degree to which two securities (or trading strategies) move in relation to one another. When we are measuring trading systems – the most interesting correlation to measure is the strategy as it relates to its benchmark.
The value of a correlation coefficient falls between -1 to 1
- Positive correlation: 1 is a perfect correlation in the same direction
- Negative correlation: -1 is a perfect correlation in the opposite direction
- No relationship: 0
High vs. low values: This can depend on what you are looking to achieve, but generally when assessing investment strategies, a lower correlation (or negative if you can find it) is most desirable in order to maximize diversification benefits.
Limitation: Because correlations are calculated based on past movement, they are not static measures that will remain constant into the future. During periods of high market stress this becomes particularly noticeable as many asset classes and investment strategies see correlations converge towards 1.
In plain English: Win rate tells you what percentage of a system's overall trades result in profit.
High vs. low values: A high win rate is preferred as it shows the trading system has a higher probability of producing profitable trades.
Limitation: Win rate only tells us the percentage of trades that resulted in profit and does not say anything about the size of the winning trades relative to the losers.
Average Win-to-Loss Ratio
In plain English: The average win-to-loss ratio tells you how large your winning trades are compared to the size of your average losers. Ideally this ratio is computed using R-multiples (see next definition) so you can compare win sizes based on risks taken.
High vs. low values: A high average win-to-loss ratio is preferred as it shows the strategy makes more money on its winning trades than it loses on its losing trades.
Limitation: Average win-to-loss ratio measures the average size of the trade but not the frequency or win percentage of the trading strategy.
R & R-Multiples
In plain English: R represents the amount of risk in a given trade, dictated by the strategy's initial stop loss. This helps create a standard “risk per trade,” which then allows traders to express the outcome of all their trades in the form of R multiples. R multiples are useful at comparing the efficiency and profitability of an investment strategy's entire trade history.
High vs. low values: In the case of profit, a high R-multiple is preferred showing the trade returned high profit relative to the initial risk required. In the case of loss, a low R-multiple is preferred and generally you want your loss to be 1R or less.
Limitation: R and R multiples tell you about a trade's outcome relative to the initial risk taken but does not give any information about the frequency of the outcomes.
Measuring Trading Performance in Practice
To put some of the terms we explored above into practice, we can look at the historical performances of the Trade Risk’s own Merlin Trading Strategy from January 2006 to March 2020.
We built two model portfolios to showcase Merlin's track record and profitability. A Margin portfolio, which allocates up to 150% invested (50% margin), and a traditional IRA portfolio, which uses no leverage. Please reference our Performance Disclosures to understand the underlying assumptions for each portfolio.
The Importance of Benchmarks
Choosing the right benchmark is a topic worth its own blog post, but for the scope of this guide, a benchmark is the standard that we will measure our trading systems against. Sometimes the best (and only) way to know if something is "good" or "bad" is to compare it to a benchmark, and eventually other trading systems, to find out how it stacks up.
The benchmark we selected for our portfolios is a 60/40 stocks/bonds allocation, which is widely regarded as the gold standard and default allocation for the majority of investors. If a trading system can't beat the benchmark, then why invest in the trading system? Just own the benchmark, and wait until you find a better system.
What do the terms say about each portfolio’s performance?
CAGR: Both the IRA and Margin portfolios outperformed the benchmark, with the Margin Portfolio having the most impressive return on the initial investment.
Drawdowns: The 60/40 benchmark portfolio witnessed the largest drawdown at -27.96%, however its second and third largest drawdowns were smaller than both the IRA and Margin Merlin portfolios. This is a good illustration of the maximum pain you may have to endure trading any one of these portfolios. If you want to enjoy the high CAGR of the Merlin Margin Portfolio then you’ll also have to accept its larger drawdown profile as compared to the more modest IRA equivalent.
Sharpe: Both IRA and Margin portfolios have a risk-adjusted excess return above 1. This is an excellent outcome as the returns, in essence, outweigh the underlying volatility/risks. Meanwhile, the benchmark 60/40 portfolio has a Sharpe ratio of 0.75. For a passive allocation, earning a 0.75 Sharpe is very strong, which is partly why we selected this benchmark in order to set a high standard. Both IRA and Margin’s returns exceeded this high bar, with the Margin portfolio being the most attractive.
MAR: MAR is another risk-adjusted return measurement. Unlike Sharpe ratio, which measures the consistency/steadiness of returns and penalizes a wide deviation of returns, MAR does not care about deviation but focuses on the impact of the biggest loss relative to the “normal” annualized return. Once again, both IRA and Margin outperformed the benchmark significantly and have a return that outweighs the largest loss. In contrast, the annualized return of the 60/40 portfolio does not make up for its associated level of drawdown risk.
Correlation: The IRA and Margin Portfolios have comparable positive correlations relative to the 60/40 benchmark portfolio. Since a 0 represents no correlation whatsoever, and a 1 represents perfect correlation, we can see Merlin’s 0.65 falls a bit above the midpoint into what we could consider moderately correlated to the benchmark. Thus, for traders, the Merlin portfolios offer a unique advantage: they generate strong risk-adjusted returns while acting different and not just following the market.
Trading System Performance Metrics Final Thoughts
It's important to remember, that even if you find the perfect trading system, with all of the best metrics, you can still run into unexpected behavior with that system going forward. This is because all of these measurements are backward-looking, based on historical data, and we know that the future may not look like the past, and markets/strategies can and will change.
Last but not least, while it can seem overwhelming and redundant to have all these different measurements, they really are necessary and all play a small part in painting the full picture. In other words, none of those measurements taken by themselves would be sufficient in assessing the profitability/efficiency of an investment strategy. A good example is Sharpe ratio vs. MAR, both are risk-adjusted return measurements, but one defines risk by the large deviation of returns, and the other by the largest loss. By considering them both in portfolio analysis, traders can bridge the gap between max drawdown and consistent system performance, thus becoming more informed about the overall performance of the strategy.
Now that you have a good understanding of these important trading metrics, and have seen how well our Merlin trading system has held up versus its benchmark, head over to this page to learn how the strategy works.