What’s the best method to calculate normalized power?
Thesis: There probably isn’t one, but the PeaksWare method certainly isn’t the best.
Background: The objective of a normalized power metric is to report a value for relative ‘difficulty in output’ in executing an effort with a certain power profile. The specific desire is that this representative of the metabolic and cardiovascular challenge of the effort, it is not generally used to assess neuromuscular or strength or anaerobic difficulty. There have been additional metrics built off of Normalized Power, most notably V.I. for assessing whether or not an effort was well paced. I want to be specific that the criticism of NP algorithms that follows is an evaluation on the above-stated goal only, not on all of the various interpretations and rules of thumb based on it.
Normalized Power herein refers specifically to Normalized Power™, the most widely spread variant of this algorithm used by cyclists, trademarked in 2013 by PeaksWare LLC, the owners of TrainingPeaks and WKO software.
The PeaksWare method for calculation normalized power has two pieces of logic applied. Part 1, the power sequence is smoothed to better represent the load on the cardiovascular system. Heart rate, breathing rate, glucose consumption are smooth changes that occur over many seconds, more seconds than the Adenosine triphosphate and creatine phosphate energy systems in the muscles can supply. Part 2, the power sequence is weighted. The logic here is that the difficulty in sustaining the production of power goes up really quickly above threshold because limited reservoirs are being drawn down. Those reservoirs are basically blood sugar and blood oxygen content, and to a lesser degree, muscle and liver glycogen. They are not ATP and CP, those were theoretically addressed by the Part1 smoothing. Part 3, this weighted sequence is averaged to get a single value.
The PeaksWare method of calculation is to use a 30 second square window moving average to pre-smooth the power sequence and use a 4th power weighting function on the smoothed profile. The TrainingPeaks guys claim that this algorithm combined with these parameters achieves the outcome of representing metabolic and cardiovascular challenge, and that the balance of these parameters is generally correct for everyone and generally correct for all variations of efforts.
This is the point where everyone’s bullshit alarm should be blinking red and the sirens sounding. The 30 seconds square window moving average is clearly not correct for everyone. Some people are absolutely murdered by power variations on the order of 10 seconds, some people are not. Early in a base phase of training, going a little “into the red” can be a workout ender and require very long recoveries, whereas in the midst of a good season when primed up for racing, an athlete can go deeper into the red and recover from it faster. That’s a change to the anaerobic capacity following training that system. Should the results of a metric that purportedly assesses metabolic and cardiovascular challenge be sensitive to the high variability between athletes in anaerobic capacity? No, the metric should be generally insensitive.
Quick test on the PeaksWare method. 10 minute ride total. 9.5 minutes spent at 50% of FTP with a 30second sprint at 3x FTP in the middle of it. The average power is 63% of FTP. I would suggest that the normalized power for this effort should be lower than FTP. The athete didn’t demonstrate the capacity to cardiovascularly do anything above FTP. They showed an OK sprint. What does the PeaksWare method suggest for NP for this effort? 120% of FTP. I generally think that’s wrong.
Further, it’s my assessment (based on my collected power data alone) that the parameters of 30seconds and 4th power, even if they give good results for certain kinds of efforts, are more sensitive to the algorithm than they should be. If we had a better algorithm for normalized power we wouldn’t need to memorize the laundry-list of situations where “normalized power doesn’t apply”. The concept always applies, the algorithm however, sometimes fails.
Onto a recent example: The 2upTT I competed in last week [link]. The effort is particularly appropriate to inform the discussion as it is both “max effort” and not perfectly even pacing on the short time scale but is relatively well paced on the long scale.
The below plot shows which NP is reported (y-axis) for each variation in the power averaging parameter (x-axis) and the power weighting (colour-axis). The dashed lines show the intersection of the PeaksWare algorithm gives a value of 336.8 W normalized power for that effort. Is it reasonable – yes, it’s a reasonable evaluation of the effort. BUT, I do want to highlight a couple features on this plot. The “steepness” of the output curve for any weighting (colour) around 30 seconds is much steeper than the steepness around 60-90 seconds. The meaning is that for *some* efforts the PeaksWare NP algorithm in the region that it is used is quite sensitive to the parameters of the algorithm.
Followup plot to the above. I put the colour-axis from the above plot on the Y-axis (didn’t adjust colours) and then highlighted all of the combinations of exponent and an averaging window that give the same result as the PeaksWare parameters. Basically, there’s a tradeoff, the more power smoothing you apply to the raw data, the bigger the weighting exponent you need to boost the value for NP up. The less smoothing you apply, the less you need to boost the weighting of the hard efforts. The argument for these parameters cannot be made from one effort, and I am not making one based on the 2upTT being analyzed. Just showing that you can get the same answer many different ways.
The question of “why the 4th power” is a glaring one. That rate of scaling is a red flag for me. It may be the most appropriate parameter to put into an algorithm with flawed logic to yield a correctresults. That doesn’t mean it is a good solution to the overall objective. Let’s assess for a moment, what a 30sec 3xFTP sprint thrown into the effort should mean for normalized power. The PeaksWare algorithm is going to weight a portion of that effort as 81 times as demanding as continuing to ride at FTP. Considering 3x the power was transferred, the effort is really weighted “up” by 27 times. Does the body really respond cardiovascularly by such an enormous factor? My experience is no. Substrate consumption efficiency is measured as degraded in the lab when you draw down CP, but it’s not a factor of 27. A factor of more like ~4-5 seems more appropriate. That parameter doesn’t get the “correct” result in the PeaksWare algorithm, but it could mean that the algorithm and parameter are co-broken and compensating for one another.
Final critique: PeaksWare provides no satisfactory analog for instantaneous NP. Such a concept shouldn’t be impossible. As they’re not in the business of providing ANT+ scraping and display to head units (like Garmin for example) they have largely evaded this shortfall. If you ride along with some power variability, it is not logically impossible to assess what the instantaneous draw on the cardiovascular/metabolic systems in your body is/are. Instead of spouting that “instantaneous NP has no meaning”, it’s more appropriate to make your NP algorithm provide the meaning that is logically connected to the concept.
Now, it’s easier to critique than to provide solutions… and I am sure to be critiqued for the above because people love to get religious about their power numbers. So, I’ll present an alternative.
I want to draw on first principles for muscle/O2 transport/substrate consumption energy systems. I am going to guess the weighting factor a-priori. The argument is that burning anaerobic fuel is done at a discounted efficiency compared with aerobic fuel burn. When I ask muscles to generate power above FTP, I’ll agree that I’m going in debt, but it’s not the 27x debt from an exponent of 4, it’s more like a 4x or 5x debt. If we consider that theoretical 3x FTP sprint that generally an athlete can do with cadence on flat ground (reasonably achieved with CP system, not a strength/neuromuscularly limited 4-5x FTP sprint, that they also are using torque and may only be able to achieve sprinting uphill) the exponent should be between log(4×3)/log(3) & log(5×3)/log(3) = between 2.26 & 2.46. If you think you can only sprint at 2.5x FTP maybe the exponent, is 2.5-2.7, but then you’re probably getting old, or you need to work on getting your gainz!.
Then that debt has the potential to be repaid as you work under FTP as extra O2 and glucose are delivered. How long that takes is basically an assessment of how long it takes you to catch your breath after a sprint. Coach Corey typically wanted to know peak HR and HR 1 minute after cresting Emily Murphy Hill (2min @ ~FTP into 40sec max effort sprint), which was certainly not resting HR, but I was usually back to zone 2 with a coasting/pedaling recovery and sometimes all the way back to zone 1. I don’t really have any other assessment for how long it takes to catch my breath except for that one example. It doesn’t matter so much, whether the HR or breathing rate is back down, but those are the simple markers that your body is generally not still trying to “catch up” from an anaerobic effort for much more than a minute after the fact.
OK, so one simple proposal for power normalization is that you would weight power numbers with an exponent (w) and then take an exponentially weighted moving average with a timeconstant T. The weighting is done first, representing the effect of the instantaneous cardiovascular efficiency of the effort. The time averaging models the impact to the breathing rate or HR over time. There are assumptions here, but without growing the model to include three parameters I don’t have a bright idea for a solution. The appropriate parameters are guessed to be 2.36 and a weight of maybe 1/20 or 1/30 each second. Impulse response of a weighting of 1/20 will have decreased by 80% within the minute which seems appropriate. A weighting of 1/30 will have reduced by 80% of the original response before 90 seconds. The summary metric for this normalized power for an effort is simply the average of the instantaneous normalized powers.
To start to analyze this algorithm, let’s map this normalized power metric against the parameter space for the same TT. I am using the denominator from the exponential weighting as a proxy for the square window width from PeaksWare. They aren’t identical but they are analogs so I am going to plot the same parameter space and using the same axis for NP even though it overflows the top with this version of the algorithm. Increasing the weighting of an anaerobic effort increases the normalized power as expected. The larger weighings give rise to much larger values, the cause is the order of weighting then time averaging vs time averaging then weighting. Easy to observe from the plot that increasing the time averaging also increases the normalized power. With the PeaksWare method, you don’t get this effect “in general” although in some cases you will. The interpretation is based on the principle of what we modeled. That is: if you believe the impact of going anaerobic takes longer to recover from (longer time constant), you simultaneously believe that the cardiovascular performance requirement to make that effort is a higher benchmark. Interestingly after 15-20 seconds of weighting, the algorithm becomes less sensitive to this parameter. The plotted values of exponent 2.3 and 2.4 are demarcated on this plot, showing a NP estimate in the range of the one provided by the PeaksWare estimate is achieved. It’s actually unnerving how close.
Now perhaps most interestingly. What is the instantaneous normalized power profile from the 2upTT. I am plotting here with a ^2.36 weighting and 25 second EWMA time constant.
The trendline points out two really big things. The hardest part of team time trials is the sprint to get on the wheel of the person pulling through. Easy to see that when Will pulls through he is putting me in the hurt box big time, it takes the better part of my effort to recover from those spikes. They are prominent after rotation 1, 2, 3, 5, 7, 8, 9, 11, 12, 15… i.e. most of the swaps, I was most in the hurt-box after getting back on, not when I quit on the front. Interestingly, late in the race, it becomes more prevalent that I am resting when not on the front and going deep when I am on the front. The cause is basically that the speeds are higher due to the downhill and the draft is better.
Quick test on that sprint effort. 10 minute ride total. 9.5 minutes spent at 50% of FTP with a 30second sprint at 3x FTP in the middle of it. The average power is still 63% of FTP. Instantaneous normalized power peaks at 2.6x FTP which falls appreciably short of 3x FTP. That seems approximately correct to me, maybe a bit high. I had previously suggested that the normalized power for the effort as a whole should be lower than FTP and the result is indeed 72% of FTP. Increasing the EWMA time constant from 25 seconds to 90 would blunt the peak instantaneous NP to only 1.8x FTP, and change the result of the overall effort’s FTP by only 1%. This is not wholly unsurprising, I had previously shown that after 15sec the algorithm is not terribly sensitive to changes in this parameter.
Disadvantages of this algorithm: If you’ve got a really lopsided effort, going kinda hard in one part and really hard in the other part, it’s not going to give you as much “credit” towards an overall normalized power as the PeaksWare strategy would/does. If you think that’s a big disadvantage I’ll propose that you were construing more from your NP numbers than you should have been doing in the first place.