Heading out the door? Read this article on the new Outside+ app available now on iOS devices for members! Download the app.
When I was a kid, I was obsessed with sabermetrics. For those of you who went on dates in high school, sabermetrics is the analysis of baseball statistics to develop an advanced understanding of the game (like in Moneyball). I may not have gone to many school dances, but I could talk for 3 hours on the merits of Wins Above Replacement in baseball players before 1950.
Sabermetrics never did me much good in life… UNTIL TODAY. Because in this article, we’re going to analyze some advanced statistics for the Western States 100, with the help of some sabermetric principles along the way. My co-conspirator is Marshall Burke, associate professor of Global Environmental Policy at Stanford University, whose work on wildfire smoke you may have seen over the last few weeks in outlets like the New York Times. He’s back after last year’s time predictions, which were shockingly accurate. I didn’t ask him whether he went on dates in high school, but based on his statistics knowledge, I think we all know the answer.
Let’s break down this year’s data fun, starting with the best women’s performances ever.
Ann Trason is the Babe Ruth of Ultrarunning
In 1920, Babe Ruth hit 54 home runs, at a time when no other team in the league hit more than 50. That wasn’t even his best season. The aforementioned Wins Above Replacement aims to quantify how many wins a player adds to their team relative to a solid player that the team could theoretically add from the minor leagues. In 1921, the Babe had 14.1 WAR. For comparison, in those years when Barry Bonds took steroids and broke the entire game of baseball (for context, his on base percentage was .609 in 2004, when the league average was .335), his highest WAR was 11.9.
In ultrarunning, Ann Trason is Babe Ruth, but better.
(Also, fun fact: Barry Bonds is an avid cyclist on Strava. For some reason, this makes me so, so happy.)
Here’s how Marshall conducted the thought experiment to determine the best performances ever. For each year starting in 1985, he built a model from all the other years but not that year, and predicted what would have happened in that year given the overall trend in times and the overall field average performance on that day. It’s akin to a simplified Wins Above Replacement, comparing the winner to the middle of the pack.
Marshall’s rationale is that the overall gender-specific average is going to pick up the temperature effect and anything else that made that day fast or slow (snow, boats, mountain lions, Mercury in retrograde, etc.). And controlling for field average is probably better than controlling for average time in top-10 or top-20, since a fast leader might cause the elite times to be faster.
A model that considers the progression of performances over time and field average time explains greater than 75% of the variation in winning times. So even though this is just a thought experiment, it’s coming from Marshall’s brain, which could be hooked up to a turbine to power a medium-sized city.
There is one more problem, though. How do we deal with athletes who have won multiple editions of the race? For example, Jim Walmsley’s 2018 performance could theoretically make his 2019 performance seem less remarkable since he’s competing against himself in the analysis. Marshall dealt with the problem by creating two models and letting us decide. What a boss! It’ll be a shame when his cerebellum is used to power microwaves in Albuquerque.
- The first model drops the winning times of every year that athlete won the race (“What would have happened in 2019 had Jim Walmsley not existed?”)
- The second model just drops the winning time in a single year (“What would have happened in 2019 had Jim missed that year?”).
Now we have the context to think about the immortal Ann Trason. She won Western States 14 times, in bonkers performances that put her in the overall top-6 8 times. Let’s start with the model that predicts winning times assuming that the year’s winner never existed, since Marshall thinks it’s the fairest to outliers. We included the average finishing time and high temperature in Auburn for context. Prepare to have your mind blown.
KABOOM, there goes your mind! Ann was so far ahead of her time that she has run 13 of the best 14 performances ever based on the model. Here’s the second model, which assumes the winner didn’t compete in that year only.
Ann still has 12 of the top 20 performances ever, but she is dethroned at the top spot by legend Ellie Greenwood’s course record. While I am not a statistician, I actually think this model might be fairest because Ann won so many editions of the race that we would be tossing a lot of data. Ellie’s time is historic, and my gut tells me it’s the best run ever, maybe in the entire sport. I am biased toward the present, though.
And all of that brings us back to baseball. If we take WAR at face value, it’s hard to argue against Babe Ruth being the greatest of all time. However, baseball has some unique considerations. Players back then had names like Rock Saw McGee, they threw fastballs that couldn’t break glass, and–disgustingly–the game was segregated. Babe Ruth would have hit fewer home runs if he had to face Smokey Joe Williams in 1930, or an exploding 100-mile-an-hour cut fastball in the modern era. He probably would have still been quite good, but no way he’s the clear-cut greatest.
Ultrarunning is a lot different, of course. Here, we’re talking the 1990s and early 2000s, not 100 years ago. And the segregation piece is not directly relevant (though trail running has a long way to go with inclusion). But I do think that more recent athletes might be getting penalized for the advances of the sport more generally, as the average finishing time in the middle of the pack might have gone from someone who had never heard of a hill stride in 1994 to someone who reads everything about training theory today. Yes, in this formulation, I think hill strides are the most statistically significant variable. Is there anything hill strides can’t do?!
In addition to training theory, it’s possible that increased use of cooling, a larger talent pool, and better equipment could be driving down average times. But no matter how you slice it, Ann set records that will never be broken. 100 years from now, my great-great grandson’s coach will probably be writing an article on how Ann Trason is untouched historically.
Quickly, though, let’s highlight some of those modern performances. Every year since 2018, it has taken a historically stellar performance to win. Courtney, Clare, Beth, and Ruth, in chronological order, have beaten statistical expectations by 20-40 minutes. That informs my takeaway for this year for women. If you want to be in the top-5, you’re racing the competition. If you want to win, you’re racing history.
Every year since 2018, it has taken a historically stellar performance to win. Courtney, Clare, Beth, and Ruth, in chronological order, have beaten statistical expectations by 20-40 minutes. That informs my takeaway for this year for women. If you want to be in the top-5, you’re racing the competition. If you want to win, you’re racing history.
While we can’t run this model without an understanding of average finish times, we can use temperature and the same historical trends to predict what time the winner will run. The current high forecast in Auburn, CA is a downright temperate 78 degrees F. Marshall’s model would predict that the winning woman will finish in 17:06. If it’s 84 degrees, that time would be 17:18. It’s going to be snowy in the high country, so it’s possible that these predictions need to be tossed aside. But 20 minutes faster than 17:06 is 16:46.
Ellie Greenwood’s GOAT performance is 16:47. Get your popcorn popping.
Jim Walmsley is a Superhero
In last year’s analysis, we had to run predictions with Jim and without Jim. The problem? Jim breaks equations like he breaks records.
We continued that trend for the time predictions, but for the best all-time performances, we don’t see the same effect as we saw with Ann. While Ann is like Babe Ruth, Jim might be like Pedro Martinez or Sandy Koufax, demonstrating absolute dominance, but at a time when the game was a bit more developed. Here are the best performances ever with all of that athlete’s performances omitted, which will be the only graph since the differences are marginal.
Jim’s 2021 win was his slowest time, but his best performance, and the best of all time by a ton. That year, it was hot and times were slower across the board. Except Jim, who ran an unthinkable time. And I love Mike Morton’s performance sneaking in with the 2nd spot! This ranking really shows the effect of temperature on men’s times, with some of the fastest times ever being relatively close to model predictions (as a reminder, every winning time is legendary in its own right, and all hate mail can go to Marshall and his big brain, P.O. Box 190 IQ Way).
When you take out Jim’s performances, most recent men’s races have aligned with model predictions, or even been slower, contrasting with what we see for women. In 2023, it’s safe to assume that the snow will lead to slower times than the model predicts, since it doesn’t account for snowpack in the high country.
If the high in Auburn is 78 F, the model predicts the men’s winning time will be 14:24. Holy shit. Let’s take Jim out of the stats entirely to give a more accurate prediction. Without Jim, we’re looking at 14:41. And based on what we see historically, it’s safe to assume the time might be a bit slower than that unless we see an all-time great run.
At 84 degrees, those times go to 14:34 and 14:56. Even with snow, we’re probably looking at a barn burner this year.
David Roche’s Predictions
The temps are going to be cool, the competition is going to be hot, and the sentence structure is going to be predictable. My bold prediction is that the women’s course record is broken, along with the best performance ever using this model. I don’t want to name names, but if you know, you know.
For men, I think the winning time will not break 15 hours or the top-20 performance list. Yes, I am going against Marshall’s model. He’s on vacation in Iceland, so what’s he going to do about it? Iceland doesn’t have baseball! I’m not even sure Iceland has internet!
My rationale is that the men’s times are so fast, with a lower rate of improvement over time, and sub-15 pace in the first 30 snowy miles will eat athletes alive in the final 30 miles. So I think the men’s winner will come off a slightly more conservative pacing strategy than the model would predict, or will involve a fade from model-predicted times. But again: my brain is filled with Rod Carew batting averages and Fergie lyrics, so maybe that’s crowding out some useful predictive neurons.
Let’s end with one more baseball stat: Fielding Independent Pitching, or FIP. I absolutely love the story behind this statistic, so here I am telling you about it in a trail running magazine. Believe in your dreams, dateless kids!
In the early 2000s, researcher Voros McCracken discovered a wild baseball quirk. Across seasons and massive datasets, the number of batted balls that became hits rarely showed correlations for individual athletes. In other words, the probability that a batted ball becomes a hit might be out of a pitcher’s control.
That seems wrong. Wouldn’t a great pitcher give up less solid contact, leading to fewer hits? And wouldn’t Tugboat McGee with the middle-school fastball have every pitch belted a billion miles per hour? Shockingly, it doesn’t seem like that’s the way it works–the league-average batting average for balls in play is relatively stable across seasons, and individual deviations from that mean might just be luck, rather than skill (with some exceptions for certain pitchers).
FIP isolates what the pitcher actually controls: strikeouts, walks, hit-by-pitches, and home runs. It seems like that’s such a small part of the game, but those numbers alone added a depth of understanding to pitching performance that informs how the game unfolds now. And I think FIP is a solid metaphor for how athletes can think about Western States.
There are things you control: training, cooling, logistics, and mindset. There are many more things you don’t control: health, temperature, snow, trail conditions, and all the other vagaries of race day. Think about what you control, and try to remember that the uncontrollable variables are what make ultrarunning so special.
Think about what you control, and try to remember that the uncontrollable variables are what make ultrarunning so special.
When there are two outs and the bases are loaded, just like when there is a make-or-break moment in an ultra, it doesn’t matter how much good luck or bad luck led to that moment. What matters is the underlying fundamental elements of performance you can control.
There is a lot of luck involved in all of this stuff. But luck plays a lot smaller role if you take a deep breath, regroup, and strike the next hitter out.
David Roche partners with runners of all abilities through his coaching service, Some Work, All Play. With Megan Roche, M.D., he hosts the Some Work, All Play podcast on running (and other things), and they answer training questions in a bonus podcast and newsletter on their Patreon page starting at $5 a month.