Jump to content

What’s a Hall of Fame Quarterback Worth?

Sidecar Falcon

Recommended Posts

Note: The images are missing here but are in the link. 


By almost all accounts, the Pittsburgh Steelers have had a horrible, no good, very bad offseason. They were forced into trading Antonio Brown — one of the greatest wide receivers to ever the play the game — and had to sit idly by, watching Le’Veon Bell — a generationaltalent — hit free agency after a year spent on the sidelines in a contract dispute. That’s a ton of production lost on the offensive side of the ball. Or, in other words, trouble for the Pittsburgh Steelers.

But things aren’t so straightforward. Complicating matters is that both Brown and Bell had the luxury of playing with a Hall of Fame quarterback, Ben Roethlisberger, and as much of the analysis of football has shown, the QB is significantly more valuable than any other position. So who was responsible for all of those gaudy stats? And more importantly, what does that mean for Brown and Bell, both of whom will be playing with significantly worse quarterbacks next year?

In what seems fairly counterintuitive, we can’t answer these questions by examining specific pass catchers — instead, we have to examine quarterbacks, and specifically quarterbacks similar to Roethlisberger and those more similar to Brown and Bell’s new QBs. Ultimately, if we want to generalize whatever conclusions we draw to more than just the three aforementioned players, we convert our initial questions into this generalized analytic question: What is the isolated impact of replacing a non-HOF QB with a HOF QB? By correctly interrogating this question, we can

Determine the more important driver of offensive production (QB vs Pass Catcher)

Quantify the value of a Hall of Fame QB

Create an expectation for pass catchers that either join or depart from a team quarterbacked by a Hall of Famer

To answer the above question, I once again turned to the nflscrapR package to gather play by play level data from 2009–2018 (you can find all the code for this analysis here).

Once I had my data cleaned and prepared, I began my analysis. While our initial question seems rather simple to answer— choose my KPI of interest, and examine its levels for those HOF QBs versus those non-HOF QBs — this would be the wrong approach. By measuring production in this way, we wouldn’t be actually measuring the QB, but rather the QB’s team. This underpins one of the fundamental issues with all sports analysis, and arguably none more than in football: sports are dynamic systems, and therefore suffer from entanglement. In other words, measuring the QB’s performance is just another way of measuring the wide receiver’s performance and the offensive line’s performance and the opposing defenses performance, and vice versa.

This is obviously a problem when it comes to quantifying things. It may not matter to Patriots fans if the Patriots make Tom Brady good, or if Tom Brady makes the Patriots good (the Patriots are good either way!) but it matters when we’re trying to determine the value of each piece of the pie, so that we can make better decisions about how we pay players, whether or not to trade players, and how we should talk about the game itself. So how can we disentangle the entanglements? How can we know if Antonio Brown or Ben Roethlisberger was the driver of the Steelers offense? And how can we know what to expect when we replace a Hall of Fame QB with someone…normal?

The With or Without Analysis

Well, the short answer is that we can’t, at least not really. The game is too complex, there are too many moving parts, and we can’t have free rein to run an actual experiment within the confines of the league. But we can approximately decompose passer from receiver using a relatively simple analytical method: the with or without (WoWo) analysis.

So what is WoWo? The name is fairly descriptive — at the most basic level, we’re going to examine every play in which Ben Roethlisberger (and our other HOF QBs) threw to Antonio Brown (or Le’Veon Bell, or any other WR / RB) versus every time someone else threw to AB or Bell or any other WR that Roethlisberger also threw to (i.e. withRoethlisberger vs without Roethlisberger). We’re going to repeat this process for every WR-QB combo, paying particular attention to our HOF QBs, and from there, we can start to generalize the impact of a Hall of Famer on receiver performance, because we’re theoretically controlling for all other variables (this last part isn’t very accurate — in fact, we’re making a lot of assumptions, some of which may be flawed and that I’ll point out as we go along).

My first step was to find the set of players I felt reasonably reflected a “Hall of Fame” class of players. With that in mind, I isolated all passing plays from 2009–2018 for the following QBs (feel free to quibble with my selections, but this is what we’re going with):

Ben Roethlisberger

Tom Brady

Peyton Manning

Eli Manning

Aaron Rodgers

Philip Rivers

Matt Ryan

Drew Brees

Tony Romo

Russel Wilson

At least seven of the selected are sure-fire Hall of Famers, while Romo, Ryan, and Wilson either have decent shots, or were at some point considered elite.

These selections prove out when we examine our KPI of choice: Expected Points Added, or EPA (see the link for a detailed explanation — I used the Ron Yurko’s version of the EPA model in this analysis). The Hall of Famers I selected averaged 188% more EPA per attempt than the non-HOF QBs over the same time frame (for context, the overall mean was 0.14 EPA).

So we’ve established that the Hall of Famers produce more EPA than the non-HOF group, but that doesn’t actually tell us about the quality of the QB versus the quality of the team. For example, the level in EPA for Hall of Fame QBs may even be explained by a larger trend in overall EPA. So let’s check that.

Okay, so the trend actually favors the non-HOF over time, but that intuitively makes sense — we can only determine Hall of Famers by their performance in the past, which inherently means our sample is a little age-biased. In fact, two of the QBs selected retired at some point during our study after spending at least a year significantly injured. So the slight decline in HOF performance isn’t surprising, and the increases in the non-HOF performance, while substantial, still pales in comparison.

Perhaps the higher EPA for our Hall of Famers can be explained more by howthey play the game — it could be that either the HOF or non-HOF group is more boom or bust, which could artificially help or hurt EPA (depending on your view of that strategy) versus a more consistent approach. We can quickly see whether or not this is true by examining the density plots of both groups.

The plot to the left shows that the distributions are fairly similar in shape, which tells us that the overall styles aren’t likely to be that different, but you’ll notice that the peak to the left of the black line (zero EPA) is higher for the red group, while the “shelf” of EPA to the right of zero is a bit higher for the blue group — our Hall of Famers are just consistently producing higher EPA than their counterparts.

Finally, before we dive into the full analysis, I wanted to double check that my selections — while strong in aggregate — also made sense at the individual level (and don’t act like you’re not curious who the best of my QBs are by our KPI of choice).

So these guys are pretty solid across the board. Hey, even Eli is worth nearly double the average non-HOFer in the data set (plus the two Super Bowl MVPs virtually guarantees a Hall of Fame bust).

Now that we’ve established that our sample group is pretty reasonable, it’s time to begin our WoWo analysis to isolate the expected impact of a HOF QB in the passing game.

The first step was to isolate reasonable WRs that played with one of my chosen Hall of Famers and at least one other QB. I made an arbitrary decision to limit the sample to receivers that were thrown at least 20 passes by two QBs, one of which being in our HOF class. This leaves us with every pass thrown to this number of receivers per QB.

One thing to note here is that there is overlap between QBs and receivers. For example, Brandin Cooks played with both Brady and Brees, and Emmanuel Sanders played with both Roethlisberger and Peyton Manning, among others within the data set. And many of the receivers played with multiple non-HOF QBs. When we expand out the data set to the individual throws to each of these receivers, we’re left with 44,013 attempts, with some replication. For instance, we have to analyze Cook’s receptions four times: once “without Brady”, once “without Brees”, and another two times for “with” each HOF QB.

Before we get to modeling the impact of HOF QBs, let’s just check the mean EPA for receivers when they caught passes from someone in our sample versus anyone else. These simple results are pretty interesting.

Eli…woof. Maybe the guy shouldn’t be in the Hall of Fame after all! Receivers were 17% less productive, on average, when they were targeted by Eli versus any other QB they played with. But outside of Manning, receivers averaged between 18% to 213% more EPA per target when playing with a HOF QB versus when they played with someone else — not too shabby.

One thing I noticed in these results was that both With and Without numbers tended to be on the high side — all but Matt Ryan and Eli’s With receivers produced above the study-wide average EPA.

That can imply a few different things — the most obvious being that receivers that play with HOF tend to be great and therefore the causality of the HOF QB is the receiver, not the QB. The means Without EPA standing higher than the mean overall EPA suggests that this interpretation isn’t without merit. And that may be true! But ultimately, I find there is enough evidence of a drastic decrease in production when playing without one of our Hall of Famers to reasonably assign more credit to the QB than the receivers, in general. If we really wanted to get to the bottom of the “credit” question, we could attempt to answer the former question by running — you guessed it — a WoWo analysis in the other direction, and examining QBs with and without certain receivers. But that’s for another analysis…

In either case, the important thing to observe is not the level of production but rather the relative difference in production , and irrespective of the quality of the receivers, there is little doubt that they were more productive when playing with a HOF caliber QB. That brings us to the final step of our WoWo analysis — creating a statistical model to quantify and generalize the relative impact of replacing an average QB with a Hall of Famer.

To do this, let’s use a simple linear regression with a single independent variable: an indicator of whether or not the pass was thrown by a HOF QB. Now, to note, this is an incredibly simple model that is leaving out a ton of potential variables: age of receiver, age of QB, defense, season, week, weather…I can go on and on, but that’s not the point. Some of the more clever readers (and the statisticians) have probably already figured out the results of this model based on an earlier graph, but for everyone else, here you go.

Notice anything about the placement of the point-estimates (the dots in the middle of the lines)? They’re the exact averages shown in the column chart just before! Because linear regression is modeling the mean effects of our indicator variable, it’s no different than just taking the simple averages. But it does give us two crucial pieces of evidence: (1) the significance of the effect (p ~0) and, (2) the confidence intervals of our mean estimation (the vertical lines surrounding the dots represent the 95% C.I. that our true mean effects lie within). We can then convert the point estimates to a relative scale to draw our WoWo conclusion: the data suggests that HOF QBs generate 92% higher EPA per attempt than non-HOF QBs, all else held equal.

This is a fairly stark conclusion, and one that appears pretty substantial…but if HOF QBs are worth nearly twice as much as non-HOF QBs, why are so many clear non-HOF QBs paid like our sample group? There are a whole bunch of reasons why this could be the case that I won’t get into, but perhaps our relative estimate isn’t that impactful on the actual field. After all, a 92% increase in effectiveness on a relatively low EPA might seem like a lot, but we’re still talking fairly small numbers, at least per attempt. So with that in mind, let’s translate our WoWo results into the one KPI that truly matters: wins.

Passing EPA and Win Probability

Converting our relative numbers to wins is fairly straightforward — we just need to find out how passing EPA predicts success. First, I collected all game outcomes from 2009–2018. Then, I had to determine what variables I was going to use to construct my win probability model. Since we’ve been focused on Passing EPA, it logically makes sense that we use this as our primary variable of interest - but we’ve been working from EPA per attempt, a rate statistic. While rate statistics are great for comparing efficiency; volume or counting statistics are also extremely important for predicting the likelihood that a team will win.

As an example, let’s say we have two teams playing one another: Team A andTeam B. Team A scores one TD per every 5 minutes of possession time while Team B averages one TD per every 15 minutes of possession. If we assume both teams mimic their averages, you’d rightly conclude that Team A is a more effective scoring team. But if Team A only controls the ball for 10 minutes, and Team B has possession for the other 50 minutes, Team B would be expected to win 23 to 14. Volume, as well as efficiency, counts for a lot when the game is decided by a counting statistic (in this case, points).

So with that in mind, I decided to calculate the Total Passing EPA for each team in each game, and use that as the foundation for our win probability model.

Once I calculated each team’s Total Passing EPA and took into consideration whether or not they won the game, I fit a simple logit model using only those two variables to understand the influence of passing production on the likelihood of victory. With the model fit, I could then apply a 92% increase in Total Passing EPA to simulate what replacing a non-HOF QB with a HOF QB should do in terms of win probability.

The results are striking.

So on average, QBs produced about 4.8 total expected points from the passing game, which should roughly win 50% of the time. This makes sense, since the average QB, by definition, should fit right in the middle. If we apply our 92% lift for all else held equal HOF QB performance, we should expect about 9.3 points from the passing game, which should get the team a win about 62% of the time — 12% higher, or a relative 24% increase in win probability. This suggests that replacing an average QB on an average team with a Hall of Fame QB should be worth about 1.92 more wins over the course of a 16 game season (.62 * 16 - .5 * 16). And since the average team is expected to win 8 games, this pushes the expected record to 10–6.

This is nothing to sneeze at — according to FiveThirtyEight, 8–8 teams only make the playoffs about 10% of the time, while 10–6 teams make the playoffs 88% of the time. That’s a huge increase! So Hall of Fame QBs, when they theoretically replace non-HOF QBs, and everything else is held equal (impossible, but still!) should increase a team’s chance of making the playoffs by an astonishing 780%. Of course, there are a ton of other variables that we ignored, but this is still an illustrative exercise that is — at least to me — a pretty convincing argument for finding a HOF talent and never letting him go.

Finally, I wanted to point out that all non-HOF for HOF QB swaps are not created equal. If a terrible team (say, the Bills) were to swap their QB (Josh Allen) with an all-time great like Tom Brady, they would get an enormous relative lift in passing production. But they would still be a terrible team and there is only so much Tom Terrific could do. So even with Allen to Brady swap, we probably shouldn’t expect the Bills to win too many more games. The same would be true if you swapped Jared Goff for Drew Brees — the Rams are already pretty great, so even though Brees is arguably the greatest passer ever, there is a limit to the amount of wins he could add to a 13 win team. The best time to get a Hall of Famer caliber QB is when you’re already somewhere in the middle…

Conclusions, caveats, and what to expect from AB and Bell in 2019

A With or Without analysis is most certainly a simplistic technique, but that doesn’t mean it isn’t a powerful tool in the analytics arsenal, particularly when we’re dealing with issues of entanglement such as we see in the NFL play by play data. Sure, there are a host of variables I could have accounted for — particularly controlling for the talent of the team around the QBs using other metrics, or incorporating information about a receivers age (perhaps the Withoutstatistics are driven by older receivers that have moved on rather than poor QB play). But at the end of the day, I don’t have any strong issue with my assumptions, because my posterior beliefs aren’t that different from my priors. I believe having a great QB is the secret to winning, and not the other way around. There is at least some reasonable evidence that this is true.

One way I may update this analysis in the future is by treating it as a longitudinal or panel study, and fitting a mixed effects model to better control for age, usage, and mid-season trades or injuries. But that’s for another time, and another post.

And finally, using our insights from this analysis (and making the not too crazy prediction that Derek Carr and Sam Darnold aren’t quite HOF worthy), what might we expect from Brown and Bell next year?

From 2009–2018, Brown ranked 39th in EPA per target (while amassing a whopping 1,234 targets and by far the most total EPA) while Bell produced a more paltry ranking of 166 (in pure passing EPA, ignoring rushing). If we apply the relative change expected from leaving a HOF caliber QB to each player’s career EPA average, the “new” EPA would only rank a respective 143 and 227 in the NFL over the same period.

That’s a shocking result, and one that could make the often predictable anything but. Get your popcorn ready — it’s going to be a wild season. I can’t wait.


Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Create New...