Respect for the Complexity of the Problem
We're not as far apart as we think we are :- )

 

Geoffy linked up our last article on WAR dogma ... no doubt there's some fresh traffic to SSI on the subject.  A little Exec Sum would probably be helpful.  

Okay, here's the thing.  Neither James, Zduriencik, I, or anybody else disputes the idea that WAR (or VORP, or Win Shares, or...) has a lot of value.  Everybody on SSI, and everybody with the Mariners we're sure, considers WAR (or Win Shares, or VORP, or a similar one-stop stat) carefully when discussing a roster move. 

WAR is great.  It is ruined when it is used 2-dimensionally, as though it were the blinkin' Periodic Table of Baseball Elements.  

For example:

............

1) When John Jaso was traded, who were the catchers that were going to replace him?  What is the DELTA between this 2.5 WAR player and whoever is behind him?

At any given time, the Mariners are allowed 25 men on their roster.  In this sense, player #26 is not worth 1.0 WAR; he is worth absolutely zero.  Players #27, #28, etc., are worth ZERO during tonight's game.  

This dynamic is constantly overlooked by WAR dogmatists.  They'll compare single players, Jaso vs Morse, but refuse to compare the player pairs, say [Jaso + Wells] vs [Morse + Zunino] or [Morse + Shoppach].  Much less the player 10-sets that will cascade over the next several years.

............

2) What is the dynamic FUTURE of your 25-man roster, one month from now, one year from now and three years from now?  

How does a Stars & Scrubs acquisition -- dealing four 1.5 WAR meatballs for one Doug Fister -- affect the AGILITY of your in-season reaction?    Where does your WAR formula factor in the reality that you gave yourself the option to (say) replace Brandon Maurer with Danny Hultzen?

Picture the variations of your club's possible futures as --- >  a transparent 3-D spiral cylinder of computer code that you are trying to read.  You can't capture it with a formula that has three variables in it, such as John Jaso's WAR x 3 years vs. Michael Morse's WAR x 1 year ... 

.... and what is your chance of re-signing Michael Morse?  Did you attempt to capture that in your formula?

.............

3) What is the John Jaso's FUTURE value going to be, after he goes from part-time play to full-time play?  

Stats are backwards-looking by their very nature.  WAR dogmatists get used to measuring previous seasons quite accurately, and then (through sleight-of-hand) represent themselves as having a "correct" projection of a player's future seasons.

Nobody has a "correct" projection of Morse's or Jaso's 2013 seasons.  Better projections, vs. worse projections, that's what makes a ball game between Zduriencik and Beane.

..............

4) What is the real defensive value of a player, in distinction to what his UZR says?

This isn't a small issue.  And right off the bat, it appears visually to many observers that Morse is going to be worth 0.5 to 1.0 WAR more than we thought (he was given a huge downgrade based on defense, and if he's even mediocre, he could be a real star).  

And that Jaso could be worth 1.0 to 2.0 WAR less.  Did you see that throw on Saunders?!  

... and what about his pitch framing, worth +2.0 to -2.0 WAR per year?  And everything else a catcher does?  Supposing, for the sake of argument, that Jaso really isn't a viable fulltime catcher.

OOPS.  MY BAD.  :shrug:

You don't have to sign this off in order to recognize the broader problem.  The problem is simply that WAR doesn't warrant the dogma that it attracts.

.............

5) Why do real GM's -- all 30 of them, backed by armies of saberdweebs -- pay far more (like 2x, 3x as much) for proven 100-RBI first basemen, DH's, LF's and RF's than WAR says they should?  

Does it have anything to do with there being no way to "solve" these hitters?  Is there something "harder" or "more sound" about Morales' and Morse's offensive production -- more predictable going forward -- than there is about other players'?

................

6) When a legit 100-RBI man delivers a big blow in the early innings, does it knock the SP out of his rocking chair?  Does he pitch differently when behind 2-0?

MLB players respect certain other players.  Does this affect their own play?  What CAUSES players to have UP seasons or DWN seasons?  Sabermetrics hasn't even begun to answer this question -- the causes of good or bad seasons -- and it in fact resents the question.

..............

7) etc.  8) etc.  9) etc.

...............

WAR dogmatists tend to brush all this off with "if Jack Zduriencik can't prove the Rocking Chair theory, he has no right to factor it into his evaluations.  He has the moral obligation to use the best mathematical formulas available, and nothing else.  Of course, those formulas are the ones we publish."

To which Bill James responds, "We do not have near-perfect measurements of baseball players.  It is foolish to assume that we do."

Modern sabertistas, if they're not careful, become comparable to 1906 sportswriters, who measured players by batting average.  If you traded a .280 AVG player for a .260 AVG player, they'd fricasee you alive.  They could have just as easily argued that your love for defense, walks, stolen bases, etc., was ephemereal.

The 1906 sportswriter could have just as easily demanded that you use his AVG paradigm -- exclusively -- until you proved you had a better one.  Sorry, Charlie.  Just because you think you've got it figured out, doesn't mean the rest of the world has to buy in.  Baseball isn't chemistry, and precious little about individual player performance is provable through repeatable experiment.

There are few areas of life that can be captured by mathematical formula.  Life is complex.  Are murderers convicted that way?  Do you choose your wife that way?  Do you decide whether a salesman is lying to you, through mathematical formula?

No, and Jack Zduriencik chooses not to rule his own intuition out of bounds as he makes difficult judgments about the future of the Mariners.

.............

We're not saying that WAR is useless.  We're just saying, have some respect for the complexity of the problem.

Your friend,

Jeff

Comments

1

Somebody had a comment, "I think you're also saying," and I accidentally deleted it when deleting the comment that I promoted here.  Sincere apologies.  Hopefully was something that can be quickly re-posted?
::sheepish::
Edit to add, sorry Matt.  Can you re-post?

2

I responded to your original comment...and then you moved it to a post and my response disappeared! :)

3

I had a quick trigger finger...saw your post and felt the need to chime in...LOL
What I originally said:
I think you're also saying that the model itself needs to be re-evaluated. Am I correct about that?
I certainly believe that. The specific example of Mike Morse vs. John Jaso is not an isolated incident. We had the same fight over Raul Ibanez vs. Endy Chavez (looking back on that one now...which of those guys do you suppose would have made the Mariners better in 2009/2010?) and Adam Dunn vs. the glove men and several other similar situations in the past. WAR calls Kendrys Morales an average player...does he feel average to you from the stands when he's carrying the offense?
Here's what I believe:
The problem with WAR is not JUST that it doesn't respect the complexity of the problem, but that it has systematic biases against or for certain types of players that defy basic common sense. When a fleet of Endy Chavezes gets deployed, teams go 61-101 and set records for offensive futility and somehow...their WARs all go down that year. The Mariners are not the only team to have this happen.
I think that the positional adjustments, defensive methodologies, and presentations of the the results need to be adjusted or rethought entirely from the basis of "what are our underlying assumptions?'
For myself, here's what I think is going on:
The position adjustment is based on the assumption that if you take the non-starters at each position and average their performance, you get an estimate of what the minimum production at position is that each team can expect to get back if they go wire trawling. That assumption is manifestly false. It also assumes that the wire average for defense is dead average and that the variation in the value of the positions is entirely driven by the offense. That assumption is also manifestly false.
WAR, in total, assumes that value is linearly additive. I think, though I have not thus far proven, that value is NOT linear...that there are significant non-linearities that explain a lot of the variability in player performance, especially on defense, where each play is a cooperative effort and at the extremes of performance, where you start to produce negative value in either your own team or the opposition.
UZR assumes that defense is eesentially eight men competing against the ball all by themselves. That assumption is clearly not right.
All value metrics that exist today assume that the parks and the strength of schedule have minor impacts on performance and that impacts are ratio multiplicative. Parks adjust scoring by X percent, etc. In fact those impacts are additive, non-linear, and net cumulative...they compound on each other in unpredictable ways.
And finally, all statistics in baseball are, as you say, backwards-looking and ignorant of contexts that cannot be or at least have not been directly measured. Is it easier for a guy to hit when he feels less pressured to carry a club? You might not find that in a study of all players...but it may be true for some subset of players.
I also believe that value metrics need to be presented in scientific notation, including the uncertainties, if you're going to think of yourself as a scientist. I think UZR has a very high uncertainty. All UZR values should be +/- a LOT until you start building a large sample. That may be why Chone Figgins' 4 WAR only happened every once in a while...his types of skills may have high uncertainty and be heavily context-driven at the same time.
In fact, I'm fairly "certain" that soft-skills WAR heroes like Gutierrez, Figgins and E. Chavez, when collected together on one roster, fail as often as they do at least in part because those WAR values are highly uncertain so you can't bank a high stoploss and the failure rate year to year is higher.
When you look at PCA defensive wins - which are more stable than UZR year to year, you see lots and lots of players with a peppering of a few great fielding seasons that could match something Ozzie Smith did...but few players who had a low stoploss for defensive wins the way that Smith did.
For an example of that problem...see Cameron, Mike. I have him as producing, literally, 5.5 defensive wins above margin (3 above average) in 2003 (!!), but his stoploss was closer to 0.5 wins above average (!).
So I think what we have here is a combination of (a) the problem being more complex than WAR makes it and (b) the WAR formula being based on a series of assumptions that clearly are not true all of the time, leading to intrinsic biases and errors that need further refinement and more scientific presentation style.
If local bloggers were more apt to think of WAR as having an error bar...they would be better able to see that some trades that seem horrible to them might work out differently than they expect.

4

I agree with just about all of that...
..........
To take one example, your remarks about WAR's assumption of replacement level.  That assumption is not only dubious; it's demonstrably wrong.  What is available to the Seattle Mariners, at SS, on 10 days' notice is --- > not what is available to the St. Louis Cardinals on 10 days' notice.  
Particularly after you factor in the "blood-brain barrier" of a AAA player's adjustment to a new (major!) league.  Often, a GM can't afford to invest that time.
You could say, go get a Ronny Cedeno, and that's RL ... but there too.  Circumstances in a given org may prohibit them from deploying a AAAA infielder for two months.
I didn't even include the RL problem in my list.  File it under 8) etc.  LOL.  And your other points under 9) etc and 10) etc.
..........
And what about the case where 8 HOF shortstops -- like a few years ago -- skew your idea of what the world's shortstops are?  If there are 15 astronauts in the world, and you average them with 15 ditch-diggers, maybe you get a 130 IQ.  Does that mean you, going after "player" #31, can expect to hire a 130 IQ?  Or should be be shooting for a 105?
..........
Definitely.  We are not just talking about items that WAR fails to capture.  We are talking about assumptions it makes that are often catastrophic fails.
............
All of that would be fine!!  ... but for the folks who try to sell it as the Periodic Table.
You've got to start somewhere, and WAR (or WS, or VORP) is a great place to start.  We're just calling for a sense of proportion -- one that GM's, in fact, do apply.

5

To the horrific Bavasi era (remember that After Leone for Third, Sully's blog was called "Fire Bavasi") and the long series of inexplicable moves that doomed the team. Players like Carlos Guillen gone, gritty veterans who know how to play the game taking their place, and the team continued to suck, while Bavasi's explanations made little if any sense.

6

Ovder the top, Doc and Matt. Over the top!
Love these two paragraphs from Matt:
"I also believe that value metrics need to be presented in scientific notation, including the uncertainties, if you're going to think of yourself as a scientist. I think UZR has a very high uncertainty. All UZR values should be +/- a LOT until you start building a large sample. That may be why Chone Figgins' 4 WAR only happened every once in a while...his types of skills may have high uncertainty and be heavily context-driven at the same time.
In fact, I'm fairly "certain" that soft-skills WAR heroes like Gutierrez, Figgins and E. Chavez, when collected together on one roster, fail as often as they do at least in part because those WAR values are highly uncertain so you can't bank a high stoploss and the failure rate year to year is higher."
It would be interesting to do a bit of research and determine just how many UZR-WAR guys a lineup could field at one time and still be viable/potent. Could you afford to have a Belanger and a batless Paul Blair? Probably doesn't work in the NL, minus a terrific staff, because you then have 3 "outs" in the lineup. In essence, can the rest of your bats make up for 3 Mendoza-line players?
And I still think that UZR is too subjective. UR depends on the leap of faith that a particular player got that ball, but that few others would. But it is dang hard to factor in positioning and jump, among other factors. A hit, however, is a hit.....not subjective at all.
The more I look at WAR and UZR the more I think managers long ago figured out that you maximize your corner bats and your SS/CF gloves and you have a neat arrangement. If your CF has + bat, too...then you're off to the races.
UZR doesn't change that......and WAR doesn't factor in WHO the replacement for a team might be. If Montero breaks an ankle tonight, his replacement isn't (probably) a replacement level player. So, to some degree, Montero's WAR is higher than it should be in real life.
I'm done, because I'm afraid I'm mucking up great posts by you guys.
moe

7

Not just WAR, but sabermetrics in general, had some really bright days in the sun when it was stamping out the last vestiges of really brain-dead Good Ole Boyz decisions...
At times it seems to wander about in search of those old days, in which it could scoff at obvious nonsense...

9

I actually am guessing that GMs learned quickly to think about the data that way in their heads at least, and most likely require it in the exec summary reports they get from their interns and assistants and such.

10

:- ) 
So first thing I'd do, as it were, would be to file an absurdly confusingly maddeningly math-heavy paper proposing your +/- figures for a few stats such as UZR games 50-100, 101-200, 201-300 ... recommending that it become the companywide convention...
BOOM SHAKA LAKA c'mon with me to Oakland, kid...

11

Though I won't be at liberty to tell you what I learned. LOL
Will be interesting to see what kind of work the Yankees' interns are doing right now and pitch a few of my own ideas on various data I wish I personally had that I know they do have ready.

12
AnyRoad's picture

Thank you Jeff, nice article, good comments. And it may be super pertinent to the Mariner’s, as I think how to incorporate statistics and quantitative analysis in general is something Jack Zduriencik has been wrestling with. Certainly Mr. Z is not treating statistical info the same as when he arrived. Belangio is gone, Tango is distanced, and Jeff Kingston is in. Kingston has a degree in Economics and supposedly has a close working relationship with Z. Some interpret these moves to mean that Jack has abandoned statistical analysis for what he is comfortable with, scouting. Perhaps. But perhaps our GM hasn’t wavered in his belief in the importance of effective quantitative analysis, but, rather, has a new appreciation for the complexity of it (at the GM level). If so, I wish Jack’s success in integrating quantitative analysis into the process. It may be the difference between the Mariner’s being a contender versus being a PERENNIAL contender. Baseball is a big and broad business now. More size and diversity => more need for effective quant.s.
Are the teams that have recognized ‘this is not your father’s baseball’ enjoying a huge edge now? Look at Tampa Bay. How is it that they are so effective in converting $s into wins? Their scouting is often mentioned as a reason, but both the owner and the GM are ‘numbers’ people from Wall Street. Perhaps the Rays are an example of a successful marriage between quantitative and qualitative analysis. It has been said that the creative companies Disney and Apple would not have enjoyed their degree of success without the tight working relationship between the CEO and a sharp CFO? Has Mr. Zduriencik found his CFO equivalent in Jeff Kingston? Questions!
If Mr. Kingston is tasked with putting numbers to Zduriencik’s investment options, he has a very interesting and very challenging job. The asset evaluations and investment decisions in baseball seem incredibly diverse and complicated.
For example, how do you decide whether or not to invest in a new Dominican Academy? Do you factor in the new CBA rule changes and the implied enhanced value of recruiting? Do you factor in projected rising revenues (cable deals etc.) for the Mariners and MLB as a whole? Another example; when deciding what the top price to pay for a Michael Morse type, do you just base it on projected wins added or do you factor in ‘Star Power’, that is, the ability to fill seats and pump ratings. What if you have to decide between investing in example 1 (the Dominican Academy) or example 2 (Michael Morse); apples or oranges? Which do you choose?
This is where I feel a good numbers man in the organization can earn their money. If they can properly describe the complexity of the market and thus make accurate valuations, the team will spend its revenue much more effectively. This hopefully leads to more wins and thus increased revenue, which then can be used to ‘purchase’ further wins. Simpler said than done!
One final slightly related comment. I sure would like to see ownership consider moving to some form of a 5-year budgeting plan in view of the projected sharp revenue increase from the impending cable deal. Now is the time to be making more investments in the future like the Dominican Academy.

Add comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd><p><br>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

shout_filter

  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.