Seattle Sports Insider

Bill James' Strong Season Leading Index

Posted by

11/20/09

Thanks to Geoff Baker for linking us to Bill James' system for predicting "UP" ballplayer seasons.

I'll use this system heavily in roto. I expect it to be my biggest "secret weapon" since the days when Ron Shandler used to be secret.

Q. Why a new UP/DWN system, considering that PECOTA already issues the 75%, 50%, 25%, etc. probabilities?

A. An Excel spreadsheet will calculate the math perfectly, of course ... AFTER a human programmer has told it what he thinks defines a "comparable player."

Underlying PECOTA's calculations are the arbitrary opinions of a Baseball Prospectus' employee's opinion of what makes two ballplayers comparable. And, especially, that employee's opinion of how those 12-14 factors should be weighted against each other.

Who is Carl Crawford's best comp: Johnny Damon or Chone Figgins? Are BB more important than SB? If BB and SB each comprise 1/12 of the formula, why are both weighted the same?

PECOTA is cool, but the assumptions underlying it are very, very simple and very, very arbitrary. Take it from a chess-AI geek. It's a good system, and a very useful one. But if you're thinking PECOTA's simple little forecasts are From On High, beyond question, think again. :- )

..............

Bill James' system has -- as far as I can tell -- vastly more intelligent human intuition applied, in terms of the underlying criteria used to define "comparable player." (I know, I know, the player is mostly compared to himself; that's a whole other subject. Bear with me.)

...............

PECOTA, at least a few years ago, picked a dozen criteria out of a hat and threw them all into the mix (equally, at least at that time) to find "most-comparable players" who were, at best, roughly comparable.

Bill James is a 60-year-old historian who has spent his life comparing baseball players to each other, and he applied his finely-honed judgment to come up with criteria that he felt mattered most.

If James' underlying comp-criteria are better, then his system will be more accurate.

Q. Is there any evidence for believing that James' system will accurately forecast UP and DOWN seasons?

A. The money chart is in this article, page 83. Take a look.

If you want a system to show more accuracy than that, you'll have to call the 31st century. Bill would probably be the best guy to call about whether that is do-able, too.

Q. What does it mean to have an UP season?

A. It means that you had the same* OPS as the year before, and at least 80% as many at-bats.

So bear in mind, if Russell Branyan is forecasted for a DOWN season, that would become the case IF he had (1)

For example, if Branyan hit .260/.345/.495 in 145 games with 42 homers and 105 RBI, that would be a DOWN season per James' system.

Q. Does that make any sense?

A. Yes, because if you take 100 players having DOWN seasons, only a few are going to have the type of "in practical terms, UP" season above.

Q. Is it "fair" to take 7 or 8 criteria and kind of mold them like Silly Putty until your prophecies come out accurate?

A. This is "fuzzy logic." It's state-of-the-art for dealing with math problems that can't be captured otherwise.

For example, I want to know whether the phrase "Stunning" sells more eBay items -- for jewelry? For dresses? For black dresses? I can't isolate the variables.

So what I do, I take one phrase -- "Stunning" or "Throws Left Handed" or "Born In Peurto Rico" and I simply track that to see if gains or losses are reflected.

Then I cascade other search phrases -- "No Reserve" or "Versace" or "95 MPH" or whatever -- and form a 2x2 grid, then a 3x3 -- going back and changing "Spectacular" for "Stunning" and so on. (This actually is how Cindy and I built a PowerSeller business.)

Trial-and-error "points systems" -- in which you assume values in columns A, B, C and proceed on those assumptions -- are often the starting point for capturing variables that can't be captured algebraically. You start molding the data like Play-Doh until you're, Eureka!, coming up with some predictive validity.

............

Bill wasn't throwing spaghetti against the wall. He took a bunch of criteria that he knew were important, and he experimented until his hypothesis was matching up beautifully to baseball history.

He'll probably improve and revise his system in the future. Right now I'm guessing that his system is a big step forward. I'll certainly be using it a lot in Roto.

Q. How might the system be improved?

A. First question I would ask Bill: how do you think it would affect your system, if you added the John Benson criterion "had a great/terrible second half"?

James' system doesn't do that. And if we just plugged in 1H/2H, the predictive validity might collapse. So we can't just bolt it on.

But tell you what: you got a roto draft pick who gets UP from James, and had a great 2H, you got a sleeper pick, bro'.

I get the hitters who passed both, you get the hitters who failed both, you'll lose. And you won't even know why. ;- )

Cheers,

Dr D

Comments

My proposed alterations.

Supposing I trusted this kind of fuzzy math (I don't)...the first thing I'd do is drop the nonsense with R and RBI. A player certainly has some control over his average RBI and R production, but year to year fluctuations? No. Sorry.
I would also adjust the BABIP section so that it did not just account for HRs but speed score as well. It is known that speedy guys hit for a high adjusted BABIP for obvious reasons.
Finally, I think his age section needs to be position-specific. If you're a catcher, the odds of decline begin increasing by age 30. If you're a first baseman...it's more like age 34 or 35...if you're a DH...more like 38. This needs to be accounted for.

What is the need for "trust"

Considering the predictive validity shown on page 83?

Remember that with R and RBI

James is capturing, among other things, big increases in playing time.
When a player gets a big increase in PT in Y+1, James is positing that he's less likely to make a big stride in Y+2, than if he'd been playing a lot before that, like Jose Lopez has, for instance. Otherwise, you're asking for two consecutive big leaps (in absolute value) in two years.
Whatever the skepticism towards R and RBI, a guy who goes from 50-50 to 105-105 has consolidated some things -- even if it's only the ability to withstand "exposure," to hit all pitchers. Usually a plateau leap comes after a period of stagnation/consolidation.
That seems very sensible to me, and as a completely separate issue, it WORKED well.

And before we start adding umpteen refinements

Let's remember the Zen imperative. :- )

Adjusted BABIP

Matt sez,
I would also adjust the BABIP section so that it did not just account for HRs but speed score as well. It is known that speedy guys hit for a high adjusted BABIP for obvious reasons.
Good call. I sort of wonder why James didn't do that. Would like to hear his reply.
My guess is that he'd say he only wanted a small adjustment "capturing" the ability to hit the ball harder, but I dunno... you'd think if you were using Adjusted Ball In Play or whatever for a variety of applications, you'd want to do exactly as you said.
........
Also, there's little doubt that making the age factor position-specific would improve it. Zero chance that he didn't consider that, being probably the original guy to point out that Catchers' hitting arcs are right-shifted. I wonder if he'd just plead Zen on that.

As it applies to Russ Branyan

Remember that the question at hand is, "can he PROGRESS in 2010 even more."
In 2009, the progress was to play every day at the same level -- to hit just as well despite facing LHP's.
The system sees him as a guy who went from 30 RBI to 70 RBI. The system is now very skeptical that he'll add large gains to what he just now added.
I dunno. Do *you* expect Branyan to go, say, .290/.390/.570 in 2010? :- )
.......................
Considering that James' system predicts UP/DWN seasons with such stunning accuracy -- the fact that it deploys R and RBI to get there, is itself one piece of evidence suggesting that R and RBI have weight.
If used parsley, sage, rosemary and thyme to predict the weather, that would in itself be a piece of evidence that rosemary was weird. :- )

If youw ant to capture PT...

...then rate PLAYING TIME! Not garbage stats with no meaning like R and RBI delta.
As for zen and "results"...our current general circulation models do a great job simulating the annual fluctuations in temperature experienced by our planet in the 20th century...that doesn't mean we should trust them to predict planetary temperatures into the 21st century. I am not one iota sympathetic to that kind of argument. I spend too much time watching GCMs completely fail to predict climate in the last decade to buy into past-correlation = future ability.
Now, I do understand his comments on making the thing as simple as he can...but it's not adding much complication to switch from RBI/R to PA/G and GS. And it's not asking much more to get him to include positional aging or speed in his BABIP. Heck, you can just steal fangraphs' xBABIP which includes a bunch of things that have been shown to improve BABIP over time (LD%, HR rate, speed, high GB/FB etc).

I also think...

...that it's incorrect to count progress only in terms of production RATE. I think Branyan is a good candidate to add more playing time at the same production rate...a good system should see that as UP...not DOWN.

Run & RBI

While I understand the urge to ignore the R and RBI categories, Matt ... there is something that they capture beyond physical ability. They capture BOTH playing time and deployment choices of previous managers. We all understand that lineup placement is more critical for RBI and Run totals than simple ability. Leadoff hitters score runs and don't get RBI -- 3/4/5 guys get RBI.
But, for good or ill, the vast majority of baseball choices are very, very similar. The managers are an inbred species, who often make choices based far more on what they are looking at then what the numbers tell them. If you get a 6'4" 230 pounder with speed, you simply won't see him leading off in most cases. So, I could see R and RBI being indicators that capture PT and deployment likelihood, (which, of course, impacts production).
Ultimately, however, I think it is simply a case of understanding that you're going to appeal to a much wider audience if you include the most familiar stats in the mix. There are many, many players that still insist on the value of R and RBI ... and will dismiss any system that opts to ignore them. Even today, the vast majority of ROTO leagues are based on which stats? R, RBI, HR, SB. It's a big leap from F = MA to E = MC^2 --- non-scientists need time to catch up. And James has always been a leader in the industry NOT because he's the best scientist, (he often notes this himself), but because he is a master of communicating the science to the masses. Opting to include the classic stats is a part of that communication.

Good 'puts gentlemen :- )

Just to letcha know, Bill would bristle at the idea that he was including R and RBI for the benefit of readers who like the stat.
James quote: "when in doubt, think runs -- R, RBI, ERA, runs anything." James values R and RBI more than most sabermetricians do, and not because he's unaware of the large random factors involved. He was THE man to point out that ERA and RBI can be very misleading. Literally, it was Bill James who taught the world that it's wrong to judge pitchers primarily on ERA and to judge Entitled Veterans on their RBI.
He wrote the book (literally) on what's wrong with ERA and RBI. But it's not right to treat those stats as though they contain zero information, either.
.................
All of us have a different 200 of 1000 light bulbs on, some overlapping, and I believe that a few of James' pertain to R and RBI.
.................
Of course James has RATE stats imbedded in his prediction formula; R and RBI are the main category that captures volume.

Do either of you have any comment

On the predictive RESULTS that James gets with his formula (page 83)?
Would you concede that as a ballplayer scores more points per James' formula, that he is more likely to have good year?

No. See the comment above.

I already commented on that question. I reject the notion that a tool that works for one time period of data is guaranteed wo work well for the next time period. For a very specific reason I already highlighted.

The 'time period of data'

was this:
Using those initial assumptions, I figured a “20 point indicator score” for every player in major league history who had 400 or more plate appearances in a season. Some people had to be thrown out of the study for one reason or another, of course—probably for making out on the school bus—and I wound up with a field of a little more than 16,000 players who could be studied. - p. 78.
............
Are you saying that Bill's system works perfectly for 1876-2009, but that you expect baseball to change now?
That his system was fine for predicting 2006-to-2007 transitions, but you don't like its chances to be accurate for 2009-to-2010? Is that what you're saying?
If that's the case, why put any credence in any statistical measurement of performance over the period 1876-2009? Maybe OBP will not be predictive over the next time period, either.
..............
I missed the "specific reason" that you grant James' system's accuracy for 1860-2009 but disallow its relevance for 2010. Could you re-state your "specific reason" for this skepticism?

Doc...do you read the comments?

I ask again...because my specific reason is like...right above this string of posts.
I work in atmospheric sciences. The way climate forecasting is traditionally done is through general circulation models which are callibrated with real weather history...going back at least 100 and sometimes 200 years. The models all do a fine job of recreating the year to year variability in temperature and precipitation over that long recorded history of ours...but that doesn't mean we should be trusting them to get the future planetary temperature right. In fact, models that were quite well callibrated to reproduce the weather leading up to 2000 or 2005 even...have already significantly BUSTED on their global temperature forecasts for the years that have followed...as models all call for increased temperature, the planet is COOLING over the last several years in the means.
I have therefore seen firsthand the dangers of trusting a metric which is callibrated to historic data to accurate predict the future. You ask...why should I trust ANY metric? You should trust the ones that are grounded in sound logic that is irrefutable. OBP will always correlate with run scoring...the logic is obvious as to why. James' metric is not grounded in consistent logic. It's numbers thrown together that sort of kind of measure a difficult to define pattern in the hopes of maybe giving you a heads up on some kind of future trend...a future trend which is not even correctly defined by James (if you get 400 ABs and OPS .900 one year...then get 700 ABs and OPS .895 the next year...that's an UP year...not a DOWN year...he starts with the wrong gosh darned premise!).
The GCMs that are already busting on their planetary foreecasts are based on similar mushy logic and a whole spate of very rough estimates of parameters that could be off by orders of magnitude. There's a reason climate skeptics exist.

Problemo there

Is that there are underlying geological & meteorological phenomena that cycle over very long periods of time.
Your analogy is equivalent to a situation in which James had measured the period 1904-1909 and then applied that to today.
In order for the meteorological historico-predictive models to be analogous to James', they'd have had to have gone back not 100-200, but a minimum of 25,000 years.
But then, you knew that.
.............
James' system obviously captures the "world" going back to the cooling of the oceans. What is going to fundamentally change in baseball players' performance fluctuations, that is analogous to the cycling of ice ages and warming periods?!
Baseball isn't about to be hit by a comet, in the way that human beings have UP and DWN seasons. You know that.

You're assuming...

...that baseball doesn't have long running cycles of change too...that the game is static. It's not static...we talk every day about how managers and GMs are changing to adapt to inproved medical, psychological and statistical understanding of the game. That's a very slow process so you won't see 2010's predictions veer way off course if you accept James' incorrect definition of what he's measuring as you apparently have, but the game does evolve.
And BTW, you're not correct about the main source of error in the GCMs either. The big problem with these models is that they assume the recent warming is caused by CO2 (that's where the climate forcing terms come from that cause the models to rapidly warm into the 21st century) when in fact there's a much stronger correlation between global temperature and the combined effects of total solar irradiance, the Pacific Decadal Oscillation and the Atlantic Multidecadal Oscillation...none of which are correctly modeled by the GCMs...and all three of which have DECADAL...not geological time scales for variability...the same kinds of time scales over which baseball changes.
But thanks for the "but you knew that" condescencion...it was AWSOME.

How about some more of the same...

You, personally, are as intelligent as anyone in cyber-Seattle. This silly "wiggle room" plausible-deniability argument is not.
It isn't IQ that allows us to judge the merits of a new hypothesis. It's objectivity.
.............
James creates a system that predicts UP/DWN seasons from 1876-2009, with stunning precision, and your response is "well, maybe baseball will be different in 2010 ff than it has been so far. After all, the weather changes."
Very seldom an argument at MC/SSI hits me the way that one did.
I'll give you the last word. :wanders off:

I guess if you're going to be rude...

...you might as well go all in and be rude, dismissive, inaccurate in your depiction of my argument, and evasive to my actual points. You've got the whole package today, babe...nicely done.
Just in case you're wondering where I get that...my point was never that I expected James' system to be ineffective in 2010...at least no more ineffective than it already is given its (I'll say it again) *WRONG* definition for what constitutes an UP or DOWN year. My point was that arguing "look, I made a purdy correlation!!!" is not proof that you're doing something useful. Not unless you can logically explain - with data that fits what you're trying to accomplish! - why that correlation exists. That's how science works. I use an exapmle from my real life in the sciences to illustrate why callibrated correlation is not sufficient to convince me of the utility of a metric and you talk down to me like I'm a 3 year old and completely ignore my point. That's not like you, Doc...and frankly...I'm extrmely disappointed in your behavior. Maybe you ought to think about what tone you want the rest of this conversation to follow. (there...how does it fell to YOU when I start talking to you like a child, hm?)
Throwing some numbers together isn't new...fans have been doing the kind of stuff James does for years. The difference is...James is a great writer and so has won himself a big money job, the backing of the wealthiest stat-tracking firm in the game for publication, and a massive audience. You act as though James is the only one who ever thought of this kind of toy...and that's all it is...a toy...he's not. He's just the most visible.

A sure thing

SSLI will miss on prediction about Ichiro 2010.
It takes in account only sabre data. That is only 400 of 1000 bulbs.
On the other hand we know that Ichiro had bleeding ulcer in SP.
We assume that he was effected heavy by ulcur at WBC.
My friend PhD who is an ulcer specialist things that Ichiro suffered from ulcer in 2007 and 2008 too. (a bleeding ulcer does not develop overnight....)
So Ichiro had a good 2009 season because his ulcer was cured after his first DL.
I predict a strong Ichiro season for 2010.

Wow -

so your expectation is ? for Ichiro's performance in 2010?

As promised

Yours will stand as the last word. :- )

As good as 2010. At least

As good as 2010.
At least Ichiro's stomach does not ache in pressure situation.

The future itself is unpredictable.
But anyone can have an opinion.
At the end of 2010 season you know who was correct.

In this case the shades of difference may be subtle

A hitter can lose only 20 OPS points per James' system and be categorized as having a DOWN year.
So, for instance, if Ichiro bats .340/.375/.450 with 240 base hits -- better than his career avg -- he will *technically* become an accurate prediction for James.
...................
Bear in mind that James' system is -- in spirit -- trying to capture groups of players who significantly improve on their previous seasons only, not on their career averages.
Ichiro just had, in essence, his career year, and James' system is merely saying that he's very unlikely to perform a lot better than his career year.
....................
I think you both will be correct in spirit: Ichiro will more-or-less sustain his HOF performance, and James will be right that Ichiro is 90% unlikely to hit better.

I agree that Ichiro is

... healthy, interested, and in a positive state-of-mind for 2010.
I expect him to repeat his 2009 performance for 2010-12 at least. In my opinion, his 2006-08 results suffered a bit from the fact that he psychologically is a poor fit for a team that does not play to win.
...........
No argument about the ulcer, either. For a person who plays with high "chi", a very energized physical state is important.