I get the argument about LD definitions being problematic.
But, by including HR data, I think Fast may be pulling a fast one.
BABIP (by definition) does not include HRs. The advanced pitcher metrics all include some form of HR rate. So, the advanced metrics (DIPS, FIP, etc.) should all be (to some degree) capturing batted ball speed, since HRs are easily going to be the highest spike on the ball-speed chart.
So, I'm not sure how ground breaking this is. If down the line we remove HR rate and replace it with off-the-bat speed, and the formula becomes more predictive than the current ones, then Fast deserves applause. But, I think part of the original Voros paradigm was that "after excluding HRs", the pitcher doesn't have meaningful control of the outcome. That position clearly supports the notion that pitchers do have control over HRs. Voros was just utilizing HR rate as the proxy for the specific control that pitchers have "that matters".
But, I will say that if this research pans out and additional control over outcome is moved into the pitcher category - it gets there at the expense of defensive ability. If you say pitchers have more control over outcome than is currently accepted - then fielders must have less. And that kind of change would have massive implications on every WAR computation for every fielder in the game.
An ESPN heatmap of Michael Pineda's first pitches during April-May, 2011. See Michael Pineda's Pinpoint Accuracy by Lee Singer ....
..........
In a stroke of coincidence rivalling the King Humberto destiny script, CA asked on November 20th whether saberdudes still believed that pitchers do not affect BABIP. Dr. D kind of laughed that a lot of them are softening on it some...
... and then, on November 22nd, Mike Fast published a BABIP article that may be the most important saber article of the last five years. It seems to be in front of BP's pay wall.
CA: 'net rat, career scout, or time traveller from the 31st century? You be the judge.
.
Q. What did Fast discover?
A. That pitchers indeed have a lot of control over how well the batter hits the ball. We always figured that the pitcher (partially) controls the "launch angle" -- some pitchers get lots of groundballs, and some don't.
Fast proved that pitchers also (partially) control how fast the ball comes off the bat, and this of course has a lot to do with whether the batted ball turns into a safe base hit.
.
Q. How did he discover that?
A. Firstly, in 2009, SportVision started collecting MPH data off hitters' bats. They gave Fast a year's worth of it.
Secondly, Fast threw out the old paradigm that categorized batted balls this way: POPUP - GROUNDER - LINE DRIVE - FLY BALL - HOME RUN.
........
Fast argued that these categories, the way they were defined, argued in circles. About 25% of "line drives" were softly-hit (60-70 mph) bloopers that landed between the IF and OF.
These were called line drives because they happened to fall in areas that were hits - and then you argued from these "line drives" to show that a pitcher was giving up balls that deserved to be hits. Circular logic.
Also, BABIP by its very definition takes away the hardest-hit balls -- home runs -- from a pitcher's account sheet. Why should we watch Carlos Silva give up an upper-tank shot, and then say that ball doesn't count in BABIP, and then argue that we're not seeing Silva's poor pitching show up in the balls that aren't homers?
........
Fast added HR's back in, and he threw out the groundball and line drive definitions, and he asked the right question. Does Carlos Silva give up a faster batted ball than CC Sabathia does? Counting HR's, popups, "line drives," and everything?
He found that Carlos Silva gives up a much faster batted ball than Sabathia does.
If you're using a category system that proclaims Silva's high BABIP an accident relative to Harden's, then you're using the wrong system. Fast's system proclaims Silva's BABIP a non-accident.
Comments
I am now going to launch a series of experiments with my team data files. For both the PBP era and the pre-modern era (each requiring a different set of correlations.
I am aware that some sabes have already shown a correlation between K rate and BABIP...that's well established. I figured that this was becasue high-K pitchers tended to be good pitchers in other ways and because high K pitchers tend to pitch in patterns for which the defense can better prepare (they're smarter...know the hitters better, work with catchers better). But now...I think it's because of the batted ball velo. And if so, I can dramatically improve my defensive analysis (and my pitching analysis) by looking for patterns in the available statistics correlated to team DER (I am not going to go to the individual pitcher level first because that has the small-ish sample problem and the additional circular logic problem of assuming that variability in pitcher skills is correlated to the pitcher himself...need broader piles on which to base judgments).
If I can find some 30% or 40% explained ratios...they're going into my DNRA/PCA defense/pitching calculations as linear adjustments.
This,
BABIP (by definition) does not include HRs. The advanced pitcher metrics all include some form of HR rate. So, the advanced metrics (DIPS, FIP, etc.) should all be (to some degree) capturing batted ball speed, since HRs are easily going to be the highest spike on the ball-speed chart.
So, I'm not sure how ground breaking this is
Well, they "include" HR rate in a very poorly-targeted manner. It's like saying ERA "includes" doubles, or OPS "includes" walks. The question is how well a stat isolates component skills.
xFIP, for example, "includes" homers -- only to "normalize" them. To xFIP, a homer is no more and no less than ... one more fly ball.
Not sure how FIP captures batted ball speed any better than ERA captures it. Sure, FIP makes a pass at batted ball speed by saying, "well, 4% of the BIP's went over the fence." But that's as far as it goes. Not very useful, especially since sabes believe that the 4% figure was random and luck-driven.
............
The groundbreaking part of the MPH paradigm is that the IF/F, LD, and HR/FB categories are so vague and inaccurate. Actual pitcher performance is clearly blurred by the noise. When a blooper is scored a line drive, that's noise. When a 50 mph clonker is scored a GB just like an 85 mph one-hopper is scored a GB, that's noise.
What is inaccurate about MPH, though? The split-half correlations were perfectly sound.
...given Fast's results over one season's worth of data - veyr suggestive they are indeed - we need to figure out if there are other reliable predictors of BABIP in the common statistics (we don't have his velo data to work with...and even if we did, it doesn't go back beyond 2008).
But this is actually a bit more complicated than it sounds, because the park can influence BABIP and HR/Fly without the speed of the ball off the bat being any different.
So the problem I have is that I can't park adjust for K, HR, BB, and BABIP because I can't assume that equal pitching was in each park in equal measure when it comes to controlling for those factors. The standard ratio method of park factors is demonstrably a false approach, which is the entire reason I pursued a linear matrix solver approach to park factors, umpire factors, league scoring contexts and strength of schedule. So I need some kind of multivariable linear solver for this problem too...I need Fiato/Souders matrix solutions for HR park factors, K park factors, BB park factors, and BABIP park factors that account for the variability in the pitchers and batters facing each park.
As I do not have time to do that...my best efforts will not be very conclusive, I fear...but what I CAN do is control for the park...or at least control for non-extreme parks in the +/- several points of park HR factor...under the assumption that HR, K, BB, and BABIP will not vary much in the "standard" parks and thus ignore park effects entirely for now.
This is what I'm going to have to attempt here while I have a few days off for Thanksgiving.