February 25th, 2008
Figure 1: Roger Clemens ERA and WHIP vs Others, fitted curves only.
Figure 2: Roger Clemens WHIP, with fitted curve and raw data.
Figure 3: Statistical Software Analysis of Roger Clemens' WHIP versus his age.
Folks,
Recently the New York Times published an article titled, Report Backing Clemens Chooses Its Facts Carefully by Eric Bradlow, Shane Jensen, Justin Wolfers and Adi Wyner
Published: February 10, 2008.
(http://www.nytimes.com/).
This article is a critique of the “Clemens Report,” (http://www.rogerclemensreport.com/) which is a analysis of Roger Clemens pitching career by Hendricks Sports Management. The brief critique of Bradlow et al, discusses the use of the metrics earned run average (ERA) and walks plus hits per inning (WHIP) versus the age of pitchers to evaluate Roger Clemens’ performance versus his age. The article favors WHIP as a better performance predictor. It concludes that Roger Clemens is unusual in that his WHIP metric improved later in his career. Whereas, for most pitchers, this metric becomes worse with age after about age 35. (A lower number for both metrics is better). The graphs in the Times article are shown below as Figure 1.
In the article, the authors conclude:
One of the authors, Justin Wolfers, published the raw data and the fitted curve for Clemens’ WHIP at http://freakonomics.blogs.nytimes.com/. It appears below in Figure 2.
A quick glance at the data would cause even the least skeptical to ask, “Is the curve a good fit to the data?” In the original Times article the authors did not publish this raw data and have not published curve fitting correlation coefficients.
I carefully read the values of the data points from the chart and analyzed the data with two well-known and widely-used statistical analysis software packages.i The results are in Figure 3 below.
In Figure 3 we see that R-sq (R2) equals 4.4%. Roughly speaking, this means that only 4.4% of the variation in the data describe the curve. Therefore, a full 95.6% of the variation is in the randomness of the data (i.e the “scatter”)iv. Such a low R2 value makes the curve almost meaningless in describing the data. By examining the raw data, even from a commonsense perspective, one sees the difficulty in fitting such data with a curve. In one year, from age 24 to 25, WHIP drops about 25%. From age 33 to 34 it increases nearly 30%. Yet over the entire curve from ages 23 to 46 the maximum variation is less than 10%.v Statistical analysis also helps us by providing confidence when answering the question, “Is there a correlation between age and WHIP?” The answer: we can only be 38% confident that there is.vi Hardly the stuff from which to draw strong conclusions, especially when one considers that we can be 50% confident in the result of a coin toss!
In studying Clemens’ available pitching performance metrics, there is not enough statistical confidence in the fitted curve to enable us to make any strong conclusions about whether or not “some unusual factors may have been at play in producing his excellent late-career statistics.”
- i. Excel® and Minitab®
- ii. Minitab
- iii. Explanation of the confidence intervals in the graph. The 95% CI (confidence interval), shown in the broken line with large segments, gives us the statistical error for each data point with a confidence of 95%. Note that the 95% CI varies from about 15 to 30%, for each data point, which is much more than the maximum variation in the fitted curve’s < 10%. The 95% PI (prediction interval), shown in the other finer broken line, answers the question, If Roger Clemens repeated any year (let’s say when he age 30), given the variation of the data, what would be the prediction limits of that repeated year with 95% confidence. In looking at the 95% PI for age 30, we see a low of about 0.9 and a high of about 1. Although impossible to do in this case (i.e. Roger Clemens can’t become 30 again and repeat that baseball year), this type of analysis is possible when measuring something like the hardness of a razor blade under certain processing conditions. Both the 95% CI and 95% PI suggest much “scatter” or randomness in the data, which thwart strong statistical conclusions.
- iv. Technical note: When a curve is almost horizontal, it is difficult to achieve a high R2 value. This curve is close to horizontal. To see if this is a concern for this curve, I generated data alternatingly +/-1% above the curve and obtained a respectable R2 value of 85%. Hence, the Clemens data R2= 0.044 being low is a valid concern.
- v. The authors don’t discuss the most interesting part of the data, the dramatic drop in WHIP from age 38 to 44 of almost 50%. Although I’m not suggesting that this is a reasonable thing to do, fitting just these data alone gives an R2 value of 83%.
- vi. Minitab analysis.
Joel:
I agree that for Roger Clemens’ data, there does not appear to be much, if any, evidence (based on the R**2 value) that Age has any kind of predictive qualities for Roger. That said, the NYT graph shows a strong curve for the “typical” pitcher…Roger Clemens showing no relationship may then be sufficient enough evidence that he differs from typical pitchers (possibly from artificial means).