FB501: Advanced Sabermetrics

05/20/2007 8:23 PM - 

FantasyBaseball.com University Series
Contributed By: Bryan P. Douglass
Special thanks to Ray Flowers for his assistance with this article

FantasyBaseball.com University has brought you many informative and entertaining lessons thus far, but the time has come to step it up. During your Senior session, lesson FB401: Sabermetric Principles brought you the University’s first insight into the world of sabermetric analysis. We looked at the history of these methods and evaluations as well as how those objective views on the game of baseball are being incorporated into roto leagues around the world. Many of the statistics and concepts we discussed in that lesson were heavily rooted in the traditional principles of the game and are easily recognized by the typical baseball fan.

As we enter the Master session of your FantasyBaseball.com University experience, we will take a deeper plunge into the sabermetric view of the world of player evaluation. We’ll delve into the more complicated theories and concepts utilized by the sabermetric enthusiasts to assess the value of the baseball player, and we will discuss how these philosophies can be used to the advantage of the fantasy baseball competitor.


We brought a general history lesson to the table in FB401: Sabermetric Principles. Bill James, a former security guard, spent much of his time questioning how the traditional minds in baseball were assessing the value of the players they were putting on the field, in the farm systems, and in the Hall of Fame. He developed formulas and appraisals with the objective of producing values for these same commodities based in more objective elements. James believed if a person wanted to determine the true worth of a player, one must eliminate, as much as possible, all of the aspects of that player’s game often controlled or influenced by the actions and performances of others.

James also illustrated how the concepts and theories applied by so many throughout the history of baseball had taken the value of the game away from its true source. The typical baseball fan had been raised to believe batting averages, earned run averages, and other traditional statistics were the controlling factors in determining a player’s effectiveness. James questioned these views and put forth one simple question: isn’t the goal of the game to score runs? In an attempt to simplify the process, James used this view to formulate a unique vision of the game based on the premises stating (a) players are better evaluated when viewed as a singular being rather then letting the actions of others influence those evaluations, and (b) players should be evaluated on their abilities to stimulate the production of runs as well as their abilities to act as a detriment to the opposition in their efforts to do the same.

With this in mind, let’s take look at the offerings from Mr. James. We covered the basic principles in the first sabermetric lesson, so let’s turn our attention to more complex theories and concepts of the sabermetric world.

Statistical Categories, Theories, and Definitions

For this particular piece, we will discuss the advanced statistical categories and theories employed by the sabermetric enthusiast. We will take a look at fantasy applications after we review these measurements and how they are derived.

On-base plus Slugging Percentage: Often abbreviated as “OPS,” the formula is just as it sounds. First, you must calculate the player’s on-base percentage (OBP). Add this number to the player’s slugging percentage (SLG). The number is considered a reflection of a player’s ability to gain bases.


Runs Created: Often abbreviated as “RC,” Runs Created is a statistical expression invented by Bill James in an effort to estimate the number of runs a hitter contributes to his team.

James invented several different formulas to determine this number. In the first formula, add the number of hits (H) to the number of walks (BB), then multiple this number with the player’s total bases (TB). This result is then divided by the total of adding the player’s at-bats (AB) and walks (BB).

RC = (H + BB)(TB)/(AB + BB)

Another accepted formula producing the same result states the player’s on-base percentage (OBP) is multiplied by the player’s total bases (TB). James also states this statistic can be expressed by multiplying a player’s on-base percentage (OBP), slugging percentage (SLG), and at-bats (AB) to achieve the desired value. James has also composed other formulas for this statistic often considered to be more accurate.

RC = —————————————————————————————————
                                       (AB + BB + HBP + SH + SF)

Essentially, James wanted to temper a player’s production with a consideration for his playing time, resulting in a number representing a player’s level of production. Runs Created is often expressed as a rate stat, meaning the number is associated with a number of outs. For example, it is common for a sabermetrician to express this number as RC/27, as 27 is the number of outs per team in a standard 9-inning baseball game.

Pythagorean Expectation: Another invention of Bill James, Pythagorean Expectation is a formula used to estimate how many games a team “should” have won based on the number of runs they scored and allowed.

Essentially, the result of this formula is expressed as a “Win%,” and this number is derived by squaring the team’s Runs Scored, then dividing this number by the squaring of the team’s Runs Scored added to the squaring of the team’s Runs Allowed. Once calculated, this number is expressed as a percentage and can be multiplied to a number of games to compute how many of those games will be won by the particular team. (NOTE: many sabermetricians feel the results will be more accurate if, instead of SQUARING the numbers, those results are actually calculated to the power of 1.83; the formula below is expressed in these terms)

Win % = [RS^1.83] / RS^1.83 + RA^1.83

This theory has proven to be very accurate, though James admits there will be deviations based on, as he identifies it, luck, the quality of the team’s bullpen, and the situation in the game when runs are scored.

Total Player Rating: Often abbreviated as “TPR,” Total Player Rating is method invented by Pete Palmer, printed in a series of baseball encyclopedias titled Total Baseball, for measuring a player’s true value regardless of position, team, or the time in history in which they play.

TPR is computed by assigning each event occurring in a baseball game a value in runs. At the end of said game, every player will have a rating in Batting Runs, Pitching Runs, and Fielding Runs. These numbers may be adjusted for the particular ballpark and the player’s position, and the total of the numbers is divided by 10 (the number of runs determined to represent 1 “game”) to result in a number that is then compared to the “average” baseball player. This number is normally expressed in terms of this comparison.

For instance, a player of high stature may have a TPR of “10 games more than the average player,” meaning this player’s total of the Batting Runs, Pitching Runs, and Fielding Runs, after being divided by 10, resulted in a calculation of 10. In the history of sabermetrics, this method of statistical evaluation is considered by many to be antiquated.

The modern statistical mind will note this method tends to award players for events occurring in a game for which they have no control (such as RBIs, often awarded to a hitter despite the fact that hitter has no control over whether the player batting before him was able to get on base). Many sabermetricians also feel TPR is overlooking the importance of fielding. As a result, sabermetricians have spent much time developing new methods of achieving the same desired result with different methods, resulting in alternate means of similar evaluations such as Equivalent Average, Value Over Replacement Player, Win Shares, and a method extremely similar to Palmer’s equation but employing a new system of components known as Super Linear Weights. Despite the problematic nature of the original method of calculating TPR, it is recognized by many as the first widely-accepted sabermetric stat.

Equivalent Average: Often abbreviated as “EqA,” Equivalent Average was invented by a man named Clay Davenport with the goal of expressing the production of a hitter regardless of the ballpark or the league, and is formulated to essentially mirror the view of batting average (average MLB players with an EqA over 300 are considered above average performers, while hitters with an EqA of 230 or lower are consider below average).

Davenport formulated the equation for EqA as such. First, multiple the total of the total walks and the total hit-by-pitch by 1.5. Add this number to the total hits, total bases, and total stolen bases. We will call this total Number A. Next, we will divide the total stolen bases by the number 3, and we will add this number to the total at-bats, the total walks, the total hit-by-pitch, and the total caught stealing. This will give us a result we will call Number B. Once these results are calculated, dividing Number A by Number B will give us the REqA, the Raw Equivalent Average. This number is then “normalized” to account for the league to create EqA.

                (H + TB + 1.5*(BB + HBP) + SB + SH + SF)
REqA = ————————————————————–
                  (AB + BB + HBP + SH + SF + CS + SB/3)

Davenport feels this formula will give the user the most accurate and equitable estimation of runs scored for a player based on team and league statistics. Also worthy of note, the resulting number is a close model to batting average, making its utilization and comprehension simple for the typical baseball fan.

This statistic is seen as a highly valuable commodity by many baseball fans as it offers an objective “projection” (sabermetricians often cringe at the use of the word “projection”; this measure is actually a direct conversion of current statistics, and it is not necessarily a “projection” of a player’s major league performances-to-come, though they are often used for those applications) of major league potential from minor league hitting statistics (as stated, the raw statistics used for this value can be adjusted for park and league). James later developed a metric equivalent he termed Major League Equivalency.

Value Over Replacement Player: Often abbreviated as “VORP,” the goal of this value is to determine how much a player contributes both offensively and defensively to his team in comparison to a fictitious replacement player that is typically assigned an average fielding value for his position and a below average hitting value and assumed to be allowed the same percentage of team plate appearances. The replacement values are set as such to properly value them at the “replacement level,” or the level of performance an average team can expect when trying to replace a starter at minimal cost. This statistic is also adjustable depending on the ballpark, league, and position in which the player is performing.

This statistical definition was invented by a man named Keith Woolner. Woolner has written extensively on the subject, and the calculations he has undertaken to define the VORP of a player are best defined as “strenuous.” He submits a full report of his findings and calculation results every season, and his findings often result in minor revisions for achieving VORP every season. In short, Woolner will list players by position and then proceed to research the statistics of these players in an effort to correctly determine the value of the “replacement level.” Once this is determined, the resulting calculations for each individual player can be compared to this result, giving us the VORP for that player. Woolner has performed these tasks for several years now, and his annual results (along with explanations, definitions, accessory calculations, and other various details) have been published in Baseball Prospectus, an annual publication dedicated to the sabermetric views and analysis of the world of baseball.

Many sabermetricians feel VORP is an excellent tool for assessing a player’s overall value with a single measure. It accounts for both the amount of the player’s performance while also integrating the quality of those performances.

Major League Equivalency: Often abbreviated as “MLE,” Major League Equivalency is a formula invented by Bill James with the goal of converting minor league statistics into major league conversions. It is important to note the results are not necessarily considered “projections.” Simply stated, MLE is a conversion of a player’s current statistics. These calculations are adjusted for the league as well as the team.

Generally speaking, James feels this conversion works well with Triple-A statistics, but the quality of the results may worsen as you digress from the Triple-A level. This statistic is purely an offensive tool, addressing only the batting performance of a player. Others have formulated similar applications for pitchers, but in general their value is not as great as James’ version for hitters.

For various reasons, James has never published, in full, his formula for determining MLE. He has provided calculated results in the past in various locations and publications, but the exact formula he employs to achieve this results remain a mystery.

Peripheral ERA: Often abbreviated as “PERA,” Peripheral ERA is a pitcher’s earned run average (ERA) as estimated from his peripheral statistics. Peripheral statistics are, essentially, traditional statistics corrected for a league offensive level, ballpark, and/or team. The corrected results are also calibrated to a number resulting from a calculation producing what is commonly referred to as “an ideal major league,” allowing the results to maintain accuracy while providing a number that is easily used by the typical baseball fan.

The goal of PERA is to value a pitcher with a calculation that is less subject to luck than the traditional calculation of traditional ERA. Many sabermetricians feel PERA is also a much more accurate indicator of a pitcher’s potential ERA for the future as it is heavily rooted in statistics directly resulting from the performance of the pitcher rather than the performance of the defensive team supporting that pitcher.

Also of worth to the fantasy baseball player, while PERA results may not differ dramatically from the traditional numbers of a major league pitcher, these methods may produce remarkable differences when applied to minor league pitchers, thus allowing users to see past anomalies and other sources of discrepancy that might result from traditional calculations.

Defense Independent Pitching Statistics: Often abbreviated as “DIPS” and commonly referred to as DIPS ERA (or “dERA”), this sabermetric statistic is a measure of a pitcher’s effectiveness based on three statistical categories which are directly controlled by his performance: home runs allowed, strikeouts, and walks. It is important to note that while it is obvious the hitter has an effect on these results, these categories are seen as those “directly controlled by his performance” due to the inability of the fielder to effect the outcome.

A man named Voros McCracken is given credit for the invention of this statistic. Following a long and arduous process in which every MLB game from 1973 to the present was studied, play-by-play, the following formula was derived for properly calculating this statistic: multiple the total home runs allowed by 13, multiple the total walks allowed by 3, and add these numbers together. From this number, subtract the total number of strikeout multiplied by 2. This value is then divided by the total innings pitched, resulting in the final DIPS value.

DIPS = [13*HR + 3*BB – 2*K]/IP + 3.70

Because this calculation often results in a number that does not closely resemble the typical ERA value for a pitcher, many sabermetricians will often add the number 3.70 to this DIPS value in an effort to so. It is important to note this statistic is not considered useful for pitchers taking the approach of the typical “groundball pitcher,” or a pitcher putting emphasis on forcing hit balls easily turned into outs by the fielders.

Many sabermetricians like to employ a formula affectionately referred to as the Down & Dirty method they feel results in a reasonably accurate result:

             (IP*2.35) + (H*0.805) + (HR*10.76) + (BB*2.76) + (K*1.53)
DIPS = ——————————————————————————-
                  (IP*0.712) + (H*.244) + (K*0.096) + (HR*0.244)

Secondary Average: Often abbreviated as “SecA,” is essentially a sabermetric revision of the traditional batting average. For SecA, the traditional ratio of hits to at bats is still used, but the additional bases gained from extra base hits, walks, and stolen bases are also incorporated. This results in a number expressed in the same terms as batting average, yet higher in value.

For example, a player with a SecA of approximately .500 is well above average. However, the average baseball player’s SecA value is similar to batting average by comparison (typically in the .250 to .280 range).

The formula is simple: subtract the hits from the total bases of said player, add the walks and stolen bases, and then subtract the number of times said player has been caught stealing. Then, divide this result by the player’s total at-bats.

Like many sabermetric calculations, SecA is an attempt to account for the total effectiveness of the player, not just the production resulting from base hits.

Win Shares: Win Shares may be the most notable and valuable contribution Bill James has made to the baseball statistic community. The process of calculating a win share is so large, James authored an entire book on the category! The formula is extremely complex and some of the components are rooted in arbitrary numbers and, for lack of a better term, educated guesses. In the interest of time, we will discuss this stat without delving into the detailed particulars of the calculation itself.

Simply stated, a Win Share is a means of considering the statistics of a player, viewed in the context of that player’s team, then using this information to assess a single number for this player which represents his contributions for the season. Every contribution by a player is taken into account, and each statistical component is adjusted for the ballpark, the league, and era in time in which the player is or has performed.

James defines a win share as 1/3 of a team win. Therefore, if a team wins 70 games, each player on that team will share 210 win shares. Players cannot be awarded with negative win shares according to James, though some sabermetricians feel negative win shares are necessary to perform the process properly. The process of the calculation is a top-down approach, starting with the number of games a team won, continuing to the distribution of the credit to each individual player based proportionally on their statistics. Pitching and defensive contributions receive 52% of the total win shares while hitting contributions receive 48%.

The hitting contributions of the equation are based on runs created (RC). The arbitrary numbers and educated guesses then enter the equation as one must determine the amount of the pitching credit that is distributed to the pitcher as well as the amount of the pitching credit that is distributed to the fielder. Pitching contributions are based on runs prevented (the pitcher’s version of RC). Fielding contributions are based on a series of educated guesses as well as a selection of traditional defensive statistics.

It is important to note the intention of this evaluation. Win shares are a representation of a player’s value rather than a player’s ability. Some critics believe this process is flawed as it awards the player based on team wins, and therefore some players will receive more win shares because their team will exceed the expected number of wins as determined by Pythagorean expectation. Likewise, players on teams failing to meet expectations are penalized with fewer win shares. Therefore, it is crucial for those employing win shares to evaluate a player to view this number in the context of their particular team. It is also important to note James addressed this issue in his book on the subject, stating the process does not discriminate against players on below-average teams, stating the context of the evaluation is extremely important but the final results of the process should not penalize a player in a less-than-optimal situation.

James developed this system as a tool to assist the objective comparison of two players from different eras, playing in different environments, and playing for different teams. The book he authored on the subject, appropriately titled Win Shares, has been widely accepted as an excellent companion piece to his earlier works, and the topic continues to spark debate amongst those in the sabermetric community when of proper calculation of this statistic is discussed.

Fantasy Baseball Perspective

The typical sabermetric fantasy baseball league is not likely to incorporate most of these statistics and theories into their scoring systems. The complex nature of these calculations alone makes them rather difficult to work with, especially for those in leagues where the scoring and points are totaled at the kitchen table with a set of box scores and a calculator. The history of baseball has also been a bit of hindrance for the progression of the sabermetric statistic into the world of roto baseball. So many fans have been raised on batting averages, RBIs, and ERAs, it is only natural for those fans to continue with those traditional views as they enter the fantasy realm.

However, more and more statistical providers are looking at these sabermetric calculations as viable offerings. As baseball has matured and the support of this particular brand of analysis has grown, the two have grown closer together. Bill James is now employed by the Boston Red Sox. Several professional organizations have devoted staff members specifically to the sabermetric analysis of their players and prospects. Billy Beane, the GM of the Oakland Athletics, not only popularized the use of these views for determining the value of a player, but most of his assistants and staff have moved into powerful positions with other franchises as those organizations seek to find the same success Beane has enjoyed in a financially inferior environment. It has become almost impossible to find a baseball mind, be it a reporter or a fan or professional member of an organization, without a working knowledge of simple sabermetrics, such as OPS and WHIP, and a strong opinion of those methods of evaluation.

It is impossible for the true baseball fan to ignore the emergence of these views. It is even more impossible to ignore the results. With this in mind, the fantasy participant would be wise to consider adding such knowledge to their mental toolbox. While 99% of the roto leagues in the world do not use these statistics and calculations directly, their value in the field of player evaluation and assessment is immeasurable. The primary goal of sabermetrics is to provide an objective judgment of the production of a baseball player while also providing insight into their potential for the future. These are tools geared towards providing success in the present as well as the future, and I don’t know a competitive fantasy player that wouldn’t benefit from their use. It may take some time and an open mind, but sabermetrics can provide results making it all worthwhile.

Click Here to check out the entire FantasyBaseball.com University Series!

Comments are closed.