20 Comments
Feb 6, 2023Liked by Eli Ben-Porat

I love this. I've been playing around with BAPP on an individual level to see how it might shift our perception of players. One thing I'm noticing is, at least relative to OPS rankings, the players hurt most by a shift to BAPP is high BA with moderate-to-low BB% and ISO players.

You sort of addressed this at the top, but I'm wondering if you'd agree with this takeaway: BA matters, but in the effort to replace BA, some stats used as a replacement (mainly OPS) were accidentally overvaluing BA.

I'm a Cleveland fan so I've been playing around with their stats. An example that jumped out was 2014 Carlos Santana vs 1995 Carlos Baerga:

Based on OBP and OPS you'd say they're similar:

Baerga: .355 OBP, .807 OPS

Santana: .365 OBP, .792 OPS

But BAPP shows Santana as a much more valuable player:

Baerga: .314 BA / .058 BB% / .138 ISO - .510 BAPP

Santana: .231 BA / .171 BB% / .196 ISO - .598 BAPP

Thoughts on this comparison and takeaway?

Expand full comment
Feb 5, 2023Liked by Eli Ben-Porat

Great article, Eli! Thanks for sharing! Essentially, each of the traditional triple slash line metrics needs the others to complete the story.

SLG doesn't tell us how often the player avoids outs.

OBP doesn't tell us what proportion of the non-outs come from the more valuable hits (and doesn't reward extra base hits).

BA doesn't tell us the quality of the hits or the other ways to get on base.

I really like the idea of BAPP.

Expand full comment
Feb 5, 2023Liked by Eli Ben-Porat

This is excellent work. Thank you!!!

Expand full comment

Great read. I played baseball in college from 2009-2012. Before the start of my junior year, the top half of our lineup created a competition for who would walk the fewest number of times over the course of the season. We also had side wagers with the first hitter to get to 5 walks being the loser. This was born from a late night conversation over a couple frosty beers where guys with higher BAs were ragging on guys with higher OBPs.

Something odd happened- our walks went down, average went up…and we had our best conference finish in school history.

On an unrelated note- we were A LOT more aggressive on the base paths too…attempting to stretch every 50/50 play into a double/triple etc.

Obviously not MLB data nor super objective. But my view on baseball theory changed during that season.

Appreciate the write up and information!

Expand full comment

In your example, how is it possible that trading 55 pts of BA outweighs losing 55 BB% and 55 ISO when the coefficients are +0.224 vs -0.156 and and -0.2?

Expand full comment

I knew this Ben Clemens answer was wrong when he wrote the article over at fangraphs. But everything you say as a commenter at fangraphs is deemed as wrong if you don't agree with the authors. While mentioned in your article, you did not mention the technical term - multicolinearity, meaning overlapping or redundant predictability. I also knew the article was wrong because i did my own analysis. I will let you take any year from 1980-1992, 96 and any year from 2010-2022 (excluding the strike years, 1981, 1994, 1995, 2020 covid, and you will have to adjust for only 26 teams in the 80's) and compare them with a 30 to 40 year gap in between. Batting average and on base percentage are better (BA + walks don't overlap) is a better predictor of runs than OPS and SLG. Why? 1. because the modern age is devaluing the single by default because there are way fewer of them hit compared to the 80's because the mantra is don't swing unless you get a double or homer. 2. lineup turnover is higher in the 80's, not now in the modern age. 3. more runs were scored in the 80's. Go do the calculations yourself, or if you want you can have my spreadsheets

Expand full comment

This is interesting research, Eli, but I think there's a question going missed here. What do we want to predict? Are we looking for metrics that will predict how many runs a team DID score? Or how many runs a team WILL score? One is backwards looking, one is forwards looking. The forward-looking question leads us to that mythic search of "true talent level."

If I am given the statistics for a game (AVG, OBP, SLG, wRC+, etc.) and told to guess the final score, honestly, simpler is better (really, the Runs metric is pretty dang simple, and it will give me the best answer). But If you give me a team's stats for June (AVG, OBP, etc.) and ask me to predict their record in July -- now THAT'S a tougher, and more important, question.

The problem with Batting Average isn't that it doesn't report facts -- it certainly does! (Albeit in a weird way, thanks to the use of ABs as the denominator.) The problem is that AVG takes some 1200 PA to stabilize, meaning we end up with random oddities like in 2000 when an outfielder Jeffrey Hammonds (a career .272 hitter) randomly hit .335 for a whole season. He never did that before, and never again. Did he help Colorado win some games that year? You bet your butt he did. But Milwaukee was probably pretty disappointed the following two seasons when, after acquiring him, he averaged .250.

This is also why wRC+ has an even worse correlation with Runs scored -- wRC+ is trying to strip park effects from the data. Hammonds hit .335 in the 2000 Colorado run environment. That place was next to the moon. So a metric like wRC+ gives less credit -- not because Hammonds didn't get hits and create runs. But because he probably couldn't do it again. (That said, his wRC+ was pretty good in 2000.)

I think the next step to this test is seeing if a team's AVG on even days can predict their run scoring on odd days. Or can a team's June AVG predict July runs. Because if it performs poorly there (and it historically has), then AVG is right back where it was at the beginning -- an interesting, but comparatively very limited metric.

Expand full comment

Hi Ben,

Really enjoyed this piece. Would you mind sharing your calculation for linear weights when comparing a .250 hitter to a .333 one? I’m trying to better understand it. Thanks!

Expand full comment

This is interesting work; thanks for presenting it. How does your model account for HBP? There is a great deal of variability in HBP% and in some instances it is a significant component of a player's OBP. Unless I'm missing something, your calculation of BAPP disregards HBP.

Expand full comment

3 questions:

1) the title of the BAPP vs r/PA says the r^2 is .72 and then in the description it says it’s .86. Which is it?

2) did you graph out wRC+ to r/PA, what was that r^2?

3) is r/PA the standard way to measure offense? I would’ve thought r/outs made, but idk?

Expand full comment