Using Pythagorean Win Percentage as a prediction tool

I blogged about defense, scoring differential, and the definition of Pythagorean Win Percentage in my first three blogs of the 2023-2024 basketball season. You can click on the links for each blog by clicking on the words above.

Some of you may have read these discussions and said "So what!". Some people who have a keen interest in statistics and their power of prediction devoured these blogs. I don't guarantee that I will entertain everyone as we wait for the season to start.

I will be putting out pre-season rankings once the MSHSAA sets up the class assignments for all teams, which typically only happens about mid-November. However, I am working on improving my early rankings with predictive statistics. Once the season starts, ratings will be based on head-to-head games and comparative game results to develop a dynamic algorithm. My formulae are set up so that a team can improve their ranking for early season wins if the teams they beat get better and conversely, ratings can be dragged down from teams you beat early that tank during the season.

But enough about ratings. I want to get back to the statistical mouthful... Pythagorean Win Percentage or PWP for short. PWP is used by the NBA as a statistic to predict winning percentage based on point differential. The only problem is that is it something of what I would call a "rearview mirror" statistic, which means, the teams have already played the games so it accurately describes the past results and nothing more.

What if you could somehow use historical PWP to predict future results? I have found that high school basketball programs are fairly consistent. Consistently good and sometimes consistently bad. Schools with great feeder programs reload with fresh, talented kids as quickly as talented seniors have graduated. Of course, some private schools can "recruit" talent through traditionally strong programs. Kids, and their parents, opt for a private education at schools with talented and successful sports programs. In both cases, the public and private schools are good or great, year after year. At some schools, the culture just isn't there for success, until some coach comes along with a fresh outlook and changes the culture both in the school and in the community. That has happened with Coach Eric Bennaka at Smithville, where they have turned the program around. Some schools like South Iron have had that success culture for years, starting kids in a feeder program in early elementary school.

As a statistician, I have seen this consistency (good and bad) in ratings over the past six or seven years that I have conducted Gramps Ratings. I decided to look at PWP for different periods. I looked at Three-year average, a two-year average, a last year's average, and, a five and a ten year average PWP. I also looked at a five-game PWP for the beginning of last season. I threw out the five and ten-year PWP for this discussion.

So do any of those PWP averages reliably predict the team's winning percentage for last year? The table is quite large so I will drop it in at the end of the blog. Let's look at the Correlation coefficients and Scatter plots for three scenarios:

1. Three-year PWP averages - these are the Pythagorean Win percentage prediction based on the team's point differential averaged for the 2019-20, 2020-21, and 2021-22 seasons,

This analysis suggests that the three-year PWP could explain about 60% of the team's winning percentage for 2023. Other factors (injuries, depth, etc) are in the 40% of unexplained factors that determined the team's actual winning percentage. 60% isn't a bad correlation coefficient but you could probably look at the team's record over the past ten years and come fairly close to their actual record. In addition, team composition changes about every three to four years and the quality of the team can change remarkably. The eyeball factor is probably just as good as a 3 year PWP estimate.

2. Two-year PWP averages - these are the Pythagorean Win percentage prediction based on the team's point differential averaged for the 2020-21 and 2021-22 seasons,

The two-year PWP appears to explain a little over 80% of the team's actual winning percentage, a bit better than the three-year PWP. That's still 20% unexplained by PWP but better than any other predictive formula I have looked at. You may recall that I have a complex formula with return scoring, VPS values for returning players, and some other factors which explained the winning percentage at about 60% so I will take 80%. I actually found that return scoring by itself, only explained the team's winning percentage at about 7%.

3. Last Year's PWP vs. this year's Win % - would the most recent point differential predict next season's success?

I was surprised to see that only 40% of the winning percentage for the 2022-23 season was predicated on the 2021-2022 Pythagorean Win Percentage (based on point differential). While teams build on successful seasons, talent can fluctuate year-to-year AND coaches can change the culture from one year to the next. Success doesn't always guarantee success and previous failures do not always forecast future failure. Sometimes, failure is a learning experience that can be utilized to build future success.

4. Five games In PWP averages - this was the Pythagorean Win percentage prediction based on the team's point differential averaged for the first five games of the 2022-23 seasons,

I thought that early season results might be a good prediction for the entire season. Looking at the five first-game PWPs, it explains a little less than 60% of the team's final season record. It looks like some teams jump into really tough competition and end up better than their early season results (which probably makes them better later) and some teams have a patsy schedule early which raises expectations but ultimately ends in disappointment. Both of those situations drag down the predictive value of those first five games' PWP as a season record predictor.

Conclusions:

Pythagorean Winning Percentage could be used as a predictive tool to some degree with the two-year average scoring differential. It is not just a "rearview mirror" statistic but I have another tool that I have used to get success with several teams and I will discuss that in a future blog. Hint: It has to do with Gramps ratings and the team's schedule.

Here's the raw data table that went into the above scatter plots. Hope you have enjoyed the stroll down Statistics Lane.

Search This Blog

Gramps 2024 Boys Girls MO HS BB ratings

Using Pythagorean Win Percentage as a prediction tool

Comments

Post a Comment

Popular posts from this blog

A NEW STATE BASKETBALL TOURNAMENT PROPOSAL - CLASS SIX SEEDED SUPER REGIONS AS AN EXAMPLE

Eureka High School Girl's Basketball top 50 lists

Best MIssouri Boys and Girls Basketball Programs Combined