How much better can the 2019 NBA Draft class get?

Ved Pujari

Introduction

As an avid watcher of the NBA, I was suprised to see how much talent has come out of the rookies this season. Along with me, other fans, and NBA commentators, we all had the same mindset, that these rookies are some of the best performing rookies seen to step foot on NBA courts in a while. Some recources include "UNDISPUTED" on youtube, as well as https://clutchpoints.com/is-the-2019-nba-draft-class-the-best-from-the-last-decade/. The question I want to know is how much potential these rookies will have if we take their stats from this year and apply a "Hall of Fame" slope onto it. The Hall of Fame is recognized in every sport, and is described as an institution that honors the best individuals from a particular activity. In this case, it will be basketball. I plan to do this by gathering data from the Hall of Famers that played from the 1980's onward for several years. This will incude the four most essential stats a player can have: Points per game, Shooting percentage, Rebounds, and Assists. I will be excluding anything to do with Championships, such as Championship Rings and Finals MVP's as we will be predicting this young players career growth, not if his team wins a championship. Championships will vary depending on teams hes on, injuries, support players, etc. After I gather this data, I will seperate the data by position. With this data, for each position, I will create a linear regression model. I then will plot each of the top 15 picks from the 2019 Draft class to see how they compare against Hall of Famers statistics and see their potential.

Data Collection

In this project, we collect our data from several different websites. In data science, we learned that data can be collected in many ways, such as csv's and htmls. In my find, I will be using htmls. First, we need to import all possible programs that we will be using

Now, we have to initialize a head to collect all our data from. This was found with a quick google search, as it is IP based. We first have to collect all the names of NBA Hall of famers. In the basketball Hall of fame, in addition to NBA players, there are WNBA players, International players, and Coaches. This website included only NBA players, so the data collection was a bit easier due to this website.

Here, I initialize 5 arrays for each position in the NBA: Point Guard, Shooting Guard Small Forward, and Center. There are 135 NBA players in the Hall of fame currently, so my for loop is of range 135. I then have a filter to only gather data of players who played in the 1980s. I decided to filter by 1980s, because Michael Jordan, objectively the greatest player of all time played in this era, and wanted to base my statistics off the peak of NBA, which is from the 1980s-Present. I then filtered again by position and added the names of the the Hall of Famers to each positional array, respectively.

Now that we have the players names we want for each position, we need to gather the statistics for each player. The way I decided to tackle this is by using a webiste called basketball-reference.com. Each player has their own unique website for stats, so I created an array of Urls for each player. Essentially, instead of a list of names, the below method creates a list of urls.

Now that we have a way to get the links for each of these players, we now have to find a way to collect the data for each of these players. The getData mathod does just that. The method below collects data based on the link you put into the parameter. This will come in handy in the future.

Going back to what I said earlier about collecting the data, here is a method I created that collects data based on what player type is given. The below method essentially creates five 2D arrays that has 15 slots each. Example for PG: [ [Year 0, Year 1, Year 2, ... Year 15] , [Year 0, Year 1, Year 2, ... Year 15] ... for all the point guards ]

Data Visualization

Now that we have all the data for Hall of Famers we need to work with, we need to start graphing this data to have a better understanding.

I created a graph method that creates a violin plot for the type of player inputted. Because of the way I formatted my data points previously, I had to do a bit of editing to have correct syntax for seaborn. Seaborn is something that allows me to easily plot violin and box plots while given a x list and y list. For more information on seaborn, look here: https://seaborn.pydata.org/. Essentially, I had to combine all of my data into a one dimensional array. Example: if I originally had [ Year 0, Year 1, Year 2, ... Year 5 , Year 0, Year 1, Year 2, ... Year 5 ... for all the point guards ] I had to convert it to [0,0,0, ... 5 times, 1,1,1,1, 15 times, 2,2,2,2, 5 times] and [Year 0 statistic,Year 0 statistic,Year 0 statistic, ... 5 times, Year 1 statistic, Year 1 statistic, Year 1 statistic, Year 1 statistic, 5 times, Year 2 statistic,Year 2 statistic,Year 2 statistic,Year 2 statistic, 5 times]. The 5 times represents the 5 years I collected data for, and the 0 -2 and so on represents how many Hall of Famers I took data for. The reason I condensed it to only five years was to get the players rise in their careers, rather than after their peak/plateau. Although this looks confusing, with a simple flatten() method I was able to convert my 2d array into a 1d array. After this, I also added some x and y labels depending on what player type was inputted.

Data Violin Plot Analysis

From these 5 graphs, we can talk a bit about if the data is normally distributed, or skewed in any way. Data being normally distributed is described as " distribution that is symmetrical on both sides of the mean." In this case, it means that players had around the same stats for the most part. For some positions, this holds true, but for other positions this is inaccurate. For most of the positions, the data seems to be skewed towards lower stats, especially in the early years, with an exception of Centers. This may indicate that as the years went by, the game changed for the better and players started to score more. This may be a change of speed in the prime years of NBA, with many talented players entering the scene, such as Michael Jordan.

Rookie Data Collection

Now it is time to deal with our original question, on how much potential the 2019 NBA Draft class has. Similar to how I collected the names of the Hall of Famers earlier, I used a website to collect the names of the first round of the 2019 Draft.

The below for loop essentially is an iteration of a method I created for calculating and saving the Hall of Fame Player Data. Instead of saving all the data into a 2d array, I created 5 seperate arrays, depending on their position: PG_Stat, SG_Stat, SF_Stat, etc. The reason I did this was because I am only collecting data for one year and the top 15 picks, so the process of creating graphs will be much easier this way. An Example of this for Point Guards : PG_Name=[ Player A, Player B] PG_stat=[15,17]. With this way of organization, I can easily grab the rookies name and statistic.

In order to compare the 2019 Draft Class to the Hall of Famers, I need to create equations to plot lines against the Hall of Famers Linear Regression. The equation I made was the slope of the Hall of Fame Linear Model plus the rookie statistic (y-intercept) I collected the slopes of the Hall of Famers positions by running a linear regression on them, as shown below

Linear Regression

The last step in this project is to visualize the potential of the rookies. In order to start this off, I imported a few things to help me while graphing linear regression models.

The graphnew method will be useful when graphing the rookies potential against the Hall of Famers

The below graph method is a modified version of a graph method I created previosuly. THhe main difference is that instead of graphing a violin plot, this method plots scatter and linear regression. The reason I had to create an new method was to showcase the two different graphs and what they each show (normal distribution and rookie potential). For mopre information on normal distribution, look here: https://www.mathsisfun.com/data/standard-normal-distribution.html. Additionally,I wanted to have two seperate graphs, one that shows only the Hall of Famers, and one that shows both the Hall of Famers and Rookies

Linear Regression Analysis and Conclusion

The above five graphs show us how the 2019 Draft Class, also know as this past years rookies compare to Hall of Famers, and how much potential each one of them has. We can easily say that the Shooting Guards of this class were the most talented, and Point Guards were a close second. The Small Forwards and Power Forwards were pretty equal, but the one lonely center from the 2019 draft class unfortunately does not have Hall of Fame potential. The Point Guard with the most potential is Ja Morant, Shooting guard is RJ Barett and Tyler Herro, and the Power Forward with the most potential is Zion Williamson by a milestone. These were done by taking the players who were outside of the middle range, indicated by the shadow in the graph (closest to the outliers.)The data does not tell the entire story though; there are many more years to come in these young players careers, so these conclusions can all change. This project shows how data science can be used in many fields of the real world, and just how much we can reveal about certain topics. The data I collected was represented only by statistics off of rookies, but some other ways NBA data could be used is to maybe predict Hall of Famers, or use Data Science to determine who the Greatest Player of All time is.