introduction methods results and error analysiscs229.stanford.edu/proj2016/poster/qiu-automatic...

1
Automatic Classification of Pick-and-Roll plays in the NBA Will Qiu Introduction The pick-and-roll is a basic and fundamental type of play frequently employed teams in the NBA. Knowing both how to execute as well as stop the execution of a pick-and-roll is an important skill that players and coaches must develop. This has led teams to develop a wide variety of pick-and-roll type of plays to use dur- ing game-time, and many teams have been come to known for the their particular style of play. Recognizing when a pick-and-roll is about to occur, its general type is vital for a defensive team’s success in stopping its execution in a timely manner. More broadly, knowing which kinds of pick and roll tend to be effective is also important for teams as they learn to utilize their players on the court. Traditionally, this is a task relegated to the coaching staff and involves spending a tedious amount of time both in tape-watching sessions and scrimmages. Thus, a method which can take the large volumes of available game data and recognize the specific play-making strategies of teams in the league will help coaches better utilize their time when developing new styles of play. (coachesclipboard.net) Data and Features NBA SportsVu data of player and ball movements from 630 regular season games from the 2015-2016 season in the form of (x,y) coordinates was used. Data is discretized into individual events and further discretized into single frames (moments) at 25 frames per second. Relevant "pick-and-roll-able" moments and their features were extracted. Data were then labeled into several different categories for prediction: {no play, screened shot, pick-and-pop, classic pick-and-roll, pass to an open man}. I attempted both binary classifica- tion: {pick-and-roll, not pick-and-roll} as well as multi-label classification. Feature Representation Moment Length (normalized to 1) m end - m start Time of ball-screen t s = arg min t d t (p d ,p s )[1] Ball handler’s starting distance to basket d 0 (p h ,b) Speed of player i S i = t d t t ) Min distance (all pairwise players and basket) min({d 1 ,d 2 ..., dt}) Max distance for (all pairwise players and basket) max({d 1 ,d 2 ..., dt}) Average distance (all pairwise players and basket) avg ({d 1 ,d 2 ..., dt}) Total Distance Traveled by Player i sum({d i (2 - 1),d i (3 - 2)...d i (t - t - 1)}) Scatter Plot of Main Features (Handler, Screener and On Ball Defender) Methods I train and assess the performance of two classifiers using gradient boosting and SVM –specifically, two separate groups of models. The first group attempts simple binary classification of pick-and-roll plays. The second group attempts a multi-class classification task whereby different types of plays in addition to the classic pick-and-roll are predicted. The data is split 70-30 in order to compute validation errors. The train- ing set contains n training = 134, 165 total unique plays and the test set contains n test = 4816 total unique plays. Gradient Boosting Classification Gradient boosting is a boosting method that uses gradient descent to train a number of weak classifiers which are then combined to form a single more robust classifier. For the case of binary classification, gradi- ent boosting is formulated in the following way: Start with initial models for a given class i F i , then, for each iteration until convergence: Calculate negative gradient for i : -g i (x i )= - ∂L(y i ,F (x i )) ∂F (x i ) Fit a regression tree h i to -g i (x i ). Compute F t : F t+1 := F t + ρh Here, L is a loss function and 0 1 is the learning rate. Here, L is chosen as the logistic loss function. To get at the best classifier, I experimented with different learning rates ρ and assess each one’s performance. Support Vector Machines Support Vector Machines is a more traditional method which attempts to find the maximal separating hyper-plane among two sets of points. For classes where y = {-1, 1}, The typical SVM classifier is classified as the following: h w,b = g (w T x + b) Where: w = X i h i y i K (x i , x) b = 1 |SV | X iSV (y i - X j h j y j K (x j , x i )) and K is the Kernel function. For the classification task here, the Radial Basis Function is used as the Kenel. The RBF Kernel is defined as: K (x, x 0 ) = exp - ||x - x 0 || 2 2σ 2 ! Results and Error Analysis Model Training and Test Error (Training size: 134,165, Test size: 4,816) Model Training Accuracy Test Accuracy Binary SVM with RBF Kernel 0.9459 0.946 Multi-Class SVM with RBF Kernel 0.9918 0.695 Gradient Boosting (Binary) (ρ =0.1) 0.937 0.930 Gradient Boosting (Binary) (ρ =0.2) 0.941 0.933 Gradient Boosting (Binary) (ρ =0.5) 0.946 0.940 Gradient Boosting (Multi-Class) (ρ =0.1) 0.939 0.931 Gradient Boosting (Multi-Class) (ρ =0.2) 0.947 0.941 Gradient Boosting (Multi-Class) (ρ =0.5) 0.918 0.90 ROC Curves For Different Methods (Binary and Multi-Label Classifiers) Discussion and Extension All of the methods perform very well for binary classification and their differences are negligible. There is however greater variation in multi-label classification between methods depending on the specific label being predicted. Surprisingly, the multi-label SVM had terrible performance overall for multi-label classification against gradient boosting. Gradient boosting is a much better method overall. But the high false positive rates for some of the labels suggest that the extracted features do not provide adequate predictive value for distinguishing between more nuanced types of plays. One possible future extension is to take into account the sequential nature of the subject matter, and use as input, a sequence of values of player positions to make predictions about the next likely action or set of sequences. References [1] Armand McQueen, Jenna Wiens, and John Guttag. “Automatically recognizing on-ball screens”. In: 2014 MIT Sloan Sports Analytics Conference. 2014.

Upload: duongtram

Post on 18-Aug-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction Methods Results and Error Analysiscs229.stanford.edu/proj2016/poster/Qiu-Automatic Classification of... · AutomaticClassificationofPick-and-RollplaysintheNBA Will Qiu

Automatic Classification of Pick-and-Roll plays in the NBAWill Qiu

Introduction

The pick-and-roll is a basic and fundamental type of play frequently employed teams in the NBA. Knowingboth how to execute as well as stop the execution of a pick-and-roll is an important skill that players andcoaches must develop. This has led teams to develop a wide variety of pick-and-roll type of plays to use dur-ing game-time, and many teams have been come to known for the their particular style of play. Recognizingwhen a pick-and-roll is about to occur, its general type is vital for a defensive team’s success in stoppingits execution in a timely manner. More broadly, knowing which kinds of pick and roll tend to be effectiveis also important for teams as they learn to utilize their players on the court. Traditionally, this is a taskrelegated to the coaching staff and involves spending a tedious amount of time both in tape-watching sessionsand scrimmages. Thus, a method which can take the large volumes of available game data and recognizethe specific play-making strategies of teams in the league will help coaches better utilize their time whendeveloping new styles of play.

(coachesclipboard.net)

Data and Features

NBA SportsVu data of player and ball movements from 630 regular season games from the 2015-2016 seasonin the form of (x,y) coordinates was used. Data is discretized into individual events and further discretizedinto single frames (moments) at 25 frames per second. Relevant "pick-and-roll-able" moments and theirfeatures were extracted. Data were then labeled into several different categories for prediction: {no play,screened shot, pick-and-pop, classic pick-and-roll, pass to an open man}. I attempted both binary classifica-tion: {pick-and-roll, not pick-and-roll} as well as multi-label classification.

Feature RepresentationMoment Length (normalized to 1) mend −mstartTime of ball-screen ts = argmin

tdt(pd, ps)[1]

Ball handler’s starting distance to basket d0(ph, b)

Speed of player i Si =∑

t dtt )

Min distance (all pairwise players and basket) min({d1, d2..., dt})Max distance for (all pairwise players and basket) max({d1, d2..., dt})Average distance (all pairwise players and basket) avg({d1, d2..., dt})Total Distance Traveled by Player i sum({di(2− 1), di(3− 2)...di(t− t− 1)})

Scatter Plot of Main Features (Handler, Screener and On Ball Defender)

Methods

I train and assess the performance of two classifiers using gradient boosting and SVM –specifically, twoseparate groups of models. The first group attempts simple binary classification of pick-and-roll plays. Thesecond group attempts a multi-class classification task whereby different types of plays in addition to theclassic pick-and-roll are predicted. The data is split 70-30 in order to compute validation errors. The train-ing set contains ntraining = 134, 165 total unique plays and the test set contains ntest = 4816 total uniqueplays.

Gradient Boosting Classification

Gradient boosting is a boosting method that uses gradient descent to train a number of weak classifierswhich are then combined to form a single more robust classifier. For the case of binary classification, gradi-ent boosting is formulated in the following way:

Start with initial models for a given class i Fi, then, for each iteration until convergence:

Calculate negative gradient for i :

−gi(xi) = −∂L(yi, F (xi))

∂F (xi)

Fit a regression tree hi to −gi(xi).

Compute Ft:Ft+1 := Ft + ρh

Here, L is a loss function and 0 < ρ ≤ 1 is the learning rate. Here, L is chosen as the logistic loss function. Toget at the best classifier, I experimented with different learning rates ρ and assess each one’s performance.

Support Vector Machines

Support Vector Machines is a more traditional method which attempts to find the maximal separatinghyper-plane among two sets of points. For classes where y = {−1, 1}, The typical SVM classifier is classifiedas the following:

hw,b = g(wTx + b)

Where:

w =∑i

hiyiK(xi,x)

b =1

|SV |∑i∈SV

(yi −∑j

hjyjK(xj,xi))

and K is the Kernel function. For the classification task here, the Radial Basis Function is used as the Kenel.The RBF Kernel is defined as:

K(x,x′) = exp

(−||x− x′||2

2σ2

)

Results and Error Analysis

Model Training and Test Error (Training size: 134,165, Test size: 4,816)Model Training Accuracy Test AccuracyBinary SVM with RBF Kernel 0.9459 0.946Multi-Class SVM with RBF Kernel 0.9918 0.695Gradient Boosting (Binary) (ρ = 0.1) 0.937 0.930Gradient Boosting (Binary) (ρ = 0.2) 0.941 0.933Gradient Boosting (Binary) (ρ = 0.5) 0.946 0.940Gradient Boosting (Multi-Class) (ρ = 0.1) 0.939 0.931Gradient Boosting (Multi-Class) (ρ = 0.2) 0.947 0.941Gradient Boosting (Multi-Class) (ρ = 0.5) 0.918 0.90

0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

e

Binary Classification (pick-and-roll vs not pick-and-roll)

binary_label_5000_trees_learning_rate_0.2binary_label_5000_trees_learning_rate_0.1svm_rbfbinary_label_5000_trees_learning_rate_0.5

0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

e

No_play

multi_label_5000_trees_learning_rate_0.2multi_label_5000_trees_learning_rate_0.1svm_rbf_multimulti_label_5000_trees_learning_rate_0.5

0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

e

Screened_shot

multi_label_5000_trees_learning_rate_0.2multi_label_5000_trees_learning_rate_0.1svm_rbf_multimulti_label_5000_trees_learning_rate_0.5

0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

e

Pass_to_open_man_and_drive

multi_label_5000_trees_learning_rate_0.2multi_label_5000_trees_learning_rate_0.1svm_rbf_multimulti_label_5000_trees_learning_rate_0.5

0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

e

pnp_non_roll_man

multi_label_5000_trees_learning_rate_0.2multi_label_5000_trees_learning_rate_0.1svm_rbf_multimulti_label_5000_trees_learning_rate_0.5

0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

e

pick-n-roll

multi_label_5000_trees_learning_rate_0.2multi_label_5000_trees_learning_rate_0.1svm_rbf_multimulti_label_5000_trees_learning_rate_0.5

0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

e

Handler_drive_to_basket

multi_label_5000_trees_learning_rate_0.2multi_label_5000_trees_learning_rate_0.1svm_rbf_multimulti_label_5000_trees_learning_rate_0.5

ROC Curves For Different Methods (Binary and Multi-Label Classifiers)

Discussion and Extension

All of the methods perform very well for binary classification and their differences are negligible. There ishowever greater variation in multi-label classification between methods depending on the specific label beingpredicted. Surprisingly, the multi-label SVM had terrible performance overall for multi-label classificationagainst gradient boosting.

Gradient boosting is a much better method overall. But the high false positive rates for some of the labelssuggest that the extracted features do not provide adequate predictive value for distinguishing between morenuanced types of plays.

One possible future extension is to take into account the sequential nature of the subject matter, and useas input, a sequence of values of player positions to make predictions about the next likely action or set ofsequences.

References

[1] Armand McQueen, Jenna Wiens, and John Guttag. “Automatically recognizing on-ball screens”. In: 2014MIT Sloan Sports Analytics Conference. 2014.

1