py "baseball" data入門 - 広島東洋カープ編 #pyconhiro

Post on 12-Jan-2017

5.060 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Py “Baseball” Data PyCon mini Hirosima 2016

Python

Shinichi Nakagawa(Baseball Analyst&Pythonista)

Starting Member

• Who am I?( )

• PyData

• PyData / #

• Python

• PyData + (FIP/RC27)

Who am I?

• Shinichi Nakagawa(@shinyorke)

• Python , Hack ※ Python

• HR .

• Python/Agile/PyData/SABRmetrics( )

• ( ) .

• ( ) HR .

• 1 2

.

• (Django) Python .

• https://service.visasq.com

• https://tech.visasq.com

• &

• etc…

• Web Python

• IPython + pandas

(Hello World )

.

.

• Deep Learning ,

.

• (Pandas )

& .

PyData / #

PyData

“”” PyData

Python Python Library

“””

※@iktakahiro http://www.slideshare.net/iktakahiro/pydata-67913897

PyData

• , ,Python

&( ) .

• , or .

• Excel Python, Deep Learning,

etc… PyData

PyData ( )

( )

( )

• ,

• 1970

, &

( , )

• ( , ,FA)

• ( )

• ( , etc…)

• ( , J )

× ( ) ※ × +

× ( ) ※ × +

5

• ( - ) = 5 ( )

.

• ( - ) 5 5

(ry .

• = ( 2 )÷( 2 + 2 )

Python×Pandas

Python×pandas

# Python 3 (3.4 ) ( )$ pip install ipython pandas beautifulsoup4 numpy lxml html5lib# ipython ( Jupyter )$ ipython

Python×pandas

# import pandas as pdimport numpy as np

# ( )df = pd.read_html('http://baseball.yahoo.co.jp/npb/standings/')

# df_cl = df[0].drop([0]) #

Python×pandas

# # ( )df_cl.columns = ['rank', 'name', 'games', 'win', 'lose', 'draw', 'pct', 'gb', 're_games', 'r', 'er', 'hr', 'sb', 'ba', 'era']

# df_cl['win'] = df_cl['win'].fillna(0).astype(np.int64) # df_cl['lose'] = df_cl['lose'].fillna(0).astype(np.int64) # df_cl['pct'] = df_cl['pct'].fillna(0).astype(np.float64) # df_cl['r'] = df_cl['r'].fillna(0).astype(np.int64) # df_cl['er'] = df_cl['er'].fillna(0).astype(np.int64) #

Python×pandas

# df_cl['difference'] = df_cl['r'] - df_cl['er']

# df_cl['pythagorean_win_per'] = (df_cl['r'] ** 2) / (df_cl['r'] ** 2 + df_cl['er'] ** 2)

# df_cl['pythagorean_win'] = (df_cl['pythagorean_win_per'] * 143).fillna(0).astype(np.int64)

df_cl['pythagorean_lose'] = 143 - df_cl['pythagorean_win']

# df_cl.sort_values(by='pythagorean_win_per', ascending=False)

https://gist.github.com/Shinichi-Nakagawa/8ff55af83390fcd2e2dd34bcb914868c

( )

×

• (+187)

• 5

• /

• DeNA ,

• ( )

?

( )

• & (& )

• , , ,

×PyData

• FIP

• (RC27)

• scrapy CSV

• CSV pandas, seaborn, jupyter &

( )

FIP(Fielding Independent Pitching)

• , ( )

• , (+ ),

• ( )

• xFIP

FIP .

FIP( TOP 20)

FIP( & )

FIP(50 Histogram)

FIP(50 Histogram)

FIP(50 Histogram)

FIP

• FIP

FIP ( )

RC27

• 9 1

?

• VS , ?

• RC(Run Created, ) 1

RC27 (350 )

RC27 TOP30(350 )

RC27(Histogram)

RC27(Histogram)

RC27(Histogram)

RC27

• 1-6

• RC27 Top30 6

• ( )

6 Top30

• ,

• ,

, FIP ( )

[ ]

• ,

FIP, WHIP, K/BB, etc…

• ,

RC27 3 ( 6 )

Py "Baseball" Data - Python ※pandas, Re:dash (& )

MonotaRO TechTalk #4

http://www.kokuchpro.com/event/monotarotech4/

&

Shinichi Nakagawa(Twitter/Facebook/visasQ:@shinyorke)

top related