soviet popular music landscape: community structure and success predictors

30
Soviet Popular Music Landscape Community Structure and Success Predictors Dmitry Zinoviev Department of Mathematics and Computer Science Suffolk University, Boston

Upload: dmitry-zinoviev

Post on 18-Feb-2017

78 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Soviet Popular Music LandscapeCommunity Structure and Success

Predictors

Dmitry ZinovievDepartment of Mathematics and Computer Science

Suffolk University, Boston

Dmitry Zinoviev * IC S * Suffolk University 2

Research Question

Who Rocks and Why?

Dmitry Zinoviev * IC S * Suffolk University 3

Real Research Questions● Does sharing performers with other groups

influence the groups' eventual success?

● If so, is the success predictable from the performers' sharing network?

● What is the linguocultural and genre structure of the ex-Soviet music universe?

Dmitry Zinoviev * IC S * Suffolk University 4

Research Strategy● Collect data about sharing and success● Build a network based on shared musicians● Define “success”● Correlate network measures (such as centralities)

with success measures● Attempt to predict success from the network

measures using machine learning techniques● Look into genres/languages and communities

Dmitry Zinoviev * IC S * Suffolk University 5

DATA

Dmitry Zinoviev * IC S * Suffolk University 6

Data Set● 4,560 non-academic music groups performing in

the USSR and post-Soviet countries in 1960–2015

● 17,000 performers (at least 3,600 shared)

● 275 coded genres (rock, pop, disco, jazz, folk, etc.)

● Wikipedia pages in 122 languages

Dmitry Zinoviev * IC S * Suffolk University 7

New Groups by Year

Dmitry Zinoviev * IC S * Suffolk University 8

2,216 Groups on Wikipedia

● Russia

● Estonia

● Ukraine

● Latvia

● Lithuania

● Belarus

● Moldova

Dmitry Zinoviev * IC S * Suffolk University 9

NETWORK

Dmitry Zinoviev * IC S * Suffolk University 10

Network Construction● Group → node; labels in the original language

● Two nodes connected if the groups shared at least one musician over their lifetime

● Undirected, unweighted, unconnected graph with no loops and no parallel edges

● For each node, calculate degree, average neighbors degree, closeness, betweenness, and eigenvalue centrality, and clustering coefficient

Dmitry Zinoviev * IC S * Suffolk University 11

Network Overview

● Node size represents degree (number of shares)

Dmitry Zinoviev * IC S * Suffolk University 12

Network Description● 80% of the groups (3,602) are in the giant

connected component; all other connected components have <13 groups each

● Excellent community structure (m=0.76), 43 communities; each of the largest 25 communities has 20+ groups

● Community = groups that have a lot of mutual musician sharing

Dmitry Zinoviev * IC S * Suffolk University 13

SUCCESS

Dmitry Zinoviev * IC S * Suffolk University 14

What's “Success”?● No sales data!● No charts!● Informal/semi-legal/illegal status● Proxies for long-term success (we still remember them!):

– Wikipedia page(s) visit frequency within last 3 years (collected from http://stats.grok.se)

– Wikipedia page(s) Google PageRank

– Available for 2,000 groups

Dmitry Zinoviev * IC S * Suffolk University 15

PageRank (PR) Correlations

Dmitry Zinoviev * IC S * Suffolk University 16

Visit Frequency (VF) Correlations

Dmitry Zinoviev * IC S * Suffolk University 17

Prediction (1)● Random Decision Forest (RDF) machine learning

predictor

● Predict above-median VF vs below-median VF: accuracy 69% (expected by chance: 50%)

● Predict Google PR: accuracy 50% (expected by chance: 17%); 95% if 1 error allowed

● Quite poor, but not hopeless

Dmitry Zinoviev * IC S * Suffolk University 18

Prediction (2)● But isn't visit frequency affected by group size?

(More performers—more search queries?)

● Add group size as a control variable

● Predict above-median VF vs below-median VF: accuracy 69% (was: 69%)

● No difference!

Dmitry Zinoviev * IC S * Suffolk University 19

GENRES

Dmitry Zinoviev * IC S * Suffolk University 20

Genres and Sharing● Build a network of similar genres (recursive

generalized similarity):– Two genres are similar if used by similar groups

– Two groups are similar if play similar genres

● Genre → node; two nodes are connected if the genres are “very similar”

● Community structure (m=0.3):– Punk/jazz, metal, disco/pop, blues/hip-hop, light rock

Dmitry Zinoviev * IC S * Suffolk University 21

Genre Network

Metal

Light rockPunk

Soul

Folk/jazz/hh

Disco

Ethno

Some genres are hierarchical (rock/metal/black metal). TODO: Assign them to different levels.

Dmitry Zinoviev * IC S * Suffolk University 22

Musicians Prefer Similar Genres

Dmitry Zinoviev * IC S * Suffolk University 23

LINGUOCULTURAL STRUCTURE

Dmitry Zinoviev * IC S * Suffolk University 24

Languages, Genres, and Sharing

● Group sharing network has 25 communities with 20+ groups in each

● Preferred language = language of the most frequently visited Wikipedia page

● Look into genres and preferred languages within each community: Are they homo- or heterogeneous?

Dmitry Zinoviev * IC S * Suffolk University 25

Genres per CommunityIn 9 communities, >50% of groups perform the one genre.In 23 communities, >50% of groups perform in no more than 2 genres.

71% of all shares—homogeneous

Dmitry Zinoviev * IC S * Suffolk University 26

Preferred Languages per CommunityIn 24 communities, >50% of groups have the same preferred language!

84% of all shares—homogeneous

Dmitry Zinoviev * IC S * Suffolk University 27

Language and Genre Homogeneity: Either or Both?

Language-defined

Genre-defined

Not very convincing?

Mixed

Dmitry Zinoviev * IC S * Suffolk University 28

Conclusion● Musician sharing networks of non-academic music

groups in the USSR and post-Soviet countries have community structure inspired by preferred language and musical genre

● Centrality and clustering measures of this network are correlated with long-term success of groups in terms of popularity on Wikipedia and to some extent can serve as success predictors

Dmitry Zinoviev * IC S * Suffolk University 29

Dataset Available● https://github.com/dzinoviev/sovietmusic

Dmitry Zinoviev * IC S * Suffolk University 30

Made in PythoniaGet your copy of “Data Science Essentials in Python” at https://pragprog.com/book/dzpyds/data-science-essentials-in-python