visualizing music similarity: clustering and mapping 500 ... · scientometrics (2019)...
TRANSCRIPT
Vol.:(0123456789)
Scientometrics (2019) 120:975–1003https://doi.org/10.1007/s11192-019-03166-0
1 3
Visualizing music similarity: clustering and mapping 500 classical music composers
Patrick Georges1 · Ngoc Nguyen2
Received: 29 September 2018 / Published online: 11 July 2019 © The Author(s) 2019
AbstractThis paper applies clustering techniques and multi-dimensional scaling (MDS) analysis to a 500 × 500 composers’ similarity/distance matrix. The objective is to visualize or trans-late the similarity matrix into dendrograms and maps of classical (European art) music composers. We construct dendrograms and maps for the Baroque, Classical, and Romantic periods, and a map that represents seven centuries of European art music in one single graph. Finally, we also use linear and non-linear canonical correlation analyses to identify variables underlying the dimensions generated by the MDS methodology.
Keywords Mapping classical music composers · Similarity measures · Dendrograms · Hierarchical clustering · Multidimensional scaling · Canonical correlation · Music information retrieval
Introduction
This paper uses several techniques that permit to visualize and graph some aspects of a Classical composers’ similarity matrix computed by Georges (2017). The number of composers analysed is 500 and the matrix is of dimension 500 × 500, leading to 250,000 pairwise (bilateral) composers’ similarity indices. Sheer dimension prevents easy report-ing of the results in a standard article. Hence, ‘visualizing’ or ‘translating’ the similarity matrix into clusters and mapping of composers is an important communication tool. Before embarking on this mapping project, it may be useful to provide some background on the motivation for building a composers’ similarity matrix.
* Patrick Georges [email protected]
Ngoc Nguyen [email protected]
1 Graduate School of Public and International Affairs, University of Ottawa, Social Sciences Building, Room 6011, 120 University, Ottawa, ON K1N 6N5, Canada
2 Department of Mathematics, Western Kentucky University, 1906 College Heights Blvd., Bowling Green, KY 42101, USA
976 Scientometrics (2019) 120:975–1003
1 3
Through digital music files and streaming, classical music consumers have access to an enormous range of composers and styles of classical music. However, the vast store of music makes the problem of what, exactly, to listen to, even more acute than during the pre-digital age. Many introductions to the classical music world are in the business of inculcation through lists of ‘mandatory’ composers and compositions to explore. Yet, as mentioned by Smith (2000), most people explore new subjects by starting with the famil-iar, and in the case of music, this may mean hearing a composer that one likes and search-ing for more music of the same type. Pandora, Spotify, Last.fm, YouTube and other music streaming platforms have algorithms proposing what an auditor may want to listen next. These algorithms and their improvement are largely tributary to the field of music informa-tion retrieval (MIR), which develops innovative content-, context- and user-based search-ing schemes, music recommendation systems, and novel interfaces to make the vast store of music available to all.1
Content-based MIR aims at uncovering from an audio signal meaningful music quali-ties (rhythm, timbre, melody, harmony, loudness, etc.) that can be used for music similarity and retrieval tasks. The unit of study is typically the ‘composition’ and the analysis is to compare similarities/differences across audio signals from a series of compositions or from different segments of a same composition (self-similarity). This particular angle has been used among others, by Foote (1999), Pampalk (2006), and more recently by Weiss (2017) and Weiss et al. (2018) for classical music, and by Mauch et al. (2015) for popular music where they investigate the evolution of musical diversity and disparity and whether evolu-tion has been gradual or punctuated. Context-based MIR is motivated by the fact that there are aspects not encoded in an audio signal or that cannot be extracted from it, but which are nevertheless important to human perception of music, for example, the cultural background of a composer.2 In this case, the unit of analysis may also be the ‘composer’ or the ‘artist/performer’, not just the ‘composition’. Examples are Pachet et al. (2001) and Baccigalupo et al. (2008). In this vein, Smith and Georges (2014) propose a statistical analysis that cap-tures similarity across pairs of composers by mean of pairwise comparison of presence-absence of personal musical influences (e.g., other composers/influencers). Data on per-sonal musical influences are taken (and available) from ‘The Classical Music Navigator’ (Smith 2000; hereafter referred to as CMN) where each of the 500 composers of the data-base is associated with a list of composers who have had a documented positive influence on a subject composer.3 In essence, they infer similarities among composers by assuming that if two composers share many of the same personal musical influences, their music will likely have some similarities. On the other hand, if two composers have been influenced by
1 One of the earlier survey article on MIR is Orio (2006). Schedl et al. (2014) provide a survey of more recent developments and applications.2 Besides content- and context-based MIR, the user also plays a key role in many MIR applications, includ-ing for recommendations, retrieval, and similarity analyses, which may also depend on user properties and user context. User context represents frequently changing factors such as activity, emotion and current social context of a user. User properties refer to constant or slowly changing characteristics of a user such as music taste and music education. See Schedl et al. (2014) for a survey of MIR explorations to model and analyse user behavior and to incorporate this knowledge into retrieval and recommendation systems.3 See the ‘composers’ page of the CMN. For example, the set of composers who had a documented posi-tive influence on Debussy is:{J.S. Bach, Wagner, Chopin, Tchaikovsky, Liszt, R. Schumann, Ravel, Fauré, Grieg, Rimsky-Korsakov, Mussorgsky, Franck, Gounod, Massenet, Satie, Borodin, Rameau, Albéniz, F. Couperin, Joplin, Delibes, Chausson, Lalo, Chabrier, Dukas, Alkan}.
977Scientometrics (2019) 120:975–1003
1 3
very distinct sets of composers, then their music is likely to have little similarity.4 They use and compare the performance of a dozen similarity indices or measure of association com-monly used in fields such as biogeography and biological systematics such as the first and second Kulczynski coeffecients (1927), the Jaccard coefficient (1901), the Dice coefficient (1945), the Simpson coefficient (1943), the Smith coefficient (1983), the binary distance coefficient (Sneath 1968), and the binomial index of dispersion �2 statistic (Potthoff and Whittinghill 1966).
Smith and Georges (2015) examine how much, if any, additional explanation can be added to their first study when ‘ecological’ measures (i.e., other characteristics such as time period, geographical location, school association, instrumentation emphases, etc.) are also taken into account. Each composer is associated with a subset of 298 ecological categories (also extracted from and available at the CMN),5 and the authors infer a statistical asso-ciation between pairs of composers by assuming, in essence, that if two composers share many ecological categories, then their musical ‘ecological niches’ are very similar, so that, in this sense, they may be considered similar.6 Although the two methods (personal musi-cal influences and ecological characteristic) do not generate identical similarities rankings, their combination does provide, arguably, an improved system of ranking. Finally, Georges (2017) applies on the same database a new measure of similarity, equipped with a statisti-cal significance test, the ‘centralised’ cosine measure. The centralised cosine is based on
4 For any pair of composers (i, j) for i, j ∈ C (among the n × n possible pairs with n = 500), we have a set Ii of all personal musical influences on composer i, and a set Ij of all personal musical influences on composer j. We are interested in capturing whether a composer k ∈ C had a reported influence on both i and j, on i but not j, on j but not i, and on neither i nor j. Thus, for any pair (i, j), Ii ∩ Ij = CIi,j is the set of composers k that have influenced both i and j; Ii − Ii ∩ Ij = Ii,−j is the set of composers k that have influenced i but not j;Ij − Ii ∩ Ij = Ij,−i is the set of composers k that have influenced j but not i and DIi,j = Ii,−j ∪ Ij,−i is the set of composers k that have influenced either i or j but not both. From this we can produce a count table for any pair (i, j) that sums the elements (the number of composers) in each of the four sets CIi,j , Ii,−j , Ij,−i , and C − CIi,j − DIi,j , and from which similarity indices for all pairs of composers (i, j) can be computed on the basis of well-known formulas. For example, the binomial index of dispersion used in Smith and Georges (2014) can be computed for any pair (i, j) as: BIDi, j = n(ad-bc)2/[(a + b)(c + d)(a + c)(b + d)], where a, b, c, d, and n are the count/number of composers in each of the five sets CIi,j , Ii,−j , Ij,−i , C − CIi,j − DIi,j and C.5 See the ‘Index of Forms and Styles of music’ in the CMN. For example, the ecological characteristics and style influences associated with Debussy are represented by the following elements:{ballets 1900 on, cello chamber music, chamber music 1825 to 1925, ‘Dance’ in composition title, etudes, flute unaccompanied, flute chamber music, harp chamber music, harp orchestral music, Impressionist style, nocturnes, operas 1900 on, orchestra incidental music, orchestra symphonic poems, orchestration, Paris composers 1800 on, piano unaccompanied 1775 to 1900, piano unaccompanied 1900 on, piano chamber music general, quartets for strings, song cycles and collections, songs 1800 to 1900, songs 1900 on, suites, trios for other combi-nations, viola unaccompanied or chamber music, violin chamber music 1850 on, Asian music, gamelan music, Renaissance Period music}. An 8-page list of all ecological categories is available in Smith and Georges (2015).6 We use the word ‘ecological’ to highlight the context-based nature of these categories. Categories such as ‘harp chamber music’, ‘piano composition from 1800 onwards’, ‘impressionist style’ and ‘gamelan music’, in order to describe Debussy, or ‘second Viennese school’, ‘serial technique’ and ‘expressionist style’ for Schoenberg, are fundamentally different from content-based information extracted from a score or audio file (e.g., harmonic intervals, tonal complexity, chord transitions, etc.). While there might be, in general, a fine line between ‘content’ and ‘context’-based information, in this paper content-based information refers to audio features (as mentioned just above) that can be extracted from one specific piece of music, while context-based information represents more general information about the style of music (including instru-mentation emphasis) of a composer.
978 Scientometrics (2019) 120:975–1003
1 3
earlier literature in scientometrics and bibliographic couplings.7 It provides a more sensi-tive measure of similarity than the binomial index used by Smith and Georges (2014, 2015) as it also tracks composers who (consciously or not) attempted to ‘differentiate’ themselves from others, leading to negative values for the index. Using the centralised cosine measure, three 500 × 500 composers’ matrices of similarity indices are computed, the first one (Sinfl), on the basis of personal musical influences, the second one (Secol), on the basis of 298 eco-logical characteristics and the third one (Scomb), that combines information from personal musical influences and ecological data. A subset of this last matrix for the Top-5 major composers of the database is given in Table 1. Eventually, Georges (2017) compares the first two matrices and shows that the interactions between similarity information extracted from personal musical influences and from ecological characteristics can provide a basis for an evolutionary model of Western classical music. In particular the analysis aims to understand why composers are perceived to ‘sound’ different (on the basis of ecological characteristics) despite similar ‘lineages’ (personal influences network) or why they could ‘sound’ alike even when their ‘lineages’ are different. Using an analogy to biodiversity ter-minology, these two cases are respectively referred to as ‘music speciation and evolution’ and ‘convergent evolution’. The paper finally proposes a statistical test in order to identify ‘transitional’ composers, ‘innovators’ and ‘followers’ in the development of Western clas-sical music.
Besides the application proposed in Georges (2017), more can be done with a compos-ers’ matrix of distance or similarity. In the rest of the introduction we present the contri-bution of this paper with respect to what exists in the literature. First, this paper applies clustering techniques to the similarity matrix Scomb in order to compute dendrograms or family tree diagrams for composers. Previous applications range from fields as diverse as biology (the tree of life evolution) to linguistics (grouping of languages according to lin-guistic affinities). Classifying (or clustering) composers into pre-defined groups such as
7 The ordinary (non-centralised) cosine similarity measure (also known as the Salton’s measure) is a statis-tic familiar to bibliometrics and scientometrics. The idea was mathematically formalized by Sen and Gan (1983) and later extended by Glänzel and Czerwon (1996) who also applied the methodology. As discussed in Smith et al. (2015), when all the vectors are Boolean vectors, the null distribution of the cosine similar-ity under the assumption of independence between two composers is unknown and has a nonzero mean; in order to derive a statistical test for the cosine measure, a centralised cosine measure was proposed (Giller 2012). The centralised cosine measure is the cosine measure computed on the centralised vectors, with respect to the mean (average) vectors, and this is the measure used in Georges (2017) as applied to classical music composers. The formula is: CSCi, j = (ad − bc)/sqrt [(a + b)(c + d)(a + c)(b + d)], where a, b, c, d are the count of composers in sets described in Footnote 4.
Table 1 A 5 × 5 subset of a 500 × 500 matrix of composers’ similarities, Scomb
Source: Georges (2017)
1. Bach, JS 2. Mozart, WA 3. Beethoven 4. Schubert 5. Brahms
1. Bach, JS 1.0000 0.1121 0.0962 0.0547 0.08472. Mozart, WA 0.1121 1.0000 0.3947 0.2684 0.20033. Beethoven 0.0962 0.3947 1.0000 0.4590 0.32524. Schubert 0.0547 0.2684 0.4590 1.0000 0.26005. Brahms 0.0847 0.2003 0.3252 0.2600 1.0000
979Scientometrics (2019) 120:975–1003
1 3
Renaissance, Baroque, Classical, Romantic, Modern, is of course a customary method in musicology. Recent computational methods based on statistical analyses of content-based information such as symbolic scores representations, MIDI files, or audio recordings (e.g., Rodriguez Zivic et al. 2013; White 2013; Weiss 2017; Weiss et al. 2018) typically show that composers tend to cluster in ways that conform to our intuitions about stylistic tra-ditions. In this paper we will compare the clustering performance of our context-based approach versus the content-based approach of Weiss (2017). Beyond grouping composers on a fixed number of clusters, we also investigate, as Weiss (2017), hierarchical clustering, which better highlights evolution and trends in classical music. Weiss (2017) does it for 70 composers and we will again compare results in the next section. Besides the 70 composers analysed in Weiss, our method permits to compute dendrograms for up to 500 composers representing seven centuries of classical music. We will also cluster composers within spe-cific period (Baroque, Classical, and Romantic).
The second objective of the paper is to apply multi-dimensional scaling (MDS) to the similarity matrix Scomb so as to generate maps of classical music composers. We occasion-ally see such maps on the internet that can be used to discover composers who are ‘close’ or ‘similar’ to a subject composer. Yet, these maps come with their own problems. For example, a search on Beethoven suggesting both Schubert, Stravinsky, the Spice Girls and Miles Davis, as ‘related’ composers is one example of a methodology that requires improvement.8 This in itself is perhaps not the key problem; the real issue is that these user-oriented maps rarely focus on the methodology and the data behind their construction. As a mapping methodology, MDS has been used in many fields from marketing research to cognitive psychology. Recently, Vehkalahti and Everitt (2019) suggest that MDS can be applied to classical music and they illustrate this with a map of ten classical compos-ers. This said, our MDS application maps up to 500 composers and we also map them by main periods (Baroque, Classical, Romantic, etc.). In terms of data on composers, we refer readers to the CMN website, and to Smith and Georges (2014, 2015) and Georges (2017) for a complete description of the data collection, the computing of similarity matri-ces, and the similarity index that is used (centralised cosine measure). In the first section (Clustering techniques and dendrograms) and the second section (Multidimensional scal-ing), we will be using the 500 × 500 composer similarity matrix Scomb, built on the basis of combined personal musical influences and ecological data. Indeed, as discussed in Smith and Georges (2015) and already mentioned above, adding both sets of data provides an arguably improved composers’ similarity ranking and similarity matrix relative to the cases when we use only the personal musical influences or the ecological data. This said, we will often apply the methods analysed here to a partition of the full 500 × 500 matrix (i.e., a matrix of smaller dimension) in order to present graphs that remain readable if printed. Finally, the next section extends the MDS analysis to three dimensions and uses linear and nonlinear canonical correlation analyses as an attempt to discriminate factors explaining the MDS dimensions. In that section we will be using the similarity matrix Sinfl, based on personal musical influences only, in order to extract MDS dimensions (the composers coor-dinates). Then canonical correlation will be applied to these computed MDS coordinates and a few selected composers’ ecological characteristics to assess the discriminatory power of ecological categories on the MDS coordinates.
8 See music-map.com. Accessed February 2, 2019.
980 Scientometrics (2019) 120:975–1003
1 3
Clustering techniques and dendrograms
Musicologists have typically recounted the history of classical music by classifying com-posers into defined groups such as Renaissance, Baroque, Classical, Romantic, and Mod-ern periods. See for example Taruskin (2010), and Taruskin and Gibbs (2013), henceforth T&G. Recently, clustering composers using statistical analyses of content-based informa-tion such as audio recordings in the case of Weiss (2017) and Weiss et al. (2018), have generated results that, on average, tend to conform to traditional musicology analyses. For example, Weiss (2017) selects commercial audio recordings of 1600 pieces (movements) for 70 composers and averages audio features (such as chord progressions, interval classes, and complexity features) of the recordings over each composer. This generates a com-poser feature matrix on which he performs principal component analysis (PCA) followed by K-means clustering on the first three principal components, with K = 5 (i.e., the algo-rithm groups the 70 composers into exactly 5 clusters). In this section, we follow the same methodology on the same list of composers, applying PCA to our context-based similarity matrix Scomb (instead of an audio, or content-based, feature matrix) and then using K-means clustering on the first three principal components.9
Our results are given in Fig. 1 and can be compared with Figure 7.24 in Weiss (2017). As in Weiss, mostly, composers with a similar lifetime belong to a same cluster. The first cluster corresponds to the traditionally-defined Baroque period (e.g., Lully, Corelli, Vivaldi, J.S. Bach, Handel, D. Scarlatti). We then have a pre-classical cluster (e.g., J. Stamitz, J.C. Bach).10 This is followed by a classical cluster (e.g., J. Haydn, Mozart, Beethoven, Schu-bert), a Romantic cluster (e.g., Weber, Chopin, Bruckner, Mussorgsky, Mahler, R. Strauss), and finally a Modern cluster (e.g., Debussy, Ravel, Schoenberg, Bartók, Stravinsky, Mes-siaen, Boulez). There are very few surprises in Fig. 1 with respect to conventional grouping of composers, with perhaps the exception of classical composer Cimarosa clustered with earlier pre-classical composers.11 It could also be argued that a few ‘transitional’ compos-ers might be grouped into one instead of another cluster. For example C.P.E. Bach, a tran-sitional composer between Baroque and Classical periods, often classified as Pre-classical
10 The period of transition from Baroque to Classical is, according to T&G (2013), “a historiographical black hole that scholars tried to plug by searching for a ‘missing’ link between the Bach/Handel and Haydn/Mozart poles.” According to T&G (2013), “[no] period has been in greater need of fundamental research than that from the 1730s to the 1760s, what was long commonly known as Preclassic (and thus relegated by its very name to a status of relative insignificance) or labeled as Rococo, style galant, and empfindsamer Stil (sensibility style).” T&G (2013) classify CPE Bach and JC Bach as, respectively, representatives of the ‘empfindsamer Stil’ and the ‘style galant’.11 Technically, one reason why Cimarosa (1749–1801) is pulled into this pre-classical group of composers is that some of his personal musical influences in our database are rather backward-looking (D. Scarlatti, Pergolesi, and Piccinni), and in terms of ecological characteristics, his name is associated with a relatively imprecise category of operas from 1600 to 1800. Interestingly, from a musicological point of view, Sheldon (1989) argues that “the essential problem of ‘Pre-Classic,’ is actually one of ‘Classic,’ an aesthetic norma-tive concept based on the accomplishments of a few ‘great men’. In other words, the obvious drawback of ‘Viennese Classicism’ as a style period designation is that it excludes far more than it includes.[…] [C]ontemporaries (Germans as well as Italians) felt more appropriately represented musically by Antonio Sac-chini, Giuseppe Sarti, Giovanni Paisiello, and Domenico Cimarosa.”
9 We actually do this for 65 of the 70 composers selected by Weiss because five of them (Giustini, Platti, Pleyel, Leopold Mozart, and Antheil) are not included in the list of 500 composers given in the CMN data-base.
981Scientometrics (2019) 120:975–1003
1 3
Lully
,55
Cor
elli,
60Pu
rcel
l,36
Cou
peri
n, F
,65
Alb
inon
i,80
Viv
aldi
,63
Tele
man
n,86
Ram
eau,
81B
ach,
JS,
65H
ande
l,74
Scar
latt
i, D
,72
Bac
h, C
PE,7
4St
amitz
, J,4
0H
aydn
, J,7
7B
ach,
JC
,47
Hay
dn, M
,69
Boc
cher
ini,
62C
imar
osa,
52Sa
lieri
,75
Cle
men
ti,80
Moz
art,
WA
,35
Dus
sek,
52B
eeth
oven
,57
Web
er,4
0R
ossin
i,76
Schu
bert
,31
Ber
lioz,
66M
ende
lssoh
n,38
Cho
pin,
39Sc
hum
ann,
R,4
6Li
szt,
75W
agne
r,70
Ver
di,8
8Sc
hum
ann,
C,7
7B
ruck
ner,
72Sm
etan
a,60
Bra
hms,
64B
orod
in,5
4 Sain
t-Sa
ëns,
86M
usso
rgsk
y,46
Tcha
ikov
sky,
53D
vorá
k,63
Gri
eg,6
4R
imsk
y-K
orsa
kov,
64Fa
uré,
79M
ahle
r,51
Deb
ussy
,56
Stra
uss,
R,8
5Si
beliu
s,92
Scho
enbe
rg,7
7Iv
es,8
0R
avel
,62
Bar
tók,
64St
ravi
nsky
,89
Web
ern,
62V
arès
e,82
Ber
g,50 Prok
ofie
v,62
Milh
aud,
82H
inde
mith
,68
Wei
ll,50 Sh
osta
kovi
ch,6
9M
essia
en,8
4B
ritt
en,6
3B
oule
z,91
1630
1680
1730
1780
1830
1880
1930
1980
Year
Fig.
1
K-m
eans
clu
sterin
g (K
= 5)
of 6
5 se
lect
ed c
ompo
sers
and
thei
r cor
resp
ondi
ng li
fetim
e
982 Scientometrics (2019) 120:975–1003
1 3
(see Footnote 10), is clustered in Fig. 1 with other Baroque composers, and Schubert, a transitional composer between Classical and Romantic periods, is clustered in the Classical period.12
Comparing our results with those of Weiss, his clustering does not lead to a Pre-classical cluster (even if such a period is typically mentioned by musicologists). Also he generates two parallel clusters covering the modern period (what he refers to the ‘avant-garde’ of that time (e.g., Schoenberg, Berg, Varèse, Boulez, and, more contentiously, Britten and Bartók) and a more moderate harmonic style (such as Prokofiev, Shostakovich, Hindemith, Milhaud). Observe that, as suggested by Griffiths (1978) in his ‘Concise history of modern music: From Debussy to Boulez’, our clustering of the modern period includes the impressionists, Debussy and Ravel, while they are both clustered with other Romantic composers in Weiss’ analysis. Also quite surprising in Weiss’ analysis, are the clustering of Vivaldi and, to a lesser extent, D. Scarlatti, with Classical composers, C.P.E Bach with Romantic composers, or Mussorgsky and Fauré in the (moderate harmonic) modern cluster.13 As Weiss (2017) argues, “[t]hese outliers point to the difficulties of clustering composers to a fixed number of top-level clus-ters”. This leads him to propose applying hierarchical clustering, as in bioinformatics where “phylogenetic trees are popular tools for clustering DNA sequences in order to highlight evo-lutionary development and trends.” In the following, we quickly describe hierarchical cluster-ing and then apply it to the list of composers selected by Weiss. We finally apply hierarchical clustering to specific periods so as to highlight intra-period developments.
The agglomerative hierarchical clustering algorithm builds a cluster hierarchy displayed as a tree diagram called a dendrogram. In our case, the primary input for the algorithm is the 500 × 500 matrix of composers similarity Scomb. Typically, the algorithm applied to the composers’ similarity matrix begins with each composer in a separate cluster. In the very first step a two-composer cluster is formed between the two most similar composers. Then, at each successive step, the two clusters that are most similar are joined into a new cluster. Several methods are available to compute distance between clusters of composers
13 Domenico Scarlatti (1685–1757) is primarily a Baroque composer, clustered under Baroque in Fig. 1. Pre-classical (galant) music emerged during his long life, and there is some evidence that D. Scarlatti might have been the first great advocate of the Florentine piano (see Sutherland, 1995). Yet, it remains odd to see a clustering analysis such as the one in Weiss, which groups D. Scarlatti together with other Classical composers. It would be interesting to see how Weiss’ results are sensitive to his selection of recordings that use exclusively the piano, instead of the harpsichord. Carl Maria von Weber (1786–1826) is also clustered in Weiss together with other Classical composers, while he is clustered with Romantic composers in our Fig. 1. Weber’s grouping remains ambiguous as he was a transition composer from the Classical to the Romantic periods. Yet, according to Finscher (1983), his opera, Freischütz is often considered to be the first German Romantic opera (first performed on June 18, 1821), a genre which was to dominate the opera houses of the German-speaking countries until well beyond the middle of the [nineteenth] century, leading the way to the Romantic operas of Wagner.
12 There is a continuing question and problem as to how the music of Franz Schubert (1797–1828) is best to be understood, as representative of Classicism or of Romanticism. As argued by Gray (1978), “the Romantics created a Schubert in their own image with the aid of countless colorful legends which have little to do with fact. […]. The Romantics tended to dismiss Schubert’s early works, which are clearly Clas-sical, with the important exception of his songs; they preferred to believe that the more personal late works simply burst forth in a paroxysm of Romantic inspiration.” Yet, Gray (1971) argues that Schubert’s works are largely reflective of the Viennese Classicism of Haydn, Mozart and Beethoven. Furthermore, he argues that “historians have usually drawn upon the form wherein Schubert is universally regarded as supreme—the Lied—as evidence that that he is, most of all, a romantic.” But this attitude may follow from a reverse historicism, conferring the accolade of ‘romantic’ upon Schubert’s songs because the art song is one of the most romantic forms in the hands of Schumann, Franz, Loewe, Brahms, and Wolf. Yet, according to Gray (1971), “that is exactly in his songs where Schubert most perfectly achieves classic stature.”
983Scientometrics (2019) 120:975–1003
1 3
(as opposed to the distance between pairs of composers, which is the primary input), such as single linkage, complete linkage, simple average, centroid, median, group average (unweighted pair-group), Ward’s minimum variance, etc.14 Here we use the group aver-age method whereby the distance between two clusters is defined as the average distance between each of their members. Let the distance between composers/clusters i and j be represented as di,j and suppose that these are the two most similar composers in the data-base. Then i and j will be merged into a new cluster k from which a new set of distance must be computed between k and any other composers/clusters m in the database using the formula: dk,m =
(
ni∕nk)
di,m +
(
nj∕nk)
dj,m where nx is the number of composers in cluster x. Hence, just after the very first step, when the two most similar composers are fused into a cluster k, the distance between this newly formed cluster and any other composer m is dk,m = (1∕2)di,m + (1∕2)dj,m , that is, the average of the distance between i and m and j and m.15 This permits to progressively build the cophenetic distance matrix which can then graphically be represented with a dendrogram.16
Figures 2, 3, 4 and 5 show dendrograms resulting from different subsets of the 500 × 500 similarity matrix Scomb, as starting points for the algorithm.17 Figure 2 focuses on the list of composers in Weiss (2017). Figures 3, 4 and 5 give dendrograms for composers of the Baroque, Classical, and Romantic periods. Dendrograms for other periods (Medieval and Renaissance, and twentieth century music (Impressionism, atonal, dodecaphonic, neo-Classical, neo-Romantic and ‘avant-garde’ composers)) and for all 500 composers are avail-able from the authors upon request, to be studied from a deeper musicological angle. In all these figures, each joining of two clusters is represented by a node (the splitting of a vertical line into two vertical lines) and the vertical height of the split (as measured on the vertical axis) shows the distance between the two clusters (the larger the value the closer the clusters or individual composers). One typically reads these dendrograms by ‘cutting’ the tree at a specific height (see for example the horizontal dash line as the cut-level) from which one can examine and study the resulting main clusters. Figure 2 shows three main clusters (Baroque and Classical in blue, Romantic in green and Modern in red) and sev-eral lower-level sub-clusters. Without going into much detail, we can see that the results at sub-clusters levels are consistent with common knowledge, for example, Baroque compos-ers are grouped by nationality (French: Lully, Rameau, F. Couperin; German: Telemann, J.S. Bach, Handel; and Italian: Corelli, Vivaldi, Albinoni). There is a group of transition composers from Baroque to Classical (C.P.E Bach, J.C. Bach, J. Stamitz), a core of Classi-cal composers (Boccherini, Mozart, J. Haydn, M. Haydn, Cimarosa, Salieri) and late Clas-sical composers including those transitioning into early Romantic music (Clementi, Bee-thoven, Schubert, Weber, Rossini). Mendelssohn, an early Romantic composer is also in this
14 Chapter 445 of the NCSS Statistical Software offers an accessible introduction to hierarchical clustering.15 Note that, once fused, composers are never separated.16 In the process of constructing a dendrogram, a cophenetic matrix is computed. It is a distance matrix wherein the original pairwise distance between composers are replaced by the computed distance between their clusters at the time these clusters are merged. The cophenetic correlation coefficient provides the cor-relation between both matrixes. The cophenetic correlation coefficients for Figs. 2, 3, 4 and 5 are above 0.75, which is typically good.17 The dendrograms have been generated with XLSTAT (Addinsoft 2019).
984 Scientometrics (2019) 120:975–1003
1 3
156. Clemen�417. Dussek3. Beethoven17. Mendelssohn42. Weber4. Schubert23. Rossini105. Boccherini2. Mozart, WA9. Haydn, J150. Haydn, M215. Cimarosa228. Salieri467. Stamitz, J61. Bach, CPE119. Bach, JC40. Purcell108. Couperin, F86. Rameau122. Lully65. Scarla�, D35. Telemann1. Bach, JS8. Handel94. Corelli22. Vivaldi135. Albinoni124. Boulez63. Messiaen127. Varèse81. Milhaud14. Debussy20. Ravel75. Webern34. Schoenberg72. Berg26. Bri�en73. Weill25. Bartók44. Hindemith16. Stravinsky27. Shostakovich28. Prokofiev13. Schumann, R138. Schumann, C41. Saint-Saëns6. Wagner83. Smetana10. Chopin12. Liszt47. Bruckner7. Verdi29. Berlioz32. Fauré33. Grieg5. Brahms21. Dvorák18. Strauss, R19. Mahler11. Tchaikovsky46. Rimsky-Korsakov49. Mussorgsky79. Borodin36. Sibelius43. Ives
-0.0
7
0.13
0.33
0.53
0.73
0.93
Similarity
Fig.
2
Euro
pean
art
mus
ic. D
endr
ogra
m fo
r the
65
com
pose
rs in
Fig
. 1
985Scientometrics (2019) 120:975–1003
1 3
1640. Sanz1583. Gibbons, O1602. Lawes1621. Locke1649. Blow1659. Purcell1679. Zelenka1683. Heinichen1687. Weiss1697. Quantz1688. Fasch1692. Tar�ni1685. Scarla�, D1602. Cavalli1626. Legrenzi1686. Marcello1670. Caldara1670. Bononcini1660. Scarla�, A1605. Carissimi1667. Lo�1653. Corelli1678. Vivaldi1685. Handel1687. Geminiani1639. Stradella1671. Albinoni1658. Torelli1695. Locatelli1583. Frescobaldi1616. Froberger1585. Schütz1586. Schein1587. Scheidt1685. Bach, JS1637. Buxtehude1665. Bruhns1653. Pachelbel1661. Böhm1626. Couperin, L1689. Boismor�er1697. Leclair1656. Marais1668. Couperin, F1657. Lalande1676. Clérambault1643. Charpen�er, M-A1632. Lully1683. Rameau1660. Fux1620. Schmelzer1644. Biber1653. Muffat1681. Telemann
0.03
0.13
0.23
0.33
0.43
0.53
0.63
0.73
0.83
0.93
Similarity
Baroqu
e
Fig.
3
Euro
pean
art
mus
ic. D
endr
ogra
m fo
r the
Bar
oque
per
iod
(unw
eigh
ted-
pair
grou
p av
erag
e)
986 Scientometrics (2019) 120:975–1003
1 3
1746. Billings1713. Krebs1791. Meyerbeer1795. Mercadante1784. Spohr1763. Danzi1759. Krommer1786. Kuhlau1791. Czerny1752. Clemen�1782. Field1760. Dussek1778. Hummel1782. Paganini1778. Sor1770. Carulli1781. Giuliani1774. Spon�ni1763. Mayr1763. Méhul1760. Cherubini1770. Reicha1755. Vio�1775. Boieldieu1782. Auber1786. Weber1792. Rossini1710. Arne1711. Boyce1745. Stamitz, K1700. Sammar�ni, GB1717. Stamitz, J1706. Galuppi1699. Hasse1710. Pergolesi1729. Soler1706. Mar�ni1710. Bach, WF1714. Bach, CPE1737. Haydn, M1739. Di�ersdorf1743. Boccherini1735. Bach, JC1770. Beethoven1732. Haydn, J1756. Mozart, WA1714. Gluck1714. Jommelli1749. Cimarosa1750. Salieri1741. Grétry1728. Piccinni1740. Paisiello
0.02
0.12
0.22
0.32
0.42
0.52
0.62
0.72
0.82
0.92
Similarity
Classical
Fig.
4
Euro
pean
art
mus
ic. D
endr
ogra
m fo
r the
Cla
ssic
al p
erio
d (u
nwei
ghte
d-pa
ir gr
oup
aver
age)
987Scientometrics (2019) 120:975–1003
1 3
1819. Offenbach1859. Herbert1819. Suppé1827. Strauss, Jos1804. Strauss, J Sr.1825. Strauss, J Jr.1844. Sarasate1820. Vieuxtemps1835. Wieniawski1838. Bruch1805. Hartmann, JPE1854. Humperdinck1821. Bo�esini1857. Leoncavallo1858. Puccini1834. Ponchielli1842. Boito1860. Charpen�er, G1836. Delibes1842. Massenet1797. Donize�1801. Bellini1812. Flotow1803. Adam1799. Halévy1811. Thomas1803. Berlioz1813. Verdi1818. Gounod1804. Glinka1801. Lortzing1796. Berwald1810. Nicolai1813. Wagner1854. Janácek1852. Tárrega1860. Albéniz1813. Dargomïzhsky1855. Lyadov1840. Tchaikovsky1844. Rimsky-Korsakov1837. Balakirev1833. Borodin1835. Mussorgsky1829. Rubinstein1835. Cui1859. Ippolitov-Ivanov1856. Taneyev1861. Arensky1810. Chopin1810. Schumann, R1797. Schubert1809. Mendelssohn1845. Fauré1811. Liszt1822. Franck1857. Elgar1833. Brahms1841. Dvorák1856. Sinding1860. MacDowell1854. Moszkowski1860. Paderewski1829. Go�schalk1857. Chaminade1841. Chabrier1848. Duparc1835. Saint-Saëns1838. Bizet1813. Alkan1837. Guilmant1844. Widor1851. Indy1855. Chausson1805. Mendelssohn-Hensel1819. Schumann, C1796. Loewe1815. Franz1824. Bruckner1824. Reinecke1839. Rheinberger1848. Parry1852. Stanford1853. Foote1854. Chadwick1842. Sullivan1824. Smetana1843. Grieg1860. Mahler1860. Wolf1823. Lalo1830. Goldmark1850. Fibich1840. Svendsen1817. Gade1822. Raff
0.07
0.17
0.27
0.37
0.47
0.57
0.67
0.77
0.87
0.97
Similarity
Roman
�c
Fig.
5
Euro
pean
art
mus
ic. D
endr
ogra
m fo
r the
Rom
antic
per
iod
(unw
eigh
ted-
pair
grou
p av
erag
e)
988 Scientometrics (2019) 120:975–1003
1 3
group, paired with Beethoven.18 Pairs of composers in other periods seem reasonable, such as Debussy and Ravel, Schoenberg and Berg, Prokofiev and Shostakovich, Fauré and Grieg, Dvorák and Brahms, R. Strauss and Mahler, Chopin and Liszt, Robert and Clara Schumann. Recall that the list of composers studied here was taken directly from Weiss (2017) whose Figure 7.25 also gives a hierarchical clustering.19 Yet, as the author pinpoints, some of his results do not reflect meaningful stylistic similarities, such as the pairing of Sibelius and C.P.E. Bach, Fauré and Prokofiev, J. Stamitz and Corelli. Comparing our results with those in Weiss (2017) shows that using context-based data, instead of content-based (audio files) data, remains a useful approach for clustering and visualising similarities across composers.
It is interesting to note that grouping well-known composers into either a fixed number of clusters or into hierarchical clusters (as in Figures 1 and 2 and in Weiss 2017) roughly coincides with common musicological knowledge. However, more useful perhaps, would be to cluster a larger number of composers belonging (born into) a specific period, so as to shed light on ‘finer’ intra-period connections between composers (including composers less well-known than the very familiar names of Figs. 1, 2).20 Figure 3 shows the dendro-gram for the Baroque period with six main clusters that seem related to a regional/national/country dimension.21 Sanz is a single-element cluster representing Spain (a country which has long been isolated from the rest of Europe in terms of musical influences). We then see a cluster of English baroque composers (from Gibbons, in fact, late Renaissance, to Purcell), and an ‘Italian’ cluster (from the early Baroque opera composer Cavalli, up to the high/late baroque violin virtuoso Locatelli, passing by Corelli, A. Scarlatti, Vivaldi and Handel who, although ‘German’ and working in England, was very much influenced by ‘Italian’ music).22,23 German composers are divided into two clusters, the early to mid-dle Baroque cluster, centered on the German keyboard (harpsichord and organ) tradi-tion (from the early German Baroque of Schein, Schütz and Scheidt, culminating to J.S. Bach, and passing by some important keyboard composers such as Froberger, Buxtehude, Bruhns, Pachelbel and Böhm), and a high/late-Baroque cluster largely based on the city of Dresden (Zelenka, Heinichen, Weiss, and Quantz (a student of Zelenka). Interestingly, the late Renaissance/early Baroque Italian keyboard composer, Frescobaldi, is part of the former cluster, which reminds us the influence that he had on the development of German
20 Here, composers have been assigned to a specific period on the basis of their birthdate. Baroque (birth-dates):1583–1698, Classical: 1699–1795, and Romantic: 1796–1860.21 In this and the following figures, the name of the composers is preceded by his/her date of birth.22 Recall that neither Italy nor Germany were politically unified at that time. Yet these geographical con-cepts are useful, here, from a clustering point of view.23 Technically, our database indicates ‘Italian music’ and ‘Italian opera’ as style influences for Handel. According to T&G (2013) “[Handel] spent his true formative years—from 1706 to 1710—in Florence and Rome, where he worked for noble and ecclesiastical patrons and met Alessandro Scarlatti, Corelli, and other luminaries of the day. He was known affectionately as il Sassone, ‘the Saxon,’ meaning really ‘the Saxon turned Italian,’ in the musical sense.”
18 As argued by Todd (2008), “[Mendelssohn] is the staunch upholder of conservative aesthetic value; no radical reformer or innovator, he is the composer of finely polished chamber works that fall easily and unobtrusively in the Classical paradigms (duo sonata, piano trio, string quartet, and quintet) established by Mozart and Haydn in the eighteenth century and redefined by Beethoven in the early decades of the nine-teenth.”19 To be fully consistent with Weiss (2017), Fig. 2 is built using complete-linkage hierarchical clustering. But the unweighted pair-group algorithm provides very similar results to Fig. 2. Figures 3, 4 and 5 are built using the unweighted pair-group algorithm described in the text.
989Scientometrics (2019) 120:975–1003
1 3
keyboard music).24,25 Finally the last cluster is itself composed of two main sub-clusters. First we have the French baroque tradition (from Lully and Charpentier, up to late Baroque Leclair, passing by Marais, F. Couperin, Rameau and Boismortier); and then we have a Bohemian-Austrian sub-cluster of composers centered on Vienna and Salzburg represented by Schmeltzer, Biber (who was, allegedly, Schmeltzer’s student), Fux, and the rather cos-mopolitan composer Muffat). Violin, instead of harpsichord, is also more predominant in this group (at least for Schmeltzer and Biber) than in the German ‘keyboard’ cluster men-tioned above. That the cosmopolitan German composer Telemann belongs to this last clus-ter is interesting as he is often portrayed as having incorporated and mixed several national styles (German, French, Italian, and Polish traditions) in his compositions.26
The dendrogram for the Classical period is given in Fig. 4. First, observe two clusters with a single-element, Krebs and Billings. Krebs was a German composer and a student of J.S. Bach, and whose contrapuntal late baroque style was being supplanted by the newer galant style that eventually led to the Classical music period. Hence he does not belong to the Classical period and the cluster analysis neatly points out this fact. Billings was a Boston-based American choral music composer, which also isolates him from the Euro-pean classical music tradition. The other three main clusters are the ‘early’ pre-Classical cluster (from Arne to C.P.E. Bach, passing by Martini and the opera composers Pergolesi and Galuppi), ‘core’ Classical (from Dittersdorf to Beethoven, passing by Grétry, Mozart, Cimarosa and Salieri) and ‘late’ Classical (from Clementi to Rossini and Mercadante, pass-ing by Hummel, Field, Czerny, and Paganini). Whether Beethoven should be viewed as ‘late’ instead of ‘core’ Classical composer is an open question. The cluster analysis puts him close to J. Haydn, one of his teacher who epitomises core Classical music. Yet, Bee-thoven’s mid-period compositions also point toward the emergence of Romantic music which would tend to position him into the late Classical cluster (transitioning into Roman-tic period).27 Finally Fig. 5 shows the six clusters for the Romantic period.28 First we have the operetta and waltz composers (e.g., Offenbach and J. Strauss Jr.). Then, the violin vir-tuosi (Sarasate, Vieuxtemps, Wieniawski), and a cluster of composers who predominantly wrote operas (from Bottesini to Wagner, passing by Bellini, Verdi, Massenet, Flotow and
24 As recounted by Silbiger (1980), Frescobaldy’s far-reaching influence on seventeenth- and eighteenth-century keyboard (organ) music is universally acknowledged today. The true inheritors of his art are to be found outside his native Italy. “[Frescobaldy’s] brilliant pupil Johann Jacob Froberger carried the torch north and established a tradition there that was passed on by successive generations of German organists across at least a century.”25 Another important influencer of the North German organ tradition, especially on Scheidt, and whose influence was eventually handed on to Buxtehude and later yet on to J.S. Bach, is the Dutch composer and organist Sweelinck, who is excluded from this graph because he was born earlier, in 1562 (late Renais-sance).26 As mentioned by Ritter (1994), “Telemann’s works display a complete absorption and fusion of many individual national characteristics which establish him as an important link in the development of Baroque music and a precursor of the new galant style.”27 See for example Cassedy (2010): “While some of [Beethoven’s] earliest compositions in the 1790s had represented a departure from a musical past associated with Mozart and Haydn, by 3 years into the new century he was creating music that defiantly rejected traditions and redefined conventions. […] [T]he Sonata in F minor, op. 57 (Appassionata), composed in 1804 and 1805, serves as the earliest and boldest example of Beethoven’s break with the past and his entry into the territory of what [ETA] Hoffmann char-acterized as ‘romantic’.” On the other hand Clementi, clustered in Fig. 4 with late Classical composers, is sometimes presented as the forerunner of Beethoven (see de Saint-Foix and Herter Norton 1931).28 To limit the size of the dendrogram in Fig. 5 (and the corresponding MDS map in Fig. 8), late Romantic composers born after 1861 have been excluded.
990 Scientometrics (2019) 120:975–1003
1 3
Gounod).29 We have a single element cluster, Janáček, whose forward-looking innovative compositions led to modernism (see Steinberg 2006). And finally a very large group of ‘core’ Romantic composers (from Tarrega to Raff, passing by Mendelssohn, Liszt, Tchai-kovsky and Mahler). This last cluster is made up of many sub-clusters. Without going into much detail, we observe that the sub-clusters have a typically strong regional component. First, Spain with Tarrega and Albéniz. Then comes Russia with ‘The Five’ of the new Russian School (Balakirev, Borodin, Rimsky-Korsakov, Mussorgsky, Cui), plus Tchaiko-vsky and a few other Russians. Then, a core of early to middle key German and German-influenced composers (from Robert Schumann to Brahms, passing by Schubert and Felix Mendelssohn, plus Liszt (Hungarian), Dvorák (Czech), Chopin (Polish/French), Franck (Belgian/French), Fauré (French) and Elgar (British)).30 Then, a sub-cluster of French composers (from Chaminade to Chausson, and including the American pianist and com-poser of French/Creole origin Louis Moreau Gottschalk). Finally we have two sub-clusters which are not specifically based on a regional grouping. The first one, made of Sinding (Norwegian), MacDowell (American), Moszkowski and Paderewski (both Polish), repre-sent a group of composers who are well-known for their piano compositions. The second, more heteroclite group, is made up of a bunch of ‘national school’ composers whose music often became a means of asserting their national identity or else reflects their countries or regions of origin (Smetana and Fibich (Czech), Grieg and Svendsen (Norwegian), Gade (Danish), Stanford (Irish), Parry and Sullivan (English), Chadwick and Foote (Americans), Lalo (French), the German composers and pedagogues Carl Reinecke who taught to Grieg, Svendsen, and Stanford, and Rheinberger who taught to Chadwick. Arguably more diffi-cult to cluster here, we also have Raff (Swiss), and Goldmark, Wolf, Bruckner and Mahler (Austrians). Interestingly, we also find two women in this sub-cluster, grouped with two German composers (C. Loewe and R. Franz) essentially known, as them, for their lieder and piano pieces: Clara Wieck-Schumann and Fanny Mendelssohn-Hensel, respectively the wife of Robert Schumann and the sister of Felix Mendelssohn (who we encountered previ-ously in the ‘key’ German Romantic cluster). That they are not in the cluster of their hus-band or brother is perhaps the result of what T&G (2013) call ‘genius restrained’.31 Finally, one cluster in Fig. 5, which is made of Bruch, Hartmann, and Humperdinck, is the result of the algorithm used but does not seem to have much justification on the basis of their music. Humperdinck would probably fit better into the cluster of predominantly-opera composers, J.P.E. Hartmann (Danish) would fit better in the more heteroclite national school cluster while Bruch (German) falls into the Romantic classicism exemplified by Brahms.
31 Technically, they are mainly associated in our ecological categories with lieder and piano pieces. As a comparison, R. Schumann and F. Mendelssohn are also associated with larger, more ambitious categories such as orchestral works (including concertos) and symphonies. As for ‘genius restrained’, T&G (2013) recount that “[Fanny Mendelssohn’s] artistic isolation and probably her creative blocks were the result of the discouragement she received, from her father and later from her brother, when it came to pursuing a career.” As for Clara Wieck-Schumann, a far more significant and public figure of the nineteenth century, T&G (2013) explain that “she stopped composing entirely after [her husband]’s death in 1856, when at age thirty-six she found herself a single mother of seven (one child had died).”
29 Bottesini was also a famous double bass virtuoso and composer for his instrument.30 These non-German composers, although also having a ‘national’ school component (see later), have all been very much influenced by the symphony, concerto, chamber, lieder/songs and piano music tradition of their German colleagues in this cluster.
991Scientometrics (2019) 120:975–1003
1 3
Multidimensional scaling: 2 dimensions
Multidimensional scaling (MDS) is a technique that generates a map displaying the relative positions of a number of objects based on a given set of pairwise distances between these objects. The following example may help to understand the essence of MDS. Given a geo-graphical map of North America and a scale, one can compute the aerial distances between cities. If instead the initial data is a set of pairwise distances between North American cities, one can attempt to recover the geographical map of North America (within about a symmetry and/or rotation). MDS is a methodology that uses algorithms to implement this idea. Although MDS can generate a two-dimensional map that could perhaps, or hopefully, be interpreted as latitude and longitude in the geographical example, the technique per se can be used to generate more than two dimensions from a given distance matrix. A third dimension could perhaps be interpreted as the average altitude of cities with respect to sea level (i.e., topography).
We can apply the MDS methodology to the 500 × 500 matrix of pairwise distances across composers, Scomb.32 In this case, the MDS algorithm aims to position each of the 500 composers in an N-dimensional space (i.e., assigning coordinates) such that the primary distances between composers are preserved as well as possible, according to an optimisa-tion procedure.33 Choosing N = 2 optimizes composers location in a two-dimensional scat-terplot. In this case, given the distance di,j between composers i and j, the algorithm gener-ates the coordinates (xi, yi) and (xj, yj). The MDS algorithm typically computes coordinates (x, y) so as to minimize a loss function called ‘stress’, which is a sum of squared errors between the actual distance across any two composers, di,j, and the predicted distance di,j * computed by the algorithm:
and where the predicted distances depend on the number of dimensions kept and the algo-rithm that is used. Stress values near zero are the best.34
We first apply the MDS methodology to composers of specific periods (Baroque, Clas-sical and Romantic) in Figs. 6, 7 and 8.35 Note that the clusters that we have observed in dendrograms of Figs. 3, 4 and 5 are circled, as much as possible, on the MDS maps of the corresponding period, using the same color code. We can notice a strong correspondence between clusters in the dendrograms and the relative placement of composers on the MDS maps. The maps and dendrograms developed here also provide a music educational tool. For example, if one is interested in the guitar music of classical composer Fernando Sor,
Stress =
√
∑
i<j
(
di,j − d∗i,j
)2/
∑
i<jd2i,j,
32 Chapter 435 of the NCSS Statistical Software offers an accessible introduction to multidimensional scal-ing.33 Our initial composers’ proximity matrix does not represent pairwise distances across composers, di,j, but pairwise similarities, si,j. Typically similarity indices are converted into distance indices using the formula: di,j =
√
si,i + si,j − 2si,j.34 Following Kruskal (1964), a value of 0 is a perfect goodness-of-fit, 0.05 is good, 0.1 is fair and 0.2 is poor. More recent articles caution against using this advice since acceptable values of stress depends on the quality of the distance matrix and the number of objects in that matrix.35 The MDS maps have been generated with XLSTAT (Addinsoft 2019).
992 Scientometrics (2019) 120:975–1003
1 3
1583
. Fre
scob
aldi
1583
. Gib
bons
, O
1585
. Sch
ütz
1586
. Sch
ein
1587
. Sch
eidt
1602
. Cav
alli
1602
. Law
es
1605
. Car
issim
i
1616
. Fro
berg
er
1620
. Sch
mel
zer
1621
. Loc
ke16
26. C
oupe
rin, L
1626
. Leg
renz
i
1632
. Lul
ly
1637
. Bux
tehu
de
1639
. Str
adel
la
1640
. San
z
1643
. Cha
rpen
�er,
M-A
1644
. Bib
er
1649
. Blo
w
1653
. Cor
elli
1653
. Pac
helb
el16
53. M
uffat
1656
. Mar
ais
1657
. Lal
ande
1658
. Tor
elli
1659
. Pur
cell
1660
. Sca
rla�
, A
1660
. Fux
1661
. Böh
m
1665
. Bru
hns
1667
. Lo�
1668
. Cou
perin
, F
1670
. Cal
dara
1670
. Bon
onci
ni
1671
. Alb
inon
i
1676
. Clé
ram
baul
t
1678
. Viv
aldi
1679
. Zel
enka
1681
. Tel
eman
n
1683
. Ram
eau
1683
. Hei
nich
en
1685
. Bac
h, JS
1685
. Han
del
1685
. Sca
rla�
, D16
86. M
arce
llo
1687
. Gem
inia
ni
1687
. Wei
ss
1688
. Fas
ch
1689
. Boi
smor
�er
1692
. Tar
�ni
1695
. Loc
atel
li
1697
. Lec
lair
1697
. Qua
ntz
-101
-10
1
Baroqu
e
Fig.
6
Euro
pean
art
mus
ic. T
he B
aroq
ue p
erio
d. A
n ap
plic
atio
n of
the
MD
S m
etho
dolo
gy
993Scientometrics (2019) 120:975–1003
1 3
1699
. Has
se
1700
. Sam
mar
�ni,
GB
1706
. Gal
uppi
1706
. Mar
�ni
1710
. Per
gole
si17
10. B
ach,
WF
1710
. Arn
e1711
. Boy
ce
1713
. Kre
bs
1714
. Bac
h, C
PE
1714
. Glu
ck
1714
. Jom
mel
li
1717
. Sta
mitz
, J
1728
. Pic
cinn
i
1729
. Sol
er
1732
. Hay
dn, J
1735
. Bac
h, JC17
37. H
aydn
, M
1739
. Di�
ersd
orf
1740
. Pai
siello
1741
. Gré
try
1743
. Boc
cher
ini
1745
. Sta
mitz
, K1746
. Bill
ings
1749
. Cim
aros
a17
50. S
alie
ri1752
. Cle
men
�
1755
. Vio
�
1756
. Moz
art,
WA
1759
. Kro
mm
er
1760
. Che
rubi
ni1760
. Dus
sek
1763
. May
r
1763
. Dan
zi
1763
. Méh
ul
1770
. Bee
thov
en
1770
. Rei
cha
1770
. Car
ulli
1774
. Spo
n�ni
1775
. Boi
eldi
eu
1778
. Hum
mel
1778
. Sor
1781
. Giu
liani
1782
. Pag
anin
i
1782
. Fie
ld
1782
. Aub
er
1784
. Spo
hr
1786
. Web
er17
86. K
uhla
u17
91. M
eyer
beer
1791
. Cze
rny
1792
. Ros
sini
1795
. Mer
cada
nte
-101
-10
1
Classic
al
Fig.
7
Euro
pean
art
mus
ic. T
he C
lass
ical
per
iod.
An
appl
icat
ion
of th
e M
DS
met
hodo
logy
994 Scientometrics (2019) 120:975–1003
1 3
1796
. Loe
we
1796
. Ber
wal
d17
97. S
chub
ert
1797
. Don
ize�
1799
. Hal
évy18
01. B
ellin
i
1801
. Lor
tzin
g
1803
. Ber
lioz
1803
. Ada
m
1804
. Glin
ka
1804
. Str
auss
, J S
r.
1805
. Men
delss
ohn-
Hens
el
1805
. Har
tman
n, JP
E
1809
. Men
delss
ohn
1810
. Cho
pin
1810
. Sch
uman
n, R
1810
. Nic
olai
1811
. Lisz
t
1811
. Tho
mas18
12. F
loto
w
1813
. Wag
ner
1813
. Ver
di
1813
. Alk
an
1813
. Dar
gom
ïzhsk
y
1815
. Fra
nz
1817
. Gad
e
1818
. Gou
nod
1819
. Offe
nbac
h
1819
. Sch
uman
n, C
1819
. Sup
pé
1820
. Vie
uxte
mps
1821
. Bo�
esin
i
1822
. Fra
nck
1822
. Raff
1823
. Lal
o
1824
. Bru
ckne
r
1824
. Sm
etan
a
1824
. Rei
neck
e
1825
. Str
auss
, J Jr
.
1827
. Str
auss
, Jos18
29. R
ubin
stei
n
1829
. Go�
scha
lk
1830
. Gol
dmar
k
1833
. Bra
hms
1833
. Bor
odin
1834
. Pon
chie
lli
1835
. Sai
nt-S
aëns
1835
. Mus
sorg
sky
1835
. Wie
niaw
ski
1835
. Cui
1836
. Del
ibes
1837
. Bal
akire
v
1837
. Gui
lman
t
1838
. Bize
t
1838
. Bru
ch
1839
. Rhe
inbe
rger
1840
. Tch
aiko
vsky
1840
. Sve
ndse
n
1841
. Dvo
rák
1841
. Cha
brie
r
1842
. Mas
sene
t
1842
. Sul
livan
1842
. Boi
to
1843
. Grie
g
1844
. Rim
sky-
Kors
akov
1844
. Sar
asat
e
1844
. Wid
or
1845
. Fau
ré18
48. P
arry
1848
. Dup
arc
1850
. Fib
ich
1851
. Ind
y
1852
. Sta
nfor
d
1852
. Tár
rega
1853
. Foo
te
1854
. Jan
ácek
1854
. Hum
perd
inck
1854
. Mos
zkow
ski
1854
. Cha
dwic
k
1855
. Cha
usso
n
1855
. Lya
dov
1856
. Tan
eyev
1856
. Sin
ding
1857
. Elg
ar 1857
. Leo
ncav
allo
1857
. Cha
min
ade
1858
. Puc
cini
1859
. Her
bert
1859
. Ipp
olito
v-Iv
anov
1860
. Mah
ler
1860
. Wol
f
1860
. Alb
éniz
1860
. Mac
Dow
ell
1860
. Pad
erew
ski
1860
. Cha
rpen
¡er,
G
1861
. Are
nsky
-101
-10
1
Romantic
Fig.
8
Euro
pean
art
mus
ic. T
he R
oman
tic p
erio
d. A
n ap
plic
atio
n of
the
MD
S m
etho
dolo
gy
995Scientometrics (2019) 120:975–1003
1 3
then Fig. 7 (west side) suggests that the music of Giuliani and Carulli might also appeal to them. And indeed these two composers also mainly composed for the classical guitar.
Finally, Fig. 9 presents MDS results based on a 200 × 200 partition of the 500 × 500 similarity matrix Scomb. Hence the map represents 200 ‘key’ composers among a set of 500.36 The date of birth of composers is never used in our methodology to assess the simi-larity of composers.37 Yet, the map unfolds the history of classical music, starting in the South and moving ‘clockwise’ as centuries pass by. We start our journey in the fourteenth century in France with Guillaume de Machaut and the Medieval ‘Ars Nova’ tradition. We then encounter the musical practice of the Renaissance (1400–1600) with Dufay, in what is now Belgium, a practice that eventually spreads to the rest of Europe through Josquin, Lasso, Palestrina, Gesualdo, Gabrieli, Victoria, and Byrd. Baroque music (1600–1750) emerges in ‘Italy’ (Monteverdi, Frescobaldi), and then in ‘Germany’ (Schütz), evolving in phases to reach mid-Baroque (Lully, M.-A. Charpentier, Buxtehude, Pachelbel, Corelli, and A. Scarlatti), during which classical music progressively emancipates from previously pre-dominant vocal works, and high/late Baroque (F. Couperin, Telemann, Vivaldi, Albinoni, Rameau, J.S. Bach, Handel, D. Scarlatti, and Tartini). The Classical period (1750–1820) arises with a period of transition that is represented by early or pre-Classical composers (C.P.E. Bach, J.C. Bach, and Gluck) followed by ‘core’ Classical composers (Boccherini, J. Haydn, M. Haydn, W.A. Mozart, Clementi, and Beethoven), and ‘late’ Classical com-posers (Hummel, Schubert, Spohr, and Rossini). We have moved clockwise to more than 90° and are now on the North-West side of the map. Romanticism (1810–1920) starts and develops into two branches. Towards the outside of the map we encounter composers who are mainly remembered for their opera (Weber and the first German Romantic opera, Bell-ini and Donizetti and the Bel Canto (showcase of the singers’ talents), Meyerbeer and the ‘French Grand Opera’, followed by Glinka who signals the emergence of Russia in the classical music world. Then comes Verdi and his progressive focus on the dramatic aspects of opera, Wagner and his ‘Music Dramas’, the French with Gounod, Bizet, Massenet, Delibes, and the Verismo (realism) tradition epitomised by Leoncavallo. This has moved us another 90° and we are now on the North side of the map. The Verismo tradition eventually dies out with Puccini and Mascagni (in the North-East quadrant). Meanwhile, composers for who opera is not the main or only contribution, represent the other branch of Romanti-cism. It can be traced back from Beethoven and Schubert (that we encountered before, in the North West side of the map). Moving inside the circle (instead of toward/along the bor-der), we meet Mendelssohn, R. and C. Schumann, Brahms, Bruckner, Liszt, Franck, Saint-Saëns, Dvorák, Tchaikovsky who merges with a few other Russians (Balakirev, Borodin, Rubinstein, Mussorgsky, and Rimsky-Korsakov).
We are now in the late Romantic (North-East) quadrant with composers born well into the second half of the nineteenth century and who also composed during the twentieth cen-tury. The history of European art music becomes more complex to recount. Composers adopted distinct paths, as if there were ‘two centuries in one’. As Pauls (2014) puts it: “An outstanding feature of the twentieth-century has been the divergence of European ‘art’
36 The 200 composers are selected on the basis of their ranking in terms of ‘importance’, a ranking pro-posed by Smith (2000). A (very crowded) sister map of 500 composers is available from the authors.37 Of course this does not mean that there is no time dimension in our data set. Clearly, data on the per-sonal musical influences of a subject composer will also include some contemporary composers, and the ecological data have general references to periods. For example for Debussy (from Footnote 5): ballets 1900 on, chamber music 1825 to 1925, etc.
996 Scientometrics (2019) 120:975–1003
1 3
1. B
ach,
JS
2. M
ozar
t, W
A3. B
eeth
oven
4. S
chub
ert
5. B
rahm
s
6. W
agne
r7.
Ver
di
8. H
ande
l
9. H
aydn
, J
10. C
hopi
n
11. T
chai
kovs
ky
12. L
iszt
13. S
chum
ann,
R
14. D
ebus
sy
15. P
ucci
ni
16. S
trav
insk
y
17. M
ende
lssoh
n
18. S
trau
ss, R
19. M
ahle
r
20. R
avel
21. D
vorá
k
22. V
ival
di23. R
ossin
i
24. R
achm
anin
ov
25. B
artó
k
26. B
ri�en
27. S
host
akov
ich28. P
roko
fiev
29. B
erlio
z
30. G
ersh
win
31. C
opla
nd
32. F
auré
33. G
rieg
34. S
choe
nber
g
35. T
elem
ann
36. S
ibel
ius
37. B
izet
38. D
onize
�
39. V
augh
an W
illia
ms
40. P
urce
ll
41. S
aint
-Saë
ns42
. Web
er
43. I
ves
44. H
inde
mith
45. E
lgar
46. R
imsk
y-Ko
rsak
ov
47. B
ruck
ner
48. S
trau
ss, J
Jr.
49. M
usso
rgsk
y
50. B
erns
tein
51. B
ellin
i
52. F
ranc
k
53. M
onte
verd
i
54. B
arbe
r55
. Pou
lenc
56. G
ouno
d
57. M
asse
net
58. S
ulliv
an
59. C
age
60. S
cria
bin
61. B
ach,
CPE
62. W
olf
63. M
essia
en
64. J
anác
ek
65. S
carla
�, D
66. F
alla
67. V
illa-
Lobo
s
68. G
luck
69. P
agan
ini
70. S
a�e
71. B
yrd
72. B
erg
73. W
eill
74. K
odál
y75
. Web
ern
76. O
ffenb
ach
77. L
asso
78. J
osqu
in
79. B
orod
in
80. P
ales
trin
a
81. M
ilhau
d
82. R
eger
83. S
met
ana
84. N
ielse
n
85. W
alto
n
86. R
amea
u
87. A
lbén
iz88
. Hol
st
89. S
chüt
z
90. M
asca
gni
91. D
eliu
s
92. G
rain
ger
93. M
ar�n
u
94. C
orel
li
95. L
ige�
96. S
carla
�, A
97. D
owla
nd
98. P
iazz
olla
99. M
eyer
beer 10
0. B
uxte
hude
101.
Bus
oni
102.
Glin
ka
103.
Orff
104.
Lehá
r10
5. B
occh
erin
i
106.
Sto
ckha
usen
107.
Rod
rigo
108.
Cou
perin
, F
109.
Gra
nado
s
110.
Car
ter
111.
Pac
helb
el
112.
Jopl
in
113.
Ber
io
114.
Res
pigh
i
115.
Blo
ch
116.
Leon
cava
llo
117.
Pär
t
118.
Hon
egge
r11
9. B
ach,
JC12
0. G
inas
tera
121.
Gla
zuno
v
122.
Lully
123.
Kha
chat
uria
n
124.
Bou
lez
125.
Luto
slaw
ski
126.
Tak
emits
u
127.
Var
èse
128.
Cha
rpen
�er,
M-A
129.
Fre
scob
aldi
130.
Szy
man
owsk
i13
1. H
umm
el
132.
Ror
em
133.
Boi
to
134.
Per
gole
si
135.
Alb
inon
i13
6. H
enze
137.
Gab
rieli,
G
138.
Sch
uman
n, C
139.
Vic
toria
140.
Bru
ch
141.
Kre
isler
142.
Del
ibes
143.
Cow
ell
144.
Sch
ni�k
e
145.
Hild
egar
d
146.
Hov
hane
ss
147.
Pen
dere
cki
148.
Spo
hr
149.
Kor
ngol
d
150.
Hay
dn, M
151.
Fel
dman
152.
Gla
ss
153.
Sar
asat
e
154.
Mar
�n
155.
Arn
old
156.
Cle
men
�
157.
Ene
scu
158.
Tur
ina
159.
Tar
�ni
160.
Xen
akis
161.
Gio
rdan
o
162.
Tip
pe�
163.
Cru
mb
164.
Che
rubi
ni
165.
Mac
haut
166.
Sta
nfor
d16
7. C
haus
son
168.
Bea
ch
169.
Pon
chie
lli
170.
Rub
inst
ein
171.
Iber
t
172.
Hum
perd
inck
173.
Rei
ch
174.
Duf
ay
175.
Wie
niaw
ski
176.
Wid
or
177.
Lalo
178.
Dav
ies
179.
Tho
mso
n
180.
Mac
Dow
ell
181.
Ru�
er18
2. G
ibbo
ns, O
183.
Bol
com
184.
Cha
brie
r
185.
Men
o�
186.
Kre
nek
187.
Dup
ré
188.
Go�
scha
lk
189.
Rou
ssel
190.
Doh
nány
i
191.
Ada
ms
192.
Sw
eelin
ck
193.
Bal
akire
v
194.
How
ells
195.
Sor
196.
Ges
uald
o
197.
Duk
as
198.
Fin
zi
199.
Her
bert
200.
Tav
ener
-0.90
0.9
-0.9
00.
9
Dim2
Dim1
Fig.
9
Euro
pean
art
mus
ic. S
even
cen
turie
s in
one
grap
h. A
n ap
plic
atio
n of
the
MD
S m
etho
dolo
gy
997Scientometrics (2019) 120:975–1003
1 3
music into two general areas which do not overlap to the same extent that they do in previ-ous centuries. That is, the performing repertoire is at odds, sometimes dramatically so, with a competing canon of works considered to be of greater importance from an evolutionary historical point of view”.38 Thus, we see the late Romantic composers (Sibelius, Rach-maninoff), and to the left on the graph, we encounter the late German/Austrian Romantic tradition (R. Strauss, Mahler, and Reger). Then, reacting to the extreme form of German/Austrian late Romanticism, the impressionism movement emerges in France (Debussy, Ravel), while Bartók’s interest for ethnomusicology leads him to synthetize Modernism and folk music. The First World War shatters the ideals of Romanticism and Stravinsky responds with the Neoclassicism movement, an anti-Romantic aesthetic, while a more rational form of expression (expressionism/atonal and dodecaphonic/12-tone/serial music) emerges in Germany (Schoenberg, Berg, and Webern). Later, in France, Messiaen explores rhythm, the Indonesian gamelan, birdsong, and absorbs many non-Western influences. These developments push us in the South East quadrant of the twentieth century music. Towards the inside, we have ‘neo-classical’ and ‘neo-romantic’ currents with relatively accessible (moderate harmonic style) composers of the twentieth century (Hindemith, Honegger, Shostakovich, Martinů, Schnittke, Britten, Walton, Hovhaness, Arnold, etc.) who still compose broadly tonal music in the ‘Classical’ form or the ‘Romantic’ tradition but with elements of atonality, serialism, chromaticism, and in the case of Schnittke, poly-stylistic techniques, and Lustoslawski who has used most of the compositional techniques of the twentieth century (but never considered himself as part of the ‘avant-garde’). But towards the outside of the circle we meet the ‘second century’ within the twentieth century, the ‘avant-garde’ composers and their experimental works: Varèse (organised sound, early electronic music), Carter (metrical modulation), Ligeti (micropolyphony), Cage (electronic and aleatoric/chance music), Boulez (total serialism), Stockhausen (electronic, aleatoric, and serial), Xenakis (computer, electronic, stochastic processes and game theory applied to music), Berio (non-conventional vocal techniques), Feldman (chance, slowly evolving music), Reich (minimalism), Takemitsu (combining oriental and western music) and oth-ers, including Penderecki who in the 1970 s re-embraced neoromantic-sounding elements.
To conclude, the MDS map (and its sister map of 500 composers not included here) gives a basic overview, in a single graph, of the evolution of classical music across seven centuries.
Multidimensional scaling (more than 2 dimensions), and canonical correlations
In the previous section, we have presented MDS results on 2-dimension graphs. In Fig. 6 (the Baroque period), the first dimension (the x-axis) seems to be ‘somewhat’ linked to time (composers to the right of the graph are typically born before those on the left). The second dimension (the y-axis) seems to be somewhat associated with the country/region of
38 As the history of music unfolds, the dispersion of (contemporary) composers on the map increases, sug-gesting larger style differences among composers, a trend which has accelerated in the twentieth century as seen on the East side of the map. Rutherford-Johnson (2017) in his Music after the Fall provides an highlighting discussion of how much diverse and fragmented contemporary composition has become, since 1989, a period not captured in Fig. 9.
998 Scientometrics (2019) 120:975–1003
1 3
origin of composers (Italians above, then Germans and Austrians in the center, and French and English composers below). Yet, this ‘guess’-estimation is approximate and appears more difficult to do in maps for the Classical and Romantic periods in Figs. 7 and 8.
One difficulty is that many ecological categories (actually 298 variables including a composer’s country, period, city, style of music (opera, orchestral, etc.), schools of asso-ciation, etc.) and all personal musical influences are ‘squeezed’ into just two MDS dimen-sions. If we increased the number of dimensions, perhaps we might be able to identify or ‘match’ some ecological variables with MDS dimensions. One task of MDS is indeed to determine the number of appropriate ‘dimensions’ in the MDS model. The usual technique is to run the MDS algorithm for a series of dimensions (e.g., from 2 to 6 dimensions), and adopt the model with the smallest number of dimensions that achieves a reasonably small value for the stress loss function. Table 2 and Fig. 10 show Kruskal’s stress values we obtained for dimensions 1 to 6 for the 200 ‘Top’ composers analysed in Sect. 2 (Fig. 9). A high stress value tends to create some distortion on the MDS maps, misplacing some composers. In Fig. 9 for example, Giordano (1867–1948) should be close to Puccini and Mascagni (Verismo) instead of being localised on the West side of the map (on the x-axis). Menotti (1911–2007) whose scores are often reminiscent of Puccini is also oddly located in the West side of the map. Marcel Dupré (1886–1971), a French organist and composer, should perhaps be right in the middle of the North East quadrant half way between Widor and Messiaen (instead of being in the South side on the y-axis). Moving from 2 to 3 dimen-sions permits to lower the stress value significantly. Therefore we should at least explore MDS results with three dimensions. Choosing N = 3 optimizes the 200 composers location in a three-dimensional space. Given the original distance di,j between composers i and j, the algorithm generates the coordinates (xi, yi, zi) and (xj, yj, zj).
Instead of drawing three-dimensional graphs that are difficult to visualise our objective is to attempt to identify variables explaining these dimensions. In the example of a geo-graphical map, the three dimensions can be interpreted as latitude, longitude, and perhaps altitude (if this is a ‘level’ or topographic map). Can we say something about the MDS-derived dimensions of a (three-dimension) music map? To address this issue, as mentioned in the Introduction we now use the similarity matrix that is built on personal musical influ-ences only, Sinfl (instead of Scomb). In this setting, composers who have been influenced by a common series of composers tend to be more similar than composers who have been influenced by very distinct composers and they will tend to be closer on the MDS map. Assuming three dimensions, the MDS methodology generates coordinates (xi, yi, zi) for any composer i among the ‘Top’ 200, and we wish to say something about what these three dimensions represent. To do this we also characterize each composer by a few ecological characteristics (period, country of birth, main city where the composer lived) and the com-poser’s rank from 1 to 200 (as assigned by Smith 2000) which reflects the relative impor-tance of a composer. We then use linear and non-linear canonical correlations to explore the relationship between ecological variables and MDS dimensions.
Canonical correlation analysis is a method for exploring the relationships between mul-tivariate sets of variables. We do not necessarily think of one set of variables as independ-ent and the other as dependent. It is the multivariate extension of correlation analysis. In our example, each composer i is described by a series of coordinates (xi, yi, zi) representing the three dimensions (M1, M2, M3) generated by the MDS methodology, and by a series of ecological variables (countries, cities, period, and the composers’ rank). It might be pos-sible to create pairwise scatter plots with variables in the first set (e.g., the MDS dimen-sions of the composers), and variables in the second set (e.g., the ecological characteristics of the composers). But if the dimension of the first set is p and that of the second set is
999Scientometrics (2019) 120:975–1003
1 3
q, there will be pq such scatter plots, and it may be difficult, if not impossible, to look at all of these graphs together and interpret the results. Similarly, one could compute all correla-tions between variables from the first set and variables in the second set, however interpre-tation is difficult when pq is large. Canonical correlation analysis allows us to summarize the relationships into a lesser number of statistics while preserving the main facets of the relationships. In a way, the motivation for canonical correlation is very similar to principal component analysis. It is another dimension reduction technique.
There are two types of canonical correlations: linear and non-linear. The mathematics behind both approaches (and their relationships) and the results obtained are described in details in a working paper, Georges and Nguyen (2019), for the interested reader. Here we just provide some basic results. In order to perform linear canonical data analysis all eco-logical variables had to be converted to dummy variables. For example, the variable Period was converted into binary variables (Baroque, Classical, Romantic, and twentieth century) that take value of 1 or 0 whether a composer belongs or not to such a specific period. The results described in Georges and Nguyen (2019) show that the twentieth century has the most discrimination power with the second (M2) dimension generated by the MDS meth-odology. The romantic period has the most discrimination power with the first dimension (M1) and classical period has the most discriminatory power with the third dimension (M3). The limit of this analysis, however, is that inference that can be deduced based on linear canonical analysis might not show us the discrimination power of the ecological variables, say, Period, with regards to the MDS dimension/axis, but instead it shows the discrimina-tion power of the dummy variable ‘20-century’ (whether or not a composer belongs to the twentieth century era of music) and the MDS dimension/axis. To some researchers it would be best if the results could be obtained for categorical variables such as Period, Country, or City as a whole. This indeed motivates the use of nonlinear canonical analysis.
A first difference with linear canonical correlation is that the non-linear method can be used to explore the relationship among more than two sets of variables. In Georges and Nguyen (2019) we have considered three sets of variables. Set 1 includes the three MDS dimensions, Set 2 includes City and Country, and Set 3 includes Period and Rank. A sec-ond difference between linear and non-linear canonical correlation is that linear canoni-cal correlation works on original (numerical) variables while non-linear canonical correla-tion works on rescaled variables. Thus, non-linear canonical correlation analysis does not assume an interval level of measurement (in which distances between attributes/categories/scores are meaningful) or assume that the relationships are linear. Variables can be scaled as either nominal (taking a name or label), ordinal, or numerical. In our example, City, Country, and Period are treated as nominal. Ordinal variables are variables with natural, ordered categories.39 By choosing the measurement levels (nominal, ordinal or numerical) of the variables, one tells the non-linear canonical correlation algorithm to choose an opti-mal transformation of a variable among a set of admissible transformations for that meas-urement level.40
39 For example, we can rate quality of customer services as poor, reasonable, good, or excellent. In this case, there is a natural order of different categories for quality of customer services.40 If nominal level is chosen, observations from one category remain in the same category under rescaling. If ordinal level is chosen, the order of the original (coded) values should be maintained (i.e., if an observa-tion belongs to a ‘lower’ category before transformation, it should not be in ‘higher’ categories after trans-formation). Note that the measurement level chosen is not necessarily ‘apparent’, based on the meaning of the original variables. Instead, it can be chosen to improve the fit of the model (IBM SPSS Categories 22, 2013). For this reason, in our example, Rank is also treated as nominal.
1000 Scientometrics (2019) 120:975–1003
1 3
Results in Georges and Nguyen (2019) show that the second MDS dimension is mainly discriminated by Period. Also, the first and third MDS dimensions are mainly discrimi-nated by rank. Besides Ranking and Period, City and Country also contribute some dis-crimination power but the fact that these variables are correlating to more than one MDS dimension makes it difficult to identify ‘grouping’ of composers in the MDS space. In other words, the placements of composers on a MDS map are determined by a combina-tion of different variables. This is completely understandable for a complicated and large dataset as the one we consider here. When considering the localisation/proximity of com-posers on a MDS map on the basis of personal musical influences, it is natural to expect that many factors must have had an impact on why they felt this influence, including the period of music, the country or city, the type of music, the types of instruments, the impor-tance (ranking) of composers, etc. These various factors contribute to so much variability/noise in the data that it is somewhat impossible to find a simple one-to-one pairing of MDS variables and ecological variables. One possibility left for future research is to increase the MDS dimensions beyond the three dimensions used here, to add further ecological vari-ables, and to increase the number of composers in the canonical analysis from 200 to 500. Perhaps this might improve the identification of the MDS dimensions.
Table 2 Best stress obtained for each dimension
Dimensions 1 2 3 4 5 6
Kruskal’stress 0.672 0.385 0.277 0.219 0.181 0.156Iterations 31 205 356 415 369 437Convergence 0.000 0.000 0.000 0.000 0.000 0.000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1 2 3 4 5 6 7 8
Raw
str
ess
Dimensions
Fig. 10 MDS stress values for dimensions 1 to 6
1001Scientometrics (2019) 120:975–1003
1 3
Conclusion
This paper applies clustering techniques and MDS analysis to a medium size (500 × 500) composers’ similarity matrix built on the basis of personal music influences and ecological data. We provide dendrograms and maps for the Baroque, Classical, and Romantic periods and a map (of 200 ‘Top’ composers) that represents seven centuries of European art music in one single graph. Dendrograms and maps for other periods (Medieval and Renaissance, Impressionism and twentieth century classical music (serial, neo-Classical, neo-Romantic composers, ‘avant-garde’, etc.), and for all 500 composers are available from the authors, but their analysis is left for a subsequent musicological study.
Comparing our results with those in Weiss (2017) shows that using context-based data, instead of content (audio files) data, remains a useful approach for clustering and visualis-ing similarities across composers. We have shown that grouping well-known composers into either a fixed number of clusters or into hierarchical clusters roughly coincides with common musicological knowledge. In other words, statistical methods can cluster and map composers in a way a musicologist could perhaps organize and recount the history of music on the basis of his/her specific knowledge. In this sense, the maps and dendrograms devel-oped here might also enrich the musicologists’ toolkit for basic music education. In this perspective, these graphs (especially those for the twentieth century not included here as they are too large) might benefit from the setting of a web page with a search option. One can envisage ‘directed’ paths or ‘interactive’ searches that would lead the user to explore the evolution of Western classical music across centuries or during a specific period, with audio and video files hyperlinked to composers’ names. A drop-down menu could also identify/confirm the 5 or 10 most similar composers to a subject composer based on Georges (2017). This is left for a future project.
Acknowledgements The authors thank Charles H. Smith for providing access to the data collected in the Classical Music Navigator for this project. We also benefited from discussions and comments by C.H. Smith on the initial stage of the project.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna-tional License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
References
Addinsoft. (2019). XLSTAT statistical and data analysis solution. Long Island, NY, USA. https ://www.xlsta t.com. Accessed 31 Aug 2018.
Baccigalupo, C., Plaza, E., & Donaldson, J. (2008). Uncovering affinity of artists to multiple genres from social behaviour data. In Proceedings of the 9th international conference on music information retrieval, Philadelphia, PA, USA.
Cassedy, S. (2010). Beethoven the Romantic: How ETA Hoffmann got it right. Journal of the History of Ideas., 71(1), 1–37.
de Saint-Foix, G., & Herter Norton, M. D. (1931). Clementi, forerunner of Beethoven. The Musical Quarterly., 17(1), 84–92.
Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26, 297–302.
Finscher, L. (1983). Weber’s Freischütz: conception and misconceptions. Proceedings of the Royal Musical Association, 110(1), 79–90.
1002 Scientometrics (2019) 120:975–1003
1 3
Foote, J. (1999). Visualising music and audio using self-similarity. In MULTIMEDIA ’99 Proceedings of the seventh ACM international conference on Multimedia (Part 1) (pp. 77–80). https ://doi.acm.org/10.1145/31946 3.31947 2.
Georges, P. (2017). Western classical music development: A statistical analysis of composers similarity, differentiation and evolution. Scientometrics, 112(1), 21–53.
Georges, P., & Nguyen, N. (2019). Visualizing music similarity: Clustering and mapping 500 classical music composers. Mimeo: University of Ottawa.
Giller, G. L. (2012). The statistical properties of random bitstreams and the sampling distribution of cosine similarity. http://dx.doi.org/10.2139/ssrn.21670 44. Accessed 17 January 2015.
Glänzel, W., & Czerwon, H. J. (1996). A new methodological approach to bibliographic coupling and its application to the national, regional, and institutional level. Scientometrics, 37(2), 195–221.
Gray, W. (1971). The classical nature of Schubert’s lieder. The Musical Quarterly, 57(1), 62–72.Gray, W. (1978). Schubert the Instrumental composer. The Musical Quarterly, 64(4), 483–494.Griffiths, P. (1978). A concise history of modern music: From Debussy to Boulez. London: Thames &
Hudson.IBM Corp. (2013). IBM SPSS categories 22, version 22.0. Armonk, NY: IBM Corp.Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et du Jura.
Bull. de la Societé Vaudoise de la Science Naturelle, 37, 547–579.Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.
Psychometrika, 29(1), 1–27.Kulczynski, S. (1927). Zespoly roslin w Pieninach. Bulletin International de l’Académie Polonaise des
Sciences et des Lettres. ser. B: Sciences Naturelles, 1(suppl. 2), 57–203.Mauch, M., MacCallum, R., Levy, M., & Leroi, A. (2015). The evolution of popular music: USA 1960–
2010. Royal Society Open Science, 2, 150081.NCSS Statistical Software. Documentation available at https ://www.ncss.com/. Accessed 31 Aug 2018.Orio, N. (2006). Music retrieval: A tutorial and review. Foundations and Trends in Information
Retrieval, 1(1), 1–90.Pachet, F.,Westerman, G., & Laigre, D. (2001). Music data mining for electronic music distribution. In
Proceedings of the 1st international conference on web delivering of music, Florence, Italy.Pampalk, E. (2006). Audio-based music similarity and retrieval: Combining a spectral similarity model
with information extracted from fluctuation patterns. In Proceedings of the international sympo-sium on music information retrieval.
Pauls, H. (2014). Two centuries in one. Musical Romanticism and the twentieth century. Ph.D. thesis, Rostock University of Music and Theatre.
Potthoff, R. F., & Whittinghill, M. (1966). Testing for homogeneity. I. The binomial and multinomial distributions. Biometrika, 53, 167–182. https ://doi.org/10.1093/biome t/53.1-2.167.
Ritter, M. (1994). Telemann: master of the ‘German style’. Early Music, 22(4), 696–698.Rodriguez Zivic, P. H., Shifres, F., & Cecchi, G. A. (2013). Perceptual basis of evolving Western musi-
cal styles. Proceedings of the National Academy of Sciences, 110(24), 10034–10038.Rutherford-Johnson, T. (2017). Music after the fall. Modern composition and culture since 1989. Uni-
versity of California Press, Oakland, CA.Schedl, M., Gomez, E., & Urbano, J. (2014). Music information retrieval: recent developments and
applications. Foundations and Trends in Information Retrieval, 8(2–3), 127–261.Sen, S. K., & Gan, S. K. (1983). A mathematical extension of the idea of bibliographic coupling and its
applications. Annals of Library and Information Studies, 30(2), 78–823.Sheldon, D. A. (1989). The concept galant in the 18th century. Journal of Musicological Research, 9(2–
3), 89–108.Silbiger, A. (1980). The roman Frescobaldi tradition, c. 1640-1670. Journal of the American Musico-
logical Society, 33(1), 42–87.Simpson, G. G. (1943). Mammals and the nature of continents. American Journal of Science, 241, 1–31.
https ://doi.org/10.2475/ajs.241.1.1.Smith, C. H. (1983). A system of world mammal faunal regions. I. Logical and statistical derivation of the
regions. Journal of Biogeography, 10, 455–466.Smith, C. H. (2000). The classical music navigator. http://peopl e.wku.edu/charl es.smith /music /. Accessed
30 August 2018.Smith, C. H., & Georges, P. (2014). Composer similarities through ‘the Classical Music Navigator’: Simi-
larity inference from composer influences. Empirical Studies of the Arts, 32(2), 205–229.Smith, C. H., & Georges, P. (2015). Similarity indices for 500 classical music composers: Inferences from
personal musical influences and ‘ecological’ measures. Empirical Studies of the Arts, 33(1), 61–94.
1003Scientometrics (2019) 120:975–1003
1 3
Smith, C. H., Georges, P., & Nguyen, N. (2015). Statistical tests for ‘related records’ search results. Sciento-metrics, 105(3), 1665–1677.
Sneath, P. H. A. (1968). Vigour and pattern in taxonomy. Journal of General Microbiology, 54, 1–11. https ://doi.org/10.1099/00221 287-54-1-1.
Steinberg, M. P. (2006). The politics and aesthetics of operatic modernism. The Journal of Interdisciplinary History., 36(4), 629–648.
Sutherland, D. (1995). Domenico Scarlatti and the Florentine piano. Early Music, 23(2), 243–246 and 249–256.
Taruskin, R. (2010). The Oxford dictionary of western music. Five-volume edition, Oxford, New York: Oxford University Press.
Taruskin, R., & Gibbs, C. (2013). The Oxford history of western music. New York, Oxford: Oxford Univer-sity Press.
Todd, L. R. (2008). Mendelssohn essays. Chapter 7: The Chamber Music of Mendelssohn. New York: Routledge.
Vehkalahti, K., & Everitt, B. S. (2019). Multivariate analysis for the behavioral sciences. Boca Raton, FL: CRC Press, Francis & Taylor Group.
Weiss, C. (2017). Computational methods for tonality-based style analysis of classical music audio record-ings (Doctoral dissertation). Ilmenau University of Technology. Ilmenau, Germany.
Weiss, C., Mauch, M., Dixon, S., & Müller, M. (2018). Investigating style evolution of Western classical music: A computational approach. Musicae Scientiae. https ://doi.org/10.1177/10298 64918 75759 5.
White, C. W. (2013). Some statistical properties of tonality, 1650–1900 (Doctoral dissertation). Yale Uni-versity, New Haven, CT.