chapter 4: distinguishing major scales from minor …

© Michael R. W. Dawson 2014

CHAPTER 4:

DISTINGUISHING MAJOR SCALES FROM MINOR SCALES

What distinguishes a major scale from a minor scale? The traditional answer to this question from music theory involves creating a musical scale for a particular key, examining the pattern of distances between the pitch-classes in this scale, and using this pattern to decide whether the key is major or not. This chapter provides a case study in which a very different answer is pro-vided. It begins by introducing a new artificial neural network that is required for this problem, thee multilayer perceptron. The multilayer perceptron is related to the perceptron that was the topic of Chapters 2 and 3, but is more powerful because it includes processors called hidden units. This chapter illustrates the use of the multilayer perceptron by training one to distinguish major scales from harmonic minor scales. An interpretation of the multilayer perceptron’s internal structure reveals a quite different account of the formal difference between major and minor keys: one that focuses upon the relationship not between adjacent pitch-classes in a scale, but instead upon the relationship between pitch-classes a tritone apart. Further analyses of this network re-late its structure to topics related to geometric representations of notes, scales and chords, to spatial models of chord progressions and voice-leading, and to the tritone substitution used in jazz.

4.1 The Multilayer Perceptron ............................................ 1

4.2 Training the Multilayer Perceptron .............................. 4

4.3 Distinguishing Major from Minor Keys ....................... 7

4.4 Interpreting the Scale Type Network ........................... 8

4.5 Tritone Imbalance and Major Keys ............................ 11

4.5 Further Network Analysis ........................................... 14

4.6 References ................................................................... 25

Chapter 4 Distinguishing Major From Minor Scales 1


4.1 The Multilayer Perceptron

4.1.1 Carving Pattern Spaces At the end of Chapter 3 it was noted that

perceptrons cannot learn to solve every problem. What is the source of this inade-quacy, and how can it be overcome? The limitations of perceptrons become evident when we describe these networks as classi-fying stimuli by carving pattern spaces into decision regions.

A pattern space represents a set of pat-

terns that need to be categorized. Each pattern is represented as a single point lo-cated in a multidimensional space. The di-mensionality of this space equals the num-ber of input units used to present patterns as stimuli to a neural network. The coordinates of each point are the input unit activities used to represent each pattern to the net-work.

For example, consider Figure 4-1. It de-

picts two versions of the same pattern space for a perceptron that has one output unit and two input units. With two input units, the pattern space is two-dimensional. If the in-put units can only be turned on or off, then only four patterns can be presented to this perceptron: (0, 0), (1, 0), (0, 1) and (1, 1,). If we let the activity of Input Unit 1 represent the x-coordinate of a pattern, and let the activity of Input Unit 2 represent its y-coordinate, then these sets provide the loca-tion of four different points in the space. The resulting pattern space is graphed at the top and again at the bottom of Figure 4-1. The four circles (filled or not) show the positions of the four input patterns.

Figure 4-1. A pattern space for a two input

unit perceptron is carved with a single straight cut by an integration device, and is carved with two parallel straight cuts by a

value unit. Artificial neural networks categorize pat-

terns by carving the pattern space into dif-ferent decision regions. Perceptron re-sponses to a pattern depend upon which decision region contains it. For the simple perceptron facing a pattern space like the one illustrated in Figure 4-1, if the pattern is contained in one region, then the output unit will turn on (black dot). If the pattern resides in a different region, then the output unit will turn off (empty dot).

The activation function of a perceptron’s

output unit determines how the unit can carve the pattern space into decision re-gions. If the output unit is an integration device that uses the logistic activation func-tion (Figure 2-1), then it can make a single straight cut that divides the pattern space into two different decision regions. In a two-dimensional pattern space, this cut is a straight line, as shown at the top of Figure 4-1. This cut separates the input pattern (1, 1) from all of the other input patterns. Carving the space in this way permits the perceptron the logical operation AND: the output unit only turns on (‘true’) if both of its inputs are



also on (‘true’). Otherwise, the output unit turns off (‘false’).

To translate the carved pattern space at

the top of Figure 4-1 into network opera-tions, the net input produced by pattern (1, 1) will be far enough above the output inte-gration device’s θ to turn the unit on. In con-trast, the net inputs for the other patterns that fall in the other decision region will be far enough below the integration device’s θ to turn the output unit off.

When a higher-dimensional pattern

space is encountered, an integration device carves it into two regions with a single hy-perplane (a higher dimensional straight cut). A three-dimensional pattern space will be carved in two by a two-dimensional plane; a four-dimensional pattern space will be carved in two by a three-dimensional hyper-plane, and so on.

If the output unit is a value unit that uses

the Gaussian activation function (Figure 2-2), then it can make two parallel cuts that divide the pattern space into three different decision regions. In a two-dimensional pat-tern space, this is accomplished with two parallel straight lines that are close together, as shown at the bottom of Figure 4-1. This separates the input patterns (1, 0) and (0, 1) from the other two input patterns. This per-ceptron can be interpreted as computing the logical operation XOR: this operation is true (the output unit turns on) only when one (and only one) of its input units are true.

To translate the carved pattern space at

the bottom of Figure 4-1 into network opera-tions, the net inputs for the patterns (1, 0) and (0, 1) will be close enough to the value unit’s µ to turn the output unit on. In con-trast, the net inputs for the other two pat-terns that fall in the other two decision re-gions will be far enough away from µ to turn the output unit off. When a higher-dimensional pattern space is encountered, a value unit carves it into three regions with two parallel hyperplanes.

For either type of output unit, the position

of its cut through the pattern space is deter-mined by learning. That is, the location and orientation of a cut is dictated by the output

unit’s bias and by the values of the connec-tion weights. Learning is the process that attempts to find the best positions of cuts through the pattern space.

The activation function of a perceptron’s

output unit dictates the kinds of classification problems that can be learned, as well as those that can never be learned. For exam-ple, a linearly separable problem is one that can be solved by a single straight cut through a pattern space. Such a problem can be solved by a perceptron that has a single integration device as its output unit. One example of a linearly separable prob-lem is the logical AND illustrated at the top of Figure 4-1.

In contrast, the logical XOR illustrated at

the bottom of Figure 4-1 is not a linearly separable problem. This is because a single straight cut is not sufficient to carve decision regions that separate all of the black points (the ‘on’ or ‘true’ patterns) from all of the empty points (the ‘off’ or ‘false’ patterns). Two cuts are required. Therefore this prob-lem cannot be solved by an integration de-vice perceptron; it can be solved by a per-ceptron whose output unit is a value unit.

As pattern classification problems be-

come more complex, they require a more complicated carving of the pattern space. In very many cases the required carving is be-yond the capacity of a perceptron regardless of the nature of its output unit. At the end of Chapter 3, we noted that one such problem was identifying a musical scale as being either major or minor.

How are the limitations of a perceptron

overcome? Perceptron power is increased by adding one or more layers of intermedi-ate processors between the input and the output units of the perceptron. These addi-tional units are called hidden units because they do not have any direct connection to the external world; they reside completely within the network. An artificial neural net-work that incorporates hidden units is called a multilayer perceptron. This chapter illus-trates the use of a multilayer perceptron by training and interpreting the one illustrated in Figure 4-2, a network designed to detect whether an input scale is major or minor.



Figure 4-2. A multilayer perceptron that uses two hidden units to detect whether an input scale is

major or minor. The details of this network are the focus of Chapter 4.

4.1.2 What Do Hidden Units Do? Why do hidden units make a multilayer

perceptron capable of solving problems that cannot be solved by a perceptron?

One answer to this question is that each

hidden unit in a multilayer perceptron adds its own ability to carve of a pattern space. That is, a network with hidden units can carve a pattern space up in more complex ways than can a network that does not have such units. It has been proved that a net-work that has no more than two layers of hidden integration devices can carve a pat-tern space up into arbitrarily shaped deci-sion regions, and therefore is capable of solving any pattern classification problem (Lippmann, 1989).

A related answer to this question is that

hidden units detect higher-order features. A higher-order feature involves relationships amongst different input units considered simultaneously. When these features are detected, input patterns can be mapped into a different space called a hidden unit space.

A hidden unit space is like a pattern

space, but in a hidden unit space the coor-dinates of input patterns are given by the activities of the hidden units instead of by

the input units. For example, the multilayer perceptron illustrated in Figure 4-2 confronts a 12-dimensional pattern space. However, the hidden units can map this pattern space into a simpler, two-dimensional hidden unit space by representing inputs in terms of the presence of higher-order features. The key idea here is that the output unit may find carving the hidden unit space much more tractable than carving the pattern space di-rectly.

These two accounts of what hidden units

do are related, but focus on different aspects of solving a categorization problem. When we interpret the internal structure of a net-work like the multilayer perceptron of Figure 4-2, we will often consider how hidden units carve the pattern space, as well as how out-put units carve the hidden unit space.

Importantly, adding hidden units makes

artificial neural networks extremely powerful – indeed, as powerful as any universal com-puting machine of interest to cognitive sci-ence (Dawson, 2013). Thus we should nev-er be surprised when such a network learns to solve a musical problem. Surprises emerge instead from our investigation of the solution that the network has discovered, and what it might tell us about music (Dawson, 2004).



4.2 Training the Multilayer Perceptron

4.2.1 Credit Assignment Chapter 2 described a variety of super-

vised learning rules for modifying the weights of a perceptron. These weights are changed in such a way to reduce output unit error as learning proceeds. In general, the value of a weight change for any perceptron connection is equal to the product of three different numbers: the learning rate, the ac-tivity at the input end of the connection, and the error at the output end of the connection. The different learning rules that were de-scribed in Chapter 2 all follow this general format. The specific differences between each learning rules involve technical details of how error is mathematically defined.

A multilayer perceptron is also typically

trained using a supervised learning rule that minimizes output unit error. One can see how the rules described earlier for the per-ceptron can be readily applied to the outer layer of connections from the two hidden units in Figure 4-2 to the output unit: the ac-tivity at the input end of one of these con-nections is the activity of a hidden unit, and the error at the output end is the computed error for the output unit (i.e. the difference between desired and actual output unit ac-tivity).

However, a problem is encountered

when we attempt to define weight changes for any of the inner connections from an in-put unit in Figure 4-2 to either of the hidden units in this network. The activity at the in-put end of one of these connections is input unit activity. However, we have no idea what the error at the output end of the con-nection is. This is because the hidden units are literally hidden from feedback; we do not know the responses these units should make, and therefore cannot define error for them in the same fashion as we do for the output unit.

The difficulty in defining hidden unit error

is an example of the credit assignment prob-lem (Minsky, 1963). When the output unit in Figure 4-2 generates error, the source of this error is the signals being sent to it from the hidden units. The problem is that we don’t know how much of the error is caused

by one hidden unit, and how much is caused by the other. The credit assignment prob-lem is the inability to assign the appropriate credit – or more to the point, the appropriate blame -- to each hidden unit for its contribu-tion to output unit error. The connectionist revolution that occurred in cognitive science in the 1980s (McClelland & Rumelhart, 1986; Rumelhart & McClelland, 1986) oc-curred when researchers discovered a solu-tion to this credit assignment problem.

Researchers were able to solve the cred-

it assignment problem for the multilayer per-ceptron when the continuous logistic activa-tion function was incorporated into pro-cessing units. This permitted changes in network behavior to be explored with calcu-lus (Rumelhart, Hinton, & Williams, 1986b). Researchers used calculus to determine how network error responds to changes in an inner connection between an input unit and a hidden unit.

This mathematical investigation of learn-

ing revealed that a hidden unit’s error can be specified as the sum of signals sent to it by each of the output units to which it is con-nected. Each of these signals is an output unit’s error, scaled by the weight of the con-nection between the output and the hidden unit. In other words, output units send error signals backwards through the network, and these signals determine hidden unit error, solving the credit assignment problem. Not surprisingly the learning rule for multilayer perceptrons is often called backpropagation of error, or backprop for short. Because this rule is a generalization of the supervised learning rules for perceptrons, it is also known as the generalized delta rule.

Interestingly, there is a different kind of

credit assignment problem associated with the generalized delta rule: who to credit with its discovery. Rumelhart, Hinton and Wil-liams (1986a, 1986b) are its most famous discoverers and popularizers. It was also discovered by David Parker in 1985 and by Yann LeCun in 1986 (Anderson, 1995). Er-ror backpropagation was also reported in Paul Werbos’ 1974 doctoral thesis (Werbos, 1994), and its mathematical basis was re-



ported even earlier by Shun-Ichi Amari (Amari, 1967).

When backprop is used to define hidden

unit error, all of the connections in a multi-layer perceptron can be modified in such a way that output unit error is reduced. In par-ticular, the weight change for a connection between an input unit and a hidden unit is now definable as the product of the learning rule, activity at the input end of the connec-tion, and error at the output end of the con-nection. This error is the error of a hidden unit, which is the sum of all of the weighted error signals that have been sent backwards through the network to this particular unit.

The mathematical derivation and expres-

sion of the generalized delta rule is readily available (Rumelhart, Hinton, & Williams, 1986a; Rumelhart et al., 1986b), but is not required for our purposes. We only need to know the general algorithm for modifying the different layers of connections in a multilayer perceptron. The next section provides a generic description of error backpropaga-tion.

4.2.2 Backpropagating Error

The generalized delta rule is used to train

a multilayer perceptron to mediate a desired input-output mapping. Before training be-gins, a network is a “blank slate”; all of its connection weights, and all of the biases of its activation functions, are initialized as small, random numbers. The generalized delta rule involves repeatedly presenting training patterns -- input-output pairs -- and then modifying weights, as was the case for perceptron training in Chapters 2 and 3.

In the generalized delta rule a single

presentation of an input-output pair pro-ceeds as follows: The first step is to feed signals forward through the network. The input pattern is presented by activating the multilayer perceptron’s input units. This causes signals to be sent to hidden units, which compute their net input and then their activity. Next, the hidden units send signals to the network’s output units. The network’s output units then compute their net input and activity. Output unit activities represent the network’s response to the input pattern.

Second, now that output units have acti-vated, their response error can be meas-ured. Error is computed for each output unit by taking the difference between the desired activity and the observed activity for each output unit. This procedure is identical to that used to train perceptrons.

Third, an output unit’s error is used to

modify the weights of its immediate connec-tions. As was the case for perceptron train-ing, a weight change is added to the existing weight. The weight change is computed by multiplying three different numbers together: a learning rate, the output unit’s error, and the current activity at the input end of the connection. (Output unit error will also be scaled by the derivative of the output unit’s activity; backprop is gradient descent learn-ing.) In other words, up to this point there is no essential difference between the super-vised learning of a multilayer perceptron and the supervised learning of a perceptron.

The fourth step differentiates backprop

from the gradient descent training of a per-ceptron. In this step, each hidden unit com-putes its error. This is done by treating an output unit’s error as if it were activity, and sending it backwards as a signal through a connection to hidden units. As this signal is sent, it is multiplied by the weight of the out-put unit’s connections. Each hidden unit computes its error by summing together all of the error signals that it receives from all of the output units to which it is connected. This is analogous to computing net input when activity is fed forward through the net-work.

Fifth, once hidden unit error has been

computed, the weights that feed into the hidden units can be modified using the same equation that was used to alter the weights of each of the output units. Once this step has been accomplished, all of the network’s connections will have been modified. Train-ing continues by presenting the next input-output pair in the training set, and repeating the procedure that has just been described.

The generalized delta rule is generally

defined for multilayer perceptrons that are composed of integration devices. However, as was the case for perceptrons, variations of the algorithm exist for training multilayer



perceptrons that use value units. This in-volves using the elaborated error term for the output units that was described in Chap-ter 2. Importantly, the mathematics underly-ing the training of networks of value units is essentially the same as in the original gen-eralized delta rule (Dawson & Schopflocher, 1992).

4.2.3 Design Decisions

At this point, we have four different neu-

ral network architectures to choose from: two types of networks (perceptron vs. multi-layer perceptron) crossed with two types of processors (integration device vs. value unit). This means that when we are inter-ested in training an artificial neural network on a particular musical problem, we will have to explore what kind of architecture is required, and will also have to make design decisions concerning how it is to be trained.

One theme of this book is that artificial

neural networks can only inform the cogni-tive science of music after their internal structure has been interpreted, as was illus-trated in our analysis of the scale tonic per-ceptron in Chapter 3. With this goal in mind, design decisions should be geared towards discovering the simplest network capable of solving a musical problem. This is because a simpler network will in most cases be eas-ier to interpret than a more complex net-work.

Our first step in achieving this goal is to

determine whether the simplest network – the perceptron – can solve the problem. After building a training set for the problem, we proceed to presenting it to a perceptron. This involves deciding, or empirically explor-ing, the nature of the output units for the network (integration device vs. value unit). This also involves investigating a number of different design decisions that control the learning: the initial values of the connection weights and biases, the learning rate, and so on.

The basic question addressed in this first

step is whether a perceptron, governed by particular design decisions, is capable of

converging on a solution to the problem. In Chapter 3 we found that a value unit percep-tron with µs held to 0 could identify scale tonics. However, at the end of that chapter it was also discovered that a perceptron (under a variety of different design deci-sions) was not able to classify an input scale as being either major or minor.

In this second case, when a perceptron

is not up to the task, we next proceed to ex-ploring a more powerful network, the multi-layer perceptron. This exploration involves the same sort of design decisions used when the perceptron was investigated (learning rate, initialization, activation func-tion, etc.). However, multilayer perceptrons present additional design decisions. For example, how many hidden units should be included?

Determining the number of hidden units

required to solve a problem requires explor-ing the behavior of a number of different sized networks. Typically one begins with an educated guess about how many hidden units should be included. On the one hand, if the multilayer perceptron performs poorly on the problem (e.g. fails to learn it, stabiliz-es to very high error, etc.), then a different network that uses more hidden units is ex-plored. On the other hand, if the network solves the problem quickly, then a network that uses fewer hidden units should be ex-plored. This is because we are seeking the simplest possible network, which in this case is the one with the fewest hidden units. In other words, we run many different simula-tions as we search for the simplest multi-layer perceptron – that is, the smallest num-ber of hidden units – that can be reliably trained to solve the problem of interest.

Once the simplest network for solving a

musical problem of interest has been dis-covered, we proceed to analyze its internal structure. The purpose of this analysis is to discover how the network maps input pat-terns into output responses. For the kinds of problems discussed in this book, our hope is that network interpretation will reveal some interesting properties of music.



4.3 Distinguishing Major from Minor Keys

4.3.1 Task Our goal was to train an artificial neural

network to distinguish major keys from minor keys. The network is presented the same input stimuli that were used to train the scale tonic perceptron in Chapter 3: either a major scale or a harmonic minor scale presented using a pitch-class representation. The network was trained to turn a single output unit ‘on’ if the input pattern defined a major key, and to turn it ‘off’ if the input pattern defined a minor key.

4.3.2 Network Architecture Our seeking of the simplest network for

accomplishing this musical task proceeded along the lines described in Section 4.2.3. As mentioned at the end of Chapter 3, we were unable to successfully train a percep-tron to identify key type for these different input patterns. This surprised us, because it indicates that identifying scale tonics is a simpler problem than is identifying scale type, because the former can be learned by a perceptron while the latter cannot.

In accordance with Section 4.2.3, our

next step was to discover the simplest multi-layer perceptron that was capable of identi-fying scale type. After exploring a variety of different networks, the network that we set-tled upon is the multilayer perceptron illus-trated in Figure 4-2.

This multilayer perceptron uses 12 input

units to represent the presence or absence of pitch-classes using the same encoding that was described in Chapter 3 for the scale tonic perceptron. (This was because both networks were trained with exactly the same set of input patterns.) The network uses a value unit as its single output unit. The out-put unit was trained to turn ‘on’ if a stimulus represented a major scale, and to turn ‘off’ if a stimulus represented a harmonic minor scale. Finally, the network uses two hidden value units as intermediate processors be-tween its input and output units. There were no direct connections between the input units and the output unit.

4.3.3 Training Set The training set consisted of 12 different

major and 12 different harmonic scales rep-resented in pitch-class (i.e. each input stimu-lus was one of the rows of numbers pre-sented in Chapter 3 as Table 3-1). As was the case for the scale tonic perceptron, each input pattern was used to turn the network’s 12 different input units either ‘on’ or ‘off’ to indicate the presence or absence of the var-ious pitch-classes. The desired response for an input pattern that represented a major scale was 1, and the desired response for an input pattern that represented a harmonic minor scale was 0.

4.3.4 Training The network was trained using the gen-

eralized delta rule developed for networks of value units (Dawson & Schopflocher, 1992) using the Rumelhart software program (Dawson, 2005). This program is available as freeware from the author’s website. Dur-ing a single epoch of training each pattern was presented to the network once; the or-der of pattern presentation was randomized before each epoch.

All connection weights in the network

were set to random values between -0.1 and 0.1 before training began. The µs of the output and hidden units were set to 0 throughout training. A learning rate of 0.01 was employed. Training proceeded until the network generated a ‘hit’ for each of the 24 patterns in the training set. A ‘hit’ was de-fined as activity of 0.9 or higher when the desired response was 1 or as activity of 0.1 or lower when the desired response was 0.

The multilayer perceptron in Figure 4-2

quickly learned to solve this problem, typi-cally converging after between 350 and 475 epochs of training. The network described in more detail in the next section learned to solve the problem after 390 epochs of train-ing.



4.4 Interpreting the Scale Type Network

How does the multilayer perceptron in Figure 4-2 detect the difference between a major scale and a harmonic minor scale, regardless of the scale’s tonic? In order to answer this question we examined the hid-den unit space that confronted the output unit, as well as the input pattern features detected by the two hidden units the internal responses and structure of the trained net-work.

4.4.1 Hidden Unit Space The two hidden units in the network must

detect stimulus properties that permit the output unit to distinguish major keys from minor keys. In order to gain some insight into what features the hidden units detect, we plotted the network’s hidden unit space. Because the multilayer perceptron uses only two hidden units, its hidden unit space can be illustrated with a two-dimensional scat-terplot as is shown in Figure 4-3. (Note that in this figure more than one point falls in ex-actly the same location.)

Figure 4-3. The hidden unit space for the

scales type network. Filled circles indicate the locations of major scale patterns in this space; unfilled circles indicate the locations of harmonic minor scale patterns. Note that there are 24 different patterns illustrated in this space, but some patterns occupy the

identical position.

There are a number of regularities that are evident in Figure 4-3. First, all of the patterns that represent major scales fall near the origin of this space. This means that major key stimuli tend to produce near-zero activity in both hidden units. This further suggests that the role of each hidden unit is to detect (turn on to) some property that in-dicates that a stimulus is not related to a major key.

Second, all of the stimuli related to major

scales appear to be highly similar to one another, because all are clustered closely together in Figure 4-3. In contrast, the stim-uli related to minor scales are spread a wide distance apart. The minor scale stimuli seem to all fall roughly in a diagonal line that falls downwards from left to right and there also seem to be distinct clusters of different stimuli along this line. An account of the features detected by the two hidden units should explain such regularities.

Figure 4-3 does suggest how the output

unit can carve the hidden unit space to sep-arate all of the major key stimuli from all of the minor key stimuli, solving the problem. All that is required is that the two parallel cuts available to the output value unit are placed in such a way that they contain all of the patterns near the origin, and do not con-tain any of the minor scale patterns that are further away. For instance, this could be accomplished by aligning the value unit cuts to be parallel with the line of minor scale patterns, but translated downwards to cap-ture all of the filled circles near the origin.

4.4.2 Hidden Unit Weights What features do the hidden units detect

to determine that a stimulus represents a minor key and not a major key? In other words, what are the two hidden units detect-ing that permits them to position the different stimuli in the hidden unit space of Figure 4-3? In order to explore this issue we exam-ined the values of the weights connecting each of the input units to the hidden units.



Figure 4-4. Connection weights between input units and each hidden unit.

Figure 4-4 presents two bar plots, one il-

lustrating the weights of the connections between the 12 input units and Hidden Unit 1, the other illustrating the same information for Hidden Unit 2. These two patterns of connectivity determine when particular input patterns will cause either hidden unit to gen-erate high activity, representing the detec-tion of a minor key (as revealed in the Figure 4-3 scatterplot). Interpreting these connec-tion weights may reveal what ‘minor scale features’ are being detected by each of these hidden units.

There is a high degree of regularity in

both of the graphs illustrated in Figure 4-4. Exactly half of the pitch-classes in each plot have negative connection weights, and the other half have positive connection weights. Furthermore, the overall shape of each plot is identical, but the pattern for Hidden Unit 1 is ‘phase shifted’ three pitch-classes to the left in the plot for Hidden Unit 2. Consider the pattern of bars for Hidden Unit 1 from D# onwards to the right. The identical pattern is evident for Hidden Unit 2, but only if one begins with pitch-class C instead of D#.

Our Chapter 3 of the difference between

major scales and harmonic minor scales focused on the pattern of distances between adjacent pitch-classes in the scale. Major scales were defined as a specific sequence of whole tone and semitone distances; har-monic minor scales were defined as a differ-ent sequence involving whole tone, semi-tone, and augmented second distances. Perhaps the connectivity patterns illustrated in Figure 4-4 capture some aspect related to adjacent pitch-class differences.

For instance, are the hidden units detect-ing the presence of adjacent pitch-classes that are an augmented second apart? The presence of this feature, if detected, would uniquely define a set of pitch-classes as be-ing related to a minor key. However, exam-ining pairs of weights coming from pitch-classes that are particular distances apart (i.e. one, two, or three semitones) does not indicate that either hidden unit is detecting this property.

For example, consider the weights for

Hidden Unit 1. A and C are an augmented second apart; if both of these input units were turned on a strong negative signal would be sent to the hidden unit. However, D# and F# are also separated by an aug-mented second; if both of these input units were turned on a strong positive signal would result. There does not seem to be any property consistently true of any pair of pitch-classes that are separated by any dis-tance central to the definition of major or minor scales.

However, if one considers relations be-

tween pitch-classes separated by a much larger musical distance, then the patterns of weights in Figure 4-4 reveal a striking prop-erty. Consider the two most extreme weights for Hidden Unit 1, the ones connect-ing this unit to incoming signals from A and D#. Not only are these two weights the most extreme, but they seem almost equal in magnitude, though they point in opposite directions (i.e. one weight is positive while the other is negative). These two pitch-classes are a musical interval of a tritone (six semitones) apart on a piano keyboard. If we compare other weights for this unit



coming from input pitch-classes that are a tritone apart, then we see this pattern re-peated: these pairs of weights are roughly equal in magnitude but opposite in sign.

This same regularity is also evident in the

Figure 4-4 plot of Hidden Unit 2 weights. For instance, the two most extreme weights (from C and from F#) are roughly equal in magnitude but point in opposite directions, and again are separated by a tritone.

Figure 4-5 presents exactly the same da-

ta that is shown in Figure 4-4, but in Figure 4-5 weights from pitch-classes that are a tritone apart are graphed directly on top of one another. Plotting the weights in this fashion produces two graphs in which the upper bars are a mirror image of the lower bars. This clearly indicates that weights from any two pitch-classes that are a tritone apart have the same magnitude, but are opposite in sign. This means that, for either hidden unit, if two pitch-classes a tritone apart are both turned on their signals will cancel each other out, producing a net sig-nal of zero.

Figure 4-5. Connection weights between input units and each hidden unit. The data is iden-tical to that presented in Figure 4-4, but bars are arranged so that pitch-classes a tritone

apart are stacked on top of each other in each plot.

The notion that signals from different in-

put units cancel each other out when re-ceived by a hidden unit is of particular note given that the multilayer perceptron in Figure 4-2 uses value units for hidden units. Recall that a value unit’s Gaussian activation func-tion is defined with one parameter, µ, and

that for a value unit to generate maximum activity its net input must equal µ.

In the multilayer perceptron trained to

distinguish major from harmonic minor scales, the value of µ for each hidden unit, and for the output unit, is 0. Thus for either hidden unit to generate a maximum re-sponse, the incoming signal from the input units must also equal 0. This is why the no-tion of two signals that cancel one another is of such import.

Specifically, the connection weight pat-

tern evident in Figure 4-5 suggests that both hidden units detect tritone balance. That is, the ideal stimulus (i.e. the input pattern that produces a maximum response) is one in which pairs of pitch-classes a tritone apart are balanced. In such a stimulus two pitch-classes a tritone apart are in the same state: they are either both off or both on. When both pitch-classes are off, their input units send a zero signal through the two connec-tions. When both pitch-classes are on, both send signals through the two connections. However, because the weights of the two connections are equal in magnitude but op-posite in sign, their signals will again cancel out, contributing nothing to a hidden unit’s net input.

However, it is possible for pairs of pitch-

classes a tritone apart to be imbalanced. This occurs when one pitch-class is present, but the corresponding pitch-class is absent. In this situation a definite positive or nega-tive contribution will be added to a hidden unit’s net input. This is because when the tritone is not balanced, the signals from the two pitch-classes will not cancel out. As a result, the imbalanced tritone will shift net input away from µ, making it much more likely that a hidden unit will not respond.

Figure 4-3 indicated that the two hidden

units detect features true of minor keys, and not true of major keys. Our analysis of con-nection weights indicates that this feature is ‘tritone balance’. How is this feature related to the musical definition of major or minor keys?



4.5 Tritone Imbalance and Major Keys

Figure 4-6. Spokes in a circle of minor seconds can be used to represent the pitch classes that de-fine major and minor keys. See text for details.

In Western music, there are six possible

pairs of pitch-classes that are a tritone apart from each other: [A, D#], [A#, E], [B, F], [C, F#], [C#, G], and [D, G#]. In any stimulus presented to the network analyzed in the previous section, as more and more of these pairs are balanced (i.e. their pitch-classes are both in the same on or off state), the greater will be the response of either hidden unit.

Why does the network use tritone bal-

ance to distinguish harmonic minor scales from major scales? Figure 4-6 provides an answer to this question. This figure is an alternative version of Figure 3-2, and depicts the pitch-class content of the A major and the A harmonic minor scales. Figure 4-6 differs from Figure 3-2 by indicating not only which pitch-classes are present in a scale (solid spokes), but also which pitch classes are absent from a scale (dashed spokes).

Although Figure 4-6 provides only two

example scales (A major and A minor), we saw in Chapter 3 that the two spoke patterns that it depicts can represent any major or harmonic minor scale respectively. This is because one can transpose one of the de-picted scales into any other musical key by rigidly rotating the spoke pattern to a new orientation within the circle.

Figure 4-6 is designed to highlight the tri-

tone relationships between corresponding pairs of pitch-classes. Two pitch-classes

that are a tritone apart are directly opposite one another in the circle of pitch-classes. As a result, one can quickly inspect Figure 4-6 for tritone balance. If a tritone pair is balanced, then the diameter through the wheel that connects its two pitch-classes will be constant in appearance. For instance, in the A major diagram, the pair [D, G#] is bal-anced because there is a single solid line connecting these two pitch-classes. In the A minor diagram, [D, G#] and [B, F] are both balanced with a solid line, while [C#, G] is balanced with a dashed line.

Recognizing that the two spoke diagrams

in Figure 4-6 apply to any major or harmonic minor key respectively, two general proper-ties are now apparent. First, major scales have almost no tritone balance. For any major scale there will be one and only one balanced tritone.

To relate this point to some of the set-

theoretic material from Chapter 3, the Forte number for a major scale is 7-35 which has the corresponding ic vector 254351. The last digit in this ic vector indicates the pres-ence of a single tritone (i.e. the presence of two pitch-classes a tritone apart) in this mu-sical entity. In a major scale, the tritone is the rarest musical interval.

Second, for any harmonic minor scale,

there will be three and only three balanced tritones. In set theoretic terms, the Forte number for a harmonic minor scale is 7-32,



and is associated with the ic vector 335442. The last digit in this ic vector indicates the presence of two tritone intervals (i.e. in terms of Figure 4-6 two balanced pairs of tones present in the scale). Importantly, what the ic vector for pc set 7-32 fails to capture is the absence of two additional pitch-classes that represent the third bal-anced tritone (e.g. the absence of both C# and G on the right side of Figure 4-6).

It is not surprising that musical analyses

describe regularities that are present. How-ever, the behavior of the hidden units in the scale type network is also governed by the absence of important information. For ex-ample, when the multilayer perceptron is presented the pc set for the A harmonic mi-nor scale, its identification of this pc set as being ‘minor’ depends crucially on the fact that both C# and G are missing. While this might be an odd way for a human musical analyst to think about musical scales, it is perfectly natural for an artificial neural net-work.

Of course, the mere fact that the multi-

layer perceptron distinguishes major from harmonic minor scales by detecting tritone balance also seems odd at first glance. A more careful examination of the literature reveals some information about the im-portance of tritones as sources of infor-mation related to musical keys.

On the one hand, the fact that tritones

can be used to define key properties is not a completely novel discovery. For instance, when theory books define tritones, they of-ten note that the pitch-classes in a major scale will only include one tritone interval, a point made explicit in the ic vector for Forte number 7-35.

On the other hand, apart noting its rarity

in the major scale, and recommending that it be avoided because of its dissonant rough-ness (Piston, 1962), little more is typically said about the tritone. A detailed formal analysis of the relationship between tritones and keys is in fact a fairly modern discovery (Browne, 1981).

Richmond Browne was the first theorist

to use pitch-class set theory to argue that rare musical intervals should play an im-

portant role in establishing tonality. “When one hears a tritone, or a minor second, one’s tonal ‘knowledge’ offers a greater sense of the possible ‘places one may be in’ than when one hears a relatively common interval (like a fourth or a major second) which could hold any one of a number of places in the diatonic field” (Browne, 1981, p. 7).

Browne’s (1981) theory inspired experi-

mental investigations of the ability of human listeners to use rare intervals to identify mu-sical tonality (Brown & Butler, 1981; Butler, 1989). Brown and Butler presented minimal musical stimuli (e.g. three notes in succes-sion) to listeners who were required to iden-tify their tonal centers. When these stimuli contained a rare interval like the tritone lis-teners could accomplish this task, and there was high agreement in judgments between different listeners. However, task perfor-mance and inter-listener agreement declined when rare intervals were not present. Butler (1989, p. 238) explains this performance as follows: “Any tone will suffice as a perceptu-al anchor – a tonal center – until a better candidate defeats it. The listener makes the perceptual choice of most-plausible tonic on the basis of style-bound conventions in the time ordering of intervals that occur only rarely in the diatonic set.”

Interestingly the multilayer perceptron

deviates both from Browne’s (1981) theory and the experimental evidence that supports it in two important ways. First, both theory and evidence emphasize the rareness of intervals like the tritone. Browne notes that the ic vector for Forte number 7-32 (the harmonic minor scale) is 335442. This vec-tor reflects the fact that in this scale “the rare diatonic intervals are less rare, the common less common” (p. 8). For Browne, this means that the intervallic information availa-ble is more ambiguous, and less likely to establish tonality.

In contrast, the multilayer perceptron

does not separate major from minor scales by detecting the rareness of the tritone in the former, but instead detects the commonness of the tritone in the latter. As more and more balanced tritones are detected, hidden unit activity increases. A rare tritone hardly elicits any hidden unit activity at all.



A second important difference between

Browne’s (1981) theory and the methods exploited by the network is that the former emphasizes the presence of the tritone. As noted above, the network detects minor scales by noting not only that two tritones are present, but also by noting that a third tritone is absent.

The use of tritones to distinguish the

mode of a scale (major vs minor), in spite of the existence of Browne’s (1981) theory, is quite atypical. A more standard algorithm is to measure the frequency of occurrence of different pitch-classes in a stimulus (Krumhansl, 1990a). Carol Krumhansl has established beyond doubt that different pitch-classes have different levels of im-portance, and different frequencies of occur-rence, depending upon musical key. This has led to distributional algorithms that, after gathering pitch-class frequencies, match these to known distributions to determine the tonic and the mode of a musical pattern. Butler (1989) has proposed rare intervallic content as an alternative to this distributional approach, but Butler’s perspective has not gone unchallenged (Krumhansl, 1990b).

It might be supposed that the fact that

the multilayer perceptron uses tritone bal-ance to distinguish major from minor keys provides support to Butler’s (1989) position, and weighs against Krumhansl’s (1990b) rebuttal. However, this is not the case. The neural network has no access to any infor-mation concerning pitch-class distributions associated with keys, and so it is not surpris-ing that it does not develop this type of algo-rithm. The fact that it quickly learns to focus on tritone balance demonstrates the viability of the positions of Butler and of Browne (1981), but does little more.

There is, however, a great deal more in-

teresting musical structure that can be ob-tained by examining the internal structure of the multilayer perceptron in more detail. The next section of this chapter pursues this information by focusing on the nature of a particular geometric representation created by the multilayer perceptron, its hidden unit space.



4.5 Further Network Analysis

Now that we have interpreted the net-work’s hidden units as tritone balance detec-tors, let us return to understanding the hid-den unit space of Figure 4-3. In particular, let us explore the arrangement of the 12 harmonic minor scales in this space.

4.5.1 Linear Arrangement One of the regularities in Figure 4-3 was

that the minor scales were arranged along a straight line. The second two columns of Table 4-1 list the activity produced in each hidden unit by each of the minor scale

stimuli. Taken together these columns pro-vide the coordinates of the minor scale loca-tions in Figure 4-3. In order to objectively confirm the linear arrangement of these points, linear regression was used to predict Hidden Unit 2 activity from Hidden Unit 1 activity. A line with a slope of -0.9913 and an intercept of 0.9036 produced a near per-fect fit to all twelve points representing minor keys (R2 = 0.983). The representation of the harmonic minor scales by the two hidden units is such that positions each scale al-most perfectly along a straight line through the pattern space.

Scale Root

Hidden Unit 1

Activity

Hidden Unit 2

Activity

Balanced Tritone 1 (Present)

Balanced Tritone 2 (Present)

Balanced Tritone 3 (Absent)

Unbalanced Pitch-classes

Hidden Unit 1

Unbalanced Signal

Hidden Unit 2

Unbalanced Signal

G# 0 0.93 [A#, E] [C#, G] [C, F#] B, D#, G# 1.58 -0.16

D 0 0.92 [A#, E] [C#, G] [C, F#] A, D, F -1.59 0.16

A 0 0.84 [B, F] [D, G#] [C#, G] A, C, E -2.03 0.22

D# 0 0.84 [B, F] [D, G#] [C#, G] A#, D#, F# 2.06 -0.25

A# 0.19 0.76 [C, F#] [A, D#] [D, G#] A#, C#, F 0.69 0.31

E 0.24 0.72 [C, F#] [A, D#] [D, G#] B, E, G -0.70 -0.31

C# 0.72 0.23 [A, D#] [C, F#] [B, F] C#, E, G# 0.29 -0.67

G 0.76 0.22 [A, D#] [C, F#] [B, F] A#, D, G -0.32 0.70

C 0.84 0 [D, G#] [B, F] [A#, E] C, D#, G 0.24 2.05

F# 0.84 0 [D, G#] [B, F] [A#, E] A, C#, F# -0.24 -2.05

B 0.92 0 [C#, G] [A#, E] [A, D#] B, D, F# -0.14 -1.55

F 0.92 0 [C#, G] [A#, E] [A, D#] C, F, G# 0.18 1.57

Table 4-1. Properties of the twelve harmonic minor scales and their position in hidden unit space. See text for details.

4.5.2 Scale Geometries What can be said about the locations of

individual harmonic minor scales along the line through the hidden unit space? Answer-ing this question requires some discussion of how the network encodes the pitch-class structure of these scales. However, prior to this discussion we need to provide a brief context about using geometry to explore the relationships between musical entities.

In the 18th century Leonard Euler pro-

posed a two-dimensional lattice for repre-senting relationships between musical notes. This lattice was popularized and modified in the 19th century by Hugo Rie-mann (Krumhansl, 2005; Tymoczko, 2012),

who arranged notes in the lattice according to established notions of musical conso-nance. Riemann’s lattice, called the Ton-netz, arranges notes that that one note’s nearest neighbors are those notes that it is closest too in terms of musical consonance. That is, it connects together notes that are separated by musical intervals of a perfect fifth, a major third, or a minor third.

Figure 4-7 illustrates one small part of

Tonnetz, part of the lattice centered about the note A. Its two nearest neighbors in the horizontal direction are a perfect fifth (7 semitones) below and above A on a piano keyboard. Its two nearest neighbors along one diagonal are a major third (4 semitones) below and above A on a piano. Its two



nearest neighbors along the other diagonal are a minor third (3 semitones) below and above A on a piano.

Figure 4-7. A small section of Riemann’s Ton-netz, centered upon the musical note A. See

text for details. The segment of the Tonnetz illustrated in

Figure 4-7 can be extended in any direction in the plane by adding links to any of the terminal circles in the figure to additional notes that are the required musical intervals away. The Tonnetz is useful because geo-metric patterns drawn upon it can represent more complex musical entities. For in-stance, triangles whose vertices are con-nected notes represent triads, which are chords constructed from three pitch-classes. For instance, the A-C-E triangle represents the A minor chord (Am), while the A-C#-E triangle represents the A major triad (A).

The Tonnetz is a geometric construction

that relates individual notes. Other dia-grams have been proposed to related larger musical entities, such as chords (Hook, 2006; Tymoczko, 2006, 2011) or scales (Schoenberg, 1969).

For example, Figure 4-8 presents

Schoenberg’s (1969) regions of similarity between different musical keys. This partic-ular example is centered upon the key of A major. Direct links are made between this key and those keys to which it is most simi-lar. In the vertical dimension, A major is linked to the two keys that are a perfect fifth above or below it. In the horizontal direction it is linked to its relative minor on the left; a major key and its relative minor share the same key signature. It is also linked to its

parallel minor on the right; a major key and its parallel minor share the same root (in this case, A).

Figure 4-8. A map of key signatures about the

key of A major, following the rules of Schoenberg (1969).

Why does Schoenberg decide on these

different relationships in order to decide the maximum similarity between one key and four others? The reason is because these relationships identify scales that are maxi-mally similar to one another in terms of the number of shared pitch-classes. Consider Figure 4-8. Both the E major and the D ma-jor scales differ from the A major scale by only one pitch-class. The same is true be-tween F# minor and A major. Interestingly, the parallel minor is slightly less similar to its major key: A minor differs from A major by two pitch-classes instead of one.

The fact that Schoenberg’s (1969) map

links maximally similar keys together sug-gests an alternative, objective approach to creating a geometric representation of the relations amongst musical entities. First, one can define a measure of the similarity between two musical keys. A typical ap-proach for doing this is to represent a musi-cal key with its scale (e.g. major scale, har-monic minor scale), where the scale is given as the set of seven pitch-classes that it in-cludes. The similarity between two keys can then be measured by counting the number of pitch-classes shared by the two scales. This measure will range from a maximum of 7 (when the two scales are identical) to a minimum of 2 (e.g. A major and A# major only share the pitch-classes A and D).



Second, one can convert this measure of similarity into one that is distance-like. In a distance-like measure of similarity, a smaller number indicates greater similarity, because the two items being compared are closer to one another. One can convert pitch-class overlap into a distance-like metric by simply subtracting the number of shared pitch-classes from 7. With this metric two identi-cal scales will produce a minimum value of 0; a maximum distance of 5 can be obtained when only major or harmonic minor scales are compared. We computed this metric for all possible pairs of major and harmonic mi-nor scales, producing a 24 X 24 matrix of measured distances between each pair.

Third, once a matrix of distances has

been created it can be analyzed using a sta-tistical technique called multidimensional scaling (Kruskal & Wish, 1978; Shepard, Romney, & Nerlove, 1972). Multidimen-sional scaling (MDS) is a technique that is related to factor analysis. It generates a set of coordinates for each object that was used to create the distance matrix. These coordi-nates are such that if one measures the dis-tances between them, then one can recon-

struct the original distance matrix. In other words, MDS produces a map of objects from which the original set of distances could be derived.

The ability of MDS to capture the struc-ture of a distance matrix is measured by the fit between the original distances and the distances taken from the derived map. The quality of the fit depends upon the number of dimensions (i.e. the number of coordinates) that MDS uses for the map that it creates. We explored two different MDS solutions for the distances between scales, one that used two dimensions, and a second that used three.

Figure 4-9 is a plot of the two-

dimensional MDS solution for the matrix of scale distances. This solution accounted for 58.1% of the variance in the original dis-tance matrix; a better fit is generated with a higher dimensional solution, but Figure 4-9 is more easily compared with the two-dimensional structures presented above in Figures 4-7 and 4-8.

Figure 4-9. The two-dimensional MDS solution for the matrix of distances between pairs of major or

harmonic minor scales. See text for details.



Figure 4-9 represents the 24 scales in a musically elegant and satisfying map. It ar-ranges all of the major scales in a well-known musical pattern: the circle of perfect fifths. In this circle, adjacent scales are a perfect fifth apart. For instance, note that the nearest neighbors in this circle to the A major scale at the bottom are E major and D major. These two are also A’s nearest neighbors in the Tonnetz (Figure 4-7) and in Schoenberg’s map (Figure 4-8). The MDS map also arranges the minor scales around a separate circle of perfect fifths, and places this circle inside the circle of major scales.

The two different circles of perfect fifths

in Figure 4-9 have different orientations; for instance, the inner circle of harmonic minor scales is oriented so that the two scales closest to A major are F# minor and B mi-nor. F# minor is also a nearest neighbor to A major in Schoenberg’s map. The position-ing of B minor in the MDS solution deviates from Schoenberg’s map, but is not surpris-

ing. First, B minor and A minor are equally distant from A major (i.e. both share 5 pitch-classes with A major). Second, MDS posi-tions objects to provide the best fit to all of the distances in the original matrix, which in this case requires that B minor be closer to A major than is A minor, contrary to Schoenberg’s map.

Figure 4-10 presents the three-

dimensional MDS solution for the scale dis-tances matrix. The added dimension im-proves the fit to the data; it accounts for 69.2% of the variance in the distance matrix. The improved fit is accomplished, for in-stance, by pulling different sets of minor scales away from one another in the vertical direction. At the top of the cube one finds D#m, Cm, Am, and F#m; in the middle of the cube G#m, Fm, Dm, and Bm; at the bottom of the cube C#m, G#m, Gm, and Em. There is also some vertical separation between major scales, but all are generally positioned around the middle of the cube.

Figure 4-10. The three-dimensional MDS solution for the matrix of distances between pairs of major

or harmonic minor scales. See text for details.



What is the relationship between the two MDS solutions in Figures 4-10 and 4-9? It appears that if one was to project the posi-tions of the scales in Figure 4-10 down so that they all fell on the bottom plane of the cube, the result would be the two circles of perfect fifths that are apparent in Figure 4-9.

To sum up the last few paragraphs, we

have considered some classic geometric representations of musical elements, the Tonnetz and Schoenberg’s (1969) map of regions, which use proximity to reflect the similarity between notes and keys respec-tively. We have also taken a standard no-tion of musical similarity, the number of shared pitch-classes, and used multidimen-sional scaling to convert it into maps that arrange similar musical scales close to one another. Let us now return to another geo-metric representation of scales, the hidden unit space of Figure 4-3, and consider it in light of these more traditional geometric constructs.

An obvious regularity of Figure 4-3 is that

it clearly separates major scales from har-monic minor scales. However, this separa-tion is quite different from the MDS solutions in Figure 4-9 and Figure 4-10. Instead of surrounding minor scales with major scales, the hidden unit space pulls minor scales away from the origin. Such a clear differ-ence between the hidden unit space and other traditional geometric representations is what makes the hidden unit space interest-ing.

A second regularity in Figure 4-3 was not

only that harmonic minor scales were ar-ranged along a line through the hidden unit space (see Section 4.5.1), but that in addi-tion there seemed to be a definite grouping of points along this line. In particular, points clustered together in pairs. This grouping is less evident in Figure 4-3 because in some cases paired points fall exactly on top of one another. However, the pairing of points is quite evident if one examines the coordi-nates of the minor scales in hidden unit space, coordinates that were provided earli-er in Table 4-1.

Table 4-1 provides some additional in-

formation to shed light on the pairing of mi-nor scales. It provides three columns (Bal-

anced Tritone 1, 2, and 3) that list the three balanced tritones for each minor scale. (Note that balanced tritones 1 and 3 are pairs of pitch-classes that are both present in a stimulus, while balanced tritone 2 is a pair of pitch-classes that are both missing from the stimulus.)

Examining these three columns in Table

4-1 reveals that pairs of points with identical or near identical coordinates represent two minor scales that have three identical bal-anced tritones. For instance, in the hidden unit space the inputs for the F harmonic mi-nor scale and the B harmonic minor scale both are located at position (0.92, 0). Both of these minor scales have the same bal-anced tritones: [C#, G], [A, D#], and [A#, E].

The row shading in Table 4-1 is used to

highlight the pairing of minor scales. If two minor scales are adjacent in the table, and are given the same shading, then they are paired in the sense that they are located nearly on top of each other in the hidden unit space. Examining the scale roots (Column 1) of rows paired in this way reveals an in-teresting musical property that emerges from hidden unit activities: minor scales that are paired have roots that are a tritone apart. For instance the harmonic minor scales for G# and D are both located at (0, 0.93) in the hidden unit space, and G# and D are separated by a tritone interval (i.e. opposite one another in Figure 4-6).

The proximity relations amongst minor

scales in the hidden unit space are based on a definite musical property (shared balanced tritones), and produce a very regular pairing of scales. However, the balanced tritones property is an atypical musical regularity. As a result, the proximity relations in the hidden unit space are markedly different from those in more traditional spaces like Figures 4-7, 4-8, 4-9, and 4-10.

Consider the following set of harmonic

minor scales: G#m, Dm, Bm, and Fm. In Figure 4-10, these four scales are fairly close to one another, forming the four verti-ces of a square hovering in the middle of the cube. Table 4-2 provides the distances be-tween each pair of these scales, distances that are reflected in either MDS solution.



G#m Dm Bm FmG#m 0 Dm 3 0 Bm 2 2 0 Fm 2 2 3 0

Table 4-2. Distances between four specific harmonic minor scales based on shared

pitch-classes.

In the Table 4-2 matrix, any pair of scales

that are separated by a distance of 2 have roots that are a minor third (three semitones apart), and any pair separated by a distance of 3 have roots that are a tritone (six semi-tones) apart. As a result, G#m is more simi-lar to Bm and Fm than it is to Dm. This is reflected in both of the MDS solutions as well.

However, the relationships between

these four harmonic minor scales are quite different in the hidden unit space. First, ra-ther than being further apart, scales whose roots are a tritone apart have nearly identical locations in Figure 4-3. Second, rather than being closest to scales a minor third away, scales that differ by this amount are very far apart in the hidden unit space. In particular the points for G#m and Dm are the two that are furthest away from the points for Fm and Bm in Figure 4-3. Table 4-1 shows the co-ordinates for the latter two scales are the reflection of the coordinates of the former ((0.92, 0) vs. (0, 0.92)).

The long distance between these two

pairs of points is somewhat surprising. An examination of these four scales in Table 4-1 reveals that they all share two balanced tritones ([C#, G] and [A#, E]).Why are differ-ent scales with similar balanced tritone structure so far apart in Figure 4-3? The three pitch-classes that are not part of a bal-anced tritone must be responsible.

When a harmonic minor scale is pre-

sented to our multilayer perceptron, the three unbalanced pitch-classes are in es-sence the only source of net input to either hidden unit. This is because all other pitch-classes are balanced, and therefore have near zero to net input. The final three col-umns of Table 4-1 provide information to be used to consider the effects of unbalanced pitch-classes on a scale’s position in the hidden unit space.

The first of these three columns simply

lists, for each harmonic minor scale, the three pitch-classes that are unbalanced. The remaining two columns provide the net input to each hidden unit that is only due to the three unbalanced pitch classes, and re-veal some interesting properties concerning the spatial arrangement of the hidden unit space.

First, consider two scales that have iden-

tical balanced tritones (e.g. Fm and Bm). These two scales differ from one another in terms of their three unbalanced pitch-classes. For Fm these are F, G#, and C for the F minor scale, and for Bm these are B, D, and F#. Look in Table 4-1 at the net in-puts that each of these unbalanced sets produces in each hidden unit. One of these sets produces net inputs that are essentially equal in magnitude, but opposite in sign, to the net inputs produced by the other set. Remember that the symmetric form of the Gaussian activation function means that it in essence ignores the sign of net inputs. Thus while these two different sets of unbal-anced pitch-classes send different net inputs to the hidden units, the hidden units gener-ate the same activity to either set, so the two patterns wind up in the same position in the hidden unit space. This account holds for any of the paired scales in Table 4-1

Now consider a different pair of scales,

G#m and Dm, whose balanced tritone struc-ture is similar to that of Fm and Bm. The unbalanced pitch-classes for G#m and Dm produce the same magnitude of net input (ignoring sign) to the hidden units as do the unbalanced pitch-classes for Fm and Bm. However, they send this magnitude to the opposite hidden units! As a result, the posi-tion of these two scales is reflected in the hidden unit space (the coordinates (a, b) of Fm and Bm become the coordinates (b, a) of G#m and Dm).

This analysis leads to two general musi-

cal statements about the positioning of har-monic minor scales along the line through hidden unit space. First, two scales whose roots are a tritone apart will have the same position on this line. Second, two scales whose roots are a minor third apart will fall



at positions reflected across the center of this line.

The discussion of the geometry of Figure

4-3 in this section leads to two important notions. First, while the proximity relations in Figure 4-3 (made clear in the Table 4-1 coordinates) are quite different than tradi-tional ones, they are based upon musical structure. The atypical arrangement of scales in the hidden unit space is a conse-quence of the multilayer perceptron detect-ing a particular musical feature, balanced tritones. On the basis of this musical regu-larity scales that typically are viewed as be-ing distantly related become highly similar, and scales that are typically viewed as being highly similar become distantly related. In short, paying attention to the hidden unit space can reveal novel properties that are still completely consistent with formal music theory.

Second, and a consequence of the first,

the novel features exploited by an artificial neural network may point the way to new approaches to composing. One fundamen-tal principle of composition is modulation, in which a rational structure is used to change from one musical key to another midway through a piece. The map of regions of keys (Schoenberg, 1969) introduced in Figure 4-8 is a guide to such modulation. Keys that are a perfect fifth away from the current key, or the relative minor of the current key, share a majority of notes with the current key. As a result, one can move from the current key to any of these three new keys without musical disruption. In the 19th century composers also took advantage of the slightly more dis-tant relationships between a major key and its parallel minor to accomplish another kind of key modulation (Piston, 1962). This too is captured in Schoenberg’s map of regions.

The proximity relationships between mi-

nor scales in the hidden unit space suggest an alternative approach to modulating be-tween minor keys. In particular, the hidden unit space suggests that one should be able to modulate between one minor key and another that is a full tritone away, not be-cause of shared pitch-classes, but instead because of identical balanced tritones.

Furthermore, the hidden unit space sug-gests the possibility of successful modula-tion to keys other distances away, because these scales are close to one another in the hidden unit space. For instance, G#m and Dm are closest to D#m and Am in the space. Modulating from G#m to D#m is part of common practice (these two keys are a perfect fifth apart), as does modulating from Dm to Am (these two keys are a perfect fourth apart). However, the same geometry recommends less typical modulations a mi-nor second apart: from G#m to Am, or from Dm to D#m. The hidden unit space is clear-ly a source of compositional ideas that are worthy of exploration.

4.5.3 Chord Progressions The previous section made use of the

fact that each harmonic minor scale includes three unbalanced pitch-classes. It was shown that the contribution of these pitch-classes to hidden unit net input explains the distribution of minor scales along a line that crosses the hidden unit space of Figure 4-3.

Another reason that the unbalanced

pitch-classes are important is because they offer an elegant and simple musical interpre-tation: they are the three pitch-classes that define the minor triad of the harmonic scale. However, the positioning of these minor tri-ads in the hidden unit space is again surpris-ing, for reasons similar to those explored in Section 4.5.2.

In this section, we will explore the rela-

tions among the minor triads to consider how the hidden unit space differs from tradi-tional geometric representations of music from a slightly different perspective than above. To do so, let us first introduce some basic properties of minor triads, chord pro-gressions, and voice leading.



Figure 4-11. Triads built upon each pitch-class in the C major scale. Each triad only contains pitch-classes that belong to this scale. Note the use of lower-case Roman numerals to reflect the naming

of three minor and one diminished triad in this set. A minor triad is a chord (a combination of

notes) that is created from three pitch-classes taken from a minor scale. A minor triad is typically associated with the root (the first note) of a minor scale. The root of the scale is one note in the minor triad. The second note in the triad is the third note of the minor scale, which is a minor third (three semitones) higher than the root. The third note in the triad is the fifth note of the minor scale, which is a perfect fifth (seven semi-tones) higher than the root. For example, the harmonic minor scale of A has a minor triad (named Am) constructed from the pitch-classes A, C, and E.

Minor triads, and other chords, are im-

portant in Western music because they es-tablish tonality. One can build triads upon any pitch-class that belongs to a major scale, as is shown in Figure 4-11. This is accomplished by taking a scale pitch-class as root, and then by adding two additional pitch-classes that also belong to the scale. These added pitch-classes ‘skip’ over avail-able pitch-classes. For instance, the first chord in Figure 4-11, the C major triad, uses the first, third, and fifth pitch-classes of the C major scale. The second chord in Figure 4-11, the D minor triad, uses the second, fourth, and sixth pitch-classes of the C major scale. This pattern is continued for the re-maining triads in Figure 4-11.

Once triads have been constructed in

this fashion it becomes clear that different types of triads belong to one musical key. For instance, Figure 4-11 shows that the key of C major is associated with three major triads (C, F, and G), three minor triads (Dm, Em, and Am) and one diminished triad (Bº). If one only uses these triads in a composi-tion, then this establishes its tonality (i.e. the presence of these chords implies that a composition is written in the key of C major, and not in some other key).

Tonality is not merely established by the presence of harmonic elements like triads or other chords. These harmonic elements are also presented in a particular order. A se-quence of chords is called a progression; jazz musicians call a chord progression the changes.

A harmonic progression specifies the or-

der of chord presentation. In choral music, different singers provide the different notes of a chord; for one of these performers the changes involve singing one note that is part one chord, and then proceeding to sing a different note that is part of the next chord. For a composer the movement of one voice from one chord part to the next is called voice leading (Piston, 1962). In other words, voice leading maps specific notes between one chord and the next, and therefore com-plements the notion of a chord progression, which defines transformations between sets of notes (Tymoczko, 2008).

Chord progressions are central to West-

ern music, and are particularly evident in popular music. For instance, one popular chord sequence in the 1950s was the doo-wop progression which presented chords in the order I – vi – IV – V (where the Roman numerals are associated with the triad labels in Figure 4-11). This progression is central to Ben E. King’s “Stand By Me” and Gene Chandler’s “Duke Of Earl”. A modern varia-tion of this progression changes the order of these chords to vi – IV – I – V, which pro-vides the foundation for the chorus of Sarah McLachlan's "Building a Mystery”.

For any musical key, huge numbers of

different chord progressions are possible, particularly when chords composed of more than three notes are added to the composi-tional toolbox. However, only a small subset of these possible progressions is popular or prevalent.



One reason for the prevalence of a par-ticular progression is the relationship be-tween the sonority of one chord and that of the next chord in a progression. For in-stance, one chord in a progression may pro-duce a palpable tension in the listener, ten-sion that is resolved when the next chord is played. For example, the leading triad as-sociated with the seventh degree of a major scale (viiº) produces tension that is resolved by, and therefore leads into, the tonic triad (I) (Piston, 1962). Another way of describing this relationship is that the first chord estab-lishes an expectation in the listener about the next chord to be heard (Huron, 2006; Meyer, 1956; Temperley, 2007). Progres-sions that manipulate tension or expecta-tions in this way are preferable.

There is a well-established practical un-

derstanding of the principles of harmonic progression or voice leading (Piston, 1962; Schoenberg, 1969). For instance, Schoen-berg (1969, p. 4) advises that “when con-necting chords it is advisable that each of the four voices (soprano, alto, tenor and bass, generally used to present harmonic successions) should move no more than necessary.” Piston discusses two general procedures for voice leading designed “to insure the smoothest possible connection of two chords, so that one seems to flow into the next” (p. 21).

Several researchers have been interest-

ed in using mathematics to provide a more rigorous formulation of the principles of voice leading. Some of this work has ex-plored transformations between the pc sets that were introduced in Chapter 3 (Lewin, 1998; Morris, 1998; Straus, 2005). Some of this work attempts to ground principles of voice leading in the organizational principles of auditory perception (Huron, 2001; Krumhansl, 1998).

Other approaches to formalizing princi-

ples of harmonic progression and voice leading are geometric in nature (Brower, 2008; Douthett & Steinbach, 1998; Gollin, 1998; Hook, 2002, 2006; Krumhansl, 1990a, 1998, 2005; Tymoczko, 2006, 2008, 2011, 2012). That is, individual chords are repre-sented as points in a musical space, or as nodes in a Tonnetz-like lattice. The posi-tions of different chords in this space are

such that a transition from one chord to a neighboring chord represents efficient voice leading: that is, a least-energy transfor-mation from the first chord to the second. In general, efficient voice leading minimizes the distance (in a musical space) between one chord and the next (Tymoczko, 2006, 2008, 2011). This formalizes Piston’s (1962) techniques for maximizing the smoothness of the transition from one chord to another.

If efficient voice leading is a minimum

distance movement from one chord to the next in a musical space, then a critical prop-erty is the nature of the space in which chords are represented. That is, this space must be structured so that chords that are near one another offer smooth transitions for a harmonic progression. Chords that do not offer smooth transitions will be further apart in the musical space.

One such spatial representation is an

elaboration of the Tonnetz that was intro-duced in Figure 4-7 (Hook, 2006). Recall that triangular shapes connecting Tonnetz nodes define major or minor triads. Hook superimposes a second Tonnetz upon the note-based Tonnetz. In Hook’s second Tonnetz each node represents a triad con-structed from the triangle below. Connec-tions are now created between triad nodes and their immediate neighbors. These con-nections list possible changes from one triad to another.

There are three different kinds of connec-

tions emanating from a node, each repre-senting a different type of transformation from one triad to another. The nature of these transformations is not relevant to the point to be made below. What is important is that in this system each direct link from one node to another represents an efficient voice leading. This is because each con-nection links two triads that share two notes, even though they are opposite in mode (i.e. minor vs major). In other words, Hook’s (2006) representation is similar to the scale representations that we considered earlier, because nodes that are closer together in his lattice are similar in the sense that they share pitch-classes. Figure 4-12 gives a sense of the structure of this triad Tonnetz, emphasizing positions of minor triads.



Figure 4-12. The structure of Hook’s (2006) Tonnetz for triads. This illustration highlights the posi-tions of minor triads in the Tonnetz; each solid grey circle represents the position of a different ma-

jor triad. The distance relationships amongst the

minor triads in Figure 4-12 are reminiscent of the distance relationships amongst minor keys in the three-dimensional MDS solution that was presented in Figure 4-10. In that figure there were three different squares, each containing four different harmonic mi-nor scales whose roots were either a minor third or a tritone apart (e.g. Table 4-2). In Figure 4-12 each of these squares is trans-formed into a horizontal row of minor triads.

The key property of triads within a row in

Figure 4-12 is that nearest neighbors are related by a short musical interval (a minor third); the nearest neighbors of Bm are G#m and Dm. Furthermore, triads that are relat-ed by a longer musical interval are not near-est neighbors. For instance, Bm and Fm are further apart in the triad Tonnetz.

We are now in a position to contrast the

spatial arrangement of triads in Figure 4-12 with the hidden unit space of Figure 4-3. That hidden unit space obviously represents the positions of musical scales. However,

for harmonic minor scales we saw in Section 4.5.2 that their position in the hidden unit space was dictated by the three unbalanced pitch-classes. But the three unbalanced pitch-classes for a harmonic minor scale define the scale’s minor triad. This means that the positions of harmonic minor scales in the hidden unit space can be also inter-preted as representing the position of minor triads. In other words the proximity relations amongst minor scales in Figure 4-3 can be compared to the proximity relations amongst minor triads in Figure 4-12.

The proximity relations amongst minor

triads in the hidden space are quite different than those in the triad Tonnetz. In particu-lar, minor triads that are further apart in Fig-ure 4-12 occupy identical locations in the hidden unit space (e.g. Bm and Fm). In ad-dition, nearest neighbors in the triad Tonnetz are quite far apart in the hidden unit space. For instance, G#m and Dm, the two triads closest to Bm in Hooks’ space are the fur-thest away from Bm in the hidden unit space. Of course, the fact that the hidden



unit space arranges minor triads in a line, while Hooks’ Tonnetz arranges them in a grid, is another crucial difference between the two.

None of these differences between the

two spaces should be particularly surprising, because they mirror the differences that we have already explored between the hidden unit space and the various geometric repre-sentations of scales and keys that were de-tailed in Section 4.5.2. The reasons for these differences remain the same as well: hidden unit space locations reflect tritone structure, while Hooks’ (2006) triad Tonnetz bases proximity in terms of shared pitch-classes.

If the hidden unit space can be interpret-

ed as establishing certain proximity relations amongst minor triads, and these differ from those in traditional geometric models of harmonic progression or voice leading, then what does the hidden unit space imply about chord changes?

The key feature of the hidden unit space

is that two minor triads that are a tritone apart have the same location, and therefore are equivalent. In terms of progressions, this suggests that one can easily change from one triad to another that is a tritone away. Interestingly, this implication of the hidden unit space parallels the tritone’s rele-vance to jazz.

One common technique to add chord va-

riety to jazz changes is to use chord substi-tutions. In this practice one chord in the changes is replaced by another chord that is musically related to it. For example in tri-tone substitution a dominant 7th chord in one key is replaced with the dominant 7th chord from a key that is a tritone away. This is possible because both dominant 7th chords contain the same two notes a tritone apart, making the original changes harmonically

similar to the changes created by the tritone substitution (Tymoczko, 2008).

Figure 4-13 illustrates tritone substitution

for one of the most popular jazz changes, the ii-V-I progression (Levine, 1989). (This particular example uses tetrachords instead of triads to set the stage for later chapters.) The example presents the three chords for this progression in the key of C major. The first two bars provide the standard chords. The dominant 7th chord in this first sequence (G7) contains two pitch-classes a tritone apart (B and F). The last two bars provide the same progression, but with a tritone substitution. The G7 chord from the original

progression has been replaced with D♭7, the

dominant 7th chord from the G♭ major scale, a scale whose root is a tritone away from the

root of C major. D♭7 also contains B and F, the same tritone as that contained by G7 (in Figure 4-13 the B is written as the enhar-

monically equivalent C♭ to be consistent with notation for the G♭ major scale).

Tritone substitution is possible (i.e.

sounds musically correct) in jazz because the two dominant 7th chords contain exactly the same tritone, and the two keys a tritone apart share enough pitch-classes to make them harmonically similar (Tymoczko, 2008). In certain respects the multilayer perceptron is detecting analogous structure in harmonic minor scales, organizing them in such a way that the nearest neighbor to a minor scale in the hidden unit space will be a scale with identical underlying tritone structure. Interpreting these scale positions as being the positions of minor triads reveals that the hidden unit space of the multilayer perceptron may have a special affinity for tritone substitution!

Figure 4-13. Tritone substitution in the ii-V-I progression. See text for details



4.6 References

Amari, S. (1967). A theory of adaptive pattern classifiers. Ieee Transactions on Electronic Computers, Ec16(3), 299-307.

Anderson, J. A. (1995). An Introduction to Neural Networks. Cambridge, Mass.: MIT Press.

Brower, C. (2008). Paradoxes of pitch space. Music Analysis, 27(1), 51-106. doi: 10.1111/j.1468-2249.2008.00268.x

Brown, H., & Butler, D. (1981). Diatonic trichords as minimal tonal cue-cells. In Theory Only, 5(6 & 7), 37-55.

Browne, R. (1981). Tonal implications of the diatonic set. In Theory Only, 5(6 & 7), 3-21.

Butler, D. (1989). Describing the perception of tonality in music: A critique of the tonal hierarchy theory and a proposal for a theory of itervallic rivalry. Music Perception, 6(3), 219-242.

Dawson, M. R. W. (2004). Minds And Machines: Connectionism And Psychological Modeling. Malden, MA: Blackwell Pub.

Dawson, M. R. W. (2005). Connectionism : A Hands-on Approach (1st ed.). Oxford, UK ; Malden, MA: Blackwell Pub.

Dawson, M. R. W. (2013). Mind, Body, World: Foundations Of Cognitive Science. Edmonton, AB: Athabasca University Press.

Dawson, M. R. W., & Schopflocher, D. P. (1992). Modifying the generalized delta rule to train networks of nonmonotonic processors for pattern classification. Connection Science, 4, 19-31.

Douthett, J., & Steinbach, P. (1998). Parsimonious graphs: A study in parsimony, contextual transformations, and modes of limited transposition. Journal of Music Theory, 42(2), 241-263. doi: 10.2307/843877

Gollin, E. (1998). Some aspects of three-dimensional Tonnetze. Journal of Music Theory, 42(2), 195-206. doi: 10.2307/843873

Hook, J. L. (2002). Hearing with our eyes: The geometry of tonal space. Paper presented at the Bridges: Mathematical Connections in Art, Music, and Science, Towson, Maryland.

Hook, J. L. (2006). Exploring musical space. Science, 313(5783), 49-50. doi: 10.1126/science.1129300

Huron, D. B. (2001). Tone and voice: A derivation of the rules of voice-leading from perceptual principles. Music Perception, 19(1), 1-64. doi: 10.1525/mp.2001.19.1.1

Huron, D. B. (2006). Sweet Anticipation: Music and the Psychology of Expectation. Cambridge, Mass.: MIT Press.

Krumhansl, C. L. (1990a). Cognitive Foundations Of Musical Pitch. New York: Oxford University Press.

Krumhansl, C. L. (1990b). Tonal hierarchies and rare intervals in music cognition. Music Perception, 7(3), 309-324.

Krumhansl, C. L. (1998). Perceived triad distance: Evidence supporting the psychological reality of Neo-Riemannian transformations. Journal of Music Theory, 42(2), 265-281. doi: 10.2307/843878

Krumhansl, C. L. (2005). The geometry of musical structure: A brief introduction and history. ACM Computers In Entertainment, 3(4), 1-14.

Kruskal, J. B., & Wish, M. (1978). Multidimensional Scaling. Beverly Hills, CA: Sage Publications.

Levine, M. (1989). The Jazz Piano Book. Petaluma, CA: Sher Music Co.

Lewin, D. (1998). Some ideas about voice-leading between Pcsets. Journal of Music Theory, 42(1), 15-72. doi: 10.2307/843852

Lippmann, R. P. (1989). Pattern classification using neural networks. IEEE Communications magazine, November, 47-64.

McClelland, J. L., & Rumelhart, D. E. (1986). Parallel Distributed Processing, V.2. Cambridge, MA: MIT Press.



Meyer, L. B. (1956). Emotion and Meaning in Music. [Chicago]: University of Chicago Press.

Morris, R. D. (1998). Voice-leading spaces (Networks of pitch-classes that model voice-leading). Music Theory Spectrum, 20(2), 175-208. doi: 10.1525/mts.1998.20.2.02a00010

Piston, W. (1962). Harmony (3d ed.). New York,: W. W. Norton.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986a). Learning internal representations by error propagation. In D. E. Rumelhart & G. E. Hinton (Eds.), Parallel Distributed Processing (Vol. 1, pp. 318-362). Cambridge, MA: MIT Press.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986b). Learning representations by back-propagating errors. Nature, 323, 533-536.

Rumelhart, D. E., & McClelland, J. L. (1986). Parallel Distributed Processing, V.1. Cambridge, MA: MIT Press.

Schoenberg, A. (1969). Structural Functions Of Harmony (Rev. ed.). New York,: W. W. Norton.

Shepard, R. N., Romney, A. K., & Nerlove, S. B. (1972). Multidimensional Scaling: Theory And Applications In The Behavioral Sciences. Volume I: Theory. New York, NY: Seminar Press.

Straus, J. N. (2005). Voice leading in set-class space. Journal of Music Theory, 49(1), 45-108. doi: 10.1215/00222909-2007-002

Temperley, D. (2007). Music and Probability. Cambridge, Mass.: MIT Press.

Tymoczko, D. (2006). The geometry of musical chords. Science, 313(5783), 72-74.

Tymoczko, D. (2008). Scale theory, serial theory and voice leading. Music Analysis, 27(1), 1-49. doi: 10.1111/j.1468-2249.2008.00257.x

Tymoczko, D. (2011). A Geometry Of Music: Harmony And Counterpoint In The Extended Common Practice (E-pub ed.). New York: Oxford University Press.

Tymoczko, D. (2012). The generalized Tonnetz. Journal of Music Theory,

56(1), 1-52. doi: 10.1215/00222909-1546958

Werbos, P. J. (1994). The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting. New York: Wiley.

chapter 4: distinguishing major scales from minor …

Documents