towards a hybrid assessment model for...

16
81 TOWARDS A HYBRID ASSESSMENT MODEL FOR MUSIC CONSERVATORY ENTRANCE EXAMS OZAN BAYSAL, BARIŞ BOZKURT, TURAN SAĞER, NILGÜN DOĞRUSÖZ Ozan Baysal1, Barış Bozkurt2, Turan Sağer3, Nilgün Doğrusöz1 1 Istanbul Technical University, Turkey; 2 Universitat Pompue Fabra, Barcelona, Spain; 3 Yıldız Technical University, Turkey. Abstract This paper discusses the necessity for employing Music Information Retrieval (MIR) Technologies in Music Conservatory Entrance Examinations. In Turkey, acceptance to a music conservatory is determined through a musical aptitude examination that is usually conducted by a jury committee. While the contents of this exam has become a standard –including mostly questions on pitch recognition and melody/rhythm repetition -, factors such as the amount of time and energy devoted to the exam, differences of assessment criteria between jury members and the usage of limited set of manually constructed question packages (to avoid any leaking of the exam outside) present some shortcomings for a standardized evaluation of applicants. Although there has been a good deal of research made regarding this issue, these researches investigate solely the reliability scores of jury committees, while not making a sound analysis of the applicant performance recordings and comparing them with the jury scores. Our talk will present the findings of such a research project that compares jury scores with performance recordings. At the end we would be proposing a hybrid assessment model, MAST (Musical Aptitude Standard Test), which we believe would significantly contribute to the quality of measurement and evaluation while consuming less resources in music conservatory entrance exams. Keywords: Conservatory Entrance Exams, Musical Aptitude Tests, Musical Competence, MAST, Musical Aptitude Standard Test, Music Performance Assessment

Upload: phamdang

Post on 21-Aug-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

81

TOWARDS A HYBRID ASSESSMENT MODEL FOR MUSIC CONSERVATORY ENTRANCE EXAMS

OZAN BAYSAL, BARIŞ BOZKURT, TURAN SAĞER, NILGÜN DOĞRUSÖZ

Ozan Baysal1, Barış Bozkurt2, Turan Sağer3, Nilgün Doğrusöz1

1 Istanbul Technical University, Turkey;

2 Universitat Pompue Fabra, Barcelona, Spain;

3 Yıldız Technical University, Turkey.

Abstract

ThispaperdiscussesthenecessityforemployingMusicInformationRetrieval(MIR)TechnologiesinMusicConservatoryEntranceExaminations.InTurkey,acceptancetoamusicconservatoryisdeterminedthroughamusicalaptitudeexaminationthatisusuallyconductedbyajurycommittee.Whilethecontentsofthisexamhasbecomeastandard–includingmostlyquestionsonpitchrecognitionandmelody/rhythmrepetition-,factorssuchastheamountoftimeandenergydevotedtotheexam,differencesofassessmentcriteriabetweenjurymembersandtheusageoflimitedsetofmanuallyconstructedquestionpackages(toavoidanyleakingoftheexamoutside)presentsomeshortcomingsforastandardizedevaluationofapplicants.Althoughtherehasbeenagooddealofresearchmaderegardingthisissue,theseresearchesinvestigatesolelythereliabilityscoresofjurycommittees,whilenotmakingasoundanalysisoftheapplicantperformancerecordingsandcomparingthemwiththejuryscores.Ourtalkwillpresentthefindingsofsucharesearchprojectthatcomparesjuryscoreswithperformancerecordings.Attheendwewouldbeproposingahybridassessmentmodel,MAST(MusicalAptitudeStandardTest),whichwebelievewouldsignificantlycontributetothequalityofmeasurementandevaluationwhileconsuminglessresourcesinmusicconservatoryentranceexams.

Keywords:ConservatoryEntranceExams,MusicalAptitudeTests,MusicalCompetence,MAST,MusicalAptitudeStandardTest,MusicPerformanceAssessment

82

Introduction

Amusicalaptitudeexaminationisageneralrequirementwhenapplyingtoamusicconservatoryschool.Aimingtotestandmeasurethemusicalproficiencyofanapplicant,therearevariouskindsofdifferentapproachesinhowtomeasuremusicalcompetence.ThispaperwouldpresentthepotentialbenefitsofemployingMusicInformationRetrieval(MIR)TechnologiesinMusicConservatoryEntranceExaminations.Inthefirstpart,abriefoverviewoftwomainapproachesinmeasuringmusicalproficiencywillbepresented;(i)standardizedtestformat,and(ii)jurycommitteeevaluations.Bothoftheapproacheshavetheirownadvantagesandshortcomings,butinTurkeyitisusuallythejurycommitteeevaluationsthatarepreferredinmusicconservatoryschoolexams.Thecontentsofthiskindofanexaminationhasbecomeastandard–includingmostlyquestionsonpitchrecognitionandmelody/rhythmrepetition-,yetsinceitinvolvesajurycommittee,itmaypresentsomeshortcomingsforastandardizedevaluationofapplicants.Althoughtherehasbeenagooddealofresearchmaderegardingthisissue,theseresearchesinvestigatesolelythereliabilityscoresofjurycommittees,whilenotanalyzingtheapplicant’sperformancerecordingsthroughmusic/soundtechnologiesandcomparingthemwithjuryscores.Thesecondpartofthepaperwillpresentthefindingsofsucharesearchprojectthatcomparesthejuryscoreswiththeanalysistakenfromperformancerecordingsviasoundengineeringtools.Thispartwillrevealtheexistenceofdifferentassessmentcriteriabetweenjurymembers.Inaddition,itwilldemonstratetheproblemofusinglimitedsetofquestionpackagesfordifferentapplicants(toavoidanyleakingoftheexamoutside).Thus,thescopeofthisessayislimitedwiththeneedofusingnewtechnologicaltoolsasanaidforthejurycommittees.Attheendwewouldbeproposingahybridassessmentmodel,MAST(MusicalAptitudeStandardTest),whichwebelievewouldsignificantlycontributetothequalityofmeasurementandevaluationwhileconsuminglessresourcesoftimeandenergyinthemusicalhearingportionoftheconservatoryentrance.Ourgoalistopresentsupportingandpracticalmechanismsinordertomaketheexamsasefficientaspossible.

Musical Aptitude Tests & Music Conservatory School Examinations

Onecancategorizethemethodsofmeasuringmusicalproficiencyduringamusicconservatoryschoolexaminationundertwomainheadings;

Standardizedteststhatareusedtodeterminevariousdimensionsofauralability,

Auditionprocessesinwhichabilitiesonmusicalperceptionandmusicalexpressionareevaluatedbyajurycommission.

Standardized Tests

Standardizedtestsaredesignedtomeasuretheauralabilitiesintheperceptionofvariousmusicalelements.Theseexamsareinmultiple-choicetestformatinwhichtheapplicantsareexposedtosoundcomingfromspeakers(orheadphones)andareexpectedtoanswerquestionsregarding

83

abstractedmusicalelements-suchasvolume,dynamics,musicalinterval,timbre,texture,tempo,rhythm,melodyandharmony–bymakingcertaincomparisonsanddiscriminations.ThemostknownexamplesofthismethodareSeashoretest,Wingtest,BentleytestandGordontests(includingGordonMAP,GordonPMMAandGordonIMMA).InhischapteronMusicalAptitudeTests,Tarmangivesadetaileddiscussionofthesetestdesigns(Tarman,2016:103-113).SimilardesignshavealsobeenimplementedinTurkeysuchasDYT(“DenemeYetenekTesti”–AptitudeTrialTest)(Göğüş,1994),MÖZYES(“MerkeziÖzelYetenekSınavı”–CentralSpecialTalentExam)–which,accordingtoTarmanwasimplementedonlytwiceduringtheexamsof1994-1995and1995-1996(Tarman,2016:113)-,OMÜ-MAT(“OndokuzMayısÜniversitesiMüzikselAlgılamaTesti”–OndokuzMayısUniversityMusicalAptitudeTest)(Ibid.114),andMAÖ(“MüzikselAlgılamaÖlçeği”–MeasureofMusicalPerception)(AtakYayla,2009:372-377).Therearetwomainadvantagesofthistypeofmultiplechoicetests;firstofall,sinceeachapplicantisaskedthesamequestion,thesamewayandevaluatedequally,theirevaluationresultsaremuchmoreobjectivewhencomparedtothatofjurycommitteeevaluations.Secondly,theyusemuchlesstimeandenergy;anassignedexamsuperintendentcancarryontheexamprocedureinaroomoraconferencehallwithasmanyapplicantsaspossibleatthesametime,andthemultiplechoiceanswersheetscanbequicklyprocessedlaterthroughanopticalreader.Yet,besidesthesetwoimportantadvantages,theusefulnessofthesetestdesignsarealsoopentodebate.Thefirstproblematicistheirmultiple-choicenature;someoftheseexamshavequestionsthatonlyprovidetwochoices,thustheapplicanthasa50%chancetoscorecorrectevenif(s)hedoesn’thaveanyideaabouttheanswer.Theprocessedsoundsthatareplayedbackduringtheexamandtheacousticsoftheexamspaceareotherissues;someofthesequestionsuseunnaturalsounds(suchasanoscillatororaMIDI)whichalsoresultinanalienatednaturefrommusicality,andthespeakersystemplacedwithintheroom/hallmightcauseindividualdifferencesinperceptionofsoundsaccordingtotheacousticsofthespace.However,probablythemostimportantfactoristhat,althoughthesetypesofexamsmaymeasuretheindividualauralabilitiesofapersontosomedegree,itisstillaquestionwhethertheseabstractedabilitiescorrespondtoapotentialformusicality.(Togiveanexample;fromtheresultsoftheirdesignedtests,AtakYaylaandYayla(2009)investigatedthepredictivepoweroftheirtestresultswiththemusicaltalentofthosewhotookthetest.Theresults,althoughtheywereinpositivecorrelation,showedamedium-lowlevelrelationship(r=0,483,r2=0,234))

Thatiswhythesetests–iftheyareused–arepreferredmoreasaqualificationexaminTurkeyandhaveafilteringfunction;onceanapplicantpassestheseexams,(s)heisentitledtoenterthefinalentranceexam,whichisheldbyajurycommittee.

Jury Committee Based Exams

Althoughthedesignofthejuryexams-inwhichtheapplicant’smusicaltalentsareevaluatedbyassignedjurycommittees-varyaccordingtotherespectiveinstitutionspreferences;theyareusuallyevaluatedwithintwomaincriteria:pitchrecognition(includingsinglepitches,intervalsandchords),musicalmemory(bothmelodicandrhythmic).Ineachofthese,thecandidateisrequiredtosingorplaybackwhathasbeenplayedforherwiththepianoreference.Therecanalsobeadditionalquestionssuchasmelodic/rhythmicdictationand/ormelodic/rhythmicsightsinging,howeverasthesequestionsalsorequireamusicalknowledgebesidestalent,theyusuallyarenotencounteredinthequalification(first)examsthatfulfillafilteringfunction(iftheentranceexamshavetwo-tiers).Jury-basedexaminationsystemsaremuchmorepreferredinTurkeythanthestandardizedmultiple-

84

choicetests.Anation-widesurveyamongFineArtHighSchool’smusicdepartmentteachersthatwascarriedoutbyYağcı(Yağcı,2010:228)duringthe2006-2007educationyearshowedthat9.2%ofthesurveyorstotallyagreedwiththeeffectivenessofthejurybasedsystem,while44.6%wereinagreementtoalargeextentand40%partiallyagreed.Therest,6.2%thoughtthattheeffectivenesswasverylittle.Thus,onecansaythatmostoftheteachersnationwidebelievedintheefficiencyofthissystem.Nevertheless,heldonalimitedtimewithnumerousapplicants,thesejury-basedexamsalsobearmanydifficultiesastheyrequiretheevaluationofeachcandidateseparately.Togiveanexample,inthe2015musicalentranceexamsofITUTurkishMusicStateConservatoire,5differentjurycommissions,eachconsistingof3people,separatelyevaluated507candidatesin3fulldays.Ascanbeseentheamountofhumanresource,aswellastimeandenergydevotedtothisprocessissignificantlyhigh.Someoftheshortcomingsofthisexamtypeisalsorelatedwiththisaspect,sinceapersonmaynotbeabletokeepthesameefficiencythroughoutsuchalongandtiringprocess.Thereisalsothepossibilityofdifferentjurycommitteesdevelopingdifferentcriteriaforassessmentduringtheexamperiod;thattheirreferenceperformances(examquestions)mayshowdifferences(intermsofvolume,tempoandaccentuation);thatthejurymembersmayinfluenceeachother.Inaddition,theusageoflimitednumberofmanuallycreatedquestionpackagesinsomecases(toavoidanyleakingoftheexamoutside)mayproducedoubtsabouttheequalityofthedifficultyleveloftheexamamongallapplicants.Suchpotentialobstaclestoanobjectiveandastandardizedmeasurementarethemaindisadvantagesandthedrawbacksofthissystem.Testingofjuryreliabilitiesfromthejuryscoresheetsatfirstseemstoofferacontrolmechanism(asseeninAtılgan(2008),Ece&Kaplan(2008),Tarman(2016:90)…etc.),yet,asTarmanalsounderlines,ahighreliabilityscoredoesnotnecessarilymeanthatthejurymemberhadactedindependentlyand/orevaluatedobjectivelyorconsistently(Tarman,2016:118)

(Surelyonecanavoidsuchpitfallsbysomeimprovementssuchasincreasingthenumberofjurymembersinacommittee,isolatingeachjurymemberfromeachother-sothattheywouldnotknowthescoresofothermembers-,allowinglongertimeintervalsforthejurytorestinbetweensessions…etc.AsimilarimprovedsystemisusedinthemusicentranceexamsofYıldızUniversityDepartmentofMusicandPerformingArtssincetheeducationalyearof2016-2017.Here,exceptfortheheadofthejurycommittee,eachjurymemberisisolatedfromeachother,andentertheirscorestoacomputertheyuseindividually.Whenthescoringoftheapplicantisfinished,theheadofthejurycommitteechecksthevariancesbetweenthejurymembers,andiftherearehugedifferencesaskthemtoreconsiderscoringbyplayingtherecordedversionoftheapplicant’sperformance.Yet,asitisclearfromthisexample,anyofsuchimprovementsalreadyresultwithadditionalcosts.)

.Inordertocheckthosefacts,onealsoneedstoanalyzetheapplicant’sperformancerecordingsthroughmusic/soundtechnologiesandcomparethemwiththejuryscores.Thus,atthispointtheusageofMusicInformationRetrieval(MIR)technologies,whichoffersmanyapproachesforautomaticanalysisofrecordedsounds,mightbeasolutiontoovercomesuchdisadvantages.Thesecondpartofthepaperwillpresentthefindingsofsucharesearchprojectwhichinvestigatedtheeffectivepotentialityofusingsoundengineeringtoolsinthemusicalhearingportionofthemusicalaptitudeexams.

Research Findings Concerning the Standardness of the Jury Based Exams

Thispartwillpresenttwoimportantfindingsofatwo-yearresearchproject(May2016–May2018)thatinvestigatedthepotentialofusingsoundengineeringtoolsinthemusicalhearingportionofthe

85

musicalaptitudeexams.Ingeneral,theprojecttestedthesuccessofusingsuchtechnologicaltoolsinevaluatingtherecordedsoundsofthecandidatesbycomparingthejuryevaluationswithcomputationalanalysesofthecandidates’examperformancerecordings.Thejuryevaluationreports(ofthequalificationexamsofyears2015,2016and2017)andtheexamrecordings(ofyears2015and2017)wereprovidedbyIstanbulTechnicalUniversityTurkishMusicStateConservatoryMusicTheorydepartmentwiththepermissionoftheconservatorydirectorate.Asthemaingoalwastomakethequalificationexamsasefficientaspossible,theprojectteamalsodiagnosedsomepreviouslyunobservedflawsaboutthequestionpackagesandofferedsomeimprovementsfortheexampreparationcommittee.Besidesthis,themostnoticeablefindingwasthatalthoughtheindividualreliabilityscoresofthejurycommitteeswerehigh(basedonthejuryreports),ourcomputationalanalysesshowedthateachjurycommitteeweredevelopingdifferentcriteriaespeciallywhenevaluatingmelodicmemorysections;whichbringstomindTarman’sdoubtsabouttheindependencyofthejurymembersinajurycommittee(Ibid).Belowwewillbesharingthesetwomainfindingsthatmaycompromisethestandardnessofthejurybasedexams.

Problems about Different Question Packages

Asitwasstatedearlier,nearlyalljury-basedexamsinTurkeysharetwomaincriteria:pitchrecognition(includingsinglepitches,intervals,triads)andmusicalmemory(melodicandrhythmic),althoughtherealsomightbesomeextensions(sightsinging,dictationormusicalperformance).Duetoahighnumberofapplicants,someoftheseinstitutionspreferatwo-tierentranceexam,inwhichthefirstexamtestssolelythepreviouslymentionedmusicalabilitiesandfunctionsmoreasaqualificationforthefinalentranceexam.Thusthefirst(qualification)exam,althoughittakeslesstimeforeachapplicant,isalongprocessthatisconductedbydifferentjurycommitteesworkingsimultaneouslywithinmultipledays.Suchasettingrequiresadditionalprecautionsregardingtheconfidentialityofthequestionsaskedintheexam.Oneoftheseprecautionsisdesigningtheexamwithvariousquestionpackages;eachpackagehavingitsownsetofdistinctquestionsaboutpitchrecognition,melodyandrhythm-thusminimizingthechanceofaleakageofthequestionsoutside(i.e.memorizationofamelodybyamorespecializedapplicantandsingingitbackoutsidetoherfriendsthatarewaitingfortheirturn).Yet,suchaprecautionmayalsocreateotherproblems,suchasdifferencesbetweenthequestionpackagesintermsoftheirdifficultylevel.Itisimportanttonotethat,thequalificationsfromtheseexamsarenotdeterminedaccordingtoarankingsystemexam,theapplicantsshouldscoreatleastaboveacertainpercentage;sotheexampreparationcommitteetakesthispercentageofsuccessintoconsiderationnottheranking,andpreparesthequestionsaccordingly.Thustheapplicantsareexpectedtobesuccessfulabovesuchapredeterminedscoreregardlessofwhichquestionpackageisused.However,evenamildvariationbetweentwoquestionpackagesmayproduceamplifiedandsignificantdifferencesinapplicantperformancesduetounpredictablefactors(applicantbackground,examanxietyandindividualcapabilities…etc.).Bringingtheexamclosertoanideallystandardlevelstartsfromtheequaldistributionofquestiondifficultiesamoungvariousquestionpackages.

Thenumberofapplicantswehadanalyzedthejuryevaluationsare;365peoplefromthequalificationexamof2015,456peoplefrom2016and451peoplefrom2017.Thereliabilityscoresofthejury

86

committeeswereingeneralveryhighascanbeseenfromTable1,whichwillbediscussedinthenextsection.Thecontentsoftheseexamsareasfollows;

PitchRecognition

SinglePitchRecognition(x5)

IntervalRecognition(x5)

Triads(x4)

MusicalMemory

MelodicMemory(Tonal&Modal;onequestionforeach)

RhythmicMemory(Straight&Aksak;onequestionforeach)

Table1:2015-2017AnalizedReports:JuryReliabilityScoresUsingVariousMeasurementTests

87

Table2:2015&2016ExamsANOVATests–Successvs.QuestionPackages

88

Table2presentstheANOVAresultsobtainedfromthe2015and2016tests,consideringthepossibleeffectofusing10differentquestionpackagesonthesuccessoftheapplicants.Generallyspeaking,boththeFvaluesandthepvaluessuggestthat,foreachcategorythemeansuccesspercentageissignificantlydifferentforatleastoneofthequestionpackages.Especiallythemelodicandrhythmicmemorycategorieswerethemostproblematicinthissense.Thus,consideringthe“Total_Success”category,whichistheexamscoreoftheapplicants,onecanconcludethatthetestscoreofanapplicantwasalsodependentonwhichquestionpackageshewasevaluatedaccordingto.However,thissurelydoesn’tmeanthedependencyofpassing/failingtheexamtothequestionpackages.Table3presentsthesameeffectontheexamqualifications(forthe2015examthequalificationscorewas60%,forthe2016examitwas50%).Weobservethat,bothin2015and2016,therewasn’tanysignificantrelationshipbetweenthepassing/failingofanapplicantwithherassignedquestionpackage(p>0,05forboth).

Table3:2015&2016ExamsANOVATests–Pass/Failvs.QuestionPackages

Withtheseinformationanddataourresearchteamdesignedandconductedanexperimentfollowingtheexamof2016.Theexperimentwasmodeledfromthequestionsof2016exam,anditsaimwastotestthedegreeofvariationbetweenthequestionpackagesamongthemusicconservatorystudents–thesearethosepeopleweassumethequestionpackagechoicedoesnotplayaroleinthesuccessofthecandidate.Theinformationthatwouldbeobtainedfromthisresearch,inadditiontothepreviousdata,wouldnotonlyhelpusunderstandthedifficultylevelsandtheeaseofperceptionofthequestionsbutmayalsosuggestimprovementsforourquestiondesigns.Wehaveconductedtheexperimentwith26studentsfromMusicologyandMusicTheorydepartments.Thequestionsthatwereusedwerefromthreequestionpackagesusedin2016qualificationexams;thosehavingtheaverage(assignedaspackage#1),lowest(#2)andhighest(#3)amountofsuccessesfromeachcategory–thustheexperimentwasalsocheckingtheresultsof2016qualificationexams.TheMASTexperiment(MusicalAptitudeStandardTest)wasconductedintheMusicologylabindividuallywith

89

usageofcomputers,headphonesandmicrophones.SimilartoaTOEFLexam,theparticipantswereaskedtofollowtheinstructionsappearingonthescreeninfrontofthem.ThequestionswereplayedfromMIDIformatsandtheparticipantswereaskedtosing/performwhattheyheardontheheadphonestothemicrophoneontheirdesks.Meanwhile,oneoftheresearchersoftheprojectwasrecordingtheresponsesoftheparticipantsonadifferentcomputer.Thustherewasnojurycommitteepresentintheroom;theresearchteamlatercompiledtherecordingsandsentthemtoajurycommitteeforevaluating.Aftertheexperiment,theparticipantsalsofilledoutasurveyregardingtherelativeefficiencyofthisexamsystemwhencomparedtoalivejurycommitteesystem.Outof26people6preferredthejurysystem(23%),6wereindifferentbetweenthetwosystems(23%),while14people(54%)foundthissystembetterthanthejurybasedsystemandwrotethatsuchanenvironmenthadapositiveeffectontheirefficiency.WeshouldalsonotethattheMusicologyLabinwhichtheexperimentwasconductedhadapoorsoundisolation,andthatthe6participantswhopreferredthejurysystemalsowroteintheircommentsthattheywereconfusedduetonoisecomingfromoutsidetoroomifnottheyfeltstrangeinsuchanisolatedexamenvironment.ThewholeMASTexperimentprocess,includingtheintroduction,theexperimentandthesurveytookaround15to20minutesforeachparticipant.

Table4:2016MASTExperimentANOVATest–Successvs.QuestionPackages

90

Table4presentstheANOVAtestresultsoftheMASTexperimentthatwasconductedusingonlythreepackagesfromthequalificationexamsof2016.Whatisnoticeableisthatinthepitchrecognitionpart(singlepitch,intervalandtriadidentification),thereisnorelationshipbetweentheassignedquestionpackagesandthedegreeofsuccess.Inotherwords,thelevelofdifficultyof2016questionpackagesweredesignedforthequalifiersbasedonourassumptionthatourexperimentparticipants–alreadybeingconservatorystudents-arepotentialqualifiersoftheexam.Ontheotherhand,theresultsofthemelodicandrhythmicmemorysectionscameoutparallelwiththatof2016qualificationexamresults.AscanbeseenfromFigure1&Figure2,thequestionsofpackage#2–whichwereselectedfromthequestionpackageswiththelowestamountofsuccessin2016-inallfourcategories(melodies1&2,rhythms1&2)gotthelowestscoreaswewell.

Figure1:2016MASTExperiment:MeansPlotsforMelodyQuestionsvs.QuestionPackages

Figure2:2016MASTExperiment:MeansPlotsforRhythmQuestionsvs.QuestionPackages

Astheseresultsconfirmedtheresultsofthe2016melodyandrhythmquestionpackages,ourresearchteamanalyzedthepossiblefactorsthatmaycausesuchadifferenceintermsofthelevelof

91

difficulty.Werealizedthat,althoughallthemelodyquestionsweretwomeasureslong,andalltherhythmquestionsonemeasurelong;factorssuchasnumberofnotes,range/ambitus,shapeofthemelody,theproportionofmelodicstepswithmelodicleaps,theperiodicityandthefamiliarityofthepassagemayalsobecontributingtothisdifferentlevelsofdifficulties.Thus,wedesignedarubricforpreparingmelodyandrhythmquestionsandproposedittotheexampreparationcommitteebeforethepreparationof2017qualificationexams.Theguidelineswehadproposedwereasfollows;

MelodyQuestions

Eachmelodygroupshouldusethesamenumberofnotes,

Thesamenoteshouldnotbeusedconsecutively,

Eachmelodygroupshouldhavethesamerhythmicvalues,

Eachmelodygroupshouldhavethesamerange/ambitus,

Eachmelodygroupshouldhavethesametimesignature,

Eachmelodyshouldbetwomeasureslong,

Eachmelodygroupshouldhavethesametonality,

Themelodiesineachgroupshouldstartandendwiththetonic,

(Fortonalmelodies)Themelodiesshouldimplyasimilarharmonicbackground(suchasI–ii–V7–I),

(Formodalmelodies)Themelodiesshouldhaveasimilarmodalprogression.

RhythmQuestions

Eachrhythmshouldusethesamerhythmicmotifsarrangedindifferentorders(likea-b-c-d;b-a-c-d;c-a-b-d;d-a-b-c;a-c-b-d…etc.),

Therhythmsshouldnotcontainorimplyaperiodicstructure(likea-b-a-c),

Eachrhythmgroupshouldhavethesametimesignature,

Eachrhythmshouldbetwomeasureslong.

Besidesthesewealsosuggestedtheintervalandtriadquestionsbedesignedinsuchawaythatnotonlytheircontentbuttheirorderbearrangedinsimilarways.Table5presentstheresultsofthe2017qualificationexamregardingtherelationshipbetweensuccessandquestionpackages.

92

Table5:2017ExamANOVATest–Successvs.QuestionPackages

TheobservabledecreaseintheFvaluesandtheincreaseinp valuesinthepitchrecognitionsection(singlepitches,intervalsandtriads),whichnowsuggestsnorelationshipbetweenthequestionpackageswiththesuccessinthesecategoriesdemonstratesthebenefitsofusingthesamecontentinthesameorder–thusjusttransposedversionsofthesamequestion–intheintervalandtriadsections.Itseemsthattheguidelineswehadproposedearlierforthepreparationofmelodyandrhythmquestionsdidnothaveanypositiveeffectsonthesesectionsthough,astheamountofsuccessinallfourcategories(melody1,melody2,rhythm1andrhythm2)stillshowadependencywithatleastoneofthequestionpackages(p =<0.001ineach).FromtheBonferronipost-hoctestswespottedthemeansofoneofthemelodiesfrommelody1package,twoofthemelodiesfrommelody2packageandmorethantwooftherhythmsintherhythm1andrhythm2packagesweresignificantlydifferentthanthemeansofthecorrespondingquestionsoftheotherpackages.However,forthefirsttime,thetotalscoregainedfromtheexamresultedwithaninsignificantrelationship;withthevalues

F=1,538andp=0,132.Inadditiontothis,ascanbeobservedfromTable6,therewasnosignificantrelationshiponthepassing/failingoftheexamwiththequestionpackage(forthe2017examthe

qualificationscorewas50%);withthevaluesF=0,951andp=0,480-thelowestFandthehighestpvaluesthroughouttheresearchsofar.Thereasonforthisisprobablythefactthatthesignificantlydifferentquestionsintermsofthelevelofdifficultyweredispersedamongdifferentquestionpackagesforeachcategory;ex.apackagecontainingmelody1withthelowestmean,whereasthemeanofthemelody2fromthesamepackagehadahighmean.Thus,suchresultsofthe2017qualificationexamsmaybeanoutcomeofacoincidence.

93

Table6:2017ExamANOVATests–Pass/Failvs.QuestionPackages

Itseemsthatmoreresearchisneededintheareaofmusicperceptionforthestandardizationofsuchmelodyandrhythmquestions.Thepossibleflawsofusingdifferentquestionpackagescouldbesignificantlyminimizedusingsuchquestionpreparationguidelineswithgenerativerubrics.Inadditiontothis,inordertoincreasetheamountofdispersion,multiplequestions–atleastthree-shouldbeaskedforeachcategoryratherthanonlyone;ex.threetonalmelodyquestions,threemodalmelodyquestions…etc.However,thiswillalsoresultinanincreaseintheamountoftimeusedperapplicant.Thesolutionwewouldproposewillbediscussedinthefinalsection.

Reconsidering Jury Reliabilities

Asthemainaimofourresearchwastoinvestigatethedesignandapplicabilityofautomaticassessmenttoolstosupportthequalifyingexams,therecordedsoundsofthecandidates’examperformanceswereanalyzedincomparisonwiththeevaluationsofthejurymembers.Inotherwords,thealgorithmsweredesignednottodecidewhetherareferencepianosoundmatchedwithanapplicantperformanceornot,buttoimitatethejuryresponsesandmakeanevaluationabouttheapplicantperformances.Here,forthepitchrecognitionsection,wehavemanagedtofindreliableinformationontheacceptablepitchrangesandthresholdsabouttheintervalintonations(whichareinvestigatedanddiscussedseparatelyinKökeretal.andGüneretal.).Thehardestcategorytoaccomplishtheautomaticassessmenttaskwasthemelodysections;sinceevaluationofa“successful”melodicrecallmayhavemultiplefactorsincluding,thecompletenessofthemelody,pitchrow,rhythm,intonation,melodicshape,vibrato…etc.,andthattheimportanceofthesefactorsmaychangefromapersontoperson.Usingthedatasetderivedfrom2015and2016qualificationexams,twodifferentassessmentsystemshavebeendesigned(asdiscussedinBozkurtetal.(2017)andGültekinetal.),havingaverageaccuraciesas0.74and0.856.Duringthisprocesswealsohadthechancetoanalyzethesamplesinwhichtheautomaticassessmenttoolandthejuryevaluationshaddisagreements.Exceptforafewcases,itwasobservedthatallthedisagreementswereontheonesinwhichtheapplicantswerefavoredbythejurycommittee;thatisthejurycommittees(ofthe2015and2016exams)turnedouttobemorepositivelyflexiblethanthealgorithm,evaluatingtheapplicantsassuccessfulincasesthatmightbeconsideredasunsuccessful.Suchaseparationalsoprovideduswithinformationaboutthedifferentevaluationcriteriaofthedifferentjurycommittees;intonation,rhythm,theplace(andthefunction)ofthemissednote…etc.Infact,theexistenceofdifferentcriteriabetweenjurycommitteeswasprobablyoneofthecausesfortheautomaticassessmentsystems–which“learn”howtoevaluatefromthesedifferentcommittees-nothavinghigheraccuracies.

94

Figure3presentstheoveralldistributionofthe2016&2017qualificationexamscores.Thebasescoreneededforqualifyingintheseyearswere50%.Theboldmarkedbarinbothhistogramsreferstotheareaof48%-51,9%.Fromthefrequencydistributionsweobservethatmostoftheapplicantsthatfallinthisareaactuallypassedtheexam(19/24passedin2016and20/22passedin2017ascanbeseenfromTable7).Alsonoticefromthetwohistogramshowdrasticthedifferencesarewiththebarsontheirleft(fail)andtheright(pass).Itisasifa“positive”transferhasbeenmadefromthelefttotheright,which,foreachjurymember,requiresonly5points-thatisthedifferencebetweena“totallysuccessfulmelody”withan“averagelysuccessfulmelody”providedinthejuryevaluationsheets.Thus,itseemsthatthejurycommitteeisactingasagroup,andbytakingtheinitiativealtogether,decidingtogiveasecondchancefortheapplicantinthefinalentranceexams.This“jury-induced”positiveeffectmightalsoexplainthosecasesinwhichthejuryscoreandthedesignedalgorithmwereindisagreementwehavestatedbefore.

Figure3:2016&2017QualificationExams–OverallDistributionofScores

Table7:2016&2017QualificationExams:48%-51,9%ScoreArea

95

Atfirstthismaynotseemasanegativething,especiallywhenconsideredfromtheperspectiveoftheapplicants.However,italsoshowsthatacommunicationbetweenthejurymembersispresent,whichmayoccurinothercasesaswell,andthusbepositivelycontributingtothehighreliabilityscoreswehavepresentedintheprevioussection.Inaddition,bearinmindthat,inTurkey,thoseapplicantswhoaregraduatedfromtheFine-ArtsHighSchools(Güzel Sanatlar Lisesi),gainasignificantamountofextrapointsthanordinaryhighschoolgraduatesatthefinalentranceexams.Suchapplicantsmightappearatthetopportionoftheconservatoryacceptancelist,eveniftheyhadendedupactuallyinthewaitinglistsasaresultoftheirfinalentranceexamperformances.ThusitisopentodiscussionwhetherelevatingafailingapplicantcomingfromaFineArtsHighschoolbackgroundabovethebasescoreisa“positive”act,especiallywhenconsideredfromtheperspectiveofthosecomingfromordinaryschoolbackgroundswhohavepassedthequalificationexamswithoutanyoutsideeffects.

Conclusion: Towards a Hybrid Assessment Model

AsaconclusionweproposeaqualificationexamdesignthatissimilartotheMASTexperimentthatwasmentionedpreviously.Thus,similartotheTOEFLorPEARSONEnglishexams,theapplicantswouldregisterandindividuallyhavetheirqualificationexaminationinisolatedboothswiththeusageofcomputerscreens,headphonesandmicrophonesthroughwhichtheirquestionswouldbeaskedandtheirliveresponseswouldberecorded,assessedusingstateoftheartMIRtechnologiesandfurtherscreenedbyinstructorsespeciallyfortheclosetoboundarycases.RecallfromthesurveyresultsoftheMASTexperimentthatsuchanewsystemwaspreferredbythemajorityoftheparticipants,andthatthe23%whopreferredthejurysystembasedtheircomplaintsonthepoorisolationoftheexamenvironment–whichcanbepreventedusingabetterroom.Consideringthequestionpackageswehavediscussedintheprevioussection,thesystemcouldaskdifferentquestionsforeachcategoryfromarandomlyselectedbigpoolofdifferentpackages;thusgeneratingadifferentexameachtime.Itisalsopossibletocreatesuchquestionsbycomputersthroughgenerativealgorithms(Currently,asimilaralgorithisindesignandtestingphase).Thetestcanalsoincorporatesimilartypeofhearing-basedmultiplechoicequestionsasseeninthestandardizedtestsofSeashore,Wing,BentleyandGordon.Insuchanexamsystemtheapplicantscouldalsogetadetailedevaluationreportoftheirexamperformances,and-iftheyhavescoredaboveabasescore-usethesereportstoapplyforthefinalentranceexamsforthemusicconservatories,inwhichtheycanshowtheirmusicalperformanceskillstothejurycommitteeinmoredetail.Thiswillnotonlysaveagreatdealoftime,energy,infrastructureandcapital,butwillalsoincreasethequalityofthefinalentranceexams(thesecondtierexams),andresultinamuchmoreefficientexaminationprocess.

Acknowledgements:ThisworkissupportedbytheScientificandTechnologicalResearchCouncilofTurkey,TUBITAK,Grant#215K017.

References

AtakYayla,A.,Yayla,F.2009.“MüzikselAlgılamaÖlçeği”.8. Ulusal Müzik Eğitimi Sempozyumu: Türkiye’de Müzik Eğitiminin Sorunları ve Çözüm Önerileri – Bildiriler Kitabı.OndokuzMayısÜniversitesiYayınları.Samsun.372-378.

96

Atılgan,H.2008.“UsingGeneralizabilitytheorytoassessthescorereliabilityoftheSpecialAbility

SelectionExaminationsformusiceducationprogrammesinhighereducation”.International Journal of Research & Method in Education,31:1,63-76.

Bozkurt,B.,Baysal,O.,Yüret,D.(2017).“ADatasetandBaselineSystemforSingingVoiceAssessment”.CMMR201713thInternationalSymposiumonComputerMusicMultidisciplinaryResearch:MusicTechnologywithSwing.25-28September2017.

Ece,A.S.,Kaplan,S.2008.“MüzikÖzelYetenekSeçmeSınavı’nınPuanlayıcılarArasıGüvenilirlikÇalışması”.NationalEducation,36-49.

Göğüş,G.1999.“MüzikYeteneğininTanımı,ÖlçümüveDenemeYetenekTesti”,Uludağ Üniversitesi

Eğitim Fakültesi Dergisi,Cilt:12,sayı:1.79-89.

Güner,B.B.,Baysal,O.,Bozkurt,B.(inpreparation).“MüzikYetenekSınavlarıÇiftSesSoruDeğerlendirmelerindeKabulEdilebilirAralıklar”.

Gültekin,C.,Bozkurt,B.,Baysal,O.(inpreparation).“SingingAssessmentUsingChromaFeatures”.

Köker,O.,Baysal,O.,Bozkurt,B.(forthcoming).“MüzikYetenekSınavlarındaTekSesTekrarlarıİçinKabulEdilebilirPerdeAralığı(Aralıkları)”.Hacettepe Üniversitesi Ankara Devlet Konservatuarı Ulusal

Müzik ve Sahne Sanatları II. Sempozyumu - Bildiri Kitabı–21Aralık2017,Ankara.

Tarman,S.2016(2006).Müzik Eğitiminin Temelleri–Geliştirilmiş2.Basım.MüzikEğitimiYayınları.Ankara.

Yağcı,U.2010.“AGSLMüzikBölümleriYetenekSınavlarıveBuSınavlaraYönelikÖğretmenGörüşleri”.

Pamukkale Üniversitesi Eğitim Fakültesi Dergisi,Sayı27,223-231.