reproducing the feature outputs of common programs in matlab using melfcc

4
25/3/2015 Reproducing the feature outputs of common programs in Matlab using melfcc.m http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/mfccs.html 1/4 Dan Ellis : Resources : Matlab : PLP, Rasta, MFCC : Reproducing the feature outputs of common programs using Matlab and melfcc.m When I decided to implement my own version of warpedfrequency cepstral features (such as MFCC) in Matlab, I wanted to be able to duplicate the output of the common programs used for these features, as well as to be able to invert the outputs of those programs. This page gives some examples of how cepstra can be calculated by three common programs (HTK 's HCopy, feacalc from SPRACHcore , and mfcc.m from Malcolm Slaney's Auditory Toolbox for Matlab), and how to duplicate the results (or very nearly) using my melfcc.m routine. This also automatically shows you how to invert cepstra calculated by either path into spectrograms or waveforms using invmelfcc.m, since its arguments are the same. HTK MFCC 20130226: For an emulation of HTK's MFCC calculation accurate to the 3rd decimal place, see the modified rastamat code in calc_mfcc . The main differences were that HTK applies preemphasis independently on each window, and also removes the mean on each window. Calculating features in HTK is done via HCopy, which can convert between a wide range of representations including waveform to cepstra. HCopy takes its options from a config file. Thus, to convert 16 kHz sampled soundfiles to standard Melfrequency cepstral coefficients (MFCCs), you would have a file config.mfcc containing: SOURCEKIND = WAVEFORM SOURCEFORMAT = WAVE SOURCERATE = 625 TARGETKIND = MFCC_0 TARGETRATE = 100000.0 WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 20 CEPLIFTER = 22 NUMCEPS = 12 (The SOURCEFORMAT option specifies that the wavefiles are in MSWAVE format.) Then to calculate the features, you simply run HCopy from the Unix command line: $ HCopy ‐C config.mfcc sa1.wav sa1‐mfcc.htk We can emulate this processing in Matlab, and compare the results, as below: (Note that the ">>" at the start of each line is an image, so you can cut and copy multiple lines of text directly into Matlab without having to worry about the prompts). % Load a speech waveform [d,sr] = wavread('sa1.wav '); % Calculate HTK‐style MFCCs mfc = melfcc(d, sr, 'lifterexp', ‐22, 'nbands', 20, ... 'dcttype', 3, 'maxfreq',8000, 'fbtype', 'htkmel', 'sumpower', 0); % Load the features from HCopy and compare: htkmfc = readhtk ('sa1‐mfcc.htk'); % Reorder and scale to be like mefcc output htkmfc = 2*htkmfc(:, [13 [1:12]])'; % (melfcc.m is 2x HCopy because it deals in power, not magnitude, spectra) subplot(311) imagesc(htkmfc); axis xy; colorbar title('HTK MFCC'); subplot(312) imagesc(mfc); axis xy; colorbar title('melfcc MFCC'); subplot(313) imagesc(htkmfc ‐ mfc); axis xy; colorbar title('difference HTK ‐ melfcc'); % Difference occasionally peaks at as much as a few percent (unexplained), % but is basically negligable % Invert the HTK features back to waveform, auditory spectrogram, % regular spectrogram (same args as melfcc()) [dr,aspec,spec] = invmelfcc(htkmfc, sr, 'lifterexp', ‐22, 'nbands', 20, ... 'dcttype', 3, 'maxfreq',8000, 'fbtype', 'htkmel', 'sumpower', 0); subplot(311) imagesc(10*log10(spec)); axis xy; colorbar title('Short‐time power spectrum inverted from HTK MFCCs') subplot(312) specgram(dr,512,sr); colorbar title('Spectrogram of reconstructed (noise‐excited) waveform');

Upload: trunggana-abdul-w

Post on 16-Nov-2015

20 views

Category:

Documents


6 download

DESCRIPTION

matlab instrumentation

TRANSCRIPT

  • 25/3/2015 ReproducingthefeatureoutputsofcommonprogramsinMatlabusingmelfcc.m

    http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/mfccs.html 1/4

    DanEllis:Resources:Matlab:PLP,Rasta,MFCC:

    ReproducingthefeatureoutputsofcommonprogramsusingMatlabandmelfcc.m

    WhenIdecidedtoimplementmyownversionofwarpedfrequencycepstralfeatures(suchasMFCC)inMatlab,Iwantedtobeabletoduplicatetheoutputofthecommonprogramsusedforthesefeatures,aswellastobeabletoinverttheoutputsofthoseprograms.Thispagegivessomeexamplesofhowcepstracanbecalculatedbythreecommonprograms(HTK'sHCopy,feacalcfromSPRACHcore,andmfcc.mfromMalcolmSlaney'sAuditoryToolboxforMatlab),andhowtoduplicatetheresults(orverynearly)usingmymelfcc.mroutine.Thisalsoautomaticallyshowsyouhowtoinvertcepstracalculatedbyeitherpathintospectrogramsorwaveformsusinginvmelfcc.m,sinceitsargumentsarethesame.

    HTKMFCC

    20130226:ForanemulationofHTK'sMFCCcalculationaccuratetothe3rddecimalplace,seethemodifiedrastamatcodeincalc_mfcc.ThemaindifferenceswerethatHTKappliespreemphasisindependentlyoneachwindow,andalsoremovesthemeanoneachwindow.

    CalculatingfeaturesinHTKisdoneviaHCopy,whichcanconvertbetweenawiderangeofrepresentationsincludingwaveformtocepstra.HCopytakesitsoptionsfromaconfigfile.Thus,toconvert16kHzsampledsoundfilestostandardMelfrequencycepstralcoefficients(MFCCs),youwouldhaveafileconfig.mfcccontaining:

    SOURCEKIND=WAVEFORMSOURCEFORMAT=WAVESOURCERATE=625TARGETKIND=MFCC_0TARGETRATE=100000.0WINDOWSIZE=250000.0USEHAMMING=TPREEMCOEF=0.97NUMCHANS=20CEPLIFTER=22NUMCEPS=12

    (TheSOURCEFORMAToptionspecifiesthatthewavefilesareinMSWAVEformat.)Thentocalculatethefeatures,yousimplyrunHCopyfromtheUnixcommandline:

    $HCopyCconfig.mfccsa1.wavsa1mfcc.htk

    WecanemulatethisprocessinginMatlab,andcomparetheresults,asbelow:(Notethatthe">>"atthestartofeachlineisanimage,soyoucancutandcopymultiplelinesoftextdirectlyintoMatlabwithouthavingtoworryabouttheprompts).

    %Loadaspeechwaveform[d,sr]=wavread('sa1.wav');%CalculateHTKstyleMFCCsmfc=melfcc(d,sr,'lifterexp',22,'nbands',20,...

    'dcttype',3,'maxfreq',8000,'fbtype','htkmel','sumpower',0);%LoadthefeaturesfromHCopyandcompare:htkmfc=readhtk('sa1mfcc.htk');%Reorderandscaletobelikemefccoutputhtkmfc=2*htkmfc(:,[13[1:12]])';%(melfcc.mis2xHCopybecauseitdealsinpower,notmagnitude,spectra)subplot(311)imagesc(htkmfc);axisxy;colorbartitle('HTKMFCC');subplot(312)imagesc(mfc);axisxy;colorbartitle('melfccMFCC');subplot(313)imagesc(htkmfcmfc);axisxy;colorbartitle('differenceHTKmelfcc');%Differenceoccasionallypeaksatasmuchasafewpercent(unexplained),%butisbasicallynegligable

    %InverttheHTKfeaturesbacktowaveform,auditoryspectrogram,%regularspectrogram(sameargsasmelfcc())[dr,aspec,spec]=invmelfcc(htkmfc,sr,'lifterexp',22,'nbands',20,...

    'dcttype',3,'maxfreq',8000,'fbtype','htkmel','sumpower',0);subplot(311)imagesc(10*log10(spec));axisxy;colorbartitle('ShorttimepowerspectruminvertedfromHTKMFCCs')subplot(312)specgram(dr,512,sr);colorbartitle('Spectrogramofreconstructed(noiseexcited)waveform');

    http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/http://www.ee.columbia.edu/~dpwe/resources/http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/config.mfcchttp://www.ee.columbia.edu/ln/rosa/matlab/rastamat/readhtk.mhttp://www.icsi.berkeley.edu/~dpwe/projects/sprach/sprachcore.htmlhttp://www.ee.columbia.edu/~dpwe/resources/matlab/http://htk.eng.cam.ac.uk/http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/sa1.wavhttp://www.ee.columbia.edu/~dpwe/http://labrosa.ee.columbia.edu/projects/calc_mfcc/

  • 25/3/2015 ReproducingthefeatureoutputsofcommonprogramsinMatlabusingmelfcc.m

    http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/mfccs.html 2/4

    subplot(313)specgram(d,512,sr);colorbartitle('Originalsignalspectrogram');%Spectrogramslookprettyclose,althoughnoiseexcitation%ofreconstructiongivesitaweird'whisperingcrowd'sound

    HTKPLP

    HTKcanalsocalculatePLPfeatures.ItturnsoutthatthesearesomewhatdifferentfromtheMFCCfeaturesbecausethecepstraarecalculatedbyadifferentalgorithm.However,wecanstillemulateandinvertthemwithdifferentparameters.TocalculatePLPfeatureswithHCopy,weneedanewconfigfile,config.plp:

    SOURCEKIND=WAVEFORMSOURCEFORMAT=WAVESOURCERATE=625TARGETKIND=PLP_0TARGETRATE=100000.0WINDOWSIZE=250000.0USEHAMMING=TPREEMCOEF=0.97NUMCHANS=20CEPLIFTER=22NUMCEPS=12USEPOWER=TLPCORDER=12

    (TARGETKINDischanged,andUSEPOWERandLPCORDERareadded).Thenwecalculatethefeatures:

    $HCopyCconfig.plpsa1.wavsa1plp.htk

    ..andcomparetotheMatlabversion:

    [d,sr]=wavread('sa1.wav');%CalculateHTKstylePLPsplp=melfcc(d,sr,'lifterexp',22,'nbands',20,...

    'dcttype',1,'maxfreq',8000,'fbtype','htkmel',...'modelorder',12,'usecmp',1);

    %LoadtheHCopyfeatureshtkplp=readhtk('sa1plp.htk');%Reorder(noscalinginthiscase)htkplp=htkplp(:,[13[1:12]])';subplot(311)imagesc(htkplp);axisxy;colorbartitle('HTKPLP');subplot(312)imagesc(plp);axisxy;colorbartitle('melfccPLP');subplot(313)imagesc(htkplpplp);axisxy;colorbartitle('differenceHTKmelfcc');%Unexplaineddifferencescanbeupto20%forhigherorder%cepstra,butessentiallythesame

    %InverttheHTKfeaturesbackagainbymirroringargstomelfcc[dr,aspec,spec]=invmelfcc(htkplp,sr,'lifterexp',22,'nbands',20,...

    'dcttype',1,'maxfreq',8000,'fbtype','htkmel',...'modelorder',12,'usecmp',1);

    subplot(311)imagesc(10*log10(spec));axisxy;colorbartitle('ShorttimepowerspectruminvertedfromHTKPLPs')subplot(312)specgram(dr,512,sr);colorbartitle('Spectrogramofreconstructed(noiseexcited)waveform');subplot(313)specgram(d,512,sr);colorbartitle('Originalsignalspectrogram');%Prettyclose

    feacalcMFCC

    http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/config.plphttp://www.ee.columbia.edu/ln/rosa/matlab/rastamat/sa1.wavhttp://www.ee.columbia.edu/ln/rosa/matlab/rastamat/readhtk.m

  • 25/3/2015 ReproducingthefeatureoutputsofcommonprogramsinMatlabusingmelfcc.m

    http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/mfccs.html 3/4

    feacalcisthemainfeaturecalculationprogramfromICSI'sSPRACHcorepackage.It'sactuallyawrapperaroundtheolderrasta.whichwastheoriginalClanguageimplementationofRASTAandPLPfeaturecalculation.feacalchasbeenexpandedtobeabletocalculate(itsownversionof)MFCCfeatures,sotoparalleltheHTKexamplesabove,we'llstartwithfeacalc'sMFCCfeature.Theycanbecalculatedwiththefollowingcommandline:

    $feacalcsr16000nyq8000delta0rasnoplpno\domcepcomnofrqmelfilttricep13opfhtk\sa1.wavosa1fcmfc.htk

    andweduplicatethisinMatlabasfollows:

    [d,sr]=wavread('sa1.wav');%CalculateFeacalcstyleMFCCs%(scaletomatchnormalizationofMelfilters)mfc2=melfcc(d*5.5289,sr,'lifterexp',0.6,'nbands',19,...

    'dcttype',4,'maxfreq',8000,'fbtype','fcmel','preemph',0);%LoadtheHCopyfeaturesfcmfc=readhtk('sa1fcmfc.htk');%Noneedtoreorderorscale,justtransposefcmfc=fcmfc';subplot(311)imagesc(fcmfc(2:13,:));axisxy;colorbartitle('feacalcMFCC');subplot(312)imagesc(mfc2(2:13,:));axisxy;colorbartitle('melfccMFCC(feacalcstyle)');subplot(313)imagesc(fcmfcmfc2);axisxy;colorbartitle('differencefeacalcmelfcc');%Smalldifferencesinhighordercepstradueto%cumulativeerrorsinMelfiltershapes

    ..andinvertingworksjustthesameasabove.

    feacalcPLP

    feacalcwasoriginallydesignedtocalculatePLP(andRasta)features,sothisisitsmore'native'invocation:

    $feacalcsr16000nyq8000delta0rasnodomcepplp12\opfhtksa1.wavosa1fcplp.htk

    ..whichweduplicatethisinMatlabasfollows:

    [d,sr]=wavread('sa1.wav');%CalculateFeacalcstylePLPsplp2=melfcc(d,sr,'lifterexp',0.6,'nbands',21,...

    'dcttype',1,'maxfreq',8000,'fbtype','bark','preemph',0,...'numcep',13,'modelorder',12,'usecmp',1);

    %LoadtheHCopyfeaturesfcplp=readhtk('sa1fcplp.htk');%justtransposefcplp=fcplp';subplot(311)imagesc(fcplp(2:13,:));axisxy;colorbartitle('feacalcPLP');subplot(312)imagesc(plp2(2:13,:));axisxy;colorbartitle('melfccPLP(feacalcstyle)');subplot(313)imagesc(fcplpplp2);axisxy;colorbartitle('differencefeacalcmelfcc');%Afewlocalizeddifferencesduewindowsetc.

    ..andonceagaininvertingworksjustthesameasabove.

    AuditoryToolboxmfcc.m

    ThemostpopulartoolforcalculatingMFCCsinMatlabismfcc.mfromMalcolmSlaney'sAuditoryToolbox.ThisiswhatIusedforalongtime,untilIneededsomethingwithmoreflexibility.Thatflexibilityincludesbeingabletoduplicatemfcc.m.Here'showwecancomparetheminMatlab.

    [d,sr]=wavread('sa1.wav');%CalculateMFCCsusingmfcc.mfromtheAuditoryToolbox%(gainshouldbe2^15becausemelfccscalesbythatamount,%butinthiscasemfccuses2xFFTlen)ce=mfcc(d*(2^14),sr);%Scalethemtomatch(log_10andpower)ce=log(10)*2*ce;

    http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/sa1.wavhttp://www.icsi.berkeley.edu/~dpwe/projects/sprach/sprachcore.htmlhttp://www.ee.columbia.edu/ln/rosa/matlab/rastamat/sa1.wavhttp://www.ee.columbia.edu/ln/rosa/matlab/rastamat/readhtk.mhttp://www.ee.columbia.edu/ln/rosa/matlab/rastamat/sa1.wavhttp://www.ee.columbia.edu/ln/rosa/matlab/rastamat/readhtk.m

  • 25/3/2015 ReproducingthefeatureoutputsofcommonprogramsinMatlabusingmelfcc.m

    http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/mfccs.html 4/4

    %Duplicatewithmelfcc.mmfc3=melfcc(d,sr,'lifterexp',0,'minfreq',133.33,...

    'maxfreq',6855.6,'wintime',0.016,'sumpower',0);%..andcompare:subplot(311)imagesc(ce(2:13,:));axisxy;colorbartitle('AuditoryToolboxMFCC');subplot(312)imagesc(mfc3(2:13,:));axisxy;colorbartitle('melfccMFCC(AudToolboxstyle)');subplot(313)imagesc(cemfc3);axisxy;colorbartitle('differenceAudTBoxmelfcc');%Smalldifferencesmainlyduetohanningvs.hamming

    NotesonthedifferencesbetweendifferentMFCCs

    MelmappingfunctionMelfilternormalizationDCTusedtocalculatecepstrumNumberofMelbands(andhencetheirwidth)FrequencyspanofMelbandsLifteringrasta,htk,noneDetailsofinitialSTFT(odd/evenhann/hamm,fftlength,windowlength)MelintegrationinlinearorpowerdomainDitherandDCremovalPreemphasis

    Lastupdated:$Date:2013/02/2617:00:16$

    DanEllis

    http://www.ee.columbia.edu/~dpwe/mailto:[email protected]