exploring data -...
Post on 24-Aug-2020
4 Views
Preview:
TRANSCRIPT
ExploringData
CENG499Introduc7ontoDataScience
ErdoğanDoğdu
Content
• Ch.10WorkingwithData
ExploringData
• Beforeyoustartbuildingmodelsandpredic7ng,knowyourdata– Exploreyourdatafirst
One-dimensionaldata
• Example.Acollec7onofnumbers– Thenumberofminuteseachuserspendonyourwebsite
• Howtoexplore?– Summarysta7s7cs
• #ofitems,thesmallest,thelargest,themean,std.dev
– Histograms• Groupdataintobuckets
Histograms
Histograms
• Mean:0,Std.dev=58forbothdistribu7ons• Distribu7on?– plot_histogram(uniform,10,"UniformHistogram")– plot_histogram(normal,10,"NormalHistogram”)
Histograms
Twodimensions
• Example:– Users’dailyminutesinthewebsite(dim1)– Users’experienceinyearsindatascience(dim2)– Howdotheyvarytogether?
Twodimensions
• plot_histogram(ys1,10,”ys1")• plot_histogram(ys2,10,”ys2")• Samemean,std.dev,
normallydistributed
Twodimensions• Buteachhasaverydifferentjointdistribu7onwithxs
printcorrela7on(xs,ys1)#0.9printcorrela7on(xs,ys2)#-0.9
ManyDimensions
• Howdoallthedimensionsrelatetooneanother?
• Correla'onmatrix– Rowi,Colj:Correla7onofdimianddimj
ManyDimensions
• Scaberplotmatrixplt.subplots()
CleaningandMunging
• Realworlddataisdirty• Convertstringtonumbers(ex.float[str])• Ifcannotconvert?
Manipula7ngData
• Stockpricesdata
• Thehighest-everclosingpriceforAAPL?– RestrictourselvestoAAPLrows.– Grabtheclosing_pricefromeachrow.– Takethemaxofthoseprices.
Manipula7ngData
• Thehighest-everclosingpriceforeachstockinourdataset?
Rescaling
• Clusterbodysizes?– Euclidiandistancebetween(height,weight)pairs
Rescaling
Rescaling
Rescaling
top related