install jdk. jdk is required for spark ... -...
TRANSCRIPT
InstallJDK.JDKisrequiredforSpark. WewillbeusingSpark-1.4.1. DownloadSparkfrom.h@p://spark.apache.org/downloads.htmlScreenshotinnextslide
ExtracttheArchiveandcdtothenewlycreateddirectory. Invokeshellusing./bin/spark-shell
Scalais-objectoriented,funcKonalandstaKcallytyped. ProvideshigherlevelofabstracKon. Recommendedbook:ProgramminginScala:AComprehensiveStep-by-StepGuide,2ndEdiKonbyMarKnOdersky MarKnOderskyisthecreatorofScala.
FuncKonsarefirstclassciKzens. Itisavaluejustlikeanintegerorstring.UnlikefuncKonpointersinC/C++ PassfuncKonsasargumentstootherfuncKons,returnthemasresults. DefineafuncKoninsideanotherfuncKon. DefinefuncKonswithoutgivingthemaname.Example:x=>x+1,moreconcisewayis(_+1) SprinklecodewithfuncKonliterals.
FuncKonsmapinputvaluestooutputvaluesratherthanchangedatainplace.
s.replace(‘a’,’b’)yieldsanewstringobject ImmutabledatastructuresarecornerstoneoffuncKonalprogramming. Methodsshouldn’thaveanysideeffects. SomefuncKonalprogramminglanguagesevenprohibitsideeffects.ButScaladoesallowit.
StaKcTypingisdesirablebutannoyingwhenyouhavetospecifytypesredundantly. Ex:intinc(inty){returny+1) Ifcompilerwassmartitcouldinc(inty)(y+1). Weshallseelaterhoweventheinputtypecanbeinferred.Sotitcanbewri@ensimplyas(_+1)
ClustercompuKngframeworksletsyouwriteparallelcomputaKonsusingahighlevelsetofoperators. HadoopMapReduce,Dryad,SparkareclustercompuKngframeworks. WordCounttodemonstrate–HigherleveloperaKons.
MoreopKmalwhenyoureuseintermediatedataovermulKplecomputaKons. DatareuseiscommoninMachineLearningandgraphalgorithms. InteracKvedatamining. Handlesbothbatch,interacKveandstreamingapplicaKonswithinoneframework SupportsJava,ScalaandPython HigherlevelofabstracKon.
Data-CollecKonofelements. Distributed-DividedintoparKKonsandtheparKKonscanbespreadouttoresideondifferentmachinesinthecluster. Resilient–Maintainslineageandifyoulosedataitcanbere-computed. Couldbepersistedindiskorcanresideinmemory.
TransformaKon islazy createsnewdatasetfromexisKngone. examples:filter,map.
AcKons count,take ComputesallRDD’sinlineage.