exploring word2vec in scalaa gentle introduction to machine learning a full machine learning...
TRANSCRIPT
![Page 1: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/1.jpg)
01
ExploringWord2vecinScala
GarySieling@garysielingWingspan,anIQVIACompany
Jan11,2018PHASE
1
![Page 2: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/2.jpg)
01
FindLectures.com:Acasestudyon naturallanguagesearch
• Demo• Crawling• SearchUseCases• MachineLearning
2
![Page 3: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/3.jpg)
01
Goals
• Usingmachinelearningontext• PracticalexamplesofWord2VecinScala• ShowusesofCUDA
3
![Page 4: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/4.jpg)
01
Agenda
• ProofofConcept:Emailalerts• ConceptSearch• CUDA • Demo
• Crawling• SearchUseCases• MachineLearning
4
![Page 5: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/5.jpg)
01
Papers
5
AnempiricalstudyofsemanticsimilarityinWordNetandWord2Vechttp://scholarworks.uno.edu/cgi/viewcontent.cgi?article=3003&context=td
ADualEmbeddingSpaceModelforDocumentRankinghttps://arxiv.org/pdf/1602.01137v1.pdf
![Page 6: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/6.jpg)
01
• Demo• Crawling• SearchUseCases• MachineLearning
6
![Page 7: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/7.jpg)
01
EmailAlerts
7
![Page 8: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/8.jpg)
01
ConceptSearch• Writing,NOTCode• Excludes“writingcss”,“writingphp”• Implies"poetry","fiction",“copyediting”
8
![Page 9: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/9.jpg)
01
ConceptSearch• Recipes,VegetarianFood• NOTDairy• Allthreemightinclude"vegancooking"• Impliesnomilk,cheese
9
![Page 10: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/10.jpg)
01
Requirements
• Demo• Crawling• SearchUseCases• MachineLearning
10
• Talks”about”thechosentopic• Incorporatemeaning– “Scala”+“MachineLearning”->Dl4j
• Maybeaconcepthierarchy• Don’tcombinemeaningifnothingincommon(hiking,art)• Don’tsendduplicatetalks/articles(e.g.announcementfrom
differentpublications)• Chooseawidevarietyoftalks(not5ontypesystems,etc)• Bonuspointsfor“negative”meanings(scala,butnotmonads)
![Page 11: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/11.jpg)
01
Thisis”search”problem
• Demo• Crawling• SearchUseCases• MachineLearning
11
• Tokenizetext• Maybemarkknown“entities”• Filter/de-emphasizecommonterms/meanings• Findthetermsweshouldhavesearchedfor• Searchforthoseterms• Re-rank/filterresults
![Page 12: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/12.jpg)
01
Solution:Word2Vec
12
https://github.com/idio/wiki2vec
![Page 13: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/13.jpg)
13
Termsincontext:PoliticalCodinghttp://findlectures.com/?q=liberation
![Page 14: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/14.jpg)
14
Termsincontext:Contextdefinitionshttp://findlectures.com/?q=quaker
![Page 15: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/15.jpg)
15
TrainingVectorsWasraisedaQuaker[”was”,“raised”,”a”,“religious”,“since”,“the”,“whose”,“patience”][1,1,1,0,0,0,0,0]
TheQuakerwhosepatiencewas[”was”,“raised”,”a”,“religious”,“since”,“the”,“whose”,“patience”][1,0,0,0, 0, 1,1,1]
![Page 16: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/16.jpg)
16
Word2VecOutputP(Term|context)
Or
P(Context|Term)
![Page 17: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/17.jpg)
01
Example:VectorAdditionGloriaSteinem- Person+Ideology~=1. MarxistFeminism2. RadicalFeminism3. FeministMovement4. FeministTheory
17
![Page 18: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/18.jpg)
01SuggestedSearch
18
![Page 19: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/19.jpg)
01Example:DataFormat
19
{"word":"zulus""count":30,"syn0":[-0.064,0.118,0.031,0.163,0.019,0.197,0.097,-0.139,-0.055,0.155,-0.033,-0.252,-0.029,0.119,0.007,-0.017,0.187,0.017,0.058,-0.097,-0.255,-0.159,-0.053,-0.090,-0.118,0.119,0.068,0.025,0.160,-0.035,-0.216,0.065,0.017,0.038,-0.068,0.101,0.090,0.089,-0.023,0.265,-0.161,-0.178,-0.362,0.016,0.226,-0.070,-0.079,0.040,0.368,-0.150
],"syn1":[0.312,0.379,0.168,-0.371,-0.094,0.218,-0.022,-0.051,0.003,-0.010,0.233,-0.005,-0.037,0.105,0.025,-0.040,-0.127,.201,0.175,0.277,0.185,-0.219,-0.504,-0.187,0.069,0.041,0.237,-0.245,0.067,-0.186,0.127,0.235,-0.262,-0.020,-0.152,0.007,-0.346,0.008,-0.173,-0.267,-0.049,0.051,0.087,0.046,-0.059,0.147,0.024,0.032,-0.403,0.019
]}
![Page 20: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/20.jpg)
01Example:SimilarityNumberfrom[0,1]
20
Imagecredit:https://engineering.aweber.com/cosine-similarity/
![Page 21: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/21.jpg)
Operation1:“Similarity”defcosineSimilarity(a:INDArray,b:INDArray
):Double={Transforms.cosineSim(a,b)
}
![Page 22: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/22.jpg)
INDArray- Similartonumpy array- Implementationdependsondependency:
libraryDependencies +="org.nd4j"%"nd4j-cuda-8.0-platform"%nd4jVersion
libraryDependencies +="org.nd4j"%"nd4j-native"%nd4jVersion
![Page 23: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/23.jpg)
01
CUDA• Specializedinstructionsetinvideocards/GPUs• RequiresNVIDIASDKandarecentcard($100-$xx,xxx)• AvailableonAWS• Deeplearning4j:JVMlibrariesformachinelearning• Nd4j/nd4s:matrixalgebraonlargearrays
23
![Page 24: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/24.jpg)
CUDA:exampleCcode__global__voidcoalescedMultiply(float*a,float*c,int M)
{
__shared__floataTile[TILE_DIM][TILE_DIM],
transposedTile[TILE_DIM][TILE_DIM];
int row=blockIdx.y *blockDim.y +threadIdx.y;
int col=blockIdx.x *blockDim.x +threadIdx.x;
floatsum=0.0f;
aTile[threadIdx.y][threadIdx.x]=a[row*TILE_DIM+threadIdx.x];
transposedTile[threadIdx.x][threadIdx.y]=
a[(blockIdx.x*blockDim.x +threadIdx.y)*TILE_DIM+
threadIdx.x];
__syncthreads();
for(int i =0;i <TILE_DIM;i++)
sum+=aTile[threadIdx.y][i]*transposedTile[i][threadIdx.x];
c[row*M+col]=sum;
}
![Page 25: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/25.jpg)
01
WaystoobtainGPUS• Buying
• Renting• AWS($0.90/hr)
25
Name GPUs vCPUs RAM (GiB)
NetworkBandwidth Price/Hour* RI Price /
Hour**p2.xlarge 1 4 61 High $0.900 $0.425p2.8xlarge 8 32 488 10Gbps $7.200 $3.400p2.16xlarge 16 64 732 20Gbps $14.400 $6.800
![Page 26: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/26.jpg)
TrainingWord2Vecval vec =newWord2Vec.Builder().minWordFrequency(5).iterations(1).layerSize(100).seed(42).windowSize(5).iterate(sentenceIterator).tokenizerFactory(tokenizer).build
vec.fit();
![Page 27: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/27.jpg)
Howdoyoutellifyourcodeisrunning- GPU
![Page 28: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/28.jpg)
Howdoesthisaffectword2vec
• Dl4jDemoproject:72minutes(CPU)• Dl4jDemoproject:41minutes(GPU)
![Page 29: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/29.jpg)
MostSimilar….
Definining opswecanuse– shouldthisbesooner?
![Page 30: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/30.jpg)
Operation2:ComputeadocumentmeandefgetWordVectorsMean(tokens:List[String]):INDArray ={val words=tokens.filter(model.getWordVector(_)!=null
).sorted
model.getWordVectorsMean(words.asJavaCollection
)}
![Page 31: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/31.jpg)
Nd4s/Nd4j
- Everythingisonelongarray,withdimensions(likenumpy)- Createonewithabigiterator- Easytoreshape- Parallelism– min32cores,allfollowingsamepath
![Page 32: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/32.jpg)
01
Problem:SuggestionsBythenextsearch?
32
![Page 33: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/33.jpg)
01
Problem:Noise
33
![Page 34: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/34.jpg)
Nd4s– Makeanarrayval data:Seq[Double]=
Seq(
words.flatMap(
(w)=>wordVectors(w)
),
words.flatMap(
(w)=>Seq.iterate(1,widthOfWordVector)((idx:Int)=>termFrequencies(w)).map(
(vv:Int)=>vv.toDouble
)
),
words.flatMap(
(w)=>Seq.iterate(1,widthOfWordVector)((idx:Int)=>documentFrequencies(w)).map(
(vv:Int)=>vv.toDouble
)
)
).flatten
![Page 35: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/35.jpg)
Nd4s– ComputationofTF*IDFaverageval modeVectors =arr.reshape(modes,widthOfWordVector *numWords)
val scores=modeVectors(0->1)
val tf =modeVectors(1->2)
val df =modeVectors(2->3)
val weighted=scores*tf /df
val wordVects =weighted.reshape(numWords,widthOfWordVector)
//thisistheweightedeverage
wordVects.sum(0)/numWords
//TODOisthisanybetter?
![Page 36: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/36.jpg)
01
"Synonym" Discovery Example
"Code"
36
Imagecredit:https://engineering.aweber.com/cosine-similarity/
"Coat"
![Page 37: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/37.jpg)
01Word2Vec– BuildaFullTextQuery
37
List("python","machine","learning").map((queryTerm)=>"("+model.wordsNearest(List(queryTerm),//positivetermsList(),//negativeterms25
).map((nearWord)=>"transcript:"+term2+"^"+model.similarity(nearWord,term2)
).mkString("OR")+")"
).mkString("AND")
![Page 38: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/38.jpg)
01
Visual– Nearestterms
38
Imagecredit:https://engineering.aweber.com/cosine-similarity/
QueryTerm
TopNclosest
![Page 39: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/39.jpg)
01
Example– Query(“Python+MachineLearning”)
39
title_s:python^10ORtitle_s:"machine learning"^10…(title_s:software^1.21ORtitle_s:database^1.20ORtitle_s:format^1.18title_s:applications^1.14ORtitle_s:browser^1.14ORtitle_s:setup^1.13title_s:bootstrap^1.13ORtitle_s:in-class^1.13ORtitle_s:campesina^1.12ORtitle_s:excel^1.12ORtitle_s:hardware^1.11ORtitle_s:programming^1.11ORtitle_s:api^1.11ORtitle_s:prototype^1.11ORtitle_s:middleware^1.11ORtitle_s:openstreetmap^1.10ORtitle_s:product^1.10ORtitle_s:app^1.09ORtitle_s:hbp^1.09ORtitle_s:programmers^1.09ORtitle_s:application^1.09ORtitle_s:databases^1.09ORtitle_s:idiomatic^1.09ORtitle_s:spreadsheet^1.09ORtitle_s:java^1.09…AND(…)
![Page 40: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/40.jpg)
01
Results(Python+MachineLearning+BM25)
40
PythonforDataAnalysisHowToGetStartedWithMachineLearning?|TwoMinutePapersThe/r/playrust Classifier:RealWorldRustDataScienceAndreasMueller- CommodityMachineLearningAGentleIntroductionToMachineLearningAfullMachinelearningpipelineinScikit-learnvsinscala-SparkHelloWorld- MachineLearningRecipes#1VisualdiagnosticsformoreinformedmachinelearningLabtoFactory:RobustMachineLearningSystemsMachineLearningwithScalaonSparkbyJoseQuesada
![Page 41: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/41.jpg)
01
Word2Vec– “Writing”
41
IssuesRelatedtotheTeachingofCreativeWritingIsNonfictionLiterature?"Oh,youliar,youstoryteller":OnFibbing,FactandFabulationTheValueoftheEssayinthe21stCenturyRewritingRereadingRethinking– WebDesigninWordsAspenNewYorkBookSeries:TheArtoftheMemoirCherylStrayed:"Wild"SiriHustvedt inConversationwithPaulAusterMaryKarr:The2016DianaandSimonRaab Writer-in-ResidenceHistory,Memory,andtheNovel
![Page 42: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/42.jpg)
01AboutnessRe-sortingtop100documents
val queryMean =model.getWordVectorsMean(List(“writing”))val mean=model.getWordVectorsMean(NLP.getWords(document._1))val distance=Transforms.cosineSim(vec._2,queryMean)
5min45seconds@16parallelthreads
![Page 43: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/43.jpg)
01
Visual– Aboutness
43
Imagecredit:https://engineering.aweber.com/cosine-similarity/
QueryAverage
DocumentAverage
![Page 44: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/44.jpg)
01
Aboutness- Results
IssuesRelatedtotheTeachingofCreativeWriting:0.43Autobiography:0.41ContemporaryIndianWriters:TheSearchforCreativity:0.41MarjorieWelish:Lecture:0.40HistoryandLiterature:TheStateofPlay:ARoundtableDiscussion:0.40CriticalReadingofGreatWriters:AlbertCamus:0.40DanielSchwarz:InDefenseofReading:0.39TheJourneyToTheWestbyProfessorAnthonyC.Yu:0.39Blogs,Twitter,theKindle:TheFutureofReading:0.39
![Page 45: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/45.jpg)
01
Word2Vec+OverlappingSearchTerms
45
Python,ProgrammingvsArt,Hiking
terms.map((term1)=>terms.map((term2)=>(term1,term2))
).flatten.filter((tuple)=>tuple._1<tuple._2).map((tuple)=>(tuple._1,tuple._2,w2v.model.get.similarity(tuple._1,tuple._2))
)
![Page 46: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/46.jpg)
01
Visual– OverlappingSearchTerms
46
Imagecredit:https://engineering.aweber.com/cosine-similarity/
QueryTerm1
QueryTerm2
![Page 47: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/47.jpg)
01
Word2Vec+OverlappingSearchTerms
programming<-->python:0.61
47
art<-->hiking:0.10
Python,Programming
Hiking,Art
(pythonANDprogramming)
(hikingORart)
![Page 48: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/48.jpg)
01
TopicDiversity
AConversationwithDavidGerrold,WriterofStarTrek:TheTroublewithTribbles- Teletalk (58minutes)
StarTrek:ScienceFictiontoScienceFact- STEMin30(28minutes)
PythonsPositivePressPumps Pandas
WhyisPythonGrowingSoQuickly?- StackOverflowBlog
Pythonexplosionblamedon pandas
Writing
Python
![Page 49: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/49.jpg)
01
Visual– TopicDiversity
49
Imagecredit:https://engineering.aweber.com/cosine-similarity/
Document1- Average
Document2- Average
![Page 50: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/50.jpg)
01
Pickone,findtheleastrelated(Python+Pandas)
50
Pythonexplosionblamedonpandas:1.0ConsideringPython'sTargetAudience:0.97AnimatedrouteswithQGISandPython:0.97Ican'tgetsomeSQLtocommitreadingdatafromadatabase:0.97UsingPythontobuildanAITwitterbotpeopletrust:0.96GettingaJobasaSelf-TaughtPythonDeveloper:0.96DownloadandProcessDEMsinPython:0.96HowtominenewsfeeddataandextractinteractiveinsightsinPython:0.94DifferentialEquationSolverInMATLAB,R,Julia,Python,C,Mathematica,Maple,andFortran:0.86MypersonaldatasciencetoolboxwritteninPython:0.75
1 min30seconds@16parallelthreads
![Page 51: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/51.jpg)
01
Technique- Summary• GettopXresults,re-shuffle• Morecomputingresources+data->higherrelevance
51
![Page 52: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/52.jpg)
01
WhereWord2VecWorks• Synonymgeneration• Improverecall• Searchsuggestions• Incorporatesecondarydataset(e.g.forenterprisesearch,privacy)
52
![Page 53: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/53.jpg)
01
WhyScala?• Ecosystem:Lucene,Spark• DependencyManagement
53
![Page 54: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/54.jpg)
01
Performance• Modelstake1-2weekstotrain• Someofcomputationstakeminutes,whichwouldnotworkin
asearchengine• Changes:
• Pre-computetokens(e.g.useLucene)• Pre-computeaverages(don’tnaturallystoreinLucene)• Hazelcast
54
![Page 55: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/55.jpg)
HowdoyoutellifyourcodeisrunningonaGPU(Spark+Deeplearning4j)• 15:17:27,828INFO~Loaded[CpuBackend]backend• 15:17:28,008INFO~NumberofthreadsusedforNativeOps:4• 15:17:29,182INFO~NumberofthreadsusedforBLAS:4• 15:17:29,185INFO~Backendused:[CPU];OS:[Windows10]• 15:17:29,185INFO~Cores:[8];Memory:[3.6GB];• 15:17:29,185INFO~Blasvendor:[MKL]• 15:17:34,546INFO~UsingSparkLocal
![Page 56: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/56.jpg)
01
CUDA• SwitchbetweenCPUandGPUbychangingsbt configuration:
• Threadingresources.Executionpipelinesonhostsystemscansupportalimitednumberofconcurrentthreads.Serversthathavefourhex-coreprocessorstodaycanrunonly24threadsconcurrently(or48iftheCPUssupportHyperThreading.)Bycomparison,thesmallestexecutableunitofparallelismonaCUDAdevicecomprises32threads(termedawarpofthreads).ModernNVIDIAGPUscansupportupto1536activethreadsconcurrentlypermultiprocessor(seeSectionF.1oftheCUDACProgrammingGuide).OnGPUswith16multiprocessors,thisleadstomorethan24,000concurrentlyactivethreads.
56
![Page 57: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/57.jpg)
01
Hazelcast• Justvideos– 241.8minutes• Nothingcached,buthazelcast- 76minutes• Onquerycombos– 234minutes• AddingHazelcast onqueries- 62.091• Afterallcached– 2.38• Moveword2vecmodelfromspinnertoSSD:
57
![Page 58: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/58.jpg)
jCudadefmemory={cuInit(0)val device=newCUdeviceJCudaDriver.cuDeviceGet(device,deviceId)
val total=Array(0L)val free=Array(0L)cuInit(0)cuDeviceGet(device,deviceId)
val context=newCUcontextcuCtxCreate(context,0,device)cuMemGetInfo(free,total)
cuCtxDestroy(context)
(total(0),free(0))}
![Page 59: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/59.jpg)
Tokenize- LucenedefgetTokens(text:String):List[String]={val result=newutil.ArrayList[String]()val analyzer:Analyzer=newStandardAnalyzer()
val stream:TokenStream =analyzer.tokenStream(null,newStringReader(text))stream.reset()
while(stream.incrementToken){result.add(stream.getAttribute(classOf[CharTermAttribute]).toString())}
importscala.collection.JavaConversions._result.toList}
![Page 60: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/60.jpg)
OtherLessons
- Inventingyourownmathdoesnotwork- High-dimensional“objects”donotfollowyourintuitionlike2D/3D- Floatingpointmathnotassociative
- Mathinpapersisuntyped- ”Distance”betweentwovectors– cosine,euclidean,manhattan?- vs.Probabilitycurves- UnlikePhysics(typesnaturallycompose,kg⋅m2⋅s−2)
- Followapaper- Nearlyimpossibletotestonyourown- Almostnoonepublishescode
![Page 61: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/61.jpg)
NextIdea…
![Page 62: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/62.jpg)
CUDASurprises
• HighendGPUsdon’tdovideo• Atonofpeopleareusingtheseforbitcoinmining(seelocalcraigslist)• CUDAusesalotofCPU• Floating-PointMathIsNotAssociative• “…thepeaktheoreticalmemorybandwidthoftheNVIDIATeslaM2090is177.6GB/sec:(1.85× 109× (384/8)× 2)/109=177.6GB/sec“• “….thepeaktheoreticalbandwidthbetweenhostmemoryanddevicememory(8GB/sonthePCIe ×16Gen2).• “…if,switch,do,for,whilesignificantlyaffectthroughput...Thedifferentexecutionpathsmustbeserialized,sinceallthreadsofawarpshareaprogramcounter;thisincreasesthetotalnumberofinstructionsexecutedforthiswarp”
![Page 63: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/63.jpg)
01
Resources• "RelevantSearch"• “DeepLearning– APractitioner’sApproach”• Deeplearning4j• Gensim• https://github.com/DiceTechJobs/ConceptualSearch• https://www.reddit.com/r/datasets/comments/3mg812/full_r
eddit_submission_corpus_now_available_2006/
63
![Page 64: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/64.jpg)
01
FindLectures.comWeeklyEmailswithLunchandLearnSuggestions
http://findlectures.com/emails
64
![Page 65: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/65.jpg)
01
Nextinstallment:
JavaUsersGroupInFebruary2018
“GPUProgrammingforJavaDevelopers”
65
![Page 66: Exploring Word2vec in ScalaA Gentle Introduction To Machine Learning A full Machine learning pipeline in Scikit-learn vs in scala-Spark Hello World -Machine Learning Recipes #1 Visual](https://reader034.vdocuments.net/reader034/viewer/2022042406/5f20a0ad176a8f35ee4ed6ce/html5/thumbnails/66.jpg)
01Contact:@garysieling@[email protected]
https://www.findlectures.comhttps://www.garysieling.comhttps://github.com/garysieling/
66