spacy lightning talk for kyivpy #21
TRANSCRIPT
spaCy to the rescueor why NLTK is not cool anymore
Anton Kasyanov | DataRobot
Anton Kasyanov | DataRobot
What is spaCy
• Natural language processing library
• Industrial strength - based on latest research
• Fast - written using Cython
Anton Kasyanov | DataRobot
Usage
import spacy nlp = spacy.load(‘en') doc = nlp( ‘Hello, world.’ ‘Here are two sentences.’ )
Anton Kasyanov | DataRobot
Tokeniser
token = doc[0] sentence = next(doc.sents) assert token is sentence[0] assert sentence.text == \ ‘Hello, world.'
Anton Kasyanov | DataRobot
Word Vectors
doc = nlp(“Apples and oranges are similar.\ Boots and hippos aren’t.")
apples = doc[0] oranges = doc[2] boots = doc[6] hippos = doc[8]
assert apples.similarity(oranges) > \ boots.similarity(hippos)
Anton Kasyanov | DataRobot
Syntactic Parser
Anton Kasyanov | DataRobot
Speed
Anton Kasyanov | DataRobot
Other features• Part-of-Speech tagger
• Named entities recognition
• Integer IDs for words
• Multi-threading support
• Deep learning
• German, English, French (so far)
Anton Kasyanov | DataRobot
Thanks!https://spacy.io
antonkasyanov.com