topic modeling with spark

33
OPEN DATA SCIENCE CONFERENCE Boston | May 20-22, 2016 Natural Language Processing and Topic Modeling with Spark Frank D. Evans

Upload: frank-evans

Post on 08-Jan-2017

217 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Topic Modeling with Spark

OPEN DATA SCIENCE CONFERENCE

Boston | May 20-22, 2016

Natural Language Processing and Topic Modeling with Spark

Frank D. Evans

Page 2: Topic Modeling with Spark
Page 3: Topic Modeling with Spark
Page 4: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 5: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 6: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

America has enjoyed twenty-two months of uninterrupted economic recovery. But recovery is not enough. If we are to prevail in the long run, we must expand the long-run strength of our economy.

america enjoy twenty-two month uninterrupted economy recovery recovery not enough prevail long run expand long-run strength economy

Page 7: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 8: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 9: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 10: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 11: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 12: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 13: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 14: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 15: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 16: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 17: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 18: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 19: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 20: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

America has enjoyed twenty-two months of uninterrupted economic recovery. But recovery is not enough. If we are to prevail in the long run, we must expand the long-run strength of our economy.

america enjoy twenty-two month uninterrupted economy recovery recovery not enough prevail long run expand long-run strength economy

Page 21: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 22: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Page 23: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

America has enjoyed twenty-two months of uninterrupted economic recovery in car sales.

After the car accident, he made a heroic full recovery.

america enjoy twenty-two months uninterrupted economic recovery car sales

After car accident made heroic full recovery

Page 24: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

America has enjoyed twenty-two months of uninterrupted economic recovery in car sales.

After the car accident, he made a heroic full recovery.

(america has), (has enjoyed), (enjoyed twenty-two), (twenty-two months), (months of), (of uninterrupted), (uninterrupted, economic), (economic recovery), (recovery in), (in car), (car sales)

(after the), (the car), (car accident), (accident he), (he made), (made a), (a heroic), (heroic full), (full recovery)

Page 25: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

America has enjoyed twenty-two months of uninterrupted economic recovery in car sales.

After the car accident, he made a heroic full recovery.

(america has), (has enjoyed), (enjoyed twenty-two), (twenty-two months), (months of), (of uninterrupted), (uninterrupted, economic), (economic recovery), (recovery in), (in car), (car sales)

(after the), (the car), (car accident), (accident he), (he made), (made a), (a heroic), (heroic full), (full recovery)

Page 26: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

America has enjoyed twenty-two months of uninterrupted economic recovery in car sales.

After the car accident, he made a heroic full recovery.

(america has), (has enjoyed), (enjoyed twenty-two), (twenty-two months), (months of), (of uninterrupted), (uninterrupted, economic), (economic recovery), (recovery in), (in car), (car sales)

(after the), (the car), (car accident), (accident he), (he made), (made a), (a heroic), (heroic full), (full recovery)

Page 27: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

America has enjoyed twenty-two months of uninterrupted economic recovery in car sales.

After the car accident, he made a heroic full recovery.

Page 28: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

America has enjoyed twenty-two months of uninterrupted economic recovery in car sales.

After the car accident, he made a heroic full recovery.

america enjoy twenty-two months uninterrupted [economic recovery] car sales

After [car accident] made heroic full recovery

Frequency Merge

Page 29: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Fed Chairman Ben Bernanke led from Washington with the help of the bank's current $3.6T balance sheet. He's accompanied by Mario Draghi at the European Central Bank.

Page 30: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize

Fed Chairman Ben Bernanke led from Washington with the help of the bank's current $3.6T balance sheet. He's accompanied by Mario Draghi at the European Central Bank.

E:”Ben Bernanke”(Person) E:”Washington”(City) E:”Mario Draghi”(Person, Politician) E:”Federal Reserve”(Organization) E:”United States”(Country) E:”European Central Bank”(Organization) E:”central banks”(terminology) C:”monetary policy” C:”economics”

Entity/Concept Recognition

Page 31: Topic Modeling with Spark
Page 32: Topic Modeling with Spark

exaptive.com/blog

Frank D. Evans@frankdevans

@exaptive

slideshare.net/frankdevansgithub.com/frankdevans

Page 33: Topic Modeling with Spark

Raw Text Wrangle Model Extract Visualize