zemanta tech talk at audible

41
Audible Tech Talk 23. April 2012 Andraz Tori [email protected] @andraz

Upload: andraz-tori

Post on 12-Jan-2015

670 views

Category:

Education


0 download

DESCRIPTION

Tech talk about Zemanta's st

TRANSCRIPT

Page 1: Zemanta Tech Talk at Audible

Audible Tech Talk23. April 2012

Andraz [email protected]

@andraz

Page 2: Zemanta Tech Talk at Audible

Today's plan• Short story of Zemanta

• The Zemanta technology

Page 3: Zemanta Tech Talk at Audible

Where am I right now?

Page 4: Zemanta Tech Talk at Audible

Wonders of modern communication

Page 5: Zemanta Tech Talk at Audible

Ljubljana

Page 6: Zemanta Tech Talk at Audible

Strip mine

• A system for Slovenian National television in 2006

• Closed captioning web page for each episode of →

each show

• Natural Langauge Processing, Information

Retrieval...

Page 7: Zemanta Tech Talk at Audible

Start-up? Why not?

v

Page 8: Zemanta Tech Talk at Audible

Tour de Slovénie

Page 9: Zemanta Tech Talk at Audible

Sales

Page 10: Zemanta Tech Talk at Audible
Page 11: Zemanta Tech Talk at Audible

Seedcamp

• First European program inspired by YC (2007)

• London based

• 3 months, 50.000 EUR / 10%

Page 12: Zemanta Tech Talk at Audible
Page 13: Zemanta Tech Talk at Audible

Roller coaster12. August Deadline20. August Shortlist23. August Phone interview24. August Results

3. September London week start7. September London week end16. September ==> London

Page 14: Zemanta Tech Talk at Audible

3 months in London

Page 15: Zemanta Tech Talk at Audible
Page 16: Zemanta Tech Talk at Audible
Page 17: Zemanta Tech Talk at Audible

Back to Ljubljana

Page 18: Zemanta Tech Talk at Audible

Back to Ljubljana

Page 19: Zemanta Tech Talk at Audible
Page 20: Zemanta Tech Talk at Audible

• Figuring out US is our target market

• Figuring out where in US to be and who to have here

• Partnerships

• And naturally the business model

And then ...

Page 21: Zemanta Tech Talk at Audible

Technology

Page 22: Zemanta Tech Talk at Audible

• Zemanta – Personal Writing Assistant

- on your current platform

• While bloggers write we suggest:

- images

- related articles

- in-text links

- tags

What do we do?

Page 23: Zemanta Tech Talk at Audible
Page 24: Zemanta Tech Talk at Audible
Page 25: Zemanta Tech Talk at Audible
Page 26: Zemanta Tech Talk at Audible

• 80k bloggers monthly

• 1.3 million posts enhanced in 2011

Some stats

Page 27: Zemanta Tech Talk at Audible

How does it work• Natural Language Processing

• Big database of “meanings” (entities, concepts, topics)

• Word Sense Disambiguation

• Linking out to Wikipedia, Freebase, …

• Categorization, Named Entity Recognition

• Information Retrieval

• Solr based, using features from NLP

• With some twists

Page 28: Zemanta Tech Talk at Audible

Contentsuggestions

Plain text(article) Analysis

Semanticsearch

Backgroundknowledge

Indexed content

Page 29: Zemanta Tech Talk at Audible

“Text Understanding”- Input is meaningful chunk of text (not a keyword or a phrase)- Input is (semi) English language- Has to work across all domains in the open world- music, celebrities, finance, entertainment, politics, gardening, parenting, …

Page 30: Zemanta Tech Talk at Audible

Backgroundknowledge

Contentsuggestions

Plain text(article) Analysis

Semanticsearch

Indexed content

Page 31: Zemanta Tech Talk at Audible

Background knowledge- Data from Wikipedia, MusicBrainz, Freebase… and the

world wild web

- Includes linguistical and semantical properties and unstructured data

- Present in two forms:

- in “original” custom built triple store on top of MySQL (150 GB)

- processed into 7 GB optimized “memory mapped dump”

Page 32: Zemanta Tech Talk at Audible

Analysis pipelineNamed Entity

Extraction

Known phrasesextraction

(aho-corasick)

Triple storeSurface form features evaluation

Statistical comparison tobackground knowledge

Semantic coherenceand hand-tuned

heuristics

Disambiguated entities

etc.

Page 33: Zemanta Tech Talk at Audible

Backgroundknowledge

Contentsuggestions

Plain text(article) Analysis

Semanticsearch

Indexed content

Page 34: Zemanta Tech Talk at Audible

Connecting content

• Indexing blogosphere and mediasphere

• Solr based index

• Twist: complicated queries – 50 terms

• Filtering out spam is “fun”

• Probably best “related content” in terms of accuracy

• Coming soon: social signal

Page 35: Zemanta Tech Talk at Audible

But why just for bloggers?

Let's open up the API!

Page 36: Zemanta Tech Talk at Audible

Some API users

Page 37: Zemanta Tech Talk at Audible

Back to reality.

Page 38: Zemanta Tech Talk at Audible

Age of “smart”

Page 39: Zemanta Tech Talk at Audible

Blog me up, Scotty!23. April 2012

Page 40: Zemanta Tech Talk at Audible

Some takeaways

• Accelerators are good• World is getting flatter

But it will never be flat• Start monetizing soon – to learn, not to earn• Be where your market is• Many markets left to innovate in

Page 41: Zemanta Tech Talk at Audible

Thank you!