using the lucidworks rest api to support user-configuration big data search experience
DESCRIPTION
Presented by Mark Davis, CTO Kitenga - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 Kitenga's Analyst system uses the LucidWorks Enterprise REST API in a variety of ways, including for configuring collections and managing Solr schema. As part of the Kitenga platform, the ZettaSearch Designer empowers the end-user to dynamically drag-and-drop search widgets to create a specialized search interface. For a user to effectively design search UIs that meet their needs, they need to be able to understand the available schema fields that populate a given collection. ZettaSearch Designer interrogates the Solr infrastructure using the Lucid REST API to provide an overview of the available metadata. It is then easy for the user to build rich, facetted search experiences around the metadata library indexed into the collection. In this implementation overview, I will describe the design of ZettaSearch Designer, how it interacts with big data technologies like Hadoop as part of the indexing pipeline, and how it uses the LucidWorks API to enable user discovery of the metadata needed to create novel search user interfaces on the fly.TRANSCRIPT
![Page 1: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/1.jpg)
Kitenga reinventing information
Mark Davis Founder/CTO
![Page 2: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/2.jpg)
Enabling Big Data Search via the Lucid ReST API
![Page 3: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/3.jpg)
Big Data
Enormous transactional data Enormous unstructured information Too big for databases New tools are needed
![Page 4: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/4.jpg)
kilobyte (kB) 103 210 kibibyte (KiB) 210 megabyte (MB) 106 220 mebibyte (MiB) 220 gigabyte (GB) 109 230 gibibyte (GiB) 230 terabyte (TB) 1012 240 tebibyte (TiB) 240 petabyte (PB) 1015 250 pebibyte (PiB) 250 exabyte (EB) 1018 260 exbibyte (EiB) 260 zettabyte (ZB) 1021 270 zebibyte (ZiB) 270 yottabyte (YB) 1024 280 yobibyte (YiB) 280
Volume Velocity Variety
![Page 5: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/5.jpg)
Gather Resources
• Crawl • Crack formats
Extract Metadata
• Named entities
• Categories • Machine learning
• Semantic analysis
Index
• Schema definition
• Collection management
Indexing Challenges
Complex, varied data Compute-‐intensive metadata generation Schema and collection management
![Page 6: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/6.jpg)
Initial Query
• Keyword guesses
• Category guidance
Refine Query
• Analytic tools
• Facetted guidance
Evaluate Relevance
• Read KWIC • Read metadata
• Read document
Search Experience Challenges
Complex, varied data Resource discovery Facetted search experience management
![Page 7: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/7.jpg)
The Solution
Enable fast metadata generation:
Hadoop Mahout GPUs
Manage and control collections and schema:
LucidWorks Enterprise API
![Page 8: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/8.jpg)
SQL RDBMS
Transactional Data BI Tools
Search Documents Text Classification Taxonomies Ontologies
![Page 9: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/9.jpg)
![Page 10: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/10.jpg)
Parts-‐of-‐Speech Tagging
Tokenization
Lemmatization
Finite State Transducer Finite State Transducer
Finite State Transducer
Machine-‐Learning
![Page 11: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/11.jpg)
Query Language
Metadata Extraction
Indexing
Facet Browsing Facet Charting
Resource Integration
Autosuggest Spellcheck
![Page 12: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/12.jpg)
¡ Start to POC in a week ¡ Open source intelligence problems
![Page 13: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/13.jpg)
GOAL: Be more competitive
SOURCES: Patents, PR
announcements, legal documents,
whitepapers, crawled websites
ANALYSIS: Extract named entities and
relationships, classify and label;
visually understand relationships and
trends
ACTION: Change R&D priorities and
improve marketing approaches
13
ZettaS
earch
Facetted Search and Analytics
ZettaV
ox metadata
relationships
data entities
Source
s
![Page 14: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/14.jpg)
¡ Understand IP among competitors ¡ Assist legal team with litigation ¡ Custom search experience ¡ Custom extractors:
§ Electronic parts § Memory types § Flash memory
5/15/12 . 14
![Page 15: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/15.jpg)
5/15/12 . 15
Documents Size
Dell 102,508 9Gb
EMC 303,678 14Gb
Huawei 11,912 890Mb
Kingston 2,534 134Mb
Lenovo 8,305 542Mb
NEC 3,900 252Mb
Nokia 174,681 22Gb
Panasonic 5,804 473Mb
Rim 181 8Mb
Sharp USA 31,918 4.9Gb
645,421 60.2Gb
![Page 16: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/16.jpg)
GOAL: Discover new drugs, detect side-‐
effects, speed R&D
SOURCES: Published research reports,
patents, adverse effects databases,
genomics and proteomics databases
ANALYSIS: Extract named entities and
relationships, classify and label; visually
discover trends and relationships
ACTION: Change R&D priorities
16
ZettaS
earch
Facetted Search and Analytics
Source
s Ze
ttaV
ox
relationships
data entities pathways
sequences
![Page 17: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/17.jpg)
¡ Lousy search (Google Search Appliance) ¡ Internal regulators can’t find by accession number
¡ Custom extractors: § Accession number § Ontology of active ingredients § Drug names
© 2012 Kitenga Proprietary 17
![Page 18: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/18.jpg)
GOAL: Build “second screen
experiences”
SOURCES: wikipedia, IMDB, blogs
ANALYSIS: Extract named entities and
relationships, preserve existing
structural metadata
ACTION: Enable new media experiences
18
ZettaS
earch
Facetted Search and Analytics
ZettaV
ox metadata
relationships
data entities
Source
s
![Page 19: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/19.jpg)
¡ Crawlers on Hadoop ¡ Document format crackers on Hadoop ¡ Extractors on Hadoop ¡ Filters on Hadoop ¡ HTTP documents to Solr sharded cluster ¡ Intermediary files remain on HDFS for reprocessing
![Page 20: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/20.jpg)
¡ Missing piece of the puzzle ¡ Addresses the impedance mismatch between Big Data technologies and Solr search
¡ Manage collections ¡ Manage schema
![Page 21: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/21.jpg)
![Page 22: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/22.jpg)
![Page 23: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/23.jpg)
¡ Create collections ¡ Delete collections ¡ Update collection properties ¡ Create schema ¡ Modify schema
![Page 24: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/24.jpg)
¡ Schema interrogation ¡ Schema binding to user experience ¡ Facetted search ¡ Embedded analytics
![Page 25: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/25.jpg)
![Page 26: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/26.jpg)
![Page 27: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/27.jpg)
¡ Big Data search and analytics has many challenges: § Volume of data § Variety of data § Velocity of data § Extracting structure from unstructured information
¡ Hadoop processing enables each of these aspects ¡ Controlling indexing and search is enabled by the
Lucid Imagination search API ¡ We can enable complex user interactions with Big
Data on a self-‐serve basis
![Page 28: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/28.jpg)
ZettaVox Author RIA
Tomcat App Server
Tomcat Web Services
ZettaVoxServices Manager XML
+ JSON
Amazon S3
GPU Services Manager
Hadoop Services Manager
Analyst Browser Enterprise servers Cloud services
GPU MR Service Manager
GPU
GPU
Enterprise Cloud
Hadoop Server Job Tracker
Hadoop Task Manager Hadoop
Task Manager Hadoop
Task Manager
Hadoop Server Name node
Search Indexing
© 2012 Kitenga Proprietary Mahout
Entity Extraction Crawling
Quantum4D
RDBMS
ReST JSON
![Page 29: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/29.jpg)
ZettaVox Author RIA
Analyst Browser Enterprise servers
Hadoop Server Job Tracker
Hadoop Task Manager Hadoop
Task Manager Hadoop
Task Manager
Hadoop Server Name node
Search Indexing
© 2012 Kitenga Proprietary Mahout
Entity Extraction Crawling
ReST
JSON
• Get collection information • Create new collection • Create fields • Delete fields • Edit fields
Indexing
![Page 30: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/30.jpg)
Questions?
![Page 31: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience](https://reader033.vdocuments.net/reader033/viewer/2022060122/559607c21a28abac6e8b45ac/html5/thumbnails/31.jpg)