Download - Architecture
![Page 1: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/1.jpg)
© 2008 Palantir Technologies Inc. All rights reserved.
Architecture & Scalability
An overview of the Palantir Server Architecture
Akash JainDirector of Engineering
![Page 2: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/2.jpg)
Overview
Palantir Server Architecture– A fully-featured, enterprise-grade analytic platform– Robust, scalable, open and maintainable
In this talk– Dispatch Server– Oracle DB– Search Server– Job Server– Raptor Server
![Page 3: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/3.jpg)
Server Architecture
Dispatch Server
Revisioning DB
JDBC 3.0w/ SSL
OracleDatabase Storage
Search Server
Lucene Index
Storage
HTTPS
Job Server
Shared Storage
HTTPS
Job Data and Specs
Job Logsand Results
![Page 4: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/4.jpg)
Dispatch Server
Clients connect here– “Gateway to Palantir”– Clients can only connect here
Connects to database– Access control– Revisioning database
Connects to search and federated search Responsible for job creation and scheduling
![Page 5: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/5.jpg)
Roadmap: Revisioning DB
Dispatch Server
Revisioning DB
JDBC 3.0w/ SSL
OracleDatabase Storage
Search Server
Lucene Index
Storage
HTTPS
Job Server
Shared Storage
HTTPS
Job Data and Specs
Job Logsand Results
![Page 6: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/6.jpg)
Revisioning DB
Persistence store Oracle 10g RDBMS Enterprise-grade
– Scalability– Backup and Maintenance– Industry Standard– Large DBA community
JDBC 3.0 with SSL
Dispatch Server
Revisioning DB
JDBC 3.0w/ SSL
OracleDatabase Storage
![Page 7: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/7.jpg)
Roadmap: Search Server
Dispatch Server
Revisioning DB
JDBC 3.0w/ SSL
OracleDatabase Storage
Search Server
Lucene Index
Storage
HTTPS
Job Server
Shared Storage
HTTPS
Job Data and Specs
Job Logsand Results
![Page 8: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/8.jpg)
Search Server
Built on Apache Lucene– Leverage text processing capability– IR Library -> Enterprise Server– Full-text search capability– Custom fuzzy search using approxes
Why build our own?– Flexibility – database agnostic– Security – built into indexes– Scalability
Search Server
Lucene Index Storage
![Page 9: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/9.jpg)
Clustered Search Scale Parameters
Palantir Search Server scales horizontally User scale
– Number of concurrent requests Data scale
– Additional corpora/data sources– Also includes manually entered data
Search Server
Lucene Index Storage
![Page 10: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/10.jpg)
Clustered Search Mirroring Mirroring for User Scalability
– Redundancy across machines– Index write requests go to all mirrors– Search requests go to one mirror– More mirrors-> more concurrent queries
Search Mirror
Lucene Index Storage
Search Mirror
Lucene Index Storage
Search Mirror
Lucene Index Storage
Index Request
A
Index Request
A
Index Request
A
Search Request
1
Search Request
3
Search Request
2
Search Mirror
Lucene Index
Storage
Search Mirror
Lucene Index
Storage
Search Mirror
Lucene Index
Storage
Search Mirror
Lucene Index
Storage
Search Mirror
Lucene Index
Storage
Search Mirror
Lucene Index
Storage
Increased ThroughputSearch
Request 1
Search Request
2
Search Request
3
Search Request
4
Search Request
5
Search Request
6
![Page 11: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/11.jpg)
Clustered Search Partitioning Partitioning for Data Scale
– Split data across many machines– Search requests go to all partitions– Index write requests go to one partition– More partitions -> more data with constant index size
Search Partition
Lucene Index Storage
Search Partition
Lucene Index Storage
Search Partition
Lucene Index Storage
Index Request
1
Index Request
3
Index Request
2Search Partitio
n
Lucene Index
Storage
Search Partitio
n
Lucene Index
Storage
Search Partitio
n
Lucene Index
Storage
Search Partitio
n
Lucene Index
Storage
Search Partitio
n
Lucene Index
Storage
Search Partitio
n
Lucene Index
Storage
Search Request
A
Search Request
A
Search Request
A
Increased Index CapacityIndex Reque
st 1
Index Reque
st 3
Index Reque
st 2
Index Reque
st 4
Index Reque
st 6
Index Reque
st 5
![Page 12: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/12.jpg)
Roadmap: Job Server
Dispatch Server
Revisioning DB
JDBC 3.0w/ SSL
OracleDatabase Storage
Search Server
Lucene Index
Storage
HTTPS
Job Server
Shared Storage
HTTPS
Job Data and Specs
Job Logsand Results
![Page 13: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/13.jpg)
Job Server
The job server runs asynchronous jobs on behalf of clients– Bulk data imports– Persistent searches– LDAP auth syncs
Many job servers
Dispatch Server
Job Server
Shared Storage
HTTPS
Job Data and Specs Job Logs
and Results
![Page 14: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/14.jpg)
Systems Diagram
External Network
DMZ
Internal Network
Dispatch Server
Rev DB
JDBC 3.0w/ SSL
OracleDatabase Storage
Search Server
Lucene Index
Storage
HTTPS
Shared Storage
HTTPS
Job Server
Job Data and Specs
Job Logsand Results
HTTPS
Client
![Page 15: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/15.jpg)
Raptor Overview
Raptor sits in front of data sources Raptor indexes data source and answers search
queries Raptor monitors changes in your data source and
sends them to Palantir
![Page 16: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/16.jpg)
Federated Search
Raptor is Palantir’s federated search server– Rich data modeling– Extensible searching– Highly scalable indexing and search capabilities
Leverages– Palantir Data Import Pipeline– Palantir Clustered Search Server
With Raptor: Data owners control data You control performance characteristics
![Page 17: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/17.jpg)
Raptor Query Process
Raptor A
Searching
Raptor B
Searching
Raptor C
Searching
Search Query• Hits Palantir Search
Server• Federated to Raptor
Instances if applicable• Supports both keyword
search and structured queries
Results Collection• Results are sorted using
relevance from each search
Import to Palantir• On-The-Fly (OTF) Import• Sourcing information
retained for each attribute imported
• Enables full Palantir functionality
Palantir Query Result
Raptor C
Raptor B
Raptor A
![Page 18: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/18.jpg)
Raptor Scale Characteristics
Data Scale– 100 million row Netflix dataset– 10 million document usenet corpus– 1.5 million entity extracted Wikipedia corpus
Indexing Performance– 1m rows/hour structured indexing– 500k docs/hour unstructured document indexing– 100k docs/hour entity-extracted document indexing
Searching Performance– Sub-second search processing
![Page 19: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/19.jpg)
Summary
Palantir server components support a robust, scalable platform for analysis
Leverage enterprise-grade infrastructure Raptor provides further scalability
![Page 20: Architecture](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b72c8f4a79591b2d8b4626/html5/thumbnails/20.jpg)
© 2008 Palantir Technologies Inc. All rights reserved.
Architecture & Scalability
An overview of the Palantir Server Architecture
Akash JainDirector of Engineering