bright planetm - wikileaksbrightplanet' a - in harvesting h.-i!:' access data haweskd from...

5
Harvest "unknown and hidden" content from the Deep Web. For over ten years, BrightPlanet has been the pioneer in harvesting high quality unstructured content from the Deep Web and then making it accessible for those who need the valuable, untapped resources that lie beneath the Surface Web for deep research and analysis. The company has more than six years of experience working with the United States Intelligence Community (IC) in its 'War on Terror' to target and access data hidden beyond the reach of typical Web search engines. Now, BrightPlanet is bringing its patented Deep Web Hdng Technologies and Deep Analytics Sdutions to the commercial community. Bright Planetm Pioneerrink m - w - ' ,"" v Deep Web Deep Web Content Silos BrightPknet harvests )opic-spedfic content R Deep Web Silos for research, +, monitoring and tracking. Visit www.brightplanet.com and learn more about: Deep Web Silos - Find, harvest and normalize "unknown and hidden", unstructured content from the Deep Web for research and analytics. Deep Web Monitor - Target, Monitor, Track & get Alerts from targeted web sites, blogs, tweets, reports, newsfeeds and more. open~lanet@ Dashboard - Employ 3rd party technology options from the OpenRanet Platform for enriched visualization, entity extraction, storage, etc., to create full scope analytic solutions. ~ee~~atvester~ Workbench - Integrate our standalone, lightweight DeepHarvester Workbench behind your firewall as part of an enterprise solution for high security harvesting.

Upload: others

Post on 10-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bright Planetm - WikiLeaksBrightplanet' A - in Harvesting h.-I!:' Access Data Haweskd from the Deep Web for Research and Analysis. Deep Web Silos are repositories for topic-specific

Harvest "unknown and hidden" content from the Deep Web.

For over ten years, BrightPlanet has been the pioneer in harvesting high quality unstructured content from the Deep Web and then making it accessible for those who need the valuable, untapped resources that lie beneath the Surface Web for deep research and analysis.

The company has more than six years of experience working with the United States Intelligence Community (IC) in its 'War on Terror' to target and access data hidden beyond the reach of typical Web search engines. Now, BrightPlanet is bringing its patented Deep Web Hdng Technologies and Deep Analytics Sdutions to the commercial community.

Bright Planetm Pioneerrink m - w - ' ,""

v Deep Web

Deep Web Content Silos

BrightPknet harvests )opic-spedfic content R Deep Web Silos for research, +, monitoring and tracking.

Visit www.brightplanet.com and learn more about:

Deep Web Silos - Find, harvest and normalize "unknown and hidden", unstructured content from the Deep Web for research and analytics.

Deep Web Monitor - Target, Monitor, Track & get Alerts from targeted web sites, blogs, tweets, reports, newsfeeds and more.

open~lanet@ Dashboard - Employ 3rd party technology options from the OpenRanet Platform for enriched visualization, entity extraction, storage, etc., to create full scope analytic solutions.

~ e e ~ ~ a t v e s t e r ~ Workbench - Integrate our standalone, lightweight DeepHarvester Workbench behind your firewall as part of an enterprise solution for high security harvesting.

Page 2: Bright Planetm - WikiLeaksBrightplanet' A - in Harvesting h.-I!:' Access Data Haweskd from the Deep Web for Research and Analysis. Deep Web Silos are repositories for topic-specific

I License a Workbench for Hands-on Control of your Deep Web ~awestini. BrightPlanet's ~ e e ~ ~ a r v e s t e r ~ Workbench provides the most comprehensive and content normalization system on the market today-at the scale of the internet. I

BrightPlanet offers the DeepHarvester Workbench for those who need to be in control of their own harvesting needs within their own infrastructure.

The Workbench can be used as a standalone lightweight Web user interface or can be tightly integrated into a custom or enterprise solution through the open~lanet@ Dashboard.

7 Deep Web

- I

I Proprietary Silos

For added security, Eccnse the Workbench for use behind your firmmu.

BrightPlanet has developed a patented, heuristic, rule-based expert system for automatically communicating with Deep Web sources that does not require one-off scripts to be built by hand. The DeepHawester Workbench is wrapped around Brightplanet's flagship DeepHarvester Platform, which has been developed and refined over the past 10 years. The DeepHarvester features include:

+ Harvesting from sites that require the use of a query + Harvesting inline content options: + Integrating with internally Deep Web sources images, CSS and JavaScript files + Leveraging existing surface web search engines + Supporting proxy servers: + Harvesting using traditional crawl or surface web techniques anonymization through 3rd party solutions + Harvesting links through RSS feeds + Providing a multi-thread harvest engine

+ Supporting standard crawler features: built on a distributed platform

Optionally honoring Robots.txt rules + Accessing the openplanet@ Platform:

Customizable user-agent tags custom normalization, andytics and storage

Timeout and redirect limit settings + Working with Brightplanet's Deep Web

Support for session cookies Source Repository

+ Integrating with internal Deep Web soarces + Harvesting and profile management:

+ Scripting custom source option Java, Web Services or RMI API

BrighPhnet has provided thee-year licenses ofthe ~cep~arves ter@ to US. Government agencies, &allowing them to harvest behind their$rewaILr. While BnjghtPhnct highly recommmdr using its Content Navigators, our experienced personnel, to navigate and harvest the Deep Web as well as optimia Deep Web Content Silos, i t is nmu phased to make the DeepHarvester Workbench option avaiLble to the commercial market.

Page 3: Bright Planetm - WikiLeaksBrightplanet' A - in Harvesting h.-I!:' Access Data Haweskd from the Deep Web for Research and Analysis. Deep Web Silos are repositories for topic-specific

Brightplanet' A - in Harvesting h. -I!:'

Access Data Haweskd from the Deep Web for Research and Analysis. Deep Web Silos are repositories for topic-specific content harvested horn the Deep Web, available for either lease or purchase for Deep analytics. Each Silo is organized to hold documents relevant to a specific -

topic area and all unstructured content is normalized, tagged (enriched) and or@ for Deep Web analytics. Using the Brightplanet developed standard portal interface, you can dccm volumes of new datafir research purposes or integration with your existing analytic technology solutions. Unlike a traditional search engine that undermines its results with links merely based on

Deep Web Content Sil

popularity, each Deep Web Silo is filled with high-quality ~ r i ~ h ~ h ~ ~ ~ ham* topicspecific to web silos content from topic-specific Deep Web sources that lie for research, analysis, monitoring and tracking.

beyond the reach of traditional search engines.

Content Navigators BrightPlanet's experienced Content Navigators can create, manage, and maintain Deep Web Silos as a service, thereby leveraging for its customers the complexity of harvesting, normalizing, enriching and qualifying content. Working with your subject matter experts, BrightManet's Content Navigators will identify and configure the necessary Deep Web sources, construct all harvest profiles, schedule automated harvests, and tune analytic enrichment technologies to your specific project needs. Our refined process makes the best quality content readily available for your analysts and analytic technologies as our Brightplanet's Content Navigators move quickly to stand up a custom Deep Web Silo specific to your business needs. Afrer creating Deep Web Silos, BrightPlanet offers two options for accessing enriched content:

l) lease of Silo access, or 2) purchase of Silo as a proprietary resource

Brightplanet will host content for all Deep Web Silos or install the Silo at the customer's facility.

Multiple Uses Deep Web Silo usages vary by customer. Some customers may want a simple enterprise search interface to easily leverage harvested Web content. Some may choose to integrate the Silo with proprietary content to offer an even richer analytic opportunity. Still others may need hrther enriched content to rune or train analytic technologies on a topic-specific training set. No matter what the use, BrightPlanet will build the right Deep Web Silo for your content requirements.

flexible, Scalable Technology Deep Web Silos can be accessed using industry standard protocols, making them easy to integrate with existing content or custom solutions. Common solutions include a standard enterprise search portal interface, JDBC access, Apache Solr interface, direct access through MySQL Schema, MarkLogic@, Saffron Sierram, and even standard XML.

For many customers, the standard enterprise search portal interface will offer enough flexibility to begin leveraging the value of Deep Web content without extensive integration or IT costs. Our standard search enterprise portal interface indudes keyword search, faceted searching, clustering and various optional modules through a simple-to-use, Google-like interface. Situations that require a more robust storage, enriched visualization, analytics, or non-standard protocols can always be implemented through options within Brightplanet's ~ ~ e n ~ & n e t @ Dasbboardsolution.

Page 4: Bright Planetm - WikiLeaksBrightplanet' A - in Harvesting h.-I!:' Access Data Haweskd from the Deep Web for Research and Analysis. Deep Web Silos are repositories for topic-specific

t DEEP WEB-

PO 'OR

e$ b4d

Bright Planet' ' Pioneers in Harvehg -'-L =m

The Deep Web Monitor is a user-driven solution that allows you to set up your own monitoring harvests through a simple-to-use Web interface. If your research requires constant attention, the Deep Web Monitor will harvest content from sources YOU target. You'll always be assured of updates to the latest information from websites, message boards, blogs, tweets, reports, and news feeds from wherever it is on the Web.

Deep Web Monitor will run your profiles on an automated schedule to make sure you have constant access to new information for deep intelligence, competitive analysis, intellectual property, reputation management, company activities, people profiles, organizational intrigue and the latest trends. Set up your own Web monitoring

throughasimpleuscrinkrface.

Target Organizations or Companies - Stay on top of any company, organization, or group - what they're doing, what they're saying and what's being said about them. Know the biases of all the players by deploying additional analytics to detect their sentiments, relationships, and associations.

Target Products, Technologies or Competitive Analysis - Using product names, models or other unique characteristics to scrutinize their trend-lines, market share, enhancements, R&D and pitfalls. Additional analytics are available to help identify the sentiments or relationships of the relevant sources.

Target Intellectual Property Research - The Monitor can be a sentinel for monitoring the competition, product and technology updates, patent applications, invalidity reports, prior art, inventors, news, blogs, third party threats, trade rags, and more.

K F Y B F N F F I T S

Set-it-and-forget-it scheduling Harvest from specific sites of interest Focus on mission-critical business topics View content trends over time Filter only new or modified content Display statistical tracking reports for changes over time

Page 5: Bright Planetm - WikiLeaksBrightplanet' A - in Harvesting h.-I!:' Access Data Haweskd from the Deep Web for Research and Analysis. Deep Web Silos are repositories for topic-specific

Harvesting the Deep Web for Analytics

Deep Web

L

Beyond the Surface Any Content Type Any Source Any Language Big-Data

Tag, Tune, Normalize & Enrich "Unstructured"

Content to "Semi-Structured" for

Analytics _r

Faceted Search Topic Clustering Semantic Categorization Link Analysis (3d Party Options)