trend detection and visualization and custom search applications
DESCRIPTION
This seminar deals with Trend detection in numbers and text and its visualization. In the second part, it focuses on Custom Search Application, Apache Solr, Semantic search and Linked data approach.TRANSCRIPT
Trend Detection and Visualizationand
Custom Search ApplicationsSeminar for
PG PUSHPIN
January 12, 2012
Pranav Kadam (6641525)
Universität Paderborn
Overview
• Trend Detection
- Trend Detection in Numbers
- Trend Detection in Text
- Trend Visualization
• Custom Search Applications
- Apache Solr
- Semantic Search
- Linked Data Approach
2Trend Detection and Visualization and Custom Search Applications
Overview
• Prototypes
• Q&A
3Trend Detection and Visualization and Custom Search Applications
Trend Detection
4Trend Detection and Visualization and Custom Search Applications
Trend Detection
What is a trend?
• A general direction in which something is changing
• An inclination
• A pattern of gradual change in a condition over time
• A trend is
- always associated with time
- often described using ‘time series‘
• Long term change in the mean level of a ‘time series‘.
5Trend Detection and Visualization and Custom Search Applications
Trend Detection
Trend Analysis
• Practice of collecting information and trying to detect
trend in it
• Process of identifying pattern in behavior of a time
series by minimising noise
• Useful in forecasting future events
• Science of studying changes in social patterns
E.g. Google Trends, Youtube Trends, trendwatching.com,
Facebook Insights, Tag Cloud(on PG PUSHPIN blog)
6Trend Detection and Visualization and Custom Search Applications
Trend Detection
Trend Detection in Numbers
7Trend Detection and Visualization and Custom Search Applications
Trend Detection in Numbers
Time series and statistical methods
• Time series: ordered sequence of values at equally
spaced time intervals
• Trend detection in numbers: Statistical methods to
interpret time series and determine behavior
• Assumption: pattern in past data can be used to forecast
future data points
• Models: AutoRegressive(AR), Integrated(I), Moving
Average(MA)
8Trend Detection and Visualization and Custom Search Applications
Trend Detection in Numbers
Moving Average
• Average of time series data taken at consecutive periods
• New data in, old data out as the series progresses
E.g. MA of temperature for six months: Temp from January
to June, February to July, March to August, and so on.
• Minimizes temporal fluctuations
• Establishes trend, distinguishes any value above or
below trendline
• Applications in fields of Financial analysis, Trade,
Economics, Mathematics9Trend Detection and Visualization and Custom Search Applications
Trend Detection in Numbers
Moving Average
• Simple Moving Average: Plain average of data points
over specific no. of periods
• Period selected can be short, medium or long according
to interest (E.g. standard periods of SMA for stock
market analysis is 50 days or 200 days)
• Longer the period gives smoother curve but increases
the lag
• SMA always lags behind the latest data point
10Trend Detection and Visualization and Custom Search Applications
Trend Detection in Numbers
Moving Average
• Exponential Moving Average: Weight applied to the data
pointa to reduce the lag
• Weight decreases exponentially and never reaches zero
• EMA has less lag and is more sensitive to the changes in
data points
• SMA vs EMA: Though difference is apparent, either one
cannot be stated as better over the other
MA preference depends on objectives & time horizon
11Trend Detection and Visualization and Custom Search Applications
Trend Detection
Trend Detection in Text
12Trend Detection and Visualization and Custom Search Applications
Trend Detection in Text
Trend detection system
• Emerging Trend: Topic area growing in interest and
utility over time
• Study of emerging trend dependent on automated
process
• TD system processes collection of textual data and
identifies upward(growing), downward(falling) or
sideway(constant) tendency
• TD then highlights the emerging topics in trial period
13Trend Detection and Visualization and Custom Search Applications
Trend Detection in Text
Trend detection system
• Trend detection methods can be classified as:
- Fully-automatic
- Semi-automatic
• Fully-automatic systems:
- It generates a list of emerging topics from the
input(collection of texual data)
- Reviewer examines data & evidence provided to conclude
actual emerging trends
- Results supported with graphical visualization
14Trend Detection and Visualization and Custom Search Applications
Trend Detection in Text
Trend detection system
• Semi-automatic:
- User inputs a topic
- System outputs the evidence that helps to determine that
the topic is emerging or not
- Evidence provided either as a summary or a descriptive
report
15Trend Detection and Visualization and Custom Search Applications
Trend Detection in Text
Useful models, schemes and tools
• Term-Document Matrix
• Scheme: Term Frequency – Inverse Document
Frequency (tf-idf)
• Latent Semantic Analysis
• Science Citation Index or Web of Science database
• Inspec, Compendex database
16Trend Detection and Visualization and Custom Search Applications
Trend Detection in Text
Approches for Trend Detection
1. Tracing a trend via citation linkages:
- Determine a potential trend or select a topic of interest
- Find recent documents on the topic
- Examine whether they really discuss the topic
- Extract keywords
- Fetch abstract of the documents those are frequently
referenced using citation information
- Examine abstract to verify relation with topic
17Trend Detection and Visualization and Custom Search Applications
Trend Detection in Text
Approches for Trend Detection
1. Tracing a trend via citation linkages:
- Examine the references used above and make a subset
where author names are referenced in more than, say, 3
documents
- As an improvement, query the repositories of citation
linkage information and other sources
- Graph document frequency, repeated authors and no. of
venues by year
18Trend Detection and Visualization and Custom Search Applications
Trend Detection in Text
Approches for Trend Detection
1. Tracing a trend via citation linkages:
- Years with overall higher document frequency are likely
to have points where trend is emerging
Finally, to determine trend, apply a series of thresholds
like atleast one repeated author, atleast 10 venues
present, etc.
19Trend Detection and Visualization and Custom Search Applications
Trend Detection in Text
Approches for Trend Detection
2. Using web resources:
- Select a main topic area first
- Knowledge in this area is essential to identify trends in
later stages
- Validate it as a possible research area using sources like
Inspec database
- Search workshop websites and technical papers for
discussions on the main topic area
20Trend Detection and Visualization and Custom Search Applications
Trend Detection in Text
Approches for Trend Detection
2. Using web resources:
- Search web using helper terms like
most recent contribution, hot topic, cutting edge strategy, etc
- Again search an indexing database with
main topic ‘AND‘ newly found candiate trend
from year of origin to current year
21Trend Detection and Visualization and Custom Search Applications
Trend Detection in Text
Approches for Trend Detection
2. Using web resources:
If document frequency increases over the years, the
candidate trend is a genuine trend
x If documents from same author appear in different years
its not a trend
22Trend Detection and Visualization and Custom Search Applications
Trend Detection
Trend Visualization
23Trend Detection and Visualization and Custom Search Applications
Trend Visualization
Trend visualization techniques
• Trends can be visualized using
- Line graphs
- Bar graphs
- Word clouds
- Frequency tables
- Sparklines
- Histograms
24Trend Detection and Visualization and Custom Search Applications
Trend Visualization
Other ways to visualize trends
• ThemeRiver
- Visualizes thematic variations over time
- Changing widths depict changes in thematic strength of
the associated documents
- Flow represents time
- Colors represent themes
- Vertical section represents an ordered time slice
25Trend Detection and Visualization and Custom Search Applications
Trend Visualization
Other ways to visualize trends
• ThemeRiver
26Trend Detection and Visualization and Custom Search Applications
Trend Visualization
Other ways to visualize trends
• ThemeRiver
- Assigning same color group to related themes simplify its
tracking
27Trend Detection and Visualization and Custom Search Applications
Trend Visualization
Other ways to visualize trends
• SparkClouds
- SparkClouds= Sparklines + Tag Clouds
- Sparkline, characterized by small size and high data density,
visualize trends and variations in a simple condensed way
28Trend Detection and Visualization and Custom Search Applications
Trend Visualization
Other ways to visualize trends
• SparkClouds
- Tag clouds are text based
visualizations showing
frequency, popularity or
importance of words
29Trend Detection and Visualization and Custom Search Applications
Trend Visualization
Other ways to visualize trends
• SparkClouds
- Sparklines are added to tag clouds to represent trend across
series of tag clouds
- Overview of trends provided in limited space
- Its compact and aesthetic
30Trend Detection and Visualization and Custom Search Applications
Custom Search Applications
31Trend Detection and Visualization and Custom Search Applications
Custom Search Application
Apache Solr
• Open source search platform from Apache Lucene
project
• Provides full text search, faceted search, dynamic
clustering, database integration, rich document handling,
geo-spatial search
• High scalability, distributed search
• The core of search and navigation engine of some of the
world‘s largest internet sites
32Trend Detection and Visualization and Custom Search Applications
Custom Search Application
Apache Solr
• Written in Java, runs as a standalone search server
within a servlet container like Jetty or Tomcat
• REST-like API eases its use with any prog. language
• Input: XML, JSON or binary over HTTP(GET)
• Output: XML, JSON or binary
• Highly customizable
33Trend Detection and Visualization and Custom Search Applications
Custom Search Application
Apache Solr
• Operations:
- Indexing data
- Updating data
- Deleting data
- Querying data
- Sorting
- Higlighting
- Faceted search
34Trend Detection and Visualization and Custom Search Applications
Custom Search Application
Semantic Web
• An extension to current Web
• Information is given well-defined meaning
• Goes beyond media objects to link people, places, events,
organizations, etc.
• Resources connected by multiple relations
• Data modeled using directed labeled graph
• Based on W3C‘s RDF, it does quering and exchanging
instance data in RDF using SOAP
35Trend Detection and Visualization and Custom Search Applications
Custom Search Application
Semantic Web
36Trend Detection and Visualization and Custom Search Applications
type
type
type
co-founder
co-founder
birthplace
typelocated in
died on
born on
temp
CitySan Francisco
Steve Jobs
Businessman
February 24, 1955
October 5, 2011
Pixar
Apple Inc.
USA
Company
9°C
Custom Search Application
Semantic Search
• Context-based search results
• Can possibly enhance, but cannot replace the traditional
navigational search
• Disambiguation
• Data divided as ontological data and instance data
• Determines meaning of every word and establishing a
context between them to achieve coherence for a
sentence
37Trend Detection and Visualization and Custom Search Applications
Custom Search Application
Semantic Search
• Search Methodologies:
- RDF Path Traversal
- Keyword Concept Mapping
- Graph Patterns
- Logics
- Fuzzy Concepts, Fuzzy Relations, Fuzzy Logics
• Examples
- Hakia, SenseBot, DeepDyve
38Trend Detection and Visualization and Custom Search Applications
Custom Search Application
Linked Data Approach
• Linked data: method of publishing structured data that
can be interlinked
• Based on HTTP and URIs, extended to be read by
computers
• Components:
- URIs
- HTTP
- RDF
- Serialization formats (RDFa, RDF/XML, N3)
39Trend Detection and Visualization and Custom Search Applications
Custom Search Application
Linked Data Approach
• KiWi – a Linked Media Framework
• Easy to setup server application bundling Semantic Web
technologies
• Consists of LMF core and LMF modules
40Trend Detection and Visualization and Custom Search Applications
Custom Search Application
Linked Data Approach
• KiWi LMF core:
- Use URIs as names for things.
- Use HTTP URIs, so that people can look up those names.
- When someone looks up a URI, provide useful information,
using the standards (RDF, SPARQL).
- Include links to other URIs, so that they can discover more
things.
41Trend Detection and Visualization and Custom Search Applications
Custom Search Application
Linked Data Approach
• KiWi LMF module:
- LMF Semantic Search(highly configurable Semantic Search
service based on Apache SOLR)
- LMF Linked Data Cache (implements a cache to the Linked
Data Cloud)
- LMF Reasoner (implements a rule-based reasoner that
allows to process Datalog-style rules over RDF triples)
42Trend Detection and Visualization and Custom Search Applications
Prototypes
43Trend Detection and Visualization and Custom Search Applications
Questions and Answers
44Trend Detection and Visualization and Custom Search Applications
Thank you!
45Trend Detection and Visualization and Custom Search Applications