1 peter fox xinformatics 4400/6400 week 11, april 15, 2014 unstructured information, information...
TRANSCRIPT
![Page 1: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/1.jpg)
1
Peter Fox
Xinformatics 4400/6400
Week 11, April 15, 2014
Unstructured Information, Information Audit / Workflow
and Discovery
![Page 2: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/2.jpg)
Contents
• Information Audit
• Unstructured Information
2
![Page 3: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/3.jpg)
Businessdictionary.com
• Analysis and evaluation of a firm's information system (whether manual or computerized) to detect and rectify blockages, duplication, and leakage of information.
3
![Page 4: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/4.jpg)
Objective?• The objectives of this audit
are to improve accuracy, relevance, security, and timeliness of the recorded information.
4
![Page 5: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/5.jpg)
What is an information audit?
• An information audit is a process that effectively determines the current information environment within an organization by identifying and mapping:– What information is currently available?
– Where the information lives?
5
![Page 6: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/6.jpg)
Results/ format (e.g.)
• The results of an information audit are twofold: there is a detailed report which includes:– What information do staff acquire? Where
from? At what cost? How is it used?
– What information do staff create? What happens to it? Where does it go?
6
![Page 7: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/7.jpg)
Results/ format (e.g.)– What information is stored and why? What
purpose will it serve?
– What information is passed on or delivered? To whom? For what purpose? In what form?
7
![Page 8: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/8.jpg)
Results/ format (e.g.)– Is there a gap, or a match,
between that which is available and that which is needed?
– What are the skills and responsibilities of the people who carry out these tasks?
– What equipment and tools do they have available (hardware, software, filing cabinets, web sites, etc)?
8
![Page 9: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/9.jpg)
Results/ format (e.g.)– Are there any control documents, such as policy
statements, guidelines, service level agreements, procedures, manuals?
– Is any of the information (produced, acquired, processed, re-delivered, or stored) superfluous to needs?
– Are any of the information-handling activities non-productive?
9
![Page 10: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/10.jpg)
Results/ format (e.g.)• There is also a detailed flow chart:
– A visual map that show the areas, processes, functions and activities through which information passes, clarifying gaps or fault-lines that need to be plugged or bottlenecks and overflows that need to be unblocked
• Sound familiar?
10
![Page 11: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/11.jpg)
How to use?• An information audit can be used as a
baseline for making major improvements to the business process of an organization.
• It is extremely helpful in the identifying, buying, and implementation of enterprise systems– finance systems, portfolio management systems,
document management systems, learning and knowledge management systems, etc.
11
![Page 12: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/12.jpg)
Developed for NASA TIWG
Remember the use case doc?
![Page 13: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/13.jpg)
Developed for NASA TIWG
Event/application
![Page 14: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/14.jpg)
Remember• It never hurts to know what you have
• Build it into the routine and do not leave it as an after-thought (yep, just like documenting your code!)
14
![Page 15: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/15.jpg)
15
![Page 16: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/16.jpg)
16
Sources and uses of unstructured information
- audio, video, graphics, social media messages, etc. – that which fall outside the purview of traditional databases
![Page 17: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/17.jpg)
Data<->Information<->Knowledge• Where is the structure?
17
Data Information Knowledge
Context
PresentationOrganization
IntegrationConversation
CreationGathering
Experience
![Page 18: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/18.jpg)
Informatics• Oh, wait – people structure information!
• Cognitive processes
– Semiotics– Mental representation– Intuition– Expertise
• But not in the same way computers can! 18
![Page 19: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/19.jpg)
19
![Page 20: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/20.jpg)
So what happens?• If a structured representation of
fundamentally unstructured information is useless?– Why would it be?
• What role does visual representation play in structuring information? Hint:
20
![Page 21: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/21.jpg)
More than 10 years ago…• Unstructured Information Management Architecture
(UIMA) from IBM– “Unstructured information management (UIM) applications are software
systems that analyze unstructured information (text, audio, video, images, and so on) to discover, organize, and deliver relevant knowledge to the user. In analyzing unstructured information, UIM applications make use of a variety of analysis technologies, including statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, and ontologies.
– IBM's Unstructured Information Management Architecture (UIMA) is an architectural and software framework that supports creation, discovery, composition, and deployment of a broad range of analysis capabilities and the linking of them to structured information services, such as databases or search engines.
– The UIMA framework provides a run-time environment in which developers can plug in and run their UIMA component implementations, along with other independently-developed components, and with which they can build and deploy UIM applications.”
21
![Page 22: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/22.jpg)
From way back…
22
![Page 23: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/23.jpg)
23
![Page 24: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/24.jpg)
Data<->Information<->Knowledge• Future?
24
Data Information Knowledge
Context
PresentationOrganization
IntegrationConversation
CreationGathering
Experience
![Page 25: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/25.jpg)
Reading for this week• http://en.wikipedia.org/wiki/Information_audit
• http://www.librijournal.org/pdf/2003-1pp23-38.pdf
• UIMA - http://www.ibm.com/developerworks/data/downloads/uima/
• SPAR - http://tw.rpi.edu/web/inside/ideas/SPAREvaluation
25
![Page 26: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/26.jpg)
Logical Collections• The primary goal of a Management system is to
abstract the physical collection into logical collections. The resulting view is a uniform homogeneous collection.
• Note the analogy with logical models and information integration: so EARLY ON
– Identifying naming conventions and organization– Aligning cataloguing and naming to facilitate search,
access, use (who uses?)– Provision of **contextual** information
26
![Page 27: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/27.jpg)
Physical Handling• Map between physical and logical. • Where and who does it come from?– Is there a transfer into a physical form?– Is it backed-up, archived, cached? …– What formats?– Naming conventions – do they change?
• Note analogy to physical models
27
![Page 28: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/28.jpg)
Interoperability Support
28
![Page 29: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/29.jpg)
Security• Access authorization and change verification. This
is the basis of trusting your information.
29
![Page 30: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/30.jpg)
Ownership• Who is responsible for quality and meaning
30
![Page 31: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/31.jpg)
Metadata• Recall metadata are data about data.
• Metainformation?
31
![Page 32: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/32.jpg)
Persistence• Deployment of mechanisms to counteract
technology obsolescence.
32
![Page 33: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/33.jpg)
Discovery• Ability to identify useful relations and
information inside the collection
• More on this later in this class33
![Page 34: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/34.jpg)
Dissemination
34
• Mechanisms to make aware the interested parties of changes and additions to the collections.
• Do you rely on information retrieval? The Web?
![Page 35: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/35.jpg)
Summary of Information Management• Creation of logical collections
• Physical handling
• Interoperability support
• Security support
• Ownership
• Metadata collection, management and access.
• Persistence
• Knowledge and information discovery
• Dissemination and publication 35
![Page 36: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/36.jpg)
Note for your project writeup!• Information management! Cover the 9 areas.
36
![Page 37: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/37.jpg)
Information Workflow• What is a workflow?
• Why would you use it?
• Key considerations for information, cf. data
• Some pointers to workflow systems
37
![Page 38: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/38.jpg)
38
What is a workflow?• General definition: “series of tasks performed
to produce a final outcome” (taxes?)
• Information workflow – involves people but potentially want to– Automate jobs that a person traditionally
performed manually– Process large volumes of information faster than
one could do by hand
• NB difference from data workflows – it reaches out to encompass the user (e.g. ‘unrecorded actions’)
![Page 39: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/39.jpg)
39
Background: Business Workflows
• Example: planning a trip• Need to perform a series of tasks: book a flight,
reserve a hotel room, arrange for a rental car, etc.
• Each task may depend on outcome of previous task– Days you reserve the hotel depend on days of the
flight– If hotel has shuttle service, may not need to rent a
car
• Prior information, experience, preferences…
![Page 40: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/40.jpg)
Tripit.com?
40
![Page 41: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/41.jpg)
41
What about information workflows?
• Perform a set of transformations/ operations on information source(s)
• Examples– Generating images from raw data– Identifying areas of interest from a large
information source (e.g. word cloud)– Classifying a set of objects– Querying a web service for more information
on a set of objects– Many others…
![Page 42: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/42.jpg)
42
More on Workflows
• Can process many information types:– Archives– Web pages– Streaming/ real time– Images – Semiotic systems
• Robust workflows depending on formal (concept and logical) models of the flow of information among components
• May be simple and linear or very complex
![Page 43: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/43.jpg)
43
Challenges • Questions:
– What are some challenges for users in implementing workflows?
– What are some challenges to executing these workflows?
– What are limitations of writing a program?
• Mastering a programming language
• Visualizing workflow
• Sharing/exchanging workflow
• Formatting issues
• Locating datasets, services, or functions
![Page 44: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/44.jpg)
44
Workflow Management Systems
![Page 45: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/45.jpg)
45
Benefits of Workflows
• Documentation of aspects of analysis
• Visual communication of analytical steps
• Ease of testing/debugging• Reproducibility• Reuse of part or all of workflow in
a different project
![Page 46: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/46.jpg)
46
Additional Benefits
• Integration of and between multiple computing environments
• ‘Automated’ access to distributed resources via other architectural components, e.g. web services and Grid technologies
• System functionality to assist
with information integration of
heterogeneous components and
source
![Page 47: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/47.jpg)
Why not just use a script?• Script does not specify
low-level task scheduling and communication
• May be platform-dependent
• Can’t be easily reused• May not have sufficient
documentation to be adapted for another purpose
47
![Page 48: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/48.jpg)
Why can a GUI be useful?• No need to learn a programming language
• Visual representation of what workflow does
• Allows you to monitor workflow execution
• Enables user interaction (though not necessarily collaboration)
• Facilitates sharing of workflows
48
![Page 49: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/49.jpg)
Some workflow systems• Kepler• SCIRun• Sciflo• Triana• Taverna• Pegasus• Some commercial tools:
– Windows Workflow Foundation– Mac OS X Automator
• http://www.isi.edu/~gil/AAAI08TutorialSlides/5-Survey.pdf • http://www.isi.edu/~gil/AAAI08TutorialSlides/ • See reading for this week
49
![Page 50: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/50.jpg)
Discovery• How does someone find your information?
• How would you provide discovery of – collections – files – ‘bits’
• How would you find ->
50
![Page 51: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/51.jpg)
Discoveryo Search (Federated Search)oHelped by
oFolksonomies (user contributed)o Intelligent AgentsoSearch EnginesoTaxonomies
o Find photos of KimoBoy or girl?
51
![Page 52: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/52.jpg)
Use cases• Find a sound recording of a swallow.
• Excuse me?
52
![Page 53: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/53.jpg)
Use cases• Find a sound recording of an African Swallow
• Find a sound recording of a bird that sounds like an African Swallow
• Media types – how can you discover them?
53
![Page 54: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/54.jpg)
Use cases• Find the movie that Jean Tripplehorn first
starred in/ that was her most successful/ was lead actress?
• Has anyone gene sequenced a mouse?
• Find images of primary productivity in the North Atlantic
• Discovery can often involve information integration (or is it *almost always*?)
54
![Page 55: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/55.jpg)
55
Three level ‘metadata’ solution for DATA
Level 1:
Data Registration at the Discovery Level,
e.g. Volcanolocation and activity
Level 2:
Data Registration at the Inventory Level,
e.g. list of datasets,times, products
Level 3:
Data Registration at the Item Detail
Level, e.g. access toindividual quantities
Ontology basedData IntegrationUsing scientific
workflows
Earth Sciences Virtual DatabaseA Data Warehouse where
Schema heterogeneity problem is Solved; schema based integration
Data Discovery Data Integration
A.K.Sinha, Virginia Tech, 2006
![Page 56: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/56.jpg)
56
Three level ‘metadata’ solution?
Level 1:
Registration at the Discovery Level,
e.g. Find the upperlevel entry point to a
source
Level 2:
Registration at the Inventory Level,
e.g. list of datasets,using the logical
organization
Level 3:
Registration at the Item Detail
Level, i.e. annotatione.g. tagging
Integrationusing mappingmanagement
Catalog/ IndexSchema based integration
Information Discovery
Information
Integration
A.K.Sinha, Virginia Tech, 2006
![Page 57: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/57.jpg)
Information discovery• What makes discovery work?
– Metadata– Logical organization– Attention to the fact that someone would want to
discover it– It turns out that file types are a key enabler or
inhibitor to discovery– Result ranking using *tuned* algorithm
• What does not work?– Result ranking algorithms that depend on
unconventional information types (icon, index, symbol)
57
![Page 58: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/58.jpg)
Federated search• “is the simultaneous search of multiple online
databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.” wikipedia
• Libraries have been doing this for a long time (Z39.50, ISO23950)
• Key is consistent search metadata fields (keywords)• E.g. Geospatial One Stop http://www.geodata.gov
58
![Page 59: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/59.jpg)
Smart search• Semantically aware search, e.g.
http://noesis.itsc.uah.edu , http://eie.cos.gmu.edu (Water -> Semantic Search)
• Faceted search, e.g. mspace (http://mspace.fm ), exhibit (MIT), S2S (RPI; http://aquarius.tw.rpi.edu/s2s )
59
![Page 60: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/60.jpg)
NOESIS
60
![Page 61: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/61.jpg)
Faceted search
61
logd.tw.rpi.edu
![Page 62: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/62.jpg)
Summary - discovery• Useful to write a few discovery use cases to
drive how your design is developed
• Evolution of your role in facilitating discovery and what/ how others implement access to your information
62
![Page 63: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/63.jpg)
Reading for this week• Is retrospective
63
![Page 64: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/64.jpg)
Check in for Project Assignment
• Analysis of existing information system content and architecture, critique, redesign and prototype redeployment
• Or a new use case, development, etc.
64
![Page 65: 1 Peter Fox Xinformatics 4400/6400 Week 11, April 15, 2014 Unstructured Information, Information Audit / Workflow and Discovery](https://reader033.vdocuments.net/reader033/viewer/2022051517/5697bfa31a28abf838c96b21/html5/thumbnails/65.jpg)
What is next
•Today – project group meetings/ check in
•April 22 – Information Quality, Uncertainty and Bias
•April 29 – course summary (written part of group project due)
•May 6 – final project presentations (BE ON TIME, i.e. 5-10mins BEFORE 9AM)
– Be prepared to be asked (and answer) questions 65