enhancing enterprise knowledge processes via cross-media

8
Enhancing Enterprise Knowledge Processes via Cross-Media Extraction Jos ´ e Iria The University of Sheffield 211 Portobello Street Sheffield, S1 4DP, UK j.iria@sheffield.ac.uk Victoria Uren Knowledge Media Institute The Open University Milton Keynes, MK7 6AA,UK [email protected] ABSTRACT In large organizations the resources needed to solve chal- lenging problems are typically dispersed over systems within and beyond the organization, and also in differ- ent media. However, there is still the need, in knowl- edge environments, for extraction methods able to com- bine evidence for a fact from across different media. In many cases the whole is more than the sum of its parts: only when considering the different media simultane- ously can enough evidence be obtained to derive facts otherwise inaccessible to the knowledge worker via tra- ditional methods that work on each single medium sep- arately. In this paper, we present a cross-media knowl- edge extraction framework specifically designed to han- dle large volumes of documents composed of three types of media – text, images and raw data – and to exploit the evidence across the media. Our goal is to improve the quality and depth of automatically extracted knowl- edge. Categories and Subject Descriptors H.2.4 [Systems]: Multimedia databases; H.3.3 [Infor- mation Search and Retrieval]; I.2.1 [Applications and Expert Systems]: Office automation. General Terms Algorithms, Design, Human-Factors Keywords Cross-media knowledge extraction, Large-scale datasets, Industrial applications 1. INTRODUCTION In large organizations the resources needed to solve chal- lenging problems are typically dispersed over systems Copyright ACM ...$5.00 within and beyond the organization, and also in differ- ent media. For example, to diagnose the cause of failure of a component, engineers may need to gather together images of similar components, the reports that summa- rize past solutions, raw data obtained from experiments on the materials, and so on. The effort required to gather, analyze and share this information is consider- able. In the X-Media project 1 we are investigating the potential of rich semantic metadata as a lingua franca to connect up dispersed resources across media and sup- port knowledge reuse and sharing. Automatic capture of semantic metadata is an economic imperative for the widespread deployment of such sys- tems and is already available for single medium scenar- ios: named entity recognition and information extrac- tion for text, scene analysis and object recognition for images, and pattern detection and time series meth- ods for raw data. However, there is still the need, in knowledge environments, for extraction methods able to combine evidence for a fact from across different media. In many cases, as we will exemplify in this paper, the whole is more than the sum of its parts: only when con- sidering the different media simultaneously can enough evidence be obtained to derive facts otherwise inacces- sible to the knowledge worker via traditional methods that work on each single medium separately. Our goal is to improve the quality and depth of the extracted knowledge while providing users with joined-up views over dispersed resources. In this paper, we first motivate the cross-media extrac- tion problem by presenting two real world use cases, one for fault diagnosis in aero-engines and the other for competitor analysis in car manufacture. We then sum- marize the requirements posed by the use cases on the design of knowledge extraction systems. The core of the paper follows, which describes a framework specifically designed to perform cross-media extraction on a large scale. We finish with a brief revision of related work, followed by the conclusions and future work. 1 http://www.x-media-project.org

Upload: others

Post on 13-Apr-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enhancing Enterprise Knowledge Processes via Cross-Media

Enhancing Enterprise Knowledge Processes viaCross-Media Extraction

Jose IriaThe University of Sheffield

211 Portobello StreetSheffield, S1 4DP, UK

[email protected]

Victoria UrenKnowledge Media Institute

The Open UniversityMilton Keynes, MK7 6AA,UK

[email protected]

ABSTRACTIn large organizations the resources needed to solve chal-lenging problems are typically dispersed over systemswithin and beyond the organization, and also in differ-ent media. However, there is still the need, in knowl-edge environments, for extraction methods able to com-bine evidence for a fact from across different media. Inmany cases the whole is more than the sum of its parts:only when considering the different media simultane-ously can enough evidence be obtained to derive factsotherwise inaccessible to the knowledge worker via tra-ditional methods that work on each single medium sep-arately. In this paper, we present a cross-media knowl-edge extraction framework specifically designed to han-dle large volumes of documents composed of three typesof media – text, images and raw data – and to exploitthe evidence across the media. Our goal is to improvethe quality and depth of automatically extracted knowl-edge.

Categories and Subject DescriptorsH.2.4 [Systems]: Multimedia databases; H.3.3 [Infor-mation Search and Retrieval]; I.2.1 [Applicationsand Expert Systems]: Office automation.

General TermsAlgorithms, Design, Human-Factors

KeywordsCross-media knowledge extraction, Large-scale datasets,Industrial applications

1. INTRODUCTIONIn large organizations the resources needed to solve chal-lenging problems are typically dispersed over systems

Copyright ACM ...$5.00

within and beyond the organization, and also in differ-ent media. For example, to diagnose the cause of failureof a component, engineers may need to gather togetherimages of similar components, the reports that summa-rize past solutions, raw data obtained from experimentson the materials, and so on. The effort required togather, analyze and share this information is consider-able. In the X-Media project1 we are investigating thepotential of rich semantic metadata as a lingua francato connect up dispersed resources across media and sup-port knowledge reuse and sharing.

Automatic capture of semantic metadata is an economicimperative for the widespread deployment of such sys-tems and is already available for single medium scenar-ios: named entity recognition and information extrac-tion for text, scene analysis and object recognition forimages, and pattern detection and time series meth-ods for raw data. However, there is still the need, inknowledge environments, for extraction methods able tocombine evidence for a fact from across different media.In many cases, as we will exemplify in this paper, thewhole is more than the sum of its parts: only when con-sidering the different media simultaneously can enoughevidence be obtained to derive facts otherwise inacces-sible to the knowledge worker via traditional methodsthat work on each single medium separately. Our goalis to improve the quality and depth of the extractedknowledge while providing users with joined-up viewsover dispersed resources.

In this paper, we first motivate the cross-media extrac-tion problem by presenting two real world use cases,one for fault diagnosis in aero-engines and the other forcompetitor analysis in car manufacture. We then sum-marize the requirements posed by the use cases on thedesign of knowledge extraction systems. The core of thepaper follows, which describes a framework specificallydesigned to perform cross-media extraction on a largescale. We finish with a brief revision of related work,followed by the conclusions and future work.

1http://www.x-media-project.org

Page 2: Enhancing Enterprise Knowledge Processes via Cross-Media

2. MOTIVATIONKnowledge workers face four key challenges. The firstis to gather the knowledge relevant to a task or prob-lem, which may be dispersed across different storagesystems and different media. The second is to analyzethe knowledge they have gathered and make sense ofit. The third is to share the knowledge with their col-leagues. These three challenges are contextualized byknowledge workers’ tasks and the processes they followto accomplish them. Keeping track of the process, bybeing aware of what one is doing, what one needs todo next, and what others are doing, is the fourth chal-lenge. What to search for, what analysis is needed andwho to share with, all depend on the task in hand andthe current stage of the process.

The X-Media project is designing and implementinginnovative knowledge extraction systems to tackle thefirst challenge, and knowledge sharing and reuse tools totackle the remaining challenges (but the latter fall out-side the scope of this paper). We have gathered userrequirements at our industrial partners’ sites using auser centred design process [15]. The following subsec-tions ground our vision and motivate the requirementsfor the knowledge extraction systems being developedby presenting selected aspects of two use cases: prob-lem resolution for aero-engines and competitors scenarioforecast for car manufacture. Such systems are capableof extracting information from compound documents2containing text, images and raw data (usually in theform of tabular data with numeric fields).

2.1 Rolls-Royce: Problem ResolutionThis use case, defined in cooperation with Rolls-Royceplc (RR), deals with collaborative information retrievaland analysis to determine the root cause of problemsdiscovered during routine maintenance of aircraft en-gines. This is a very important process for the company,as it helps in understanding the real cause(s) behind in-service or maintenance events of an aircraft, contribut-ing to the more general goal of improving engine designand minimizing disruptions to the fleet.

A very simplified description of the process ProblemResolution is as follows. Currently, the process involvesthe work of a team of specialized engineers, who arerecruited according to their experience in dealing withsimilar problems in the past. They first manually searchand collect as much information relevant to the problemas possible. Subsequently, they formulate hypothesesabout potential root causes, some of which are selectedfor verification. The process cycles until a satisfactoryexplanation is found. Its duration naturally depends onseveral circumstances, but it can be extremely costly forharder problems.

2We refer to any document that contains mixed media typesas “compound document”.

X-Media aims to provide end-user systems that monitorand support the process, so as to maximize efficiencyand cooperation between the team members. One ofthe enhancements to the current process consists in au-tomating the extraction of knowledge from various dis-tributed sources and media, so as to make that knowl-edge available for the team to search and browse ina more efficient way. A vast repository of “dormant”knowledge is to be found in the form of large amountsof documents on the intranet, such as technical reportsand event reports about a given engine. These docu-ments consist of a mix of inter-related text, images ofcomponents, and raw data from lab experiments, whichpresent complementary information about the engine inquestion. We aim to automate the extraction of knowl-edge from this repository to enable on-demand retrievalof knowledge about similar problems from the past.

The documents dealt with by the team during the pro-cess also mix text, images and tabular data to conveythe message intended by the author of the document.For example, emails exchanged between team membersvery often contain a textual description of a problem ornew finding together with images and/or tabular dataattached to better illustrate what is said in the emailbody. Another common example consists of presenta-tion slides showing text and images which, again, con-tain complementary information.

2.2 FIAT: Competitors Scenario ForecastThis use case, defined in cooperation with Fiat S.p.A(FIAT), concerns forecasting the launch of competi-tors’ models. It comprises collecting information aboutthe features of competitors’ vehicles from various datasources and producing a calendar that illustrates theprospective launches. The information needed to achievethat is to be found scattered throughout the Internet,including in blogs and forums, and covered by interna-tional automotive magazines as well by a long tail ofautomotive national magazines. The collected informa-tion is used in the Set up stage of new FIAT vehicles(the development stage where a first assessment of thefuture vehicle’s features is carried out). This processis of great value to the company because it contributesto keeping vehicle design up to date with the alwaysevolving competitors scenario.

End-user systems are being developed within X-Mediathat are able to track knowledge changes and of beingproactive in supporting knowledge workers during theSet up stage. To enable that, the underlying knowledgeextraction systems are required that to be able to han-dle such rapidly evolving multimedia data sources on alarge scale.

As in the previous use case, documents contain comple-mentary information across the media. Here we give aconcrete example. The compound document illustrated

Page 3: Enhancing Enterprise Knowledge Processes via Cross-Media

Figure 1: Example of a compound document inFiat’s Competitors Scenario Forecast use case.

in Figure 1 is an example of a prototypical documentcollected by knowledge workers in this use case. Thedocument contains photographs of the front part of theinterior of a Toyota Yaris car along with text describingthe depicted car components. End-user systems are be-ing built that support issuing queries over the extractedknowledge, e.g. “find competitor car models with er-gonomic air ducts”. The desired output of such systemsfor this query would be to present Yaris as a potentiallyinteresting model and provide the worker with a set ofimages and text snippets, including the ones in the doc-ument shown. In order to achieve that, knowledge ex-traction systems must gather evidence from across themedia: on the one hand, identification of the car modeldepicted in the images can only be done using the text,which explicitly mentions “Yaris”; on the other hand,identification of some of the car model components suchas air ducts, steering wheel and gear lever can only bedone using the images, since the text only mentionsglove box, tray, pockets, bins and cup-holders. In sec-tion 4 we present a cross-media extraction frameworkthat enables capturing knowledge from documents suchas the one in Figure 1.

3. REQUIREMENTSIn this section we list the requirements for knowledgeextraction systems, identified through the analysis ofthe use cases presented in Section 2. The major re-quirements identified were the ability to exploit evi-dence for a fact across several media, and the ability toperform the extraction on a large scale. It is also worthmentioning a few of the other requirements identified,which complement the aforementioned ones and havealso strong implications in design decisions: the abilityto exploit background knowledge, portability and theability to report uncertainty.

3.1 Ability to Exploit Evidence Across MediaAs illustrated by the examples in the previous section,with the wide availability of devices and software capa-ble of acquiring, generating and presenting multimedia

data, the shape of the information landscape in enter-prises has radically changed – multimedia documentsnow abound, inside which evidence for the knowledgeis not only to be found confined to a single medium,but very often across two or more media. The new re-quirement for knowledge extraction methodologies andsystems is therefore to be able to exploit such evidence,with a real potential to improve the quality and depthof the automatically extracted knowledge and, conse-quently, enhance enterprise knowledge processes.

3.2 Ability to Extract on a Large ScaleLarge companies’ intranets, such as the ones maintainedby FIAT and RR, contain nowadays dozens of millionsof documents and are soon expected to reach hundredsof millions, a dimension comparable to the Internet atthe end of the 90s. Moreover, the increased use of theWorld Wide Web (WWW) as a source of informationhas made the boundary between intra and internet verythin, which dramatically increases the size of the searchspace.

For the purposes of this work, the following aspects areconsidered in what respects large-scale:

• Amount of content

• Domain complexity

• Amount of background knowledge available

The core basic requirement is for knowledge extractionmethods to be able to cope with the large number ofdocuments provided by the use cases. Domain com-plexity here simply refers to the amount of concepts,and relations between those concepts, that define theproblem domain – in practical terms, because we em-ploy domain ontologies to represent a conceptualizationof the domain, complexity of the domain corresponds,for our purposes, to the size of the ontologies. We willdefine what we mean by “background knowledge” inSection 3.3.

3.3 Other RequirementsAbility to Exploit Background Knowledge. Enterpriseenvironments are rich in domain expertise and untappedresources, often overlooked by knowledge extraction sys-tems even when accessible in digital form. Examplesof background knowledge include, in general, media-independent resources such as domain ontologies andpreviously existing knowledge bases. Most importantly,background knowledge also comprises media-specific in-formation. For example, in the Problem Resolutionuse case, text extraction methods can benefit from theuse of external resources such as gazetteers - jet en-gines have typically 300,000 parts whose names can becompiled by simple methods and used during extrac-tion; image analysis methods, for instance, can make

Page 4: Enhancing Enterprise Knowledge Processes via Cross-Media

Figure 2: The architectural view of the proposedknowledge extraction framework.

use of topological descriptors expressing relations be-tween regions of photographs of jet engine parts; andraw data extractors can use information about the dif-ferent frequency ranges of the several engine materi-als/components tested in the lab. Our framework shouldbe able to easily incorporate these and other externaldomain-specific resources.

Portability. To maximize reuse, knowledge manage-ment systems need to be portable across subject do-mains, languages and tasks. For example, Fiat’s com-petitors scenario forecast is a process likely to be re-vised frequently due to the nature of the task. This hasstrong implications on the choice of knowledge represen-tation formalisms and on the selection of and researchon knowledge extraction methods, e.g. adoption of ma-chine learning approaches. Portability is a pervasiverequirement to the work presented here.

Ability to Report Uncertainty. Uncertainty is inherentto the knowledge extraction process. It can arise from

limitations of the models used or from the very natureof the data. For example, in the Fiat use case, since thescenario concerns forecasts and (sometimes) rumours,there is an inherent uncertainty about the knowledgeextracted. As another example, many machine learningclassification methods (such as those discussed in sec-tion 4.2), output predictive models able to report a con-fidence value for a prediction, which constitutes another(distinct) source of uncertainty. Thus, our frameworkshould be able to report uncertainty in the extractedknowledge.

4. KNOWLEDGE EXTRACTION FRAME-WORK

In X-Media the analysis and extraction of knowledgefrom documents plays a crucial role in the quality ofthe knowledge made available to the user. We have de-signed a knowledge extraction framework adequate fordomains characterized by media heterogeneity and highvolumes of data. We drew requirements from the usecases presented in Section 2 and, taking into accountthose requirements, sought to identify a way to put to-gether extraction methods and technologies, both exist-ing in the literature as well as our own novel approaches,so as to arrive at a framework capable of satisfying allof them. In designing the framework, we had to con-sider the tradeoff between several opposing forces, themost important of which being extraction accuracy vs.capability of processing high volumes of data, but alsoother tradeoffs such as extraction accuracy vs. need foruser supervision, or portability vs. need for externalresources.

The scope of this section is to describe the architecturaland functional elements of the framework, explainingthe way the framework provides for cross-media andlarge-scale extraction.

4.1 Architectural ViewThe proposed framework consists of three main compo-nents: (i) a multimedia daemon that handles the con-tent related tasks such as dismantling compound docu-ments, enabling fast access of indexed data and makingtransparent to the rest of framework the data formatvariety, (ii) a knowledge extraction processor that op-erates on the output of the media manager with theaim to provide an interpretation of the content seman-tics and (iii) a knowledge base that facilitates the stor-age, retrieval and inference of knowledge. A graphicalrepresentation of the framework that illustrates the in-ner and inter dependencies between the components isdepicted in Figure 2.

The multimedia daemon is the functional componentthat retains direct access to the source content. Its pri-mary role is to fetch and deliver content in the appro-priate format and structure upon request of the otherfunctional components. The implications of large scale

Page 5: Enhancing Enterprise Knowledge Processes via Cross-Media

Figure 3: The functional view of cross-media knowledge extraction.

cross-media extraction on the media manager compo-nent is the need for incorporating a layout aware featureextraction and storage mechanism for documents as wellas an indexing scheme able to handle all different typesof modalities (e.g., text, image and raw data). Thelayout mechanism is vital for applying cross-media ex-traction since it is the point where spatial relations be-tween different modalities are captured and retained ina machine understandable format. On the other hand,indexing is a well known technique to efficiently dealwith large volumes of data.

The knowledge extraction processing component residesat the core of the cross-media framework since it is theplace where the actual content processing and knowl-edge extraction takes place. The functionalities of thiscomponent tends to require considerable resources interms of memory consumption and computational poweras well as bandwith for data exchange. The employmentof low complexity, fast algorithms for single media ex-traction and concept modeling is mandated by the largescale requirement and is the leading force driving thespecification of this component. Section 4.2 gives de-tails on the cross-media extractor subcomponent, andjustifies some of the decisions in light of the large scalerequirement.

The knowledge base component accommodates the knowl-edge repository of the framework and is responsible forstoring and providing access to the extracted and pre-existing background knowledge, respectively. As men-tioned in Section 3.2, the scale resistance of this compo-nent is affected both by the domain complexity and theamount of content. Thus, the large scale requirementhere too mandates strict limitations on the level of ex-pressiveness allowed for the knowledge representationlanguage and the reasoning mechanisms supported bythe knowledge base, and, thus, the output of knowledgeextraction systems. We developed a structural modelthat allows representing the output of the extractionmethods in RDF and OWL, based on the Core Ontol-ogy of Multimedia [3], an ontology that serves as basisfor representing media objects, in particular to describe

decompositions of media objects and to describe mediaannotations. To deal with background knowledge on alarge-scale, we batch-preprocess external resources andproduce RDF triples into the knowledge base. CurrentRDF store technologies can store up to billions of RDFtriples [8], which is suitable for the use cases in X-Media.

As an overall development methodology, we have adopteda strategy similar to KnowItAll [5], where the core sys-tem, no more than a sophisticated web harvester, isgradually extended with more and more complex knowl-edge extraction modules. This allows to better studythe performance issues that arise from working on alarge scale as the development progresses.

4.2 Cross-Media ExtractionTo handle extraction across the media, we have designedthe machine learning-based framework presented in Fig-ure 3. The framework receives as input a multimediadocument (e.g. a failure report) and produces seman-tic annotations with set of inferred concepts. It is di-vided into the following steps: multimedia documentprocessing, integration of single-media and cross-mediainformation, background knowledge.

4.2.1 Multimedia Document ProcessingIt is the task of the Multimedia Document Processingstep to extract single-media elements and their relationsfrom the compound document. Document processingliterature discusses several approaches to extract lay-out information from PDF, HTML and other struc-tured documents, see [9] for an overview. Single-mediaKA algorithms process the content of the correspondingmodality and cross-media KA algorithms process boththe content from the different modalities and the layoutinformation.

Single-Media Features After extracting single-media con-tent from compound documents, features are extractedfrom each single-media element. For image content,MPEG-7 low-level visual features [11] provide a richdescription of the content in terms of colour, shape, tex-ture, and histograms. From text content, we extract not

Page 6: Enhancing Enterprise Knowledge Processes via Cross-Media

only the traditional bag-of-words, but we also performfast entity [7] and relation extraction [6]. Raw data fea-tures are simple statistics such as statistical momentsof data or the explicit detection of certain data patternsknown to have some relevant meaning (e.g. sensor dataindicating a component malfunction).

Cross-Media Features As mentioned, a multimedia doc-ument may contain evidence for a fact to be extractedacross different media. However, it is not straightfor-ward to know which media elements refer to the samefact. Thus, the document layout and extracted cross-references (e.g. captions) can suggest about how eachtext paragraph/segment relate to each image/raw data[2, 4, 14]. Arasu and Garcia-Molina [2], Crescenzi etal. [4] and Rosenfeld et al. [14] approaches are basedon templates that characterize each part of the docu-ment. These templates are either extracted manuallyor semi-automatically. Rosenfeld et al. implemented alearning algorithm to extract information (author, ti-tle, date, etc). They ignored text content and only usefeatures such as fonts, physical positioning and othergraphical characteristics to provide additional contextto the information. X-Media follows an approach sim-ilar to the one proposed by Rosenfeld et al., we ex-tract a set of cross-media features for the types of doc-uments we need to process. These cross-media featuresinclude: layout structure, distance between segments,cross-references, same type of font, font colour, andbackground colour/pattern. All such features can beextracted from PDF or HTML documents, providingthe following steps with essential information about howmedia elements relate to one another.

4.2.2 Feature ProcessingSparse feature data such as text and dense feature datasuch as images have very different characteristics. Incross-media knowledge extraction the high diversity oftypes of data raises the need of pre-processing the datato produce a single common representation for all thedata. The Feature Processing step aims at estimatinga representation that will ease the task of the learn-ing algorithm. We follow Magalhaes and Ruger [10]and process text and image independently with proba-bilistic latent semantic indexing to produce a canonicalrepresentation of both text feature space and image/rawdata feature space. This allows statistical learning algo-rithms to handle different types of data simultaneouslymore easily.

4.2.3 Cross-Media Data ModelsOnce the feature data has been processed, some mod-elling algorithm can be used to create the knowledgemodels of all concepts. Special care must be takenwhen designing the algorithm to model each concept: itmust support high-dimensional data, hundreds of thou-sands of examples, and low computational complexity.Several approaches have addressed similar problems,

Figure 4: Example of dependencies on a proba-bilistic network.

e.g. [10, 16]. The maximum entropy framework de-scribed in [10] fully addresses these issues by using aGaussian prior (or a Laplacian prior) and a well knownquasi-Newton optimization procedure. Wu et al. [16]have also deployed a two step framework that use sev-eral standard feature processing algorithms (e.g. ICA,PCA) and then fuse modalities with a support vectormachine.

4.2.4 Cross-Media Dependencies ModelsThe previous step estimates the model of a single con-cept exclusively by using the concept’s own examples.However, semantic metadata provide information aboutconcepts co-occurrence and how they co-occur acrossdifferent modalities. This type of background knowl-edge describes the semantic structure of the problemthat the cross-media extraction algorithm can exploitto enhance the model of each individual concept. Ap-proaches like the ones proposed by Naphade and Huang[12], Preisach and Schmidt-Thieme [13] and Magalhaesand Ruger [10] can capture the semantic structure ofthe problem and improve accuracy of the systems.

Figure 4 illustrates the flexibility offered by probabilis-tic networks to address the X-Media specific require-ments. Concept A is initially modelled in terms of itsconcept data: only text AT, only visual AV, or text,visual and cross-media features AM. The best represen-tation of concept A is then a combination of all theserepresentations and other dependencies relations withconcepts B and C. These approaches produce a richersemantic representation than a single model for eachconcept, thus allowing to capture knowledge otherwiseinaccessible.

5. RELATED WORKThe technology focus in Knowledge Management hasmoved from simple keyword-based search towards moreadvanced solutions for extraction and sharing of knowl-edge [1]. The focus is still very much on providingmore advanced text-based solutions, though image andvideo is considered by some industry players. There

Page 7: Enhancing Enterprise Knowledge Processes via Cross-Media

are prospects for cross-media extraction and knowledgefusion entering this market.

Recently many projects and Networks of Excellence deal-ing with knowledge extraction and sharing have beensponsored by European funds. Most address the prob-lem of knowledge extraction over a single medium, buta few do address extraction over multimedia data, e.g.,MUMIS 3, Reveal-This 4, MUSCLE 5. However, mostof the research is themed around video retrieval ap-plications, which typically consider video, caption andspeech analysis, differing quite substantially from X-Media’s need to analyze and mine documents compris-ing text, static images and raw data. In fact, X-Media’sknowledge-rich environments such as those presented inSections 2.1 and 2.2 set it apart from other projects inthe area.

6. CONCLUSIONSWe have identified the need for cross-media knowledgeextraction in technical domains. In the Problem Resolu-tion use case, we saw how cross-media extraction couldsupport on-demand retrieval of knowledge about simi-lar problems from the past - a task which at present ishampered by the dispersal of evidence in different me-dia. The Competitors Scenario Forecast use case mo-tivates the need for cross-media extraction from publicresources on the Web, to support the task of producingthe new models’ launch calendar - a task with similarrequirements to the previous one. The major require-ments for knowledge extraction systems to support suchuse cases were identified to be the ability to exploit evi-dence for a fact across several media, and the ability toperform the extraction on a large scale.

The major contribution of this paper is in presenting amachine learning-based cross-media knowledge extrac-tion framework specifically designed to handle large vol-umes of documents composed of text, images and rawdata, with a high level of automation. The frameworkprovides a structured approach in which multimediadocuments are processed for single and cross-media fea-tures such as layout structure; cross-media data modelsare learned from both (after a feature processing step),and dependencies between domain ontology conceptsare captured via a probabilistic network approach.

Future work concerns two different aspects.First, the in-stantiation and evaluation of the proposed cross-mediaknowledge extraction framework in the two use casesdescribed in this paper together with the real users.Second, carrying on research on improving the accuracyvs. scalability characteristics of both our single-mediumand cross-media methods.

3http://www.dcs.shef.ac.uk/nlp/mumis/4http://www.reveal-this.org/5http://www.muscle-noe.org/

7. ACKNOWLEDGMENTSThis work was funded by the X-Media project (www.x-media-project.org) sponsored by the European Com-mission as part of the Information Society Technologies(IST) programme under EC grant number IST-FP6-026978.

8. ADDITIONAL AUTHORSAlberto Lavelli(Fondazione Bruno Kessler, [email protected]),Sebastian Blohm(Universitat Karlsruhe, [email protected]),Aba-sah Dadzie(University of Sheffield, [email protected]),Thomas Franz(Universitat Koblenz-Landau, [email protected]),Joao Magalhaes(University of Sheffield, [email protected]),Spiros Nikolopoulos(Centre for Research & Technology Hellas, [email protected]),Christine Preisach(Uni Hildesheim, [email protected]),Piercarlo Slavazza(Quinary, [email protected]).

9. REFERENCES[1] W. Andrews and R. E. Knox. Magic quadrant for

information access technology. Technical report,Gartner Research (G00131678), October 2005.

[2] A. Arasu and A. H. Garcia-Molina. Extractingstructured data from web pages. In ACMSIGMOD International Conference onManagement of Data, San Diego, California,USA, 2003.

[3] R. Arndt, R. Troncy, S. Staab, and L. Hardman.Adding formal semantics to MPEG-7: Designinga well-founded multimedia ontology for the web.Techical report, Department of Computer Science,Univ. Koblenz-Landau, 2007.

[4] V. Crescenzi, G. Mecca, and P. Merialdo.Roadrunner: Towards automatic data extractionfrom large web sites. In 27th InternationalConference on Very Large Databases (VLDB),2001.

[5] O. Etzioni, M. Cafarella, D. Downey, S. Kok,A. Popescu, T. Shaked, S. Soderland, D. Weld,and A. Yates. Web-scale information extraction inknowitall. In Proceedings of the ThirteenthInternational World Wide Web Conference. ACMPress.

[6] C. Giuliano, A. Lavelli, and L. Romano.Exploiting shallow linguistic information forrelation extraction from biomedical literature. InProceedings of the 11th Conference of the

Page 8: Enhancing Enterprise Knowledge Processes via Cross-Media

European Chapter of the Association forComputational Linguistics, Trento, Italy, 2006.

[7] J. Iria, N. Ireson, and F. Ciravegna. Anexperimental study on boundary classificationalgorithms for information extraction using svm.In Proceedings of the EACL 2006 Workshop onAdaptive Text Extraction and Mining, Trento,Italy, 2006.

[8] A. Kiryakov, B. Popov, I. Terziev, D. Manov, andD. Ognyanoff. Semantic annotation, indexing, andretrieval. Web Semantics: Science, Services andAgents on the World Wide Web, 2(1):49–79,December 2004.

[9] A. Laender, B. Ribeiro-Neto, A. Silva, andJ. Teixeira. A brief survey of web data extractiontools. In SIGMOD Record, volume 31, June 2002.

[10] J. Magalhaes and S. Ruger. Information-theoreticsemantic multimedia indexing. In ACMConference on Image and Video Retrieval(CIVR), Amsterdam, Holland, 2007.

[11] B. S. Manjunath, P. Salembier, and T. Sikora.Introduction to MPEG 7: multimedia contentdescription language. John Wiley & Sons, 2002.

[12] M. R. Naphade and T. S. Huang. A probabilisticframework for semantic video indexing filteringand retrieval. In IEEE Transactions onMultimedia, volume 3, 2001.

[13] C. Preisach and L. Schmidt-Thieme. Relationalensemble classification. pages 499–509, 2006.

[14] B. Rosenfeld, R. Feldman, and J. Aumann.Structural extraction from visual layout ofdocuments. In ACM Conference on Informationand Knowledge Management (CIKM), 2002.

[15] M. Rosson and J. Carroll. Usability Engineering:scenario-based development of HCI.Morgan-Kaufman, 2002.

[16] Y. Wu, E. Y. Chang, K. C.-C. Chang, and J. R.Smith. Optimal multimodal fusion for multimediadata analysis. In MULTIMEDIA ’04: Proceedingsof the 12th annual ACM international conferenceon Multimedia, pages 572–579, New York, NY,USA, 2004. ACM Press.