combining gate and uimauima and gate •in gate, unit of processing is the document!text, plus...

26
Combining GATE and UIMA Ian Roberts

Upload: others

Post on 21-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object

Combining GATE and UIMA

Ian Roberts

Page 2: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object

University of Sheffield NLP

Overview

• Introduction to UIMA

• Comparison with GATE

• Mapping annotations between GATE and

UIMA

• Examples and demo

Page 3: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object

University of Sheffield NLP

What is UIMA?

• Language processing framework developed by IBM

• Similar document processing pipeline architecture to GATE

• Concentrates on performance and scalability

• Supports components written in different programming

languages (currently Java and C++)

• Native support for distributed processing via web services

Page 4: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object

University of Sheffield NLP

UIMA Terminology

• Processing tasks in UIMA are encapsulated inAnalysis Engines (AEs)

• Text-specific processing by Text Analysis Engines(TAEs)

• In UIMA, AEs can be primitive (~ a single PR inGATE terms), or aggregate (~ a GATE controller).! Aggregate AE can include other primitive or aggregate AEs

• GATE includes interoperability layer to run! GATE controller as a primitive TAE in UIMA

! UIMA TAE (primitive or aggregate) as a GATE PR

Page 5: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object

University of Sheffield NLP

UIMA and GATE

• In GATE, unit of processing is the Document

! Text, plus features, plus annotations

! Annotations can have arbitrary features, with anyJava object as value

• In UIMA, unit of processing is CAS (commonanalysis structure)

! Text, plus Feature Structures

! Annotations are just a special kind of FS, whichincludes start and end offset features

Page 6: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object

University of Sheffield NLP

Key Differences

• In GATE, annotations can have any features, withany values

• In UIMA, feature structures are strongly typed! Must declare what types of annotations are supported by

each analysis engine

! Must specify what features each annotation type supports

! Must specify what type feature values may take• Primitive types - string, integer, float

• Reference types - reference to another FS in the CAS

• Arrays of the above

! All defined in XML descriptor for the AE

Page 7: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object

University of Sheffield NLP

Integrating GATE and UIMA

• So the problem is to map between the loosely-

typed GATE world and the strongly-typed

UIMA world

• Best explained by example…

Page 8: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object

University of Sheffield NLP

Example 1

• Simple UIMA annotator that annotates each

instance of the word “Goldfish” in a document.

• Does not need any input annotations

• Produces output annotations of typegate.example.Goldfish

Page 9: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 10: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 11: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 12: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object

University of Sheffield NLP

Example 2

• We may want to copy annotations, as well astext, from the original GATE document.

• Consider a UIMA annotator that! takes gate.example.Sentence annotations as

input

! annotates “Goldfish” as before

! also adds a feature GoldfishCount to eachSentence giving the number of goldfishannotations in that sentence

Page 13: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 14: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 15: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 16: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 17: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 18: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 19: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 20: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 21: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 22: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 23: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 24: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 25: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object
Page 26: Combining GATE and UIMAUIMA and GATE •In GATE, unit of processing is the Document!Text, plus features, plus annotations!Annotations can have arbitrary features, with any Java object