automatic access to legal terminology applying two ... · 4. validation process and results...

14
AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO DIFFERENT ATR METHODS MARÍA JOSÉ MARÍN CAMINO REA

Upload: others

Post on 23-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

AUTOMATICACCESSTOLEGALTERMINOLOGYAPPLYINGTWODIFFERENTATRMETHODS

MARÍAJOSÉMARÍNCAMINOREA

Page 2: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

1.INTRODUCTION-Importance ofidentifying the terms inaspecialisedcorpus.

- Terms are“textualrealisations ofspecialisedconcepts”(Spasic etal.2005).

-They areemployed to communicate amongst specialists (Rea,2008).

- They aremono-referential andhave aunivocal character(Cabré,1993):form content.

-Potential applications ofautomatic term recognition (ATR):building dictionaries andglossaries;machinetranslation;ontology building,etc.

Page 3: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

1.INTRODUCTION- This paper evaluates the efficiency oftwo ATRmethods:Keywords (2008)andChung’s (2003)on a2.6mword legalcorpus(UKSCC).

- Both will be validated interms ofprecisionandrecall.

Page 4: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

2.UKSCCand LACELL:the study andreference corpora- UKSCC(United Kingdom Supreme Court Corpus):

- Legalcorpusof192judicialdecisions issued by theSupremeCourt ofthe United Kingdom (2008-2010).

- Compiled adhocaccording to CLstandards (Sánchez,1995; Wynne,2005;Pearson,1998;Rea,2010).

- Monololingual andsynchronic.

- The Supreme Court chosen assource oftexts due to itsimportance asajudicialinstitution:touches upon allbranches oflaw andgreater geographical scope.

- Judicialdecisions appear asthe main source oflaw incommon law countries.

Page 5: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

2.UKSCCand LACELL:the study andreference corpora- LACELL(LingüísticaAplicadaComputacional,EnseñanzadeLenguasyLexicografía):

- Balanced generalEnglish corpusof20mwords:written(newspapers,books,magazines,brochures,letters,etc.)andorallanguage samples.

- Compiled by the LACELLresearchgroup atMurciaUniversity (EnglishDept.).

Page 6: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

3.ATRmethod descriptionAutomatic term recognition (ATR) methods date back to the1980s: they allow handling large amounts of data automaticallyspotting the most relevant terms in a specialised corpus. Theyhave been profusely reviewed (Maynard and Ananiadou, 2000;Cabré et al., 2001; Drouin, 2003; Lemay et al., 2005; Vivaldi et al.,2012, etc.)

- Keywords (Scott,2008):

- Not an ATR method proper, however, it has proved toidentify legal terms more efficiently than other methodsdesigned to that purpose.

- Automatically implemented using Wordsmith 5.0.Settings adjusted tu use Dunning’s (1993) log-likelihoodalgorithm.

Page 7: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

3.ATRmethod description

- Chung’s method (2003)

- Singled out due to high rate of success recorded by theauthor (86% precisionon average).

- Chung compares a qualitative term recognition method:the rating scale approach with her own quantitative one.She concludes that terms displaying a > 50 ratio ofoccurence are terms.

- How to calculate it:

Wr= SF(w)/RF(w) (freq. counts must be normalised)

Page 8: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

4.Validation process andresults

4.1. Validation process-Precision: Percentage of true terms out of candidate termsextracted.

-Recall: Percentage of true terms out of total amount of termsin the whole corpus.

-We resorted to automatic validation: 10,000 entry legalelectronic glossary compiled by authors used as gold standard(human validation poses problemsdue to subjectivity) .

-Terms confirmedas true if found in glossary.

-Keywords implemented automatically; Chung’s methodapplied using spreadsheet (data obtained withWordsmith too)

Page 9: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

4.Validation process andresults4.2. Results

Fig.1Overallprecisionandrecallachievedbyeachmethod

62,00%

42,25%

31,00%

11,75%

0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

70,00%

Keywords Chung

Precision

Recall

Page 10: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

4.Validation process andresults4.2. Results

Fig.2Cumulative precisionattainedontop2000candidates

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Keywords

Chung

Page 11: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

4.Validation process andresults4.2. Results

-Keywords excels Chung’s method both in terms ofprecisionand recall.

-Keywords decreases its efficiency smoothly andconstantly whereas Chung’s method performance is muchmore irregular.

-Chung’s method only performs better within candidates1600-1800dropping sharply to 3.5% precisionafterwards.

-Chung’s bad results may be due to the automatic inclusionof words not in the reference corpus in the terms group,especially proper names so typical of judicial decisions.

Page 12: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

4.Validation process andresults4.2. Results

Chung’smethod(2003) Ratio Keywords(2008) KeynessEHRR ∞ COURT 28955.793EWCA ∞ SECTION 27627.5586UKHL ∞ PARA(paragraph) 25311.1152MANCE ∞ LORD 25155.4434SIAC ∞ V(versus) 22486.0918ECHR ∞ APPEAL 21236.8652EWHC ∞ ARTICLE 19301.6328BAILII ∞ ACT 18577.8652GESTINGTHORPE ∞ CASE 18328.9512FOSCOTE ∞ LAW 10458.0918EARLSFERRY ∞ JUDGMENT 9297.75JFS ∞ APPELLANT 8048.33496ECTHR ∞ PROCEEDINGS 7787.61963STOJEVIC ∞ CONVENTION 7764.64355TURPI ∞ WHETHER 7716.16992LJ ∞ LJ 7707.0918DALLAH ∞ RIGHTS 7023.53613SUMPTION ∞ DECISION 6950.50488SEISED ∞ ORDER 6632.18164BANKOVIC ∞ JURISDICTION 6374.33105

Table1.Top25candidatetermsextractedbyeachmethod

Page 13: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

5.Conclusion

- Evaluating the efficiency ofATRmethods is highlyrecommendable to select the ones that suit our corpusbest,especially due to the fact that some ofthem aredomain-dependent like Chung’s .

-Nevertheless,asputforwardbyLemay(2005:245),“muchstillremainsontheterminologist’sabilitytodifferentiatearelevantunitfromanon-relevantone.Listsmustbescannedtoremoveirrelevantunits”.Actually,“fine-grainedsemanticdistinctionsstillrely...onterminologists”.

Page 14: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:

ReferencesAlcarazVaró,E.(2000)Elinglés profesional yacadémico.Madrid:AlianzaEditorial.Cabré,M.T.(1993).Laterminología.Teoría,metodología,aplicaciones.Barcelona:Antártida/Empúries.Cabré,M.T.,Estopà,R.,Vivaldi,J.(2001).Automatictermdetection:areviewofcurrentsystems.Bourigault,D.,Jacquemin,C.,L’Homme,M.C.(Eds.).RecentAdvancesinComputationalTerminology2,53-87.Amsterdam:JohnBenjamins,NaturalLanguageProcessing.Chung,T.M.(2003).Acorpuscomparisonapproachforterminologyextraction.Terminology9(2),221-246.Heatley,A.,Nation,I.S.P.1996.Range(computersoftware).Wellington:VictoriaUniversityofWellington.Kit,C.andLiu,X.(2008)“Measuringmono-wordtermhoodbyrankdifferenceviacorpuscomparison”.Terminology,14(2):204-229.Lemay,C.,LHomme,M.C.,Drouin,P.(2005).Twomethodsforextracting"specific"single-wordtermsfromspecializedcorpora:experimentationandevaluation.InternationalJournalofCorpusLinguistics,10(2),227-255.Marín,M.J.,Rea,C.(2011)."DesignandcompilationofalegalEnglishcorpusbasedonUKlawreports:theprocessofmakingdecisions".Carrió Pastor,M.L.,Candel Mora,M.A.(Eds.).LasTecnologías delaInformación ylas Comunicaciones:Presente yFuturo enelAnálisis deCórpora.Actas delIIICongreso Internacional deLingüística deCorpus(101-110).Valencia:UniversitatPolitècnica deValència.Maynard,D.andAnaniadou,S.2000.TRUCKS:Amodelforautomaticmulti-wordtermrecognition.JournalofNaturalLanguageProcessing8(1),101–125.Pearson,J.(1998).TermsinContext.Amsterdam:JohnBenjamins PublishingCompany.Rea,C.(2008).ElInglésdelasTelecomunicaciones:EstudioLéxicoBasadoenunCorpusEspecífico.Tesisdoctoral.Murcia:UniversidaddeMurcia.Rondeau,G.(1983).Introduction àlaterminologie.Québec:Gaëtan Morin Editeur.Sánchez,A.,Cantos,P.,SarmientoR.,Simón,J.1995Cumbre.Corpuslingüísticodelespañolcontemporáneo.Fundamentos,metodologíayanálisis.Madrid:SGEL.Scott,M.2008.WordSmith Toolsversion 5.Liverpool:LexicalAnalysisSoftware.Spasic,I.,Ananiadou,S.,McNaught,J.&Kumar,A.2005.‘Textminingandontologies inbiomedicine:Makingsenseofrawtext’.BriefBioinform,6(3),239-251.Vivaldi,J.,Cabrera-Diego,L.A.,Sierra,G.,Pozzi,M.(2012).‘UsingWikipedia to Validate the Terminology Found inaCorpusofBasicTextbooks’.InProceedings ofthe Eight InternationalConferenceon LanguageResources andEvaluation (LREC'12).Instambul,May 2012.Wynne,M.(Eds.)2005DevelopingLinguisticCorpora:aGuidetoGoodPractice. ASDSLiterature,LanguagesandLinguistics.Oxford.