automatic access to legal terminology applying two ... · 4. validation process and results...

Post on 23-Mar-2021

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AUTOMATICACCESSTOLEGALTERMINOLOGYAPPLYINGTWODIFFERENTATRMETHODS

MARÍAJOSÉMARÍNCAMINOREA

1.INTRODUCTION-Importance ofidentifying the terms inaspecialisedcorpus.

- Terms are“textualrealisations ofspecialisedconcepts”(Spasic etal.2005).

-They areemployed to communicate amongst specialists (Rea,2008).

- They aremono-referential andhave aunivocal character(Cabré,1993):form content.

-Potential applications ofautomatic term recognition (ATR):building dictionaries andglossaries;machinetranslation;ontology building,etc.

1.INTRODUCTION- This paper evaluates the efficiency oftwo ATRmethods:Keywords (2008)andChung’s (2003)on a2.6mword legalcorpus(UKSCC).

- Both will be validated interms ofprecisionandrecall.

2.UKSCCand LACELL:the study andreference corpora- UKSCC(United Kingdom Supreme Court Corpus):

- Legalcorpusof192judicialdecisions issued by theSupremeCourt ofthe United Kingdom (2008-2010).

- Compiled adhocaccording to CLstandards (Sánchez,1995; Wynne,2005;Pearson,1998;Rea,2010).

- Monololingual andsynchronic.

- The Supreme Court chosen assource oftexts due to itsimportance asajudicialinstitution:touches upon allbranches oflaw andgreater geographical scope.

- Judicialdecisions appear asthe main source oflaw incommon law countries.

2.UKSCCand LACELL:the study andreference corpora- LACELL(LingüísticaAplicadaComputacional,EnseñanzadeLenguasyLexicografía):

- Balanced generalEnglish corpusof20mwords:written(newspapers,books,magazines,brochures,letters,etc.)andorallanguage samples.

- Compiled by the LACELLresearchgroup atMurciaUniversity (EnglishDept.).

3.ATRmethod descriptionAutomatic term recognition (ATR) methods date back to the1980s: they allow handling large amounts of data automaticallyspotting the most relevant terms in a specialised corpus. Theyhave been profusely reviewed (Maynard and Ananiadou, 2000;Cabré et al., 2001; Drouin, 2003; Lemay et al., 2005; Vivaldi et al.,2012, etc.)

- Keywords (Scott,2008):

- Not an ATR method proper, however, it has proved toidentify legal terms more efficiently than other methodsdesigned to that purpose.

- Automatically implemented using Wordsmith 5.0.Settings adjusted tu use Dunning’s (1993) log-likelihoodalgorithm.

3.ATRmethod description

- Chung’s method (2003)

- Singled out due to high rate of success recorded by theauthor (86% precisionon average).

- Chung compares a qualitative term recognition method:the rating scale approach with her own quantitative one.She concludes that terms displaying a > 50 ratio ofoccurence are terms.

- How to calculate it:

Wr= SF(w)/RF(w) (freq. counts must be normalised)

4.Validation process andresults

4.1. Validation process-Precision: Percentage of true terms out of candidate termsextracted.

-Recall: Percentage of true terms out of total amount of termsin the whole corpus.

-We resorted to automatic validation: 10,000 entry legalelectronic glossary compiled by authors used as gold standard(human validation poses problemsdue to subjectivity) .

-Terms confirmedas true if found in glossary.

-Keywords implemented automatically; Chung’s methodapplied using spreadsheet (data obtained withWordsmith too)

4.Validation process andresults4.2. Results

Fig.1Overallprecisionandrecallachievedbyeachmethod

62,00%

42,25%

31,00%

11,75%

0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

70,00%

Keywords Chung

Precision

Recall

4.Validation process andresults4.2. Results

Fig.2Cumulative precisionattainedontop2000candidates

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Keywords

Chung

4.Validation process andresults4.2. Results

-Keywords excels Chung’s method both in terms ofprecisionand recall.

-Keywords decreases its efficiency smoothly andconstantly whereas Chung’s method performance is muchmore irregular.

-Chung’s method only performs better within candidates1600-1800dropping sharply to 3.5% precisionafterwards.

-Chung’s bad results may be due to the automatic inclusionof words not in the reference corpus in the terms group,especially proper names so typical of judicial decisions.

4.Validation process andresults4.2. Results

Chung’smethod(2003) Ratio Keywords(2008) KeynessEHRR ∞ COURT 28955.793EWCA ∞ SECTION 27627.5586UKHL ∞ PARA(paragraph) 25311.1152MANCE ∞ LORD 25155.4434SIAC ∞ V(versus) 22486.0918ECHR ∞ APPEAL 21236.8652EWHC ∞ ARTICLE 19301.6328BAILII ∞ ACT 18577.8652GESTINGTHORPE ∞ CASE 18328.9512FOSCOTE ∞ LAW 10458.0918EARLSFERRY ∞ JUDGMENT 9297.75JFS ∞ APPELLANT 8048.33496ECTHR ∞ PROCEEDINGS 7787.61963STOJEVIC ∞ CONVENTION 7764.64355TURPI ∞ WHETHER 7716.16992LJ ∞ LJ 7707.0918DALLAH ∞ RIGHTS 7023.53613SUMPTION ∞ DECISION 6950.50488SEISED ∞ ORDER 6632.18164BANKOVIC ∞ JURISDICTION 6374.33105

Table1.Top25candidatetermsextractedbyeachmethod

5.Conclusion

- Evaluating the efficiency ofATRmethods is highlyrecommendable to select the ones that suit our corpusbest,especially due to the fact that some ofthem aredomain-dependent like Chung’s .

-Nevertheless,asputforwardbyLemay(2005:245),“muchstillremainsontheterminologist’sabilitytodifferentiatearelevantunitfromanon-relevantone.Listsmustbescannedtoremoveirrelevantunits”.Actually,“fine-grainedsemanticdistinctionsstillrely...onterminologists”.

ReferencesAlcarazVaró,E.(2000)Elinglés profesional yacadémico.Madrid:AlianzaEditorial.Cabré,M.T.(1993).Laterminología.Teoría,metodología,aplicaciones.Barcelona:Antártida/Empúries.Cabré,M.T.,Estopà,R.,Vivaldi,J.(2001).Automatictermdetection:areviewofcurrentsystems.Bourigault,D.,Jacquemin,C.,L’Homme,M.C.(Eds.).RecentAdvancesinComputationalTerminology2,53-87.Amsterdam:JohnBenjamins,NaturalLanguageProcessing.Chung,T.M.(2003).Acorpuscomparisonapproachforterminologyextraction.Terminology9(2),221-246.Heatley,A.,Nation,I.S.P.1996.Range(computersoftware).Wellington:VictoriaUniversityofWellington.Kit,C.andLiu,X.(2008)“Measuringmono-wordtermhoodbyrankdifferenceviacorpuscomparison”.Terminology,14(2):204-229.Lemay,C.,LHomme,M.C.,Drouin,P.(2005).Twomethodsforextracting"specific"single-wordtermsfromspecializedcorpora:experimentationandevaluation.InternationalJournalofCorpusLinguistics,10(2),227-255.Marín,M.J.,Rea,C.(2011)."DesignandcompilationofalegalEnglishcorpusbasedonUKlawreports:theprocessofmakingdecisions".Carrió Pastor,M.L.,Candel Mora,M.A.(Eds.).LasTecnologías delaInformación ylas Comunicaciones:Presente yFuturo enelAnálisis deCórpora.Actas delIIICongreso Internacional deLingüística deCorpus(101-110).Valencia:UniversitatPolitècnica deValència.Maynard,D.andAnaniadou,S.2000.TRUCKS:Amodelforautomaticmulti-wordtermrecognition.JournalofNaturalLanguageProcessing8(1),101–125.Pearson,J.(1998).TermsinContext.Amsterdam:JohnBenjamins PublishingCompany.Rea,C.(2008).ElInglésdelasTelecomunicaciones:EstudioLéxicoBasadoenunCorpusEspecífico.Tesisdoctoral.Murcia:UniversidaddeMurcia.Rondeau,G.(1983).Introduction àlaterminologie.Québec:Gaëtan Morin Editeur.Sánchez,A.,Cantos,P.,SarmientoR.,Simón,J.1995Cumbre.Corpuslingüísticodelespañolcontemporáneo.Fundamentos,metodologíayanálisis.Madrid:SGEL.Scott,M.2008.WordSmith Toolsversion 5.Liverpool:LexicalAnalysisSoftware.Spasic,I.,Ananiadou,S.,McNaught,J.&Kumar,A.2005.‘Textminingandontologies inbiomedicine:Makingsenseofrawtext’.BriefBioinform,6(3),239-251.Vivaldi,J.,Cabrera-Diego,L.A.,Sierra,G.,Pozzi,M.(2012).‘UsingWikipedia to Validate the Terminology Found inaCorpusofBasicTextbooks’.InProceedings ofthe Eight InternationalConferenceon LanguageResources andEvaluation (LREC'12).Instambul,May 2012.Wynne,M.(Eds.)2005DevelopingLinguisticCorpora:aGuidetoGoodPractice. ASDSLiterature,LanguagesandLinguistics.Oxford.

top related