contribution to research new models of knowledge extraction...

Contribution to research new models of knowledge extraction on BigData systems Héctor Cerezo-Costas, Advisor: F.Javier González-Castaño 1 1 Department of Telematics Engineering, University of Vigo Motivation Natural Language Processing (NLP) has a wide range of applications such as: Human performance exceed computers in many complex NLP tasks: Nonetheless computers are faster and they are able to solve problems at web-scale. Thesis Objectives • Objective 1: Research in new unsupervised algorithms for the application in NLP tasks. • Objective 2: Development of bigdata algorithms to solve NLP problems in the Terabyte-scale. • Objective 3: Research in new technologies for fast adaptation in different context of text mining models. Ongoing Work Participation in a Sentiment Analysis Competition (SemEval 2015) We have taken part in the following competition SemEval- 2015 Task 10 Subtask B: Sentiment Analysis in Twitter [1]. Goal: • Given a message from Twitter classify it as positive, negative or neutral. General Approach • Supervised Strategy with Logistic Regression • Ensemble of classifiers with majority voting strategy • CRFs for complex feature extraction: negation, comparison, adversative clauses, etc [2, 3] The steps performed by the system are: 1 Preprocessing Step: emoticon substitution, multiword hashtag splittage, mentions and URL substitutions, etc 2 Data Tagging: polarity dictionaries, verb reversal detection, etc. 3 PoS data extraction 4 Syntactic Information Extraction: detection of negation, adversative or polarity reversal scopes using CRFs 5 Feature extraction and classification of sentences Figure 1 : Architecture of the system. Results Test F-score LiveJournal 2014 72.63 SMS 2013 61.97 Twitter 2013 65.29 Twitter 2014 66.87 Twitter 2014 sarcasm 59.11 Twitter 2015 60.62 Twitter 2015 sarcasm 56.45 Table 1 : Performance in progress and input test. • 16th position out of 40 competitors in both sarcasm and regular 2015 datasets. • 1st position in 2014 Tweet Sarcasm dataset. • Generalized degradation between 2014 and 2015 performance results. Research Plan (Next Year) References [1] S. Rosenthal, P. Nakov, S. Kiritchenko, S.M. Mohammad, A. Ritter, and V. Stoyanov. 2015. Semeval-2015 Task 10: Sentiment Analysis in Twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval ’2015, Denver, Colorado, June [2] J. Lafferty, A. McCallum, and F. CN Pereira. 2001. Conditional random fields: Probabilistic models for Segmenting and Labeling Sequence Data [3] E. Lapponi, E. Velldal, L. Øvrelid, and J. Read. 2012b. Uio 2: Sequence-Labeling Negation Using Dependency Features. In Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1.pages 319–327. Workshop on Monitoring PhD Student Progress, 16 June 2015, Vigo, Spain

Upload: others

Post on 19-Aug-2021

5 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Contribution to research new models of knowledge extraction ...doc_tic.uvigo.es/sites/default/files/jornadas2015/...Contribution to research new models of knowledge extraction on BigData

Contribution to research new models of knowledgeextraction on BigData systems

Héctor Cerezo-Costas, Advisor: F.Javier González-Castaño11Department of Telematics Engineering, University of Vigo

Motivation

Natural Language Processing (NLP) has a wide range ofapplications such as:

Human performance exceed computers in many complexNLP tasks:

Nonetheless computers are faster and they are able tosolve problems at web-scale.

Thesis Objectives

• Objective 1: Research in new unsupervised algorithmsfor the application in NLP tasks.

• Objective 2: Development of bigdata algorithms tosolve NLP problems in the Terabyte-scale.

• Objective 3: Research in new technologies for fastadaptation in different context of text mining models.

Ongoing Work

Participation in a Sentiment AnalysisCompetition (SemEval 2015)

We have taken part in the following competition SemEval-2015 Task 10 Subtask B: Sentiment Analysis in Twitter[1].Goal:• Given a message from Twitter classify it as positive,negative or neutral.

General Approach• Supervised Strategy with Logistic Regression

• Ensemble of classifiers with majority voting strategy

• CRFs for complex feature extraction: negation,comparison, adversative clauses, etc [2, 3]

The steps performed by the system are:1 Preprocessing Step: emoticon substitution, multiwordhashtag splittage, mentions and URL substitutions, etc

2 Data Tagging: polarity dictionaries, verb reversaldetection, etc.

3 PoS data extraction

4 Syntactic Information Extraction: detection ofnegation, adversative or polarity reversal scopes usingCRFs

5 Feature extraction and classification of sentences

Figure 1 : Architecture of the system.

ResultsTest F-scoreLiveJournal 2014 72.63SMS 2013 61.97Twitter 2013 65.29Twitter 2014 66.87Twitter 2014 sarcasm 59.11Twitter 2015 60.62Twitter 2015 sarcasm 56.45

Table 1 : Performance in progress and input test.

• 16th position out of 40 competitors in both sarcasmand regular 2015 datasets.

• 1st position in 2014 Tweet Sarcasm dataset.

• Generalized degradation between 2014 and 2015performance results.

Research Plan (Next Year)References

[1] S. Rosenthal, P. Nakov, S. Kiritchenko, S.M. Mohammad, A. Ritter, and V. Stoyanov. 2015.Semeval-2015 Task 10: Sentiment Analysis in Twitter. In Proceedings of the 9th InternationalWorkshop on Semantic Evaluation, SemEval ’2015, Denver, Colorado, June

[2] J. Lafferty, A. McCallum, and F. CN Pereira. 2001. Conditional random fields: Probabilistic modelsfor Segmenting and Labeling Sequence Data

[3] E. Lapponi, E. Velldal, L. Øvrelid, and J. Read. 2012b. Uio 2: Sequence-Labeling Negation UsingDependency Features. In Proceedings of the First Joint Conference on Lexical and ComputationalSemantics-Volume 1.pages 319–327.

Workshop on Monitoring PhD Student Progress, 16 June 2015, Vigo, Spain

Using gamification to improve the efficiency of energy …doc_tic.uvigo.es/sites/default/files/jornada2016/poster... · 2019. 10. 22. · Using gamification to improve the efficiency

Instant Accumulated Knowledge extraction from usage data ...doc_tic.uvigo.es/sites/default/files/jornada2016/Slides/SES3.pdf · Evaluation Workshop 13. th. June 2016 . Author: María

Achieving Efficiency in Dynamic Contribution Gamesadapted to other dynamic games with externalities, such as strategic experimentation and the dynamic extraction of a common resource

SYNTHESIS OF MULTIAXIS STATIONARY NON-GAUSSIAN …doc_tic.uvigo.es/sites/default/files/jornadas2015/... · INSTRUMENTATION AND DATA ACQUISITION SYNTHESIS OF MULTIAXIS STATIONARY NON-GAUSSIAN

A 2-phase Frame-based Knowledge Extraction Framework · 3 A 2-phase Frame-based Knowledge Extraction Framework - Corcoglioniti, Rospocher, Palmero Aprosio Contribution PIKES* …

Relation Extraction and Machine Learning for IE Feiyu Xu feiyu@dfki€¦ · •Topic Extraction •Term Extraction •Named Entity Extraction •Binary Relation Extraction •N-ary

CONTRIBUTION TO BUSINESS MODEL AND EFFICIENCY OF …doc_tic.uvigo.es/sites/default/files/jornadas2014/Posters20140703/I… · AUTHOR : BRAIS BLANCO RODICIO THESIS DIRECTOR: MARTÍN

Contribution to research in new models of knowledge extraction …doc_tic.uvigo.es/sites/default/files/EvaluationWorkshop2018/Slides/... · Contribution to research in new models

A MACHINE LEARNING (ML) BASED APPROACH FOR SMART …doc_tic.uvigo.es/sites/default/files/EvaluationWorkshop2018/Posters... · A MACHINE LEARNING (ML) BASED APPROACH FOR SMART ENERGY

EXTRACTION and SUGAR INDUSTRY APPLICATIONS. EXTRACTION EXTRACTION 1-LEACHING(SOLID EXTRACTION) 1-LEACHING(SOLID EXTRACTION) a) GENERAL INFORMATION a)

Learning to Rank Features for Recommendation over Multiple ...yongfeng.me/attach/sigir16-chen.pdf · -1). Since the feature-sentiment pair extraction is not the key contribution of