ﻢِ ﺴﺑِ ﻴﺣِﺮﻟا ﻦِﻤﺣﺮﻟا ﻪﻠﻟا ِﻢِ · 2019-11-22 · dua- take...
TRANSCRIPT
ن الحم حيم بسم الله الر ر
A Template-based Approach to Design an Experiment
Date: November 07,2019 Venue: Seminar Room A-block, COMSATS University Islamabad, Lahore Campus
2
Technical Motivational Workshop
MPhil/PhD Students + Interested Faculty Members
Interactive
Workshop on Designing an Experiment
Target Audience
Duration
Types of Session
1 Session
Instructor Dr. Rao Muhammad Adeel Nawab
3
Dr. Rao Muhammad Adeel Nawab
Dr. Rao Muhammad Adeel Nawab
Organizers
Dr. Rao Muhammd Adeel Nawab
4
Ms. Muntaha Iqbal
Dr. Rao Muhammad Adeel Nawab
Ms. Muntaha Iqbal
MS(CS)COMSATS University Islamabad (CUI), Lahore Campus
Visiting LectureCOMSATS University Islamabad (CUI), Lahore Campus
Group MemberNLP Research Group at CUI, Lahore
5
Dr. Rao Muhammad Adeel Nawab
Ms. Muntaha Iqbal - Publications
02
Impact Factor Journal Papers
01International Conference /
Workshop Papers 01
H-Index Citations
08
6
Dr. Rao Muhammad Adeel Nawab
Ms. Muntaha Iqbal - Research Profile on Google Scholar
7
Dr. Rao Muhammad Adeel Nawab
Dr. Rao Muhammad Adeel Nawab: About Me
PhDUniversity of Sheffield, UK
Assistant ProfessorCOMSATS University Islamabad (CUI), Lahore Campus
Group LeadNLP Research Group at CUI, Lahore Campus
8
Dr. Rao Muhammad Adeel Nawab
Dr. Rao Muhammad Adeel Nawab - Publications
15
Impact Factor Journal Papers
21International Conference /
Workshop Papers
10
H-Index
Citations
238
40+ M-Phil Students
Supervised
9
10+ PhD Students Under Supervision
Dr. Rao Muhammad Adeel Nawab
About Me - Research Profile on Google Scholar10
Dr. Rao Muhammad Adeel Nawab
Dedication11
I dedicate this workshop to my belovedand most respectful PhD supervisor.
Dr. Mark StevensonDepartment of Computer Science,University of Sheffield, UK
Dr. Rao Muhammad Adeel Nawab
Goal of Life - �ش ر� �ش ر�12
Dr. Rao Muhammad Adeel Nawab
Success vs Happiness
Success - what you Get
Happiness - what you Give
13
Dr. Rao Muhammad Adeel Nawab
Key to Success14
Dr. Rao Muhammad Adeel Nawab
Key to Success Cont.
Time Management is Life Management
15
Dr. Rao Muhammad Adeel Nawab
How to Work16
ا .�م ��ن
ا . �� �� �م ��ن
ا اهللا � �� � ��� �� �م � .�ن
ت ك : آتى ���ك نعبد وا �� �
تعني ا �س�ر�
ت. �ىا اهللا � �ى � �دت �� �: �
اور � � � � د �� �
Dr. Rao Muhammad Adeel Nawab
Power of Effort and Dua
:روزا� � د� ���ط ٱٱ تقمي رص �ط ٱٱلمس� م ٱٱهد� ٱٱلرص �ن ٱ�نعمت �لهي ��
� � ا�م �� �� راه د� ان ��ں � راه � �Holy Quran, Surah Al-Fatihah, Ayat # 6-7
17
Dr. Rao Muhammad Adeel Nawab
Power of Neyat
ر��ىا ر�ول اهللا � اهللا ع�يىه و�ن
� �ال �لنی�ة �ما ا��مع ن
�ا
)ا�ل � دارو�ار �ں � �۔(ا �رى �دت ہ ۔ۓ� �� اور ���ن
� د�تن
دا � �ن
�ا� � ر� اور �ق � ت ىن
ا� ( ت��۔�� اور �) � د�ن � ��
Sahih Bukhari, Book of Revelation , Hadith # 1
18
Dr. Rao Muhammad Adeel Nawab
Dua - Take Help from Allah before Starting any Task19
�هم� خر يل و ا�رت يل اللب�انك ال �مل ل �م س� ال� ما �ل
�تنا نا ا�ك ٱ�نت العلمي ن
�الحكمي ا
ح يل صدري رب ارش يل امري و�رساين وا�لل عقدة من لس
یفقهوا قويل
Dr. Rao Muhammad Adeel Nawab
There is tradeoff between Accuracy and Speed
First focus on
then
Accuracy
Speed
To Learn20
1
2
3
4Plan in mind
Design on paper
Execute in real world
Feedback from audience and
domain expert
Dr. Rao Muhammad Adeel Nawab
To Learn any Task21
Make a template
In template, ensure
Use divide and conquer approachBreak down into attribute-value pair
Order and Flow
Dr. Rao Muhammad Adeel Nawab
Make a Template to do any Task 22
Independence and Connectivity
1
2
Reading Writing
Dr. Rao Muhammad Adeel Nawab
Research Cycle – Three Main Phases23
Experimentation
Research Focus and Expected Outcome(s)
Main Components of an Experiment
Template-based Approach to Design an Experiment
Dr. Rao Muhammad Adeel Nawab
Workshop Outline24
To introduce with themain concepts that areessential to become agreat human beingand a great researcher
To develop skillsto make atemplate-baseddetailed outline ofa research paper
Dr. Rao Muhammad Adeel Nawab
Workshop Aims25
To develop skills to design anexperiment (or set of experiments)using a template-based approach
Workshop Learning Outcomes
Understand what daily tasks areimportant to have a healthypersonality (physically, mentallyand socially)
Be able to understandessential components of anexperiment
Be able to design anexperiment using a template-based approach and step bystep execute the experimentin the real world
By the end of this workshop, the participants
should
26
Research Focus and Expected Outcome(s)
27
Research Focus
An existing approach
Exact Matching Approach
Query Expansion Approach
Paraphrased Text Reuse Detection
You have proposed this approach to furtherimprove the performance for the paraphrasedtext reuse detection problem
28
Expected Outcome(s) of Experiment
Proposed approach will outperform the baselineapproach for the same problem on the samedataset / corpus
29
Expected Outcome(s) of Experiment Cont.
Very Important
If you can’t theoretically prove that yourproposed approach will outperform thebaseline approach, then it will be very risky tocarry out the experiment(s)
30
Dr. Rao Muhammad Adeel Nawab
Example: Baseline Approach vs Proposed Approach
Problem: Paraphrased Text Reuse Detection
Dataset
Original Text: A dog bites a manReused Text : A hound bites a person
Expected Outcome(s) of Experiment Cont.31
Dr. Rao Muhammad Adeel Nawab
Evaluation Methodology
Quantify degree of similarity between original textand reused text using the following formula,
32
Expected Outcome(s) of Experiment Cont.
Similarity Score =No. of Common Words
Total Number of Words in Reused Text
Dr. Rao Muhammad Adeel Nawab
Exact Matching Approach (Baseline Approach)
Original Text: A dog bites a man
Reused Text : A hound bites a person
33
Expected Outcome(s) of Experiment Cont.
Similarity Score =No. of Common Words
Total Number of Words in Reused Text
Dr. Rao Muhammad Adeel Nawab
Query Expansion Approach (Proposed Approach)
Original Text: A dog bites a man
Reused Text : A hound (dog) bites (cuts) a person (human, man)
34
Expected Outcome(s) of Experiment Cont.
Similarity Score =No. of Common Words
Total Number of Words in Reused Text
Similarity Score =55 = 1.0
Dr. Rao Muhammad Adeel Nawab
Proposed approach outperformed the baselineapproach on the same dataset using same evaluationmethodology
Similarity Scores Approach
0.6Exact Matching Approach
(Baseline Approach)
1.0Query Expansion Approach
(Proposed Approach)
Original Text and Reused Text are semantically same;therefore, similarity score should be 1.0
35
Expected Outcome(s) of Experiment Cont.
Dr. Rao Muhammad Adeel Nawab
Conclusion
Proposed Approach outperforms the Baseline Approach for a sample dataset
Based on this finding it is likely that Proposed Approach will outperform the Baseline Approach on large benchmark dataset
Expected Outcome(s) of Experiment Cont.36
Main Components of an Experiment
37
Techniques / Approaches / Methods
Dr. Rao Muhammad Adeel Nawab
Dataset(s) / Corpora
Main Components of an Experiment38
Evaluation Methodology
Evaluation Measures
Dr. Rao Muhammad Adeel Nawab
Dataset(s) / Corpora
Main Characteristics of Corpus
For each dataset / corpus mention
Name of Corpus1
2
39
Dr. Rao Muhammad Adeel Nawab
Write Detailed Description of Techniques in theExperimental Setup Section
Techniques/ Approaches/ Methods
Just Mention Names of Techniques
Refer to previous Section(s) which containdetailed description of techniques
Two Options
1
2
40
Dr. Rao Muhammad Adeel Nawab
Detailed Description of Technique
How it works?
For each technique describe the following main points
How you have applied it in your study?
In which previous studies it has been used?
In which category it falls?1
2
3
4
41
Dr. Rao Muhammad Adeel Nawab
Evaluation Methodology
Problem
Main Points
Important – Always use standard tools and methodsmentioned in high quality research papers
How Problem is Treated?
Tools / Toolkits
How tools / toolkits were used in your study?
42
Dr. Rao Muhammad Adeel Nawab
Evaluation Methodology Cont.
Problem – Paraphrased Text Reuse Detection
Example
Treated as – Supervised Text Classification Task
Toolkit – WEKA
43
Dr. Rao Muhammad Adeel Nawab
Use standard evaluation measures
HINT: See what evaluation measures are used in previous studies which addressed the sameproblem
Evaluation Measures44
For each evaluation measure
Write Definition
Write Equation
Template-based Approachto Design an Experiment
45
Dr. Rao Muhammad Adeel Nawab
Template-based Approach to Design an Experiment
Extract information from existing shortlistedpapers in the form of attribute-value pair to fillthe template
Combine the experimental designs of existingpapers to make your template-basedexperimental design
Shortlist 2 – 5 research papers very closelyrelated to your research work
1
2
3
46
Dr. Rao Muhammad Adeel Nawab
Paper Title – Detecting Text Reuse with Modifiedand Weighted N-grams
47Example - Template-based Approach
to Design an Experiment Cont.
1 Dataset
Template – Experimental Design
2
3
Approaches
Evaluation Methodology
4 Evaluation Measures
Dr. Rao Muhammad Adeel Nawab
48Example - Template-based Approach
to Design an Experiment Cont.
1 Dataset
METER Corpus
Original (or Source) Documents = 771
Reused (or Derived) Documents = 768
Dr. Rao Muhammad Adeel Nawab
49
Wholly Derived = 285
Partially Derived = 300
Non-Derived = 183
Pre-processing
Lower case
Remove all punctuation marks
Example - Template-based Approach to Design an Experiment Cont.
Levels of Text reuse
Dr. Rao Muhammad Adeel Nawab
50Example - Template-based Approach
to Design an Experiment Cont.
Approaches4
N-gram Overlap Approach (Baseline Approach)
Word N-grams (N = 1 - 5)
Containment Similarity MeasureSimilarity Score Range is [0 – 1]
Modified and Weighted N-grams Approach(Proposed Approach)
Modified N-gramsDeletion ApproachSubstitution Approach (WordNet, Paraphrase Lexicon)
Dr. Rao Muhammad Adeel Nawab
51Example - Template-based Approach
to Design an Experiment Cont.
Weighted N-gramsWeight of N-gram = Probability of N-gram ( -log (P) )Probability of N-gram Computed using LanguageModelLanguage Model
N-gram Weights Computed using Language Model
Containment Similarity Measure (with Clipped Count)Similarity Score Range is [0 – 1]
Trained on Reuters Corpus
Using SRILM Language Modeling Toolkit
Dr. Rao Muhammad Adeel Nawab
Problem = Paraphrased Text Reused Detection
52
Two Versions of Classification
Partially DerivedWholly Derived
Solution = Treated as Supervised DocumentClassification Task
Non-Derived
Ternary Classification – Differentiate between 3Classes
Example - Template-based Approach to Design an Experiment Cont.
3 Evaluation Methodology
Dr. Rao Muhammad Adeel Nawab
53
Binary Classification – Differentiate between 2 Classes
Example - Template-based Approach to Design an Experiment Cont.
Derived (Wholly Derived + Partially Derived)
Non-Derived
Machine Learning ToolkitWEKA 3.8
Implemented in Java Programming Language
Classifiers / Machine Learning AlgorithmsNaïve Bayes
Dr. Rao Muhammad Adeel Nawab
54Example - Template-based Approach
to Design an Experiment Cont.
Classifier Training / Testing
K-fold cross validation approach (K = 10)
Features used as input to Naïve Bayes
Similarity scores generated for Word N-grams (N = 1 - 5) using
N-gram Overlap Approach
Weighted and Modified N-gram Approach
Dr. Rao Muhammad Adeel Nawab
55
Approaches performance measured using
Evaluation Measures
Recall
Precision
F1
Three Evaluation Measures used for evaluation
Macro-averaged Precision, Recall and F1scores across all classes
4
Example - Template-based Approach to Design an Experiment Cont.
56
57