a self training framework for exploratory discourse detection final

Click here to load reader

Upload: zhongyu-wei

Post on 01-Jul-2015

1.452 views

Category:

Technology


3 download

TRANSCRIPT

  • 1. A self-training framework forexploratory discourse detectionZhongyu Wei SoLAR symposiumOpen University,UK, 26 June 2012 PhD student, SEEM, The Chinese University of Hong Kong, Hong KongSocialLearn intern, Open University, [email protected]

2. Outline Exploratory dialogue analysis A self-training framework Datasets and experiments Applications 3. Online learning resources explosionLearning Online ForumSeminar OnlineDistant Conferen Educatio ce n Platform 4. the critical, knowledge-buildingdiscourse?... 5. How many points in the webinar triggered learning/knowledge-building?This person contributes a lot during thechat. This part appears to have very good content that will provoke deeper learningData in this study taken from a 2 day OU conference in Elluminate & Cloudworks: 6. Exploratory dialogue analysis Exploratory dialogue represents a joint, coordinated from of co-reasoningin language, with speakers sharing knowledge,challenging ideas, evaluating evidence and considering Categor ... options DescriptionExampleyChallen Identifies that somethinggemay be wrong and in need I disagree. Freemind is a superbof correction piece of software to use...EvaluatiHas a descriptive qualityThats a really interestingon approachExtensio Builds on or provides Ive embedded helens slidenresources that supportshare over in cloudworks discussionhttp://link.comReasoni The process of thinking an Why intranet only? WhatMercer, N. (2004). Sociocultural discourse analysis: analysing classroom talk as a social mode of thinking. Journal of Applied Linguistics, 1(2),137-168.ngidea through.meaning CLOSED in 7. Low exploratory dialogue TimeContribution3:12 PM LOL3:12 PM Its not looking good.3:13 PM Sorry, had to do that.3:13 PM jaaa3:13 PM Ouch!3:13 PM It was a vuvuzela.3:13 PM I though that was you @Alistair3:13 PM Ive taken away the vuvuzela from you now!3:13 PM LOL 8. Higher exploratory dialogue TimeContribution2:42 PM I hate talking. :-P My question was whether "gadgets" were justbasically widgets and we could embed them in various web sites,like Netvibes, Google Desktop, etc.2:42 PM Thanks, thats great! I am sure I understood everything, but looksinspiring!2:43 PM Yes why OU tools not generic tools?2:43 PM Issues of interoperability2:43 PM The "new" SocialLearn site looks a lot like a corkboard where youcan add various widgets, similar to those existing web start pages.2:43 PM What if we end up with as many apps/gadgets as we have socialnetworks and then we need a recommender for the apps!2:43 PM My question was on the definition of the crowd in the wisdom ofcrowds we acsess in the service model? 9. Exploratory dialogue detection Problem Statement Given an online chatting session S = {d0, d1 dn}, dkstands for the kth dialogue, classify dk as exploratory ornon-exploratory. Solution from learning analytics Sociocultural discourse analysis method Manual High precision and low recall CategoryCue phrases Challenge But if, have to respond, my view EvaluationGood example, good point Extension More links, for example Reasoning That is why, next step 10. Exploratory dialogue classificationExplorator Explorator y yDialog DiscourseueNon- ClassifierExploratory Dialogue is represented by a feature vector. {I think she is right}{I, think, she, is, right, I-think, think-she, she-is, is-right, I-think-she, think-she-is, she-is-right} 11. Exploratory dialogue classification Instance-based supervised classifier trainingExplor ExplorExplorato atoryatory Explorator ryClassifier y Training DiscourseNon- Non-Non-ClassifierExplorat Explorat Explorator oryoryy Feature-based supervised classifier trainingExplor ExplorExploratoExplorato atoryatoryFeatureClassifie ry Featury Generati rer ListDiscourseNon- Non-on TrainingNon- ClassifierExplorat Explorat Explorator oryoryy 12. An example of feature list Feature Exploratory Non- Exploratory what-is 0.99920.0008 good-point0.99950.0005 your-audio- 0.001 0.999 should thank-you 0.004 0.996 my-name 0.070.93 13. A self-training frameworkAnnotatClassifierClassifieed trainingr data Step 1: Training initial classifier on annotated data. Annotated data is time consuming to obtain 14. A self-training framework Unlabeled dataAnnotatClassifier Classifieed training r dataAnnotat Pseudo-Instanceedannotate Selection datad data Step 2: Classify unlabeled data, select high confidence instances and combine them with annotated data Step 3:Re-train classifier on the augmented training 15. A self-training framework Unlabeled data ExploratorAnnotatClassifier Classifie yResuled training rDiscourse ts data DetectionAnnotat Pseudo-Instance Testedannotate Selectiondata datad data Step 4: Obtain final classifier: No improvement on validation dataset; After a certain iteration; No class label changes. Step 5: Detect exploratory dialogues on the test data. 16. A self-training framework Unlabeled data ExploratorAnnotatClassifier Classifie yResuled training rDiscourse ts data DetectionAnnotat Pseudo-Instance Testedannotate Selectiondata datad data Self-training will introduce noisy instance. 17. KNN based Instance Selection approach K nearest neighbors classification Blue stands for exploratory Gray stands for non-exploratory 1 nearest neighbor is exploratory 2 nearest neighbors is exploratory 5 nearest neighbors is non- exploratory 18. KNN based Instance Selection approachPseudo annotated instances P = {p1,p2, pn }pk = (lk, ck) . Lk is pseudo label, ck is confidence value Form a candidate listChoose instances with ck > r For pk in the candidate list, identify the Knearest neighbors and update the pseudo label of pk by KNN Obtain new pseudo annotated instances P-updated 19. Outline Exploratory dialogue analysis A self-training framework Datasets and experiments Applications 20. Data source: OU online conference 4 sessions including 2634 posts.Data in this study taken from a 2 day OU conference in Elluminate & Cloudworks: 21. Annotation 2 Annotators with one morning training. Four categories are given. Kappa value (binary) is 0.5978 (moderate). Only posts with the consistent labels are collected. Total# Exploratory # Non-Exploratory Session #OU_22A 529380 149MOU_22P 661508 153MOU_23A 456310 146M 22. Experiment Setup Baseline: CP: Cue phrase based method MEGE: Supervised Max Entropy GE (Generalized Expectation)approach (feature based) ME: Supervised Max Entropy approach (instance based) SMEGE: Self-training Max Entropy GE approach (feature based) SME: Self-training Max Entropy approach (instance based) Experiment Setup Use one session as training part, one session as testing part, onesession as validation During the self-training process, examples include cue-phraseare added to training dataset at the first stage. Pseudo samples are added with the same ratio of exploratoryand non-exploratory as training dataset. Confidence value 0.8 Feature threshold 0.65 23. Evaluation Criterion Exp Exp Exp Exp NonEx Exp p NonEx NonEx p p 24. Experiment ResultApproach Accuracy PrecisionRecallF1Cue- 0.5389 0.9523 0.4241 0.5865PhraseMaxEnt 0.8099 0.8526 0.8675 0.8499MaxEntGE 0.7932 0.8817 0.8078 0.8292Self-0.8088 0.8331 0.9011 0.8574trainingMaxEntSelf-0.81810.8818 0.84060.8554 Cue-phrase method give high precision, but low accuracy.training Feature-based self-training approach improve on all criteriaMaxEntGE (the last row). Instance-based self-training algorithm (4th row) perform even worse according to accuracy precision. 25. Experiment ResultSession MaxEntMaxEnt- MaxEntG MaxEntGSelftrainE E-SelftrainOU_22AM 0.8190 0.84670.78870.8270OU_22PM 0.8034 0.83110.7738 0.8116OU_23AM 0.8268 0.82820.81140.8297OU_23PM Instance-based self-training algorithm (2nd 0.80420.7906 0.7294 0.7989column) is sensitive to the initial classifiersperformance. Feature-based self-training approach gives morestable results (the last column). 26. Outline Exploratory dialogue analysis A self-training framework Datasets and experiments Applications 27. Transcript level visualization 28. Time line Visualization80604020 0 9:28 9:32 10:13 11:48 12:00 12:04 12:059:369:409:419:469:509:539:5610:0010:0510:0710:0710:0910:1710:2310:2710:3110:3510:4010:4510:5210:5511:0411:0811:1111:1711:2011:2411:2611:2811:3111:3211:3511:3611:3811:3911:4111:4411:4611:5211:5412:03-20-401. anybody else with poor audio?2. is anyone else Exploratorydifficulty hearing this?Average having -603. background noise makes it difficult to hear1. Sheffield, UK not as sunny as yesterday -1. See you!still warm2. bye for now!2. Greetings from Hong Kong 3. bye, and thank3. Morning from Wiltshire, sunny here!you4. Bye all for now 29. Time line VisualizationTime User IdContent added to which 2M often drops to 10% of that in peak11:46 AM User_2 80times I really disagree - ECDL was the starting point for many11:47 AM User_3 60many first time users11:47 AM User_1 40online basics wont load in final third first11:47 AM User_1 20mobile wont work round her11:47 AM User_1 0 and satlellite costs 40 a month for 1 gig data transfer9:289:3210:1311:4812:0012:0412:05 I think the issue about the skills needed to really embrace 9:36 9:40 9:41 9:46 9:50 9:53 9:56 10:00 10:05 10:07 10:07 10:09 10:17 10:23 10:27 10:31 10:35 10:40 10:45 10:52 10:55 11:04 11:08 11:11 11:17 11:20 11:24 11:26 11:28 11:31 11:32 11:35 11:36 11:38 11:39 11:41 11:44 11:46 11:52 11:54 12:03 -20 technologies is a huge one and with web 2.0 technologies -40 things are becoming more complicated, as I say often you dont just get this stuff by attending a workshop, you have Average Exploratory -60 to participate and appropriate them to your interests and11:47 AM User_4context and network of others. We use myguide on mobile broadband for outreach. Works OK, but not great and thats in city centre11:47 AM User_5boardering 3G/GPS. 30. User Visualization Contribution Distribution of Users50Exploratory Message Count454035 Time User Id Content30 because although some people can2511:42get online the feed is so poor that20AMUser_1 many pages wont load. eg myguide15 how much time and money was spent1011:42getting everyone to use a mobile 5AMUser_1 phone? 0 nothing. because it was perceived to 0 1020 30405060 be useful, therefore there is no need Total Message Count time and money on to spend11:43digitalinclusion, until the access to theAMUser_1 internet works in order to get a 2meg connection to11:44everyone we need fibre to the finalAMUser_1 third 31. User VisualizationContribution Distribution of Users Time50 User Id ContentExploratory Message Count9:5145Hello Im a tutor at Saudi arabiaAM40User_6branch359:5130AMModeratorhello Saudi Arabia!259:5120AMUser_6 hi159:52 Welcome Ashawa - did we meet in10AMModeratorKuwait a couple of years ago? 59:52 0AM 0User_61020 no actually30 40 50 609:52Total Message CountAMModerator@ashawa - maybe next time9:52AMUser_6 yes I wish 32. iiThis step appears to have very goodcontent that will provoke deeper learningiiThis step appears to have some contentthat will provoke deeper learningiiThis step appears to have little contentthat will provoke deeper learning 33. Conclusion We have extended our previously proposed self-training framework for exploratory discoursedetection in synchronous textchat (Elluminateconference sessions). Propose a K Nearest Neighbors algorithm basedinstance selection method. Applied the proposed approach to SocialLearnplatform. 34. Future WorkText analytics: Integrate KNN instance selection method into theself-training framework Explore other features for exploratory dialogueclassification: inter-dialogue features, globalfeatures. Build a more reliable dataset for sub-categoryclassification, challenge, evaluation, reasoning, extension. 35. Future WorkVisual analytics: Investigate how these can be rendered mostusefully for educators and learners Investigate user feedback when deployed Different users will appreciate different levels ofdetail Purdue Signals experience suggests that complexunderlying analytics should be usefully distilled intovery simple feedback But as analytics literacy grows, will users valuemore powerful insights? 36. Acknowledgments Thanks for the guidance and consideration of Dr.He Yulan, Dr. Simon and Dr. Rebecca. Thanks for the consideration from all the othercolleagues in Knowledge Media Institute. 37. Zhongyu WeiThe Chinese University of Hong Kong, Hong Konghttp://www.se.cuhk.edu.hk/~zywei/ Yulan He The Open University, UKhttp://people.kmi.open.ac.uk/yulan/Simon Buckingham Shum The Open University, UKhttp://oro.open.ac.uk/view/person/sjb72.htmlRebecca FergusonThe Open University, UKhttp://oro.open.ac.uk/view/person/rf2656.html