Big Data Curation And Its Application

Download Big Data Curation And Its Application

Post on 25-May-2015

338 views

Category:

Technology

9 download

TRANSCRIPT

  • 1. Copyright 2013 Hanmin JungHanmin JungHead of Dept. of Computer Intelligence ResearchKISTIBig Data Curation & Its Application

2. Copyright 2013 Hanmin JungLet Me Introduce Myself :-)2 3. Copyright 2013 Hanmin JungRecent Social Activities (2012~) ITRC , UI/UX / Giga KOREA //IBS , SW+ IT VC ISO/IEC JTC1/SC32 , JTC 1/SC34 ISO 20022 TC68 Advisory Expert /SW// / Let Me Introduce Myself :-)3 4. Copyright 2013 Hanmin JungWeekly IT Trends (NIPA)(Vol. 1593)(Vol. 1573)(Vol. 1560)(Vol. 1556)(Vol. 1550)(Vol. 1524)(Vol. 1521)(Vol. 1486)(Vol. 1477)(Vol. 1446)(Vol. 1439)IT HCI (Vol. 1435)(Vol. 1431)(Vol. 1420)(Vol. 1396)(Vol. 1372)(Vol. 1352)2.0 (Vol. 1344)(Vol. 1341)Open API (Vol. 1296)(Vol. 1276)(Vol. 1273)(Vol. 1264)Trend Reports4 5. Copyright 2013 Hanmin JungArt Curator5http://www.chrysler.org/files/resources/gallery-talk-3.jpg 6. Copyright 2013 Hanmin Jung6Press 7. Copyright 2013 Hanmin JungDefinitionActive management of data over its lifecycleEnsuring that data is trustworthy, discoverable, accessible, reusable, andfit for useE.g. data preparation for analyticsCuration TypesManual curation(Semi-)automatic curationE.g. data cleansing, record duplication, classificationSheer curation (curation at source)Integrated in workflow for creating and managing dataData Curation7E. Curry, A. Freitas, and S. ORiain, Data Curation at the New York Times, 2011. 8. Copyright 2013 Hanmin JungSteps to Setup the ProcessIdentify what data you need to curateIdentity who will curate the dataDefine the curation workflowIdentify appropriate data-in & data-out formatsIdentity the artifacts, tools, and processes needed to support the processData Curation8E. Curry, A. Freitas, and S. ORiain, Data Curation at the New York Times, 2011. 9. Copyright 2013 Hanmin JungCase Studies91. Apple Maps2. Strategic Foresight3. Big Data 10. Copyright 2013 Hanmin Junghttp://cdn0.sbnation.com/entry_photo_images/5667445/20120920-DSC_7192VERGE_large_verge_medium_landscape.jpgApple Maps10 11. Copyright 2013 Hanmin JungGoogle Maps vs. Apple Maps11http://www.gadgetreview.com/2013/01/google-maps-vs-apple-maps-ios-comparison.html 12. Copyright 2013 Hanmin JungApple Sirihttp://www.apple.com/ios/siri/12 13. Copyright 2013 Hanmin Jung13We can see it as much about itas we knowhttp://investor.google.com/financial/tables.htmlGoogles Financial Table 14. Copyright 2013 Hanmin Jung14Nokias Burning Ships Strategyhttp://www.brightsideofnews.com/Data/2011_5_5/Steven-Elop-Burn-Boats-Strategy-is-a-Cortez-Move-for-Nokia/Hernando_Cortez_BurningBoat.jpg 15. Copyright 2013 Hanmin JungResult of the StrategyWorldwide Market ShareWorldwide mobile device sales to end users in 2008 ~ 2013Gartner, IDC Worldwide Mobile Phone Tracker38.6, 117.937.8, 108.531.6,110.427.1, 106.618.7, 82.914.8, 61.9Nokia3.5, 12.14.9, 19.13.1, 13.73.2, 13.5ZTE418.641.9, 175.43.7, 15.48.9, 37.427.5, 115.01Q2013(%, M. Units)Company3Q2012(%, M. Units)3Q2011(%, M. Units)3Q2010(%, M. Units)3Q2009(%, M. Units)3Q2008(%, M. Units)Samsung 23.7, 105.4 22.3, 87.8 20.5, 71.4 21.0, 60.2 17.0, 52.0Apple 6.1, 26.9 4.3, 17.1 4.0, 14.1LG 3.1, 14.0 5.4, 21.1 8.1, 28.4 11.0, 31.6 7.5, 23.0Sony Ericsson 4.9, 14.1 8.4, 25.7Motorola 4.7, 13.6 8.3, 25.4Others 45.3, 201.6 36.1, 142 32.2, 112.5 20.6, 59.1 20.1, 61.5Total 444.5 393.7 348.9 287.1 305.415 16. Copyright 2013 Hanmin Jung16Sign of the Resulthttp://www.google.com/insights/search/Google Insights for Search 17. Copyright 2013 Hanmin JungStrategic ForesightR. Rohrbeck, H. Arnold, and J. Heuer, Strategic Foresight in Multimedia Enterprises, 2007.17 18. Copyright 2013 Hanmin Jung18Hype CycleEmerging Technologies Hype Cycle 2012 19. Copyright 2013 Hanmin Jung19Social Datahttp://bynoy.files.wordpress.com/2011/08/united-noy-weblife-60-seconds.jpg 20. Copyright 2013 Hanmin Jung20Machine DataT. Baer, What is Big Data? The Reality for Analytics, OVUM, 2011.Call data recordsCall data recordsSensory dataSensory dataWeb log filesWeb log filesFinancial Instrument TradeFinancial Instrument Trade 21. Copyright 2013 Hanmin Jung21Crop Yield Estimationhttp://www.geovar.com/data/satellite/rapideye/rapideye_sample_image_bands543.jpg 22. Copyright 2013 Hanmin Jung22Crop Yield Estimationhttps://www.agriskmanagementforum.org/content/improved-supply-intelligence-global-agriculture-marketsSouth American soybean production forecastsSouth American soybean production forecasts 23. Copyright 2013 Hanmin Jung23Soybeans Futures (CME) 24. Copyright 2013 Hanmin Jung24Data Mining MethodsA. Cheung, Forecast Anything! The Seven Data Mining ModelsDecision TreesNaive BayesCluster AnalysisSequence ClusteringAssociation RulesTime SeriesNeural Networks 25. Copyright 2013 Hanmin Jung25Value PyramidSearchClusteringExtractingDecisionSupportForecastingScenarioPlanningAdvisingModified from D. Bousfield & P. Fooladi, STM Information: 2009 Final Market Size and Share Report, 2010.InSciTe Advanced (2011)InSciTe Adaptive (2012)InSciTe Advisory (2013~)1st Target of Data Curation 26. Copyright 2013 Hanmin JungData Curationfor Getting Insights & Foresights261. Crawling Data2. Extracting Information (Text Mining)3. Resolving Identities 27. Copyright 2013 Hanmin JungCrawling DataCrawling Full Texts from Google Scholar27 28. Copyright 2013 Hanmin JungCrawling DataCrawling Web Data by RSS & Google API28 29. Copyright 2013 Hanmin JungCrawling DataWeb Data Statistics in InSciTe Adaptive29Source/Year 2001 2008 2009 2010 2011 2012 TotalIDC 0 1 13 21 313 250 598Wikipedia 0 151,942 181,721 408,017 1,555,867 249,0209 4,975,178InformationWeek 0 470 551 536 388 81 2,316Gizmag 0 1,371 2,301 2,970 2,850 2,520 16,731Technologyreview 89 646 693 758 873 405 5,330IEEE spectrum 64 468 426 385 306 205 3,173Technewsworld 31 696 742 1,396 2,332 2,485 9,843DiscoverMagazine 2 104 51 79 57 49 606NewYork Times 20,940 7,979 4,696 3,971 3,669 2,064 124,100BBC 3,155 2,687 3,158 3,318 5,964 4,243 37,073Fox News 0 1,577 1,965 1,123 2,015 1,872 10,749CNN 3,594 1,154 1,499 1,071 1,308 814 19,769Thomson Reuters 0 450 371 309 338 222 3,408USA Today 458 4,616 3,878 4,630 6,579 4,276 38,750EtnTws.com 447 1,964 2,241 2,099 1,585 844 14,259Total 28,780 176,125 204,306 430,683 1,584,444 2,510,539 5,261,883 30. Copyright 2013 Hanmin JungText Mining Concept30 31. Copyright 2013 Hanmin Jung31Mining TargetsNamed entities: e.g. PLOsPattern-based entities: e.g. a-mail addresses, phone numbersConcepts: abstractions of entitiesFacts and relationshipsConcrete and abstract attributes: e.g. 10-year, expensive, comfortableSubjectivity in the forms of opinions, sentiments, and emotions: e.g.positive/negative, angryS. Grimes, Text Analytics Overview: Technology, Solutions, Market, 2011.Text Mining Concept 32. Copyright 2013 Hanmin Jung32Text Mining SINDI Architecture 33. Copyright 2013 Hanmin JungText Mining Example33 34. Copyright 2013 Hanmin JungText Mining Example34 35. Copyright 2013 Hanmin JungText Mining Example35 36. Copyright 2013 Hanmin JungText Mining Example36 37. Copyright 2013 Hanmin JungText Mining Example37 38. Copyright 2013 Hanmin JungText Mining Application38 39. Copyright 2013 Hanmin JungResolving IdentitiesChristian Bizer, Tom Heath, and Tim Berners-Lee, Linked Data The Story So Far, 2009.URI AliasesURIs that refer to the same real-world objectsE.g. http://dbpedia.org/resource/Berlin (for Berlin in DBpedia)E.g. http://sws.geonames.org/2950159 (for Berlin in Geonames)Information providers can set owl:sameAs links to URI aliases they knowaboutResolution of Data Conflicts in Data FusionChoosing a value in situations where multiple sources provide differentvalues for the same property of an object39 40. Copyright 2013 Hanmin Jung40Resolving IdentitiesExample, , , , , Sho Jo Ji Dai, So NyeoShi Dae, SoShi, Girls Generation, SNSD, So Nyuh Shi Dae, 41. Copyright 2013 Hanmin JungResolving Identitieshttp://sindice.com/search?q=Hanmin+Jung41 42. Copyright 2013 Hanmin Jung42OntoURIResolverDemonstration 43. Copyright 2013 Hanmin JungLOD Projecthttp://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.htmlLinked Data31 billion RDF triples, 504 RDF links (2011.9)43 44. Copyright 2013 Hanmin JungInSciTe Advanced (2011)44 45. Copyright 2013 Hanmin Jung45InSciTe Adaptive (2012)https://play.google.com/store/apps/details?id=net.xenix.inscite&feature=search_result#?t=W10. 46. Copyright 2013 Hanmin Jung46InSciTe Adaptive (2012) 47. Copyright 2013 Hanmin JungInSciTe Adaptive (2012)Data Fact SheetArticles: 22.6 millions (9.8 millions for papers, 7.6 millions for patents, 5.3millions for Web data)All technical areas (2001~2011)Named entities: 1.9 millionsAuthority dictionary: 1.5 millions entriesLOD data: 290 GB (are being connected)47 48. Copyright 2013 Hanmin JungPress48 49. Copyright 2013 Hanmin JungInSciTe ArchitectureAnalytics ModelsETD ModelEmerging Technology Discovery ModelTLCD ModelTechnology Life Cycle Discovery ModelTLC ModelTechnology Life Cycle ModelOntoRelFinderRelationship Path FinderOntoReasonerReasoning EngineOntoURISemantic Knowledge ManagerOntoPipelinerSemantic Service ComposerSS&AESemantic Search & Analytics EngineOntoURIResolverIdentity ResolverSINDI-CORE/LINKEntity & Relationship ExtractorTUC ModelTerminology Use Cycle ModelOntologyLinked DataOntoFrameOntoVerifierReasoning VerifierWeb Data CrawlerRSS/Google APIWeb DataLiteratures49 50. Copyright 2013 Hanmin JungOntologyDBOntology SchemaOntology InstancesRDF TriplesPortability & ConnectibilityFor Storing & ManagingFor ServicePlanning ServicesDefining ConceptsExploiting Relations50 51. Copyright 2013 Hanmin JungBig Data Processing in InSciTe51 52. Copyright 2013 Hanmin Jung52InSciTe Homepagehttp://inscite.kisti.re.kr 53. Copyright 2013 Hanmin Jung53Thank youjhm@kisti.re.krA lot of times, people dont know what they want until you show it to them.by Steve JobsMany people wont be convinced until theyve seen it for themselves.by Jakob Nielsen

Recommended

View more >