how to teach lod?
DESCRIPTION
How to teach LOD?. Bettina Berendt Dept. Computer Science KU Leuven. Who am I ?. Privacy, Discrimination. Research: One persisting question. Research: One specific question – How do blogs and tweets spread, change, create news?. Workshop series (with, a.o., Markus L.-R.). - PowerPoint PPT PresentationTRANSCRIPT
1
How to teach LOD?
Bettina Berendt
Dept. Computer Science
KU Leuven
2
Who am I ?
Privacy,Discrimination
3
Research: One persisting question
5
Research: One specific question – How do blogs and tweets spread, change, create news?
6
Workshop series (with, a.o., Markus L.-R.)
[…] synergy between semantics and semantic-web technology on the one hand, and the analysis and mining of usage data on the other hand. […] First, semantics can be used to enhance the analysis of usage data. Second, usage data analysis can enhance semantic resources as well as Semantic Web applications; traces of users can be used to evaluate, adapt or personalize Semantic Web applications. The emerging Web of Data demands a re-evaluation of existing evaluation techniques: the Linked Data community is recognizing that it needs to move beyond triple counts because real value of Web data needs to be measured by real use.
8
Another persisting question:Data Mining for Information LiteracyAnother persisting question:Data Mining for Information Literacy
9
Data Mining for Information Literacy:How?
10
“Knowledge and the Web“ course:Curricular context
Based on experiences in HU Master specialisation „Wirtschaftsinformatik“
• Students: Wirtschaftsinformatik, Computer Science, miscellaneous
2007-2012: KUL Master specialisation „Databases“
• Students: mostly Computer Science students NOT specialising in databases
2013+: KUL Master specialisation „Artificial Intelligence“ (+ Master AI)
• Students: we‘ll see!
6 ECTS
Student numbers over the years: between ~ 6 and ~ 20
11
… & a big thanks to the teaching assistants!
Ilija Subašić
Thomas Peetz
12
Concept of the course
3 blocks:
• Web data, integrating Web data
• Mining Web data
• Applications and implications
Lecture + exercise session+ mini-workshop at the end
One invited talk
Evaluation based on homeworks
• Progression from „exercise“ to „self-defined project“
13
Lecture 2012
& more depth about the other topics
Lecture 2013
14
Homeworks
1. Modelling
2. Populating
3. Integrating
4. (optional) Data mining basics
5. (a non-graded exercise) Reporting on data mining projects / Reading data mining papers
6. Your own project
15
Semantic Web / LOD intro topics
The Semantic Web: Motivation and overview
Very brief recap of XML (& why it’s not semantic)
RDF and RDFS
OWL and ontologies
Linked (Open) Data (LOD)
Storing, accessing and combining SW data
16
Inference topics
Introduction / motivation; kinds of reasoning
Properties of Properties (cf. the Pizza Tutorial)
Class descriptions, cardinality, & value constraints
Does this type of knowledge exist in LOD?
Common problems in using OWL reasoning
17
Schema / ontology matching topics
The match problem & what info to use for matching
(Semi-)automated matching: Example CUPID
(Semi-)automated matching: Example iMAP
Ontology matching, with Example BLOOMS
Evaluating matching
Core ideas of federated databases
Involving the user: Explanations; mass collaboration
18
Identity, inconsistency, provenance topics
Introduction: The promise and risks of openness
Identity crises: owl:sameAs
Inconsistencies and provenance
19
Privacy topics: (1) Preparatory questions
What does privacy mean for you concretely? Can you remember situations where it was important for you to show yourself in a different way than you are? Do you expect such situations in the future?
Privacy also involves the possibility of lying. Is this possibility a right? Give concrete examples and discuss them.
Think of a case where someone would want to not disclose some information and where you would think "this is not right". Does this person claim their privacy? Would your desired outcome be a privacy violation?
Who do you think should be watched most closely when it comes to handling personal information: the government? companies? anyone else? why?
So what does privacy mean for databases and data mining? What problems would you like to see addressed?
(Questions from/inspired by Martens, B., Dierick, G., & Noot, W. (2008). Ethiek en weerbarheid in de informatiesamenleving, Uitgeverij LannooCampus, Leuven & Academic Service, Den Haag, p. 75)
20
Privacy topics (2): Lecture agenda
… and how the law respects them
Societal conventions that allow for secrecy
Surveillance, democracy, and …
Whose privacy? and: when privacy is traded off against other goods
“Data aggregation and record linkage”
Three types of privacy
Trackers and anti-trackers
21
Homework 1: Modelling
22
Homework 2: populating
23
Homework 3: integrating
24
Homework 4 (optional): Basic data mining
25
Reading exercise (1) (from Justin Zobel: Writing for Computer Science)
26
Reading exercise (2)
4. Now consider the guidelines for structuring a data-mining exercise from the CRISP-DM model and manual. A good description of a data-mining project will contain sections on each of the main phases in CRISP-DM.
5. Please identify and highlight passages in the paper you have read that correspond to those phases.
27
Homework 6: Your own project (1)
This homework is your final project for this course. It will take you through much of what you learned throughout the semester, and result in a small yet genuine data mining project.
With the proposal you have sent us, and the feedback you got during the discussion, you by now have a clear idea of what you will be doing. If you run into any problems that you cannot solve in a reasonable amount of time, please contact us as soon as possible.
This homework consists of two parts.
28
Homework 6: Your own project (2)Sharing your data The first three homework sets [… a reminder of what this was …]Any scientific work is only as good as its reproducibility. If you report the results of data
mining without disclosing the data used, you are asking the reader for blind faith. In order to make data mining meaningful, its data sources must be available for follow-up work. Your first task is to do precisely this. Describe the ontology that you have built and specifically the subset of it that you are using for this project. in terms of its purpose, its schema, and basic statistics about its entities. Important questions include the following.
Note: You may copy the answers to these texts from your previous homework sets if they are there already. We mention them again here so you can critically check and, if applicable, extend what you have written earlier.
Where exactly did the data originate from? Are there any problems with these sources? (Example: Do the creators of the source
follow a political agenda, only listing Muslims as terrorists? Are you even allowed to redistribute the data?)
What is the overall schema of the ontology? How did you map and match the ontologies/schemas you found? Which strategies
did you use? Which problems arose? Which attributes are guaranteed to exist for members of the most important classes? Which attributes may exist, but are not always present? How many individuals do have these attributes? Which decisions did you make for selecting the subset of data you are working with
for Homework 6 from the “full” ontology you built? For example, did you select classes, instances, attributes? Did you aggregate attributes? If so, how?
[…]
29
Homework 6: Your own project (3)Data mining In the second step, you perform the project that you prepared so
far. A good report will include the following: A very clear description of the research question you seek to answer. A good motivation for this research question. A critical review of the data, especially if you can expect it to contain the
answer to your research question. A precise description of your experiments and their validation, with a
motivation for the chosen setup. The reader must be able to obtain the exact same result, so they need to know every single parameter.
A discussion of the results, given the data review and the experiments discussion.
A conclusion that gives an answer to the research question. A list of things you would have liked to do, but didn’t due to time
constraints.
Do not forget to carefully evaluate the results of the experiments, using whatever metric is applicable (significance, confidence, accuracy, precision/recall, etc.) in order to supplement the qualitative assessment of the experiments.
30
Homework 6: Example topics from 2012 (Terrorism) and 2011 (Twitter)
Relation between oil and war
The relation between politician, his country, and terrorist attacks
Predict attack type and victim type for new organizations
Converting tweets from mobile speed controls into an historical overview on a map
Where should I go on vacation based on recent tweets?
Seasonal sentiment analysis in tweets (data sets: Libya, Syria)
31
What‘s good: students …
like the course
… are surprised
participate very actively
get hands-on experience
are creative!
reflect on data and on methods
obtain insights
• E.g. from goal: predicting who‘s a terrorist
to goal: finding correlations between a country‘s military expenditure, level of schooling, and incidence of terrorism
32
What‘s not so good / challenges (1)
Prerequisites
• To be able to interpret the results properly, would needo Proper background in statistics 2013+: better given students with more DM background?o Background knowledge about the application area Idea for 2013+: tailor the Invited Talk more closely to the project
• To be able to make more of Semantic Web reasoning, would needo More background in logics Idea for 2013+: interface more closely with parallel logics course
Didactical method: Capacity limitations and „cue-based learning“?!
• Breadth vs. depth …
• Practical learning tends to overtake theory learning
• Difficult to integrate background reading with project (easier for Twitter than for terrorism)
33
What‘s not so good / challenges (2)
Die Mühen der Ebene (“the difficulties on the ground“) in data handling and analysis
• Sparsity of data and lacking empirical regularities are frustrating
• Preference for mashups vs. Data integration?!
• Laborious data preparation is boring and time-intensive
34
Outlook: Next possible student-project field?
ParlBench An LOD of Dutch
parliamentary proceedings
(Tarasova & Marx, Proc. USEWOD/BerSys 2013)
See also (Juric, Hollink, & Houben, Proc. DeRIVE 2012)
OR
Use a similar, but not yet semantified, Flemish dataset
Dutch language: + and –
35
Outlook: curricular changes in 2013+, 2014+(?)
FROM mandatory course in the specialization „Databases“, taken largely by a non-database, heterogeneous audience
TO optional course in the specialization „Artificial Intelligence“, presumably taken by a more homogeneous, largely AI audience
The 6-ECTS course can also be taken, as a 4-ECTS course, by students in the Master of Artificial Intelligence, with
• the Web mining option (focus on modelling and mining)
• the Web data fusion option (focus on modelling and integrating)
To be supplemented by a data course in the Master Digital Humanities (currently under review)
• Chance of joint projects in which expertise can be pooled
36
Outlook: sharing
37
38
Der titel
Bla der text
• Dflkjfdo Dsflkjdsf
• Eraelkj
Erlajeklj nmnm
text
text
Text
text
[Quelle, XXX]
39
Noch ein Titel
Jkljklllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllll llllll lllll l lllllllll