how to teach lod?

1

How to teach LOD?

Bettina Berendt

Dept. Computer Science

KU Leuven

2

Who am I ?

Privacy,Discrimination

3

Research: One persisting question

5

Research: One specific question – How do blogs and tweets spread, change, create news?

6

Workshop series (with, a.o., Markus L.-R.)

[…] synergy between semantics and semantic-web technology on the one hand, and the analysis and mining of usage data on the other hand. […] First, semantics can be used to enhance the analysis of usage data. Second, usage data analysis can enhance semantic resources as well as Semantic Web applications; traces of users can be used to evaluate, adapt or personalize Semantic Web applications. The emerging Web of Data demands a re-evaluation of existing evaluation techniques: the Linked Data community is recognizing that it needs to move beyond triple counts because real value of Web data needs to be measured by real use.

8

Another persisting question:Data Mining for Information LiteracyAnother persisting question:Data Mining for Information Literacy

9

Data Mining for Information Literacy:How?

10

“Knowledge and the Web“ course:Curricular context

Based on experiences in HU Master specialisation „Wirtschaftsinformatik“

• Students: Wirtschaftsinformatik, Computer Science, miscellaneous

2007-2012: KUL Master specialisation „Databases“

• Students: mostly Computer Science students NOT specialising in databases

2013+: KUL Master specialisation „Artificial Intelligence“ (+ Master AI)

• Students: we‘ll see!

6 ECTS

Student numbers over the years: between ~ 6 and ~ 20

11

… & a big thanks to the teaching assistants!

Ilija Subašić

Thomas Peetz

12

Concept of the course

3 blocks:

• Web data, integrating Web data

• Mining Web data

• Applications and implications

Lecture + exercise session+ mini-workshop at the end

One invited talk

Evaluation based on homeworks

• Progression from „exercise“ to „self-defined project“

13

Lecture 2012

& more depth about the other topics

Lecture 2013

14

Homeworks

1. Modelling

2. Populating

3. Integrating

4. (optional) Data mining basics

5. (a non-graded exercise) Reporting on data mining projects / Reading data mining papers

6. Your own project

15

Semantic Web / LOD intro topics

The Semantic Web: Motivation and overview

Very brief recap of XML (& why it’s not semantic)

RDF and RDFS

OWL and ontologies

Linked (Open) Data (LOD)

Storing, accessing and combining SW data

16

Inference topics

Introduction / motivation; kinds of reasoning

Properties of Properties (cf. the Pizza Tutorial)

Class descriptions, cardinality, & value constraints

Does this type of knowledge exist in LOD?

Common problems in using OWL reasoning

17

Schema / ontology matching topics

The match problem & what info to use for matching

(Semi-)automated matching: Example CUPID

(Semi-)automated matching: Example iMAP

Ontology matching, with Example BLOOMS

Evaluating matching

Core ideas of federated databases

Involving the user: Explanations; mass collaboration

18

Identity, inconsistency, provenance topics

Introduction: The promise and risks of openness

Identity crises: owl:sameAs

Inconsistencies and provenance

19

Privacy topics: (1) Preparatory questions

What does privacy mean for you concretely? Can you remember situations where it was important for you to show yourself in a different way than you are? Do you expect such situations in the future?

Privacy also involves the possibility of lying. Is this possibility a right? Give concrete examples and discuss them.

Think of a case where someone would want to not disclose some information and where you would think "this is not right". Does this person claim their privacy? Would your desired outcome be a privacy violation?

Who do you think should be watched most closely when it comes to handling personal information: the government? companies? anyone else? why?

So what does privacy mean for databases and data mining? What problems would you like to see addressed?

(Questions from/inspired by Martens, B., Dierick, G., & Noot, W. (2008). Ethiek en weerbarheid in de informatiesamenleving, Uitgeverij LannooCampus, Leuven & Academic Service, Den Haag, p. 75)

20

Privacy topics (2): Lecture agenda

… and how the law respects them

Societal conventions that allow for secrecy

Surveillance, democracy, and …

Whose privacy? and: when privacy is traded off against other goods

“Data aggregation and record linkage”

Three types of privacy

Trackers and anti-trackers

21

Homework 1: Modelling

22

Homework 2: populating

23

Homework 3: integrating

24

Homework 4 (optional): Basic data mining

25

Reading exercise (1) (from Justin Zobel: Writing for Computer Science)

26

Reading exercise (2)

4. Now consider the guidelines for structuring a data-mining exercise from the CRISP-DM model and manual. A good description of a data-mining project will contain sections on each of the main phases in CRISP-DM.

5. Please identify and highlight passages in the paper you have read that correspond to those phases.

27

Homework 6: Your own project (1)

This homework is your final project for this course. It will take you through much of what you learned throughout the semester, and result in a small yet genuine data mining project.

With the proposal you have sent us, and the feedback you got during the discussion, you by now have a clear idea of what you will be doing. If you run into any problems that you cannot solve in a reasonable amount of time, please contact us as soon as possible.

This homework consists of two parts.

28

Homework 6: Your own project (2)Sharing your data The first three homework sets [… a reminder of what this was …]Any scientific work is only as good as its reproducibility. If you report the results of data

mining without disclosing the data used, you are asking the reader for blind faith. In order to make data mining meaningful, its data sources must be available for follow-up work. Your first task is to do precisely this. Describe the ontology that you have built and specifically the subset of it that you are using for this project. in terms of its purpose, its schema, and basic statistics about its entities. Important questions include the following.

Note: You may copy the answers to these texts from your previous homework sets if they are there already. We mention them again here so you can critically check and, if applicable, extend what you have written earlier.

Where exactly did the data originate from? Are there any problems with these sources? (Example: Do the creators of the source

follow a political agenda, only listing Muslims as terrorists? Are you even allowed to redistribute the data?)

What is the overall schema of the ontology? How did you map and match the ontologies/schemas you found? Which strategies

did you use? Which problems arose? Which attributes are guaranteed to exist for members of the most important classes? Which attributes may exist, but are not always present? How many individuals do have these attributes? Which decisions did you make for selecting the subset of data you are working with

for Homework 6 from the “full” ontology you built? For example, did you select classes, instances, attributes? Did you aggregate attributes? If so, how?

[…]

29

Homework 6: Your own project (3)Data mining In the second step, you perform the project that you prepared so

far. A good report will include the following: A very clear description of the research question you seek to answer. A good motivation for this research question. A critical review of the data, especially if you can expect it to contain the

answer to your research question. A precise description of your experiments and their validation, with a

motivation for the chosen setup. The reader must be able to obtain the exact same result, so they need to know every single parameter.

A discussion of the results, given the data review and the experiments discussion.

A conclusion that gives an answer to the research question. A list of things you would have liked to do, but didn’t due to time

constraints.

Do not forget to carefully evaluate the results of the experiments, using whatever metric is applicable (significance, confidence, accuracy, precision/recall, etc.) in order to supplement the qualitative assessment of the experiments.

30

Homework 6: Example topics from 2012 (Terrorism) and 2011 (Twitter)

Relation between oil and war

The relation between politician, his country, and terrorist attacks

Predict attack type and victim type for new organizations

Converting tweets from mobile speed controls into an historical overview on a map

Where should I go on vacation based on recent tweets?

Seasonal sentiment analysis in tweets (data sets: Libya, Syria)

31

What‘s good: students …

like the course

… are surprised

participate very actively

get hands-on experience

are creative!

reflect on data and on methods

obtain insights

• E.g. from goal: predicting who‘s a terrorist

to goal: finding correlations between a country‘s military expenditure, level of schooling, and incidence of terrorism

32

What‘s not so good / challenges (1)

Prerequisites

• To be able to interpret the results properly, would needo Proper background in statistics 2013+: better given students with more DM background?o Background knowledge about the application area Idea for 2013+: tailor the Invited Talk more closely to the project

• To be able to make more of Semantic Web reasoning, would needo More background in logics Idea for 2013+: interface more closely with parallel logics course

Didactical method: Capacity limitations and „cue-based learning“?!

• Breadth vs. depth …

• Practical learning tends to overtake theory learning

• Difficult to integrate background reading with project (easier for Twitter than for terrorism)

33

What‘s not so good / challenges (2)

Die Mühen der Ebene (“the difficulties on the ground“) in data handling and analysis

• Sparsity of data and lacking empirical regularities are frustrating

• Preference for mashups vs. Data integration?!

• Laborious data preparation is boring and time-intensive

34

Outlook: Next possible student-project field?

ParlBench An LOD of Dutch

parliamentary proceedings

(Tarasova & Marx, Proc. USEWOD/BerSys 2013)

See also (Juric, Hollink, & Houben, Proc. DeRIVE 2012)

OR

Use a similar, but not yet semantified, Flemish dataset

Dutch language: + and –

35

Outlook: curricular changes in 2013+, 2014+(?)

FROM mandatory course in the specialization „Databases“, taken largely by a non-database, heterogeneous audience

TO optional course in the specialization „Artificial Intelligence“, presumably taken by a more homogeneous, largely AI audience

The 6-ECTS course can also be taken, as a 4-ECTS course, by students in the Master of Artificial Intelligence, with

• the Web mining option (focus on modelling and mining)

• the Web data fusion option (focus on modelling and integrating)

To be supplemented by a data course in the Master Digital Humanities (currently under review)

• Chance of joint projects in which expertise can be pooled

36

Outlook: sharing

38

Der titel

Bla der text

• Dflkjfdo Dsflkjdsf

• Eraelkj

Erlajeklj nmnm

text

text

Text

text

[Quelle, XXX]

39

Noch ein Titel

Jkljklllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllll llllll lllll l lllllllll

how to teach lod?

Documents