linked bibliographic data - society.library.sh.cn
TRANSCRIPT
内容和目标
• 以关联书目数据为实例,介绍关联数据发布、查询和消费的基本模式。以期大家对关联数据的技术原理有一个直观的了解。
• 重点讨论以下五个方面:1. 关联数据概述
2. 从卡片到关联书目数据:书目数据语义架构的历史考察
3. 关联书目数据的语义表达:模型和模式
4. 关联数据数据的查询—SPARQL
5. 关联书目数据的编程 –发布消费与混搭
发展:图书馆关联书目数据项目—以国家图书馆为例
National Agricultural Library Thesaurus(美国) http://agclass.nal.usda.gov/agt.shtml
Web NDL Authorities - National Diet Library of Japan (日本) http://id.ndl.go.jp/auth/ndla
British National Bibliography (BNB) (英国) http://bnb.data.bl.uk
Polythematic Structured Subject Heading System (捷克) http://psh.ntkcz.cz/skos/home/html/en
Library of Congress Subject Headings (美国) http://id.loc.gov/authorities/
B3Kat - Library Union Catalogues of Bavaria, Berlin and Brandenburg
(德国) http://lod.b3kat.de
Deutsche Nationalbibliografie (DNB) (德国) http://www.dnb.de/EN/datendienste/linkedData
datos.bne.es (西班牙) http://datos.bne.es
data.bnf.fr - Bibliothèque nationale de France (法国) http://data.bnf.fr
LIBRIS (瑞典) http://libris.kb.se
Hungarian National Library (NSZL) catalog (匈牙利) http://nektar.oszk.hu/wiki/Semantic_web
Library of Congress Name Authority File (NAF) http://id.loc.gov/download/
Rådata nå! Norwegian personal name authorities as linked data
(挪威国家图书馆参与) BIBSYS is a key supplier of products and services for
higher educational institutions, other research institutions in Norway, public
administrative institutions and the National Library of Norway. http://data.bibsys.no/data
三元组
National Agricultural Library Thesaurus 364996
Web NDL Authorities - National Diet Library of Japan 15000000
British National Bibliography (BNB) 84961180
Polythematic Structured Subject Heading System 100000
Library of Congress Subject Headings 4151586
B3Kat - Library Union Catalogues of Bavaria, Berlin and Brandenburg 570000000
Deutsche Nationalbibliografie (DNB) 12786555
datos.bne.es 58053215
data.bnf.fr - Bibliothèque nationale de France 6330000
LIBRIS 50000000
Hungarian National Library (NSZL) catalog 19300000
Rådata nå! 9370074
总计 830417606
SPARQL Endpoint
Web NDL Authorities - National Diet Library of Japan http://id.ndl.go.jp/auth/ndla/
British National Bibliography (BNB) http://bnb.data.bl.uk/sparql
B3Kat - Library Union Catalogues of Bavaria, Berlin and Brandenburg
http://lod.b3kat.de/sparql
LIBRIS http://lab3.libris.kb.se/sparql
Hungarian National Library (NSZL) catalog http://setaria.oszk.hu/sparql
Rådata nå! http://data.bibsys.no/data/query_authority.html
Library of Congress Subject Headings http://api.talis.com/stores/lcsh-info/services/sparql
外联datos.bne.es links:dbpedia 36431
datos.bne.es links:dnb-gemeinsame-normdatei 76413
datos.bne.es links:lexvo 3112900
datos.bne.es links:libris 10884
datos.bne.es links:sudocfr 9725
datos.bne.es links:viaf 454068
Deutsche Nationalbibliografie (DNB) links:gnd 16.734.298
Deutsche Nationalbibliografie (DNB) links:iso639-2 3.263.366
Hungarian National Library (NSZL) catalog links:dbpedia 6285
Hungarian National Library (NSZL) catalog links:viaf 33709
Library of Congress Subject Headings links:stitch-rameau 55281
LIBRIS links:dbpedia 4669
LIBRIS links:lcsh 12586
LIBRIS links:viaf 248228
Polythematic Structured Subject Heading System links:dbpedia 3000
Polythematic Structured Subject Heading System links:lcsh 3000
Rådata nå! links:dbpedia 30346
Rådata nå! links:dnb-gemeinsame-normdatei 209681
Rådata nå! links:viaf 311154
Web NDL Authorities - National Diet Library of Japan links:lcsh 4545
Web NDL Authorities - National Diet Library of Japan links:viaf 2673
外联总计
links:gnd Total 1 16734298
links:iso639-2 Total 1 3263366
links:lexvo Total 1 3112900
links:libris Total 1 10884
links:stitch-rameau Total 1 55281
links:sudocfr Total 1 9725
links:dnb-gemeinsame-normdatei Total 2 286094
links:lcsh Total 3 20131
links:dbpedia Total 5 80731
links:viaf Total 5 1049832
Grand Total 24623242
什么是关联书目数据
• 关联书目数据是利用语义网技术,遵循关联数据原则组织、生成和发布的书目数据。
• 关联书目数据的特征:– 用URIs来命名书目数据的各种对象(things)
– 用 HTTP URIs,因此可以实现referred /dereferenced
– 当用户(人或机器)解析(dereference)书目数据对象时,可以获得有用的信息,这些信息以通用标准的形式便发出来,如RDF/XML.SPARQL
– 包含各种连接到外部数据的URI,以加强网络信息的发现机制
特点二
• 特点:从描述信息到描述关系
One of the key concepts of Linked Data is to represent data as a set of interlinked things. These things are referred to as objects of interest. They are things about which we can make statements.
-- Tim Hodson
Hodson, T. (2011, July 22nd). British Library Data Model:Overview. Retrieved Jun2 24th, 2012, from Talis systems: http://talis-systems.com/2011/07/british-library-data-model-overview/
关联书目数据vs.传统书目数据
Layer New generation Traditional catalogue
Concept model FRBR Paris Principles
Definition layer RDA AACRII
Semantic layer Vocabulary DC, SKOS OWL MARC
Syntax RDF
Coding layer XML, JSON, et.al.
Access layer RESTful Z39.50
书目数据语义架构的历史考察
书目类型 语义表达
卡片的语义表达 标点符号:斜杠逗号加冒号点横括号空一格
电子书目数据 MARC 字段标识符,每个字段不能表达一个完整的意思.
关联书目数据 RDF 三元组,主谓宾构成一个完整的句子,表达一个完整的语义
• 000 00808cam a22002658a 450• 001 399195• 005 20011123074558.0• 008 890403s1990 nyuabcf b 00110 eng• 020 __ |a 0393027082• 035 __ |a 89009241• 035 __ |9 BAA7243GL• 040 __ |d CU• 043 __ |a a-cc---• 050 00 |a DS754 |b .S65 1990• 082 00 |a 951/.03 |2 20• 100 10 |a Spence, Jonathan D.• 245 14 |a The search for modern China / |c by Jonathan D. Spence.• 250 __ |a 1st ed.• 260 0_ |a New York : |b Norton, |c c1990.• 300 __ |a xxv, 876 p., [130] p. of plates ? |b ill. (some col.), facs ims., maps, ports. ; |c 24 cm.• 500 __ |a Maps on lining pages.• 504 __ |a Includes bibliographies and index.• 651 _0 |a China |x History |y Qing dynasty, 1644-1912.• 651 _0 |a China |x History |y 20th century.• 984 __ |a 3839 |c 951 S74
MARC数据的statement
关联书目数据的statement
http://bnb.data.bl.uk/id/resource/011954620
http://purl.org/dc/terms/creator
http://bnb.data.bl.uk/id/person/SpenceJonathanD1936-
功能需求分析和数据建模
• 建模四项基本原则– 功能需求原则:功能需求是数据建模的出发点
– Thing 原则:Thing是建模的核心
– 复用原则:将可重复使用的资源定义为Thing,并尽量复用外部资源
– 扩展原则:采用扩展通用数据模型的方法建模以提高系统的互操作性。
用这四项原则来分析以下数据模型– FRBR模型
– Europeana Data Model v. 5.2.3
– 大英图书馆数据模型
FRBR or Not FRBR
两种数据建模模式:FRBR模式和非FRBR模式
非FRBR的观点
Nobody talks about works, expressions and manifestations, so why describe our data that way?
Styles, Rob. Bringing FRBR Down to Earth…. 11 November 2009. http://dynamicorange.com/2009/11/11/bringing-frbr-down-to-earth/ (accessed July 1, 2012).
FRBR类型
Koster, L. (2012, January 5th). Local library data in the new global framework. Retrieved June 26th, 2012, from COMMONPLACE.NET: http://commonplace.net/tag/frbr/
Koster, L. (2012, January 5th). Local library data in the new global framework. Retrieved June 26th, 2012, from COMMONPLACE.NET: http://commonplace.net/tag/frbr/
Koster, L. (2012, January 5th). Local library data in the new global framework. Retrieved June 26th, 2012, from COMMONPLACE.NET: http://commonplace.net/tag/frbr/
非FRBR建模British Library Data Model
图片来源:http://www.bl.uk/bibliographic/datafree.html/pdfs/bldatamodelbook.pdf
Europeana Data Model v. 5.2.3
The EDM Class hierarchy. The classes introduced by EDM are shown is light blue rectangles. The classes in the white rectangles are re-used from other schemas; the schema is indicated before the colon.
图片来源:http://pro.europeana.eu/edm-documentation
词汇结构
• 词汇是关联书目数据语义的主要载体
• 和传统书目数据相比较,书目数据的词汇体系由单一词汇体系向多词汇系统演化
• 各种词汇在一个系统协同和谐的基础是逻辑的一致性。
• 词汇的复用与语义的互操作
URI命名
• CLEAN URLs 是一种单纯结构化的仅包含路径和资源的而不包含诸如查询字段的URLs。通常URL:http://example.com/index.php?page=foo
CLEAN URL http://example.com/foo
• 用Clean URLs来命名资源– http://bnb.data.bl.uk/id/resource/011954620
– http://bnb.data.bl.uk/id/concept/lcsh/ChinaHistoryQingdynasty1644-1912
– http://dbpedia.org/resource/Shanghai_Library
• RDF中利用URL来命名资源的实例<rdf:Description rdf:about="http://bnb.data.bl.uk/id/resource/011954620">
<dct:creator rdf:resource="http://bnb.data.bl.uk/id/person/SpenceJonathanD1936-"/>
</rdf:Description>
• Permanent URL short URL
• http://Purl.oclc.org
关联书目数据的存储
• 传统数据库模式
即RDF文档是动态生成,数据后台管理依然是采用关系数据库。
• RDF三元组存储
书目数据预先转换成RDF文档,以三元组的形式存储,如英国图书馆数据,生成RDF文档通过Talis平台发布
关联书目数据的查询(SPARQL)
• SPARQL 是一种查询RDF数据的语言
• SPARQL基于RDF数据模型(三元组)
• SPARQL是pattern matching,可以查询RDF数据模型中的主谓宾任何部分
• SPARQL Endpoint是一种RESTful Web服务
一个简单的SPARQL查询
SELECT ?a ?b ?c
WHERE {?a ?b ?c} ----必须是三元组
LIMIT 10
查询大英图书馆数据
http://bnb.data.bl.uk/sparql
查dbpedia
http://dbpedia.org/sparql
查主谓宾
• 用URIs表示thing• 用尖括号表示URIs• 可以查询主谓宾—(可以查关系)
SELECT ?b ?cWHERE {<http://bnb.data.bl.uk/id/resource/011954620> ?b ?c}LIMIT 10
SELECT ?a ?cWHERE {?a <http://purl.org/dc/terms/title >?c}LIMIT 10
SELECT ?cWHERE {<http://bnb.data.bl.uk/id/resource/011954620> <http://purl.org/dc/terms/title> ?c}LIMIT 10
SELECT ?bWHERE {<http://bnb.data.bl.uk/id/resource/011954620> ?b <http://bnb.data.bl.uk/id/person/SpenceJonathanD1936->}LIMIT 10
多条件查询和PREFIX
SPARQL支持多条件查询,每个条件之间用“ . ”相连接,相当于AND
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?b ?c
WHERE {<http://bnb.data.bl.uk/id/resource/011954620> ?b <http://bnb.data.bl.uk/id/person/SpenceJonathanD1936->.<http://bnb.data.bl.uk/id/person/SpenceJonathanD1936-> rdfs:label ?c.
}
LIMIT 10
OPTIONAL
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
SELECT ?b ?v ?c
WHERE {<http://bnb.data.bl.uk/id/resource/011954620> ?b ?v.
OPTIONAL {?c dct:creator ?v}}
LIMIT 10
OPTIONAL
先查select distinct ?a ?b where {?a ?b
<http://dbpedia.org/class/yago/LibrariesInChina>} LIMIT 100
再查select distinct ?a ?b ?c where {?a ?b
<http://dbpedia.org/class/yago/LibrariesInChina> . ?a dbpprop:director ?c} LIMIT 100
用OPTIONAL查select distinct ?a ?b ?c where {?a ?b
<http://dbpedia.org/class/yago/LibrariesInChina> . OPTIONAL{?a dbpprop:director ?c}} LIMIT 100
UNION
• UNION将两个独立的查询条件通过类似布尔运算的“并”将其连接在一起,组成一个查询PREFIX rdfs: <http://www.w3.org/2000/01/rdf-
schema#>
PREFIX dct:<http://purl.org/dc/terms/>
SELECT ?b ?v ?n ?m
WHERE {{<http://bnb.data.bl.uk/id/resource/011954620> ?b ?v} UNION{
<http://bnb.data.bl.uk/id/resource/GB9917627> ?n ?m}}
UNION
比较select distinct ?a ?b ?c where {{?a ?b
<http://dbpedia.org/class/yago/LibrariesInChina>}OPTIONAL{?a dbpprop:director ?c .?a ?b <http://dbpedia.org/class/yago/LibrariesInChina>}} LIMIT 100
和select distinct ?a ?b ?c where {{?a ?b
<http://dbpedia.org/class/yago/LibrariesInChina>}UNION{?a dbpprop:director ?c .?a ?b <http://dbpedia.org/class/yago/LibrariesInChina>}} LIMIT 100
后者结果中上海图书馆和国家图书馆是重复的
FILTER
• FILTER 的作用是设定查询条件,其后跟一个判断语句。这个判断语句可以是表达式,也可以是函数
select distinct ?a ?b ?c where {?a ?b <http://dbpedia.org/class/yago/LibrariesInChina> ; geo:lat ?c filter(?c>30) }LIMIT 100
select distinct ?a ?b ?c where {?a ?b <http://dbpedia.org/class/yago/LibrariesInChina> OPTIONAL {?a dbpprop:established ?c filter (?c<1949)}}LIMIT 100
FILTER与函数
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct:<http://purl.org/dc/terms/>
SELECT ?a ?b
WHERE {?a dct:subject ?b filter(regex(str(?b),"China"))}
limit 10
DESCRIBE and CONSTRUCT
• DESCRIBE 和CONSTRUCT都是返回一个完整的RDF数据集
• DESCRIBE提供了一个资源的全部描述而CONSTRUCT则依据用户设定的模板生成一个新的RDF描述
SPARQL与知识发现?
实例1 巴金和茅盾的共同点
construct {dbpedia:Ba_Jin_and_Mao_Dun ?a ?x} where{<http://dbpedia.org/resource/Ba_Jin> ?a ?x.<http://dbpedia.org/resource/Mao_Dun> ?a ?x }
实例2肖红和Nikolai的关系
construct { <http://dbpedia.org/resource/Xiao_Hong> ?x <http://dbpedia.org/resource/Nikolai_Gogol>} where {<http://dbpedia.org/resource/Xiao_Hong> ?a ?x. {<http://dbpedia.org/resource/Nikolai_Gogol> ?a ?x } union {?x ?a <http://dbpedia.org/resource/Nikolai_Gogol>}}
关联书目数据的系统结构RESRTful
• RESRTful是一种web 服务模式• RESTful的主要原理:
– 规范使用 HTTP方法• POST:创建资源(create)• GET:检索资源(retrieve)• PUT:修改资源(change or update)• DELETE:删除资源(remove or delete)
– 无状态会话:对同一个请求,服务器将返回完整的资源,同一个资源分布传输其间不需要送交State信息.
– 结构化URIs. (CLEAN URIs)– 按需返回同一资源的不同格式类型,如HTML, XML, JSON等.
结构化URIs
设置CLEAN URL(Apache .htaccess为例)
# Rewrite URLs of the form 'x' to the form 'index.php?q=x'.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
RESRTful关联书目数据发布程序样例
<?php$accept=$_SERVER['HTTP_ACCEPT'];header("Content-type:".$accept);switch ($accept){case 'text/html':
echo '<html xmlns="http://www.w3.org/1999/xhtml">';…. ….break;
case 'application/rdf+xml':echo '<?xml version="1.0" encoding="UTF-8"?>';echo '<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"……..</rdf:RDF>';break;
}?>
http://cloudlibrary.info/lodbib/lodbib_restful.php
http://www.w3.org/RDF/Validator/ARPServlet?URI=http%3A%2F%2Fcloudlibrary.info%2Flodbib%2Flodbib_restful.php&PARSE=Parse+URI%3A+&TRIPLES_AND_GRAPH=PRINT_TRIPLES&FORMAT=PNG_EMBED
请求RDF文档:一个实例
<?php$type=$_GET["RadioGroup1"];$url=$_GET["URL"];echo '资源URL: '.$url.'<br>';echo ‘ACCEPT:’.$type.‘<br>’; 例如 Accpet:application/RDF+XML
$ch = curl_init($url);$headers = array (
'ACCEPT: '.$type);
curl_setopt( $ch, CURLOPT_HTTPHEADER, $headers);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);$result = curl_exec($ch); curl_close($ch);echo $result;
?>
http://cloudlibrary.info/lodbib/lodrequest.html
访问SPARQL Endpoint(一个实例)
public function checkdbpedia($mashterm){
$searchterm='select ?ab ?la where {{?x foaf:homepage '.$mashterm.'} optional {?x rdfs:label ?la filter(lang(?la)="en")}.'.
' optional { ?x dbpedia-owl:abstract ?ab filter(lang(?ab)="en" )}}';
$librarylist=$this->dbpediasearch($searchterm);
if (!($librarylist)) return false ;
else{
$libraryarray=json_decode($librarylist,true);
return $libraryarray;
} }
public function dbpediasearch($term){
$format = 'json';
$query =
"PREFIX dbp: <http://dbpedia.org/resource/>".$term;
$searchUrl = 'http://dbpedia.org/sparql?'
.'query='.urlencode($query)
.'&format='.$format;
$ch= curl_init();
curl_setopt($ch,CURLOPT_URL,$searchUrl);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if (empty($response))return false;
else {
if($httpcode>=200 && $httpcode<300) return $response;
else return false;
}
解析RDF-XML文档
function xml_to_array($file){/* 请求一个RDF文档 */if (!($lcsh_str=$this->get_RDF_File($file,'application/RDF+XML')))
{ $this->is_success=false;return;}
else$this->is_success=true;
/* 创建一个新的XML解析器 */$parser = xml_parser_create();xml_parse_into_struct( $parser, $lcsh_str, $tags );xml_parser_free( $parser ); /* 释放XML解析器 */return $tags;
}
解析结果
0 /* 获得命名空间信息*/
Array ( [tag] => rdf:RDF [type] => open [level] => 1 [attributes] => Array ( [xmlns:rdf] => http://www.w3.org/1999/02/22-rdf-syntax-ns# [xmlns:rdfs] => http://www.w3.org/2000/01/rdf-schema# [xmlns:owl] => http://www.w3.org/2002/07/owl# [xmlns:skos] => http://www.w3.org/2004/02/skos/core# [xmlns:xl] => http://www.w3.org/2008/05/skos-xl# [xmlns:rda] => http://RDVocab.info/ElementsGr2/ [xmlns:frbrent] => http://RDVocab.info/uri/schema/FRBRentitiesRDA/ [xmlns:foaf] => http://xmlns.com/foaf/0.1/ [xmlns:ndl] => http://ndl.go.jp/dcndl/terms/ [xmlns:dct] => http://purl.org/dc/terms/ ) )
1 /* RDF树形结构 */
Array ( [tag] => rdf:Description [type] => open [level] => 2 [attributes] => Array ( [rdf:about] => http://id.ndl.go.jp/auth/ndlna/00288347 ) )
2
Array ( [tag] => foaf:primaryTopic [type] => open [level] => 3 )
3
Array ( [tag] => foaf:Organization [type] => open [level] => 4 [attributes] => Array ( [rdf:about] => http://id.ndl.go.jp/auth/entity/00288347 ) )
4
Array ( [tag] => foaf:name [type] => complete [level] => 5 [value] => 国立国会図書館 )
5
Array ( [tag] => rda:dateOfEstablishment [type] => complete [level] => 5 [value] => 1948 )
6
Array ( [tag] => foaf:Organization [type] => close [level] => 4 )
7
Array ( [tag] => foaf:primaryTopic [type] => close [level] => 3 )
8
Array ( [tag] => dct:modified [type] => complete [level] => 3 [value] => 2012-02-03T14:08:15 )
RDF三元组
• 将RDF/XML数据的树形结构序列化成三元组
• [rdf:Description] + [rdf:about] S
• [Tag] P
• [Value] or [rdf:resource] O
共有五个三元组
S P Ohttp://id.ndl.go.jp/auth/ndlna/00288347 foaf:primaryTopic http://id.ndl.go.jp/auth/entity/00288347
http://id.ndl.go.jp/auth/ndlna/00288347 dct:modified 2012-02-03T14:08:15
http://id.ndl.go.jp/auth/entity/00288347 Is a foaf:Organization
http://id.ndl.go.jp/auth/entity/00288347 foaf:name 国立国会図書館
http://id.ndl.go.jp/auth/entity/00288347 rda:dateOfEstablishmen 1948