people.uwplatt.edupeople.uwplatt.edu/.../f13/klubertanzm_semanticweb.docx · web viewtransfer...

The Semantic Web

Matt KlubertanzDepartment of Computer ScienceUniversity of Wisconsin-Platteville

[email protected]

Abstract

The web today is created in a way that only people can really understand. Since computers are not able to understand these documents, they are unable to see relationships between data on the web. The Semantic Web is trying to do just that. The goal is to put data on the web in a way that computers can understand so that people get exactly the data that they need. Tim Berners-Lee, inventor of the World Wide Web, is now one of the leaders in pushing this idea. His goal is to get everyone to share their data and put it on the web, which links data together and allows for powerful searching. People in the scientific fields are already using this idea, allowing others in their fields to more easily find the data they need. The more that contribute to the Semantic Web, the more easily data will be found.

What is the Semantic Web?

According to Tim Berners-Lee the Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. This essentially means that it's taking the internet to the next step.

The Semantic Web is built on the stack in Figure 1. SPARQL is the language used to do queries while RDF and OWL are the languages used to describe the Semantic Web. Both RDF and OWL are written in XML and all of this is based on unique identifiers called URIs. All of these combined make up the framework for the Semantic Web and each will be discussed more in depth.

Figure 1: The Semantic Web Stack [6]

2

History

Late 1989 Tim Berners-Lee proposed the idea Hypertext Transfer Protocol or HTTP and is credited for inventing the World Wide Web. This was the beginning of the internet but Tim Berners-Lee did not believe that the internet has reached its full potential. In 1994 he founded the W3C ( World Wide Web Consortium). The W3C started to gain interest in 1999 in extending the current web and creating a new one. Tim Berners-Lee later gives this new web the name "Semantic Web." Ever since the W3C has been developing standards involved in the Semantic Web. They have also been the leading organization in pushing the further expansion of the Semantic Web.

Growth

Over the past few years the Semantic Web has really been picking up momentum and data being linked has been growing exponentially. In 2007 the Semantic Web only had a few sites of data linked. Figure 1 shows the extent of the Semantic Web in 2007. Figure 2 shows that two years later the Semantic Web grew exponentially and again in Figure 3 in 2011 it grew even faster.

Figure 1: The size of the Semantic Web in 2007 [7]

3



4

DBpedia

DBpedia was created early on in the life of the Semantic Web. DBpedia includes the information on Wikipedia. This has become essentially the central hub for the Semantic Web. According to DBpedia, the knowledge base that currently describes 4 million things. Also they say that DBpedia has about 2.46 billion RDF triples.

RDF

RDF is the standard for describing data on the Semantic Web. RDF stands for Resource Description Framework and is written in XML. The basics of RDF include statements or triples to describe all the data on a site. The three parts of the triple include the subject, predicate and object. Because of the three parts of the triples it can also be described as creating a sentence for a piece of data.

Subject

The subject of the triple is the URI which identifies where the data is from. URI stands for Universal Resource Identifier and because it is an unique identifier it is used to describe where the data is from. URI might look similar to URL which is because URLs are a type of URI. Since URL's are a subclass of URIs they can also be used, and more commonly are used, for describing the subject of the triples. An example of a subject could be: http://www.somesite.com/rdf. This would be an identifier for all the data that is associated with somesite. The idea of using URI's for identifying is that there is only one unique URI for a set of data. So in the example described all data about somesite would use that URI. No other URIs should ever be created to describe its data. Also file extensions should left out as much as possible in URIs. File extensions do not add any value to people using them and if they have long extensions it can make it difficult to figure out what its describing. These issues can cause problems in the future when trying to run SPARQL queries.

Predicate and Object

The predicate part of the triple can also be a URI but the predicate describes an attribute of the subject. If we use the example subject from before of http://www.somesite.com/rdf, we can have a URL http://www.somesite.com/rdf/person which would describe people association with somesite. So somesite would be the subject and the person would be the predicate. The final triple which is the object would be the actual name of the person. You can also add other attributes like the location and phone number of someone, or whatever other data you want to associate with the subject. So end result in XML format would be as follows:

5

Figure 4: RDF Code Example

OWL

OWL is a language built on top of RDF that adds more vocabulary to RDF to describe triples more easily. OWL stands for Web Ontology Language and is currently the language endorsed by the W3C as the standard language to use when describing data. The Semantics of OWL is basically the same as RDF. It is still written in XML but has more vocabulary than RDF. OWL has six different class descriptions that can be used when describing triples. Those six are: enumeration, property restriction, intersection, union, complement. Some of these class descriptions can be used to combine triples together to help in describe data more easily.

SPARQL

SPARQL is a language used to perform RDF queries and get data off of the Semantic Web. It uses the triples described in the RDF to perform these queries. SPARQL 1.0 was the official recommendation of the W3C in 2008. As March of 2013 though the new official recommendation of the W3C is now SPARQL 1.1 which added features like sub-queries and negation as well as expanding the set of functions and operators. The basic structure of a SPARQL query looks as follows:

Figure 5: SPARQL Query Format

6

The prefix declarations are the URIs used to describe the subjects in the RDF triples. This is saying where you are pulling your data from in the query which is similar to the "FROM" function in SQL. SPARQL also has similar functions that SQL has for filtering queries. Table 1 shows the different functions types and some examples of each.

Table 1: Filter Function in SPARQL

Logical !, &&, ||Math +, -, *, /

Comparison =, !=, >, <, IN, NOT IN, etcSPARQL tests isURI, isBlank, isLiteral, isNumberic, bound

SPARQL accessors str, lang, datatypeOther sameTerm, langMatches, regex, REPLACE

Table 2: Filter Functions added in SPARQL 1.1

Conditionals IF, COALESCE, EXISTS, NOT EXISTSConstructors URI, BNODE, STRDT, STRLANG, UUID,

STRUUIDStrings STRLEN, SUBSTR, UCASE, LCASE,

STRSTARTS, STRENDS, CONTAINS, STRBEFORE, STRAFTER, CONCAT,

ENCODE_FOR_URIMore Math abs, round, ceil, floor, RANDDate/time now, year, month, day, hours, minutes,

seconds, timezone, tzHashing MD5, SHA1, SHA256, SHA384, SHA512

An example of a SPARQL query is below. The following query searches the DBpedia web and looks for all landlocked countries that have a population greater than 15,000,000 while only displaying the English name of the country. If the English name part, "EN", was not added then there would be a repeated value for every language used on DBpedia.

7

Figure 6: SPARQL Query Example

The result of the above query using the SPARQL editor at http://www.dbpedia.org/sparql is in table 3 below:

Table 3: Result of Example Query

country_name population"Ethiopia"@en 91195675

"Afghanistan"@en 30419928"Uzbekistan"@en 29559100"Kazakhstan"@en 16911900

"Niger"@en 16274738"Burkina Faso"@en 15730977

Conclusion

To summarize, the Semantic Web is the linking of data on the internet. Using RDF and OWL data and be describe using triples which are kind of like sentences. These triples use a subject which is the unique identifier for the information about to be described. The subject is always a URI which is usually a URL. URLs are just a type of URI and they are what we use for describing sites on the internet which is why we use usually use them for the Semantic Web. The triples also have predicates and the object its self in which you are trying to describe. These triples can be uses to describe pretty much anything on the internet. DBpedia actually does just this and has over 2.46 billion RDF triples. These triples allow us to use SPARQL which is the real power of the Semantic Web. Which this we can do queries on the internet which allows us to find and share all sorts of data. This could be used in the government to share data between agencies and the biggest area it could be used in is medicine. In medical fields data could be shared more easily which could allow people to find what they are looking for more rapidly and easier. This could allow medicines and other technologies to be found sooner. So in all, the

8

Semantic Web is the future of the internet but it will take more people to adopt it before it really takes off and becomes the standard way of sharing information.

References

[1] Berners-Lee, Tim. "Tim Berners-Lee: The next Web." Lecture. TED. Mar. 2009. Web. <http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html>.

[2] Feigenbaum, Lee, and Eric Prud'hommeaux. "SPARQL by Example - Cambridge Semantics." Cambridge Semantics. N.p., 30 May 2013. Web. 23 Sept. 2013. <http://www.cambridgesemantics.com/semantic-university/sparql-by-example>.

[3] Hori, Masahiro, Jérôme Euzenat, and Peter F. Patel-Schneider. "OWL XML Syntax: OWL Examples in XML Syntax." W3C. N.p., 11 June 2003. Web. 23 Sept. 2013. <http://www.w3.org/TR/owl-xmlsyntax/apd-example.html>.

[4] Prud'hommeaux, Eric, and Andy Seaborne. "SPARQL Query Language for RDF." SPARQL Query Language for RDF. N.p., 15 Jan. 2008. Web. 23 Sept. 2013. <http://www.w3.org/TR/rdf-sparql-query/>.

[5] Wang, Xia, and Wolfgang A. Halang. Discovery and Selection of Semantic Web Services. Heidelberg: Springer, 2013. Print.

[6] Crowther, Rob. "Planning a Semantic Web Site." Planning a Semantic Web Site. N.p., 10 Apr. 2008. Web. 11 Nov. 2013.

[7] Cyganiak, Richard and Jentzsch Anja. Linking Open Data cloud diagram. 28 Sept. 2013 http://lod-cloud.net/

people.uwplatt.edupeople.uwplatt.edu/.../f13/klubertanzm_semanticweb.docx · web viewtransfer...

Documents