1 figis’ml hands-on creating xml documents for firms factsheets
TRANSCRIPT
1
FIGIS’ML hands-on
creating XML documents for FIRMS factsheets
2
AN INTRODUCTION TO XML• What is XML
• an XML document
• what is a DTD, XSD
• HOW TO CREATE XML DOCUMENTS
FIRMS FACTSHEETS•Marine Resource fact sheets: FIGIS AND FIRMS
• HOW FIGIS IS ORGANISED
• FIGIS MAIN STRUCTURE
• CREATING Marine Resource fact sheets
1) CREATING OBJECTS
2) REFERENCING OBJECTS
3) ADVANCED TAGGING: TOPIC
4) ADVANCED TAGGING: FORMATTING
CONTENTS
3
An introduction to XML
• What is XML? :– XML stands for eXtensible Markup Language
– XML is not itself a markup language but it can be considered as a set of rules for building markup languages
• XML is extensible : you can create your own tags. XML doesn’t define any markup elements: every user needs to make up his own markup language to express his information in the best way possible. But it is important to follow strictly the XML syntax.
– XML deals with content and structure: it allows to contain and manage information with markup
4
Presentation: the main difference between XML and HTML
Separation of style and meaning is a very important matter in xml.
•Being Presentation the way how a document look like, this should be not comprised in an XML document.
•The layout of an XML document is assigned trough another document called stylesheet.
•Thus, like HTML XML makes use of tags (words bracketely by '<' and '>') and attributes (of the form name="value"). While HTML specifies what each tag and attribute means, and often how the text between them will look in a browser, XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that reads it. In other words, if you see "<p>" in an XML file, do not assume it is a paragraph. Depending on the context, it may be a price, a parameter, a person....
•Keeping style out of the document enhances your presentation possibilities. It is indeed possible to apply many different stylesheet for a single xml document and different versions of the same information can be created on the fly.
•Document implemented with stylistic information as for example an html document is difficult to repurpose, update or convert in other forms.
5
XML and HTML
HTML•pros : widely used, simple to use;•cons : mixes data and formatting, non-extensible
XML•handles only the content - formatting is done by stylesheets•the tags are defined by the authors•tags definition and structure is handled by dictionnaries (DTD)•XML is platform indipendent•It allows View, Reuse, multiple use of the same document, load database
6
Information dealing with the presentation are stored Elsewhere, this has many benefits: 1)use of the same style settings for many documents (e.g. figis species factseets); 2)change the layout to a set of documents can imply the updating of only one single file; 3)you can swap stylesheets for different purpose for example one for web displaying, one for printing (e.g Sidp, Figis species factsheets); 4)the document structure and content can not be messed up by changing its layout.The purity of the information structure does not get in the way of format conversions.
Sidp factsheet Thunnus thynnusSidp factsheet on FIGIS
Thunnus thynnus
XML DOCUMENT Thunnus thynnus
7
- XML is an open standard managed by the World Wide Web Consortium (W3C)This means that is not tied to the fortunes of any single company: it is considered to be platform indipendent technology.http://www.w3c.org
- Instead to create new tags in many environment people working in information systems are trying to set standards tags to be used for exchanging data. The goal of writing unambiguouse structure makes writing XML markup more difficult, but still it is easy to read and parse by humans and programs alike. One of the most recent application is through Internet Media using RSS feeds.
- XML is text-based
Other important charachteristics of XML
8
An introduction to XML
• What is XML? : an example<?xml version="1.0" encoding="UTF-8"?><fi:FIGISDoc xmlns:fi="http://www.fao.org/fi/figis/devcon/" xsi:schemaLocation="http://www.fao.org/fi/figis/devcon/ http://figis01/Dtd/Beta/3.5/firms_schema/editor/aqres_editor.xsd" xml:lang="en"><fi:AqRes> <fi:AqResIdent Status="1" Factsheet="true" RefObservation="false">
<fi:FigisID>10008</fi:FigisID><!-- this is the firms name --><dc:Title>Albacore - Atlantic and Mediterranean Sea</dc:Title>
<fi:SpeciesList Type="Target"><fi:SpeciesRef Taxonomy="Species"><fi:ForeignID CodeSystem="Scientific_name" Code="Thunnus alalunga"/>
<dc:Title Type="FIRMS">Albacore</dc:Title></fi:SpeciesRef>
</fi:SpeciesList><fi:ReportingYear>2003</fi:ReportingYear><fi:AdditionalRefData>
<dcterms:Alternative> "Albacore"</dcterms:Alternative></fi:AdditionalRefData>
</fi:AqResIdent>
9
• an XML document
<?xml version="1.0" encoding="UTF-8"?><fi:FIGISDoc xmlns:fi="http://www.fao.org/fi/figis/devcon/" xsi:schemaLocation="http://www.fao.org/fi/figis/devcon/ http://figis01/Dtd/Beta/3.5/firms_schema/editor/aqres_editor.xsd" xml:lang="en"><fi:AqRes> <fi:AqResIdent Status="1" Factsheet="true" RefObservation="false">
<fi:FigisID>10008</fi:FigisID><!-- this is the firms name --><dc:Title>Albacore - Atlantic and Mediterranean Sea</dc:Title>
<fi:SpeciesList Type="Target"><fi:SpeciesRef Taxonomy="Species"><fi:ForeignID CodeSystem="Scientific_name" Code="Thunnus alalunga"/>
<dc:Title Type="FIRMS">Albacore</dc:Title></fi:SpeciesRef>
</fi:SpeciesList><fi:ReportingYear>2003</fi:ReportingYear><fi:AdditionalRefData>
<dcterms:Alternative> "Albacore"</dcterms:Alternative></fi:AdditionalRefData>
</fi:AqResIdent>
XML declaration
element
entity
attribute
root element
Document Type declaration
value of the attributeelement
10
• an XML document• Prolog
– Elements• they define the document’s content dividing it into its
constituent parts.• they can contain other elements, text or both.• they can be empty
– Attributes• add information about one element• one attribute can only appear once in one element• attributes can only contain text
– Entities• use an entity in place of not allowed characters (e.g.
“&”= “&” ; “<“=“<”...)• can be used as “shortcuts”• are similar to variables
11
• an XML document : the prologThe top of an XML document contains special information called document prolog. The prolog says that this is an XML document and it declares the version of XML being used. It can hold additional information as the document type definition being used, text encoding and istructions to XML processors.
– XML Declaration: it tells to the processor that it needs an XML parser to interpret the document.
• always the first line of an XML document• can be omitted (not recommended)• the simplest always like : <?xml version="1.0”>• can contain language-related info :
<?xml version="1.0" encoding="UTF-8"?>
– Document type declaration: it describes the root element type and designates a Document Type Definition or a XML Schema Definitione.g.: <fi:FIGISDoc xmlns:fi="http://www.fao.org/fi/figis/devcon/" fi:FIGISDoc in this case is the root element. The root element is the first XML element to appear in the document and therefore it is the one that contains the rest of the document. A SYSTEM identifier specifies the location of the DTD (Document Type DEFINITION) or the XSD (XML Schema Definition).
12
An introduction to XML
• Creating well-formed XML documents– well-formed means that the document
respects the XML rules– the main rules are :
• there must be a root• there must be at least one element inside the root• all elements must be properly imbricated and closed• elements, attributes and entities are case-sensitive• attribute values have to be quoted
13
An introduction to XML
• Valid XML documents– an XML document is valid when :
• it follows the structure that has been defined• all the elements respect the defined rules set in the
XSD or in the DTD– valid elements– valid attributes– optional and mandatory elements/attributes
– the validation rules are defined in a “dictionary” :
• a Document Type Definition (DTD)• a XML Schema Definition (XSD)
14
An introduction to XML
• Valid XML documents– well-formed doesn’t imply valid
• well-formed means syntaxically correct• valid is similar to grammatically correct
– but a valid document is always well-formed
15
An introduction to XML
• Valid XML documents : schema– The purpose of a schema is to define a class of
XML documents. As DTD it sets out what names are to be used for the different types of element, where they may occur, and how they all fit together.
– schemas are written in XML• they don’t require learning a new language• they can be handled and transformed as normal XML
documents
– schemas can contain more complex rules that DTDs
• conditions• tests
16
An introduction to XML
• creating XML documents– XML is pure text– XML documents can be created manually– XML editing softwares make the task easier
• check for well-formedness and validity• provide a graphical user interface
– Various solutions exist :• XMLSpy by Altova (Windows)• Xmetal by SoftQuad(Windows)• Morphon XML Editor Suite by Morphon (MacOS)• ElfData XML Editor by ElfData (MacOS)
17
An introduction to XML
• Creating XML documents :– hands-on : creating a document with XMLSpy
• example of an order form :– 1 order : reference 120136– two items in the order :
» 1 CD, “Groovy beats” by Howie C., ed. “Average Records”, available with 2-days delivery
» 2 copies of the book “Piano for dummies - vol1” by K. Board, published by “Dubious & P. Ano”, available with 1-week delivery
18
An introduction to XML
• Creating XML documents :– hands-on : creating a document with XMLSpy
<?xml version="1.0" encoding="iso-8859-1"?><ORDER orderref="120136">
<ITEM type="CD" quantity="1"><AVAILABLE status="yes" time="2days"/><TITLE>Groovy beats</TITLE><AUTHOR>Howie C.</AUTHOR><EDITOR>Average Records</EDITOR>
</ITEM><ITEM type="book" quantity="2">
<AVAILABLE status="yes" time="1week"/><TITLE>Piano for Dummies - vol1</TITLE><AUTHOR>K. Board</AUTHOR><EDITOR>Dubious & P. Ano</EDITOR>
</ITEM></ORDER>
19
An introduction to XML
• using XML documents– exchanging data between systems
• from a DB to another DB• data manipulation by softwares
– submitting data to a system• automatically (software) (e.g. conversion from firms excel inventory to xml)
• manually (creation of content)
– presenting data to the user• web pages• data export in any format (text, pdf, etc.)
20
• using XML documents : within FIGIS
Original data
Text documents
DB
XML documents
FIGIS loader
FIGIS DB
End user
FIGIS application(or FIGIS-like)
XMLtemp files
user’s query
query result
data load
21
• using XML documents : FIGIS and the world
Text documents
XML documentsDB
FIGIS
FAODTD
user
DTD
DTD
Graphical user interface : website
PA
RTN
ER
S
22
• using XML documents : FIRMS and the world
Reports (.xls; .doc)
XML documents
FIRMS FACT SHEETLoad XML
user
XML converter
On Line Editor
PA
RTN
ER
S
23
Creating XML content for FIGIS, FIRMS
• Objectives :– Organisation of data in FIGIS– Creating “FIGISML” documents– Creating “FIGISML” objects– Referencing “FIGISML” objects– Creating documents : hands-on– Detailed structure– Advanced tagging
24
Creating XML content for FIGIS
• Organisation of data in FIGIS– data is organised by “domains”
• e.g. : aquatic species, marine fisheries, fishing technologies, aquaculture..
– each topic handles “objects”• e.g. : species, gear types, marine resource ...
– complex objects can be defined using simple objects
• e.g. : a fishing technique is defined by a gear, a target species and a vessel
25
Creating XML content for FIGIS
• Organisation of data in FIGIS– generic structure of objects :
• the OBJECT SOURCE block– it contains information about the origin of the data
• the IDENTITY block– it contains the definition of the object
• the TOPIC block– contains a description of the object : what makes it
different from another of the same type– contains any other information on the object : factual data
26
Creating XML content for FIGIS
• Organisation of data in FIGIS– the XML documents reflect the structure of
the objects they contain– a document can contain one or more
objects• e.g. : two species
– an object can contain other objects • e.g. : a resource composed of three stocks
27
Creating XML content for FIGIS
• High level structure of “FIGISML documents”– an XML document created for FIGIS always
starts with :• fi:FIGISDoc as the root element
– followed by• a OBJECT SOURCE block (fi:ObjectSource)• one or more OBJECT blocks (fi:AqSpecies (FIGIS Species fact
sheet), fi:GearType (FIGIS Gear fact sheet), fi:AqRes (FIRMS
Marine Resource fact sheet) ...)
28
Creating XML content for FIGIS
Creating documents : FIGISDOC
Domains currently available in FIGIS
The root of the document
29
Creating XML content for FIGIS
Elements that can be used within the Object Source (fi:ObjectSource)
Creating documents : Object Source
30
Creating XML content for FIGIS
Elements that can be used within the Source of Information (fi:Sources)
Creating documents : Source of Information
31
Creating XML content for FIGIS
• Creating documents : OBJECT SOURCE– the Object Source is important :
• for quality assurance• for ownership of the data• for version management• for observations management
32
Creating XML content for FIGIS
• Creating objects– generic structure of objects :
example for AqRes• the OBJECT SOURCE block
– Collection Ref, Cover Page, Corporate Cover Page
• the IDENTITY block– FigisIdentifier (FIGISId), Title, Alternative Title,
SpeciesList, WaterAreaList, Reporting Year, Foreign Id
• the TOPIC block - History, Habitat and Biology, Geographical Distribution,
Water Area Overview, Resource Structure, Exploitation, Statistics, Assessment, Management,
Biological State and Trend
- Source of Information, Bibliography, Related Resources
33
Creating XML content for FIGIS• Creating objects : the Object Source
– each fact sheet has its own Object Source– 3 components: Data Collection Owner,
Cover Page, Corporate Cover Page
<fi:ObjectSource>
<fi:Owner>
<fi:CollectionRef>
<fi:FigisID>6</fi:FigisID> <!--ICCAT SCRS Reports-->
</fi:CollectionRef>
</fi:Owner>
<fi:CorporateCoverPage>
<fi:FigisID>6</fi:FigisID><!--Stock status report --> </fi:CorporateCoverPage>
<fi:CoverPage>
<dcterms:Created>2005-08-05</dcterms:Created>
</fi:CoverPage>
</fi:ObjectSource>
34
Creating XML content for FIGIS
• Creating objects : the Object Source
Data Collection Owner : A Data collection is a set of homogeneous data handled over time by a data owner according to agreed and consistent processes and dissemination formats; as such it may cover data types from different domains. A Data collection is also the primary level of definition of user rights, hence is systematically associated with data owner institutional name.
35
Cover Page: The cover page is composed by a set of public bibliographic-like information. It is modelled to adapt the traditional paper publishing logic (made of a cover page wrapping a thick intellectual content) to the internet publishing logic (fact sheets can be considered short electronic pages part of a broader virtual book). Most of the information used to build the citation of a domain object observation comes from the cover page attributes.
Creating XML content for FIGIS
• Creating objects : the Object Source
dcterms:Created is the date of creation of the intellectual content.
dcterms:Modified is the date of modification of the intellectual content.
dc:Language element for each language in which the resource is available
36
Corporate Cover Page : A cover page is attached to each observation made on a domain object (e.g. marine resource, fishery, etc...). In general, a set of observations issued by the same data owner under the same data collection will have the same cover page, more precisely at least part of their cover page attributes will be the same. This group of shared attributes is called “Corporate Cover Page”.
Creating XML content for FIGIS
• Creating objects : the Object Source
A Corporate Cover Page is defined by a FIGIS reference (fi:FigisID) or (exclusive) by a set of elements defining an unreferenced Corporate Cover Page.
For an unreferenced Corporate Cover Page, the Title and the Corporate Author are mandatory.
The Publisher element is used only for output purposes.
The elements Title, Series and CreatorCorporate might be provided in 3 languages (English, French and Spanish).
The Data collection module is used to indicate which corporate Cover pages can be served within a given collection. This module will display all the attributes of the referenced Corporate Cover pages.
37
Creating XML content for FIGIS
• Creating objects : the Object Source
Conceptual data model
Corporate Cover Pages might be referenced in the FIGIS system and managed by the Reference Tables Management System (RTMS).
Each observation made on a domain object (e.g. on a marine resource) has a Corporate Cover Page, but this cover page might not be a FIGIS referenced Corporate Cover Page.
The FIGIS system manages a relationship between Referenced Corporate Cover pages and Data Collection. This relation indicates which Corporate Cover Pages may be used according to the Data Collection under which an observation on a domain object is published.
38
Creating XML content for FIGIS• Creating objects : the Object Source
What is the Reference Tables Management System (RTMS)
The RTMS (http://www.fao.org/figis/servlet/RefServlet) is a graphical interface to manage the reference data.
Reference data is a set of static values utilized by FIGIS applications to determine univocally all the objects involved on each operational context. Every FIGIS application referring to a reference objects queries the RTMS in order to obtain all the related attributes pertaining to a certain objects (countries, species, areas, stocks...).
As example: an application requests for a reference object like a country specifies a unique ID and gets consequently a list of related attributes (e.g. UN code; ISO 2-alpha code; ISO 3-alpha code...)or (species: 3-alpha code, taxonomic code, scientific name..)
39
Creating XML content for FIGIS
• Creating objects : the IDENTITY– each type of object has its own identity
element– they are all named using the object name
and the suffix Ident– e.g. : for a marine resource, the element is
AqRes and the matching identity element is AqResIdent
41
Creating XML content for FIGIS
• Creating objects : the Topic blockexample : marine resource topic block
42
Creating XML content for FIGIS
• Referencing objects :– FIGIS can draw links between existing
objects based on criteria• retrieving data from objects and embed them in the
XML document– e.g. : when describing a yellow fin tuna stock, get the
standard image and names of the species from the species identification sheet and include them in the marine resource fact sheet
• creating hyperlinks in web pages that point to existing objects
– e.g. : in the same fact sheet page, hyperlink all the gears and species names to point to their respective description pages
43
Creating XML content for FIGIS
• Referencing objects :– retrieving data from existing objects
• it can be done with any object defined in FIGIS• it can be used to define an object using other objects
– e.g. : the bigeye tuna resource of indian ocean
• is defined by the SPECIES (bigeye tuna) and the AREA (indian ocean)
• bigeye tuna is already described in a species fact sheet• indian ocean is defined in the reference tables• -> the definition of the resource will only need to be
done using REFERENCES to those two objects
44
Creating XML content for FIGIS
• Referencing objects :– how to reference existing objects :– each object type has a REFERENCE tag– the tag is built using the object tag and
adding the suffix Ref– e.g. : AqRes AqResRef
• this will reference the species Albacore using the “Scientific name” Code System.
• the output document will contain additional info about that species : picture, scientific and FAO official names, standard codes...
<fi:SpeciesRef Taxonomy="Species"> <fi:ForeignID CodeSystem="Scientific_name" Code="Thunnus alalunga"/> </fi:SpeciesRef>
45
Creating XML content for FIGIS• Referencing objects :
– e.g. : WaterAreaRef
• this will reference the area “ICCAT SMU” using :– the references (ALB_N, ALB_S..) to the ICCAT Statistical
Management Unit where the resource is “located”
<fi:AqRes><fi:AqResIdent Status="1" Factsheet="true" RefObservation="false"> -------------------------------------------------- <fi:WaterAreaList>
<fi:WaterAreaRef> <fi:ForeignID CodeSystem="iccat_smu" Code="ALB_N"/></fi:WaterAreaRef><fi:WaterAreaRef> <fi:ForeignID CodeSystem="iccat_smu" Code="ALB_S"/></fi:WaterAreaRef><fi:WaterAreaRef> <fi:ForeignID CodeSystem="iccat_smu" Code="ALB_M"/></fi:WaterAreaRef>
</fi:WaterAreaList>------------------------------------------------
</fi:AqResIdent>
46
Creating XML content for FIGIS
• Referencing objects :– behaviour of Ref elements
• when the document is loaded and processed, “Ref” tags are interpreted this way :
– if a matching object is found in the database, the “Ref” tag is replaced by the matching object’s “Ident” block
– if no matching object is found in the database, the tag is left alone
47
Creating XML content for FIGIS
• Referencing objects :– creating hyperlinks
• to objects available in internet (web page, pdf file, ftp..)– e.g. : link the word ICCAT to “http://www.iccat.es/” this is done by using “a” HTML tag :<a href =“http://www.iccat.es ” target=“_blank”>ICCAT</a>
48
Creating XML content for FIGIS
• Creating documents :a 3-steps method– high-level tagging
• overall structure of document and objects
– mid-level tagging• precising the thematic content of the document• referencing objects
– low-level tagging• formatting• keywords, links, biblio etc.
49
Creating XML content for FIGIS
• Creating objects : hands-on• pre-requisite: understanding of the XSD, DTD structure
– starting from a sample document– analysis and structure of the paper
document– creation of the XML document– high-level tagging of the document– mid-level tagging of the document– low-level tagging of the document
50
Creating XML content for FIGIS
• Creating objects : hands-on– analysis and structure of the paper
document :• identify the high-level elements in the document :
– Object Source– Objects
• for each object, identify it’s the following :– Ident block– the various “topics” for each object, which will be used for
the tagging– match each “topic” with a Schema/DTD element
51
Creating XML content for FIGIS
• Creating objects : hands-on– creation of the XML document
• XML editors only accept pure TEXT• you need to avoid text containing UNICODE characters• creation of the document :
– open the XML editor and create a new document– assign the FIGIS DTD/XSD to the new document– insert the ROOT element FIGISDOC– insert the Object Source information– insert the Ident information– start copying the content from the original text document
into the elements that you have defined during the previous step
52
Creating XML content for FIGIS• Detailed structure: high level tagging
– example : marine resource
53
ELEMENT Assessment
•Mid-level tagging of the document
Creating XML content for FIGIS
54
Creating XML content for FIGIS
• Advanced tagging– document formatting : HEADER, LIST...– detailed thematic tagging (indicators,
values, wrappers etc.)– Images, tables, hyperlinks...
55
Advanced tagging example
Creating XML content for FIGIS