the convergence of publishing and the web

71
Ivan Herman, W3C <markup forum/> 2015, Stuttgart, Germany 2015-11-20 THE CONVERGENCE OF DIGITAL PUBLISHING AND THE WEB This work is licensed under a Creative Commons Attribution 3.0 License, with attribution to W3C. Copyright 2015 W3C (MIT, ERCIM, Keio, Beihang) © ® 1

Upload: ivan-herman

Post on 17-Feb-2017

522 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: The convergence of Publishing and the Web

Ivan Herman, W3C<markup forum/> 2015, Stuttgart, Germany

2015-11-20

THE CONVERGENCEOF DIGITAL

PUBLISHING AND THEWEB

This work is licensed under a Creative Commons Attribution 3.0 License, with attribution to W3C.Copyright 2015 W3C (MIT, ERCIM, Keio, Beihang)© ®

1

Page 2: The convergence of Publishing and the Web

See: http://w3c.github.io/dpub/markup-forum-2015-11/index.html/

(Slides are in HTML)

THESE SLIDES ARE AVAILABLEON THE WEB

2

Page 3: The convergence of Publishing and the Web

• The publishing industry is, probably, the most important userof W3C’s Web technologies after (traditional) browsers:

• almost all journals, magazines, etc., have an online version these days• scholarly publishing cannot exist without the Web any more• EPUB is, essentially, a frozen and packaged Web site

• The quality requirements of this industry are very high:• high quality typesetting, graphics, etc.• new forms of publishing will be based on high level of interactions, rich

media, …• common document and data publishing comes to the fore

DPUB IG ORIGINS3

Page 4: The convergence of Publishing and the Web

• But… the publishing industry had been in an entirely“passive” mode v.a.v. Web technologies

• no participation in the development of fundamental Web technologies• W3C (and many other standard bodies) hardly know about the

requirements that this industry may have• the potential synergy between Web developers and publishers is missed

out

• Consequence: Working Groups at W3C set their prioritieswithout knowing about, and considering, the publishingindustry

DPUB IG ORIGINS (CONT.)4

Page 5: The convergence of Publishing and the Web

• W3C and IDPF organized a series of exploratory workshops in2012 to create a missing synergy among communities

• The W3C Digital Publishing Interest Group was formallycreated in May 2013

• DPUB IG has weekly teleconferences and bi-annual face toface meetings

DPUB IG ORIGINS (CONT.)5

Page 6: The convergence of Publishing and the Web

• Experts familiar with the ins and outs of digital publishingand its associated industry groups identify issues that are notaddressed by the Open Web Platform

• Goal is to raise issues to W3C working groups who canupdate or develop specs based on the needs of thepublishing community.

• Work on a future vision of Digital Publishing called “PortableWeb Publications (PWP)”

See our website for more detail.

DPUB IG MISSION6

Page 7: The convergence of Publishing and the Web

IDPF W3CStandards for the ElectronicPublishing and ContentConsumption (EPUB)

Standards for the General WebTechnologies

Builds on lower level Web (e.g.,W3C) Standards

Builds on lower level Internet(e.g., IETF, ECMA) Standards

Does not develop standardsbeyond publishing

Does not develop industryspecific standards if there isanother home for those

The key is strong collaboration.

IDPF AND W3C7

Page 8: The convergence of Publishing and the Web

SOME RESULTS OF THE PASTTWO YEARS

Page 9: The convergence of Publishing and the Web

• An evolving document:“Requirements for Latin TextLayout and Pagination”

• Describes issues like hyphenation,spreads and bleeds, drop caps,pagination, etc.

• Has greatly influenced somecurrent CSS Work, e.g. “CSS InlineLayout Module Level 3” (handling initial letters, dropcaps), or“CSS Generated Content for Paged Media Module” (handlingrunning heads and footers)

LAYOUT AND STYLING9

Page 10: The convergence of Publishing and the Web

• Another evolving document:“Priorities for CSS from the DPUBIG”

• Provides a list of the top CSSpriorities, and their currentavailability

• Also influences the work of theCSS Working Group

PRIORITIES FOR CSS10

Page 11: The convergence of Publishing and the Web

• Goal: identify the semantics of the HTML elements• “abstract”, “indexed term”, “footnote”, "chapter", …

• Express structural information (“where can that element beused”)

• Do it in a forward looking way in terms of W3C standards.• i.e., move away from epub:type used in EPUB 3• the resulting HTML should be valid

• These terms may be useful for the Web at large!

CONTENT AND MARKUP11

Page 12: The convergence of Publishing and the Web

• Use “Accessible Rich Internet Applications (WAI-ARIA)” as abasic mechanism:

• use specific attributes in HTML• attribute values convey a specific semantics

These semantics are designed to allow an author to properlyconvey user interface behaviors and structural information toassistive technologies in document-level markup

CONTENT AND MARKUP:APPROACH CHOSEN

12

Page 13: The convergence of Publishing and the Web

• A Digital Publishing ARIA moduleis in development

• Publishing terms become part ofARIA

• Extra bonus: these terms directlymapped on Assistive Technologiesinterfaces!

<section role="doc-appendix" > <h1>Appendix A. Historical Timeline</h1> …</section>

CONTENT AND MARKUP: DPUBARIA MODULE

13

Page 14: The convergence of Publishing and the Web

• Published an Annotation UseCases

• Activity and work has shifted tothe Web Annotations WorkingGroup

• the work aims at annotation for all formsof Web Documents, whether in a browser or an eBook

ANNOTATIONS14

Page 15: The convergence of Publishing and the Web

MAJOR WORK COMING UP:PORTABLE WEB

PUBLICATIONS (PWP)

Page 16: The convergence of Publishing and the Web

THE MAIN MESSAGE:

Page 17: The convergence of Publishing and the Web

WEB = PUBLISHING!

Page 18: The convergence of Publishing and the Web

PUT IT ANOTHER WAY…

Page 19: The convergence of Publishing and the Web

PUBLISHING = WEB!

Page 20: The convergence of Publishing and the Web

• Separation between publishing“online”, as Web sites, and offlineand/or packaged is diminished tozero

• This means:• publication content on the Web can be loaded into a browser or a

specialized reader, whatever the user prefers• a publication on a local disc can be pushed onto the Web and used without

any change• content are authored regardless of where they are used• these are done without any user interaction (or only very minimal one)

WHAT DOES THIS MEAN?20

Page 21: The convergence of Publishing and the Web

Credit: ibta arabia

Page 22: The convergence of Publishing and the Web

Credit: Extract of Joseph Reagle’s Book as ePUB

• On a desktop I may want to read abook just like a Web page:

• easily follow a link “out” of the book• create bookmarks “into” a page in a book• use useful plugins and tools that my

browser may have• create annotations

FOR EXAMPLE: BOOK IN ABROWSER

22

Page 23: The convergence of Publishing and the Web

Credit: Extract of Joseph Reagle’s Book as ePUB

• But:• sometimes I may also want to use a

small, dedicated reader device to readthe book on the beach…

• All these on the same book (notconversions from one format tothe other)!

FOR EXAMPLE: BOOK IN ABROWSER (CONT.)

23

Page 24: The convergence of Publishing and the Web

Credit: Bryan Ong, Flickr

• I may find an article on the Webthat I want to review, annotate,etc., while commuting home on atrain

• I want the results of theannotations to be back online,when I am back on the Internet

• Note: some browsers have an “archiving” possibility, but they are notinteroperable

• the content can definitely not be read on a dedicated reader

FOR EXAMPLE: I MAY NOT BEONLINE…

24

Page 25: The convergence of Publishing and the Web

Credit: Screen dump of an article “Sub-strains of Drosophila Canton-S…” on F1000

• My paper is published, primarily,on-line, but people may want todownload it for offline use

• The format of the paper should beadaptable to my readingenvironment

• do not want a two column, fixed layoutfile that I cannot handle on my iPad…

• My “paper” may also contain video,audio, data, programs…

• scholarly publishing is not text only any more!

FOR EXAMPLE: SCHOLARLYPUBLISHING

25

Page 26: The convergence of Publishing and the Web

Credit: Merrill College of Journalism, Flickr

• What is an educationalpublication?

• a book that requires offline access?• a packaged application with built-in

interactive tests, animated examples?• a Web client reaching out to Web services

for assessing test results, toencyclopedia, …?

• an interactive data container storing various data for, e.g., demonstrations?• The borderline between a “book” and a “(Web) Application”

are becoming blurred!

FOR EXAMPLE: EDUCATIONALMATERIALS

26

Page 27: The convergence of Publishing and the Web

SYNERGY EFFECTS OFCONVERGENCE

Page 28: The convergence of Publishing and the Web

Credit: Nathan Smith, Flickr

• Publishers want to concentrate onwhat they know better: how toproduce, edit, curate, etc, greatcontent

• Publishers are not technologycompanies, nor do they intend tobe; they want instead to rely onthe vibrant Web community!

ADVANTAGE FOR PUBLISHERS‘COMMUNITY

28

Page 29: The convergence of Publishing and the Web

• OWP is more than “just” HTML, CSS, MathML, etc.• It also defines a large number of facilities that provide

access to, e.g., system resources or utilities• index database, Web storage, battery status API, real-time communication,

geolocation,…

• Aligning more on OWP means that publishing orienteddevices, software, services, etc, can rely on those

• instead of possibly re-inventing the wheel…

ADVANTAGE FOR PUBLISHERS‘COMMUNITY (CONT.)

29

Page 30: The convergence of Publishing and the Web

Credit: e-codices, Flickr

• Publishers have a long experiencein ergonomics, typography,paging, …

• Publishing long texts, with theright aesthetics, readability,structure, etc., is an expertise theWeb community can profit from

• Experience of publishers in thecomplete workflow for producingcontent may become important for Web design

ADVANTAGE FOR THE WEBCOMMUNITY

30

Page 31: The convergence of Publishing and the Web

BUT… WHY NOT RELY ONLYON THE WEB?

(I.E., FORGET ABOUTDOWNLOADED CONTENT!)

Page 32: The convergence of Publishing and the Web

• The future may be that everyone is always connected… butthe reality is different

• slow connections, e.g., or on a plane or bus or even in some areas• huge roaming prices among countries

• Current publishing business models rely on distributableentities

• Privacy or security issues may require off-line access• e.g., in a plane cockpit

• Archiving considerations

SEVERAL REASONS…32

Page 33: The convergence of Publishing and the Web

HOW DO WE GET THERE?(TECHNICALLY)

Credit: Moyan Brenn, Flickr

Page 34: The convergence of Publishing and the Web

• A strong cooperation between the different communitiesshould be ensured

• Technical challenges must be identified• note that some of the challenges are not PWP specific, but Digital

Publishing in general (e.g., pagination control)

• Some examples follow…

34

Page 35: The convergence of Publishing and the Web

WARNING: EVERYTHING I SAYIS SUBJECT TO CHANGE!

Credit: Catherine Kolodziej, Flickr

Page 36: The convergence of Publishing and the Web

TECHNICAL CHALLENGE:FUNDAMENTALTERMINOLOGY

Page 37: The convergence of Publishing and the Web

• On the current Web one has the notion of a “page”:• conceptually, a single entity that displays some content• has its own URL

• But publishers need the concept of a (Web) Publication:• a collection of pages, CSS files, images, video, etc.• it is the collection that has a distinct identity, not its constituents

WEB PUBLICATIONS37

Page 38: The convergence of Publishing and the Web

• A Web Publication is an aggregated set of interrelated WebResources, and which is intended to be considered as asingle, and which can be addressed on the Web as a unit (isitself a Web Resource)

FORMALLY38

Page 39: The convergence of Publishing and the Web

• A Web Publication may consist of resources spread all overthe place (HTML on one site, CSS somewhere else)

• the owner of the Web Publication is only a “user” and not necessarily theowner of all resources!

• But a publishers may want to, create, curate, move the wholepublication, as a single unit

• The Web Publication should be, in some sense, “selfconsistent”, not relying on external entities.

• A “self-consistent” Web Publication is Portable

PORTABLE WEB PUBLICATIONS39

Page 40: The convergence of Publishing and the Web

• A Portable Web Publication is such that a user agent canrender its essential content by relying on the Web Resourceswithin the same Web Publication

MORE FORMALLY40

Page 41: The convergence of Publishing and the Web

• A journal or magazine article, including the relevant CSS filesand images

• An educational article, including the JavaScript to dointeractive exercises

• A novel or a poem on the Web, including the necessary fonts,CSS files, etc, to provide the required aesthetics

WHAT KINDS OF DOCUMENTSARE WE TALKING ABOUT?

41

Page 42: The convergence of Publishing and the Web

• A Web mail application• A social Web site like Facebook, Renren, or Twitter• A dynamic page that depends on, say, a Javascript library

hosted somewhere on the cloud

WHAT KINDS OF DOCUMENTSARE WE NOT TALKING ABOUT?

42

Page 43: The convergence of Publishing and the Web

Protocol Access File AccessPacked PWP as one archive

on a serverPWP as one archiveon a local disc

Unpacked PWP spread overseveral files on aserver

PWP spread overseveral files on a localdisc

ENVISIONED “STATES” OF APORTABLE WEB PUBLICATION

43

Page 44: The convergence of Publishing and the Web

TECHNICAL CHALLENGE:OVERALL ARCHITECTURE

Page 45: The convergence of Publishing and the Web

• Web Worker: a truly parallel thread within the browser• A Service Worker is a special type of Web Worker, with

additional features:• it is a programmable network proxy: the main thread’s network calls are

caught and the request/answer can be modified on-the-fly behind thescenes

• it has an interface to handle a local cache for networked data• it will stay alive even if the user moves away from the main page, and can

be accessed later if he/she returns to it

ADVANCES IN MODERNBROWSERS: WEB AND SERVICE

WORKERS

45

Page 46: The convergence of Publishing and the Web

• Web Worker: a truly parallel thread within the browser• A Service Worker is a special type of Web Worker, with

additional features:• it is a programmable network proxy: the renderer’s network calls are caught

and the request/answer can be modified on-the-fly behind the scenes• it has an interface to handle a local cache for networked data• it will stay alive even if the user moves away from the main page, and can

be accessed later if he/she returns to it

ADVANCES IN MODERNBROWSERS: WEB AND SERVICE

WORKERS

Work in progress

46

Page 47: The convergence of Publishing and the Web

ENVISIONED ARCHITECTURE: UNPACKED STATE

47

Page 48: The convergence of Publishing and the Web

ENVISIONED ARCHITECTURE: CACHED STATE

48

Page 49: The convergence of Publishing and the Web

ENVISIONED ARCHITECTURE: PACKED STATE

49

Page 50: The convergence of Publishing and the Web

ENVISIONED ARCHITECTURE: PACKED STATE

Draft…50

Page 51: The convergence of Publishing and the Web

• Some prior art exists (e.g., experimentation by the ReadiumConsortium with Service Workers)

• An early mock-up of the current architecture has also bedone

• caveat for now: current Service Worker specification does not allow fordirect, local file access

• some extra tricks have to be found

DRAFT INDEED, BUT…51

Page 52: The convergence of Publishing and the Web

TECHNICAL CHALLENGE:ARCHIVAL FORMAT

Page 53: The convergence of Publishing and the Web

ROUGH STRUCTURE OF ANEPUB3 FILE

53

Page 54: The convergence of Publishing and the Web

• There is an interest among some W3C members for a Webfriendly packaging format:

• should be streamable• should rely, as much as possible, on existing Web technologies (e.g., HTTP)

• Use cases include:• retrieve an HTML file with related CSS files, images• access Web Applications (“Widgets”) with all libraries involved• Portable Web Publications are a clear use case

• But: current Web Packaging proposal is not OPF based• this may lead to a different packaging in future for Digital Publishing

ARCHIVAL FORMAT54

Page 55: The convergence of Publishing and the Web

PWP PACKAGING STRUCTURE55

Page 56: The convergence of Publishing and the Web

• There isn‘t yet a full agreement to develop such WebPackaging format

• for some a caching architecture based on Service Workers is enough forthe use cases

• If that happens, the Publishing Community may not moveaway from OPF

• technical advantages of a new format must be weighted against existingdeployment

HOWEVER…56

Page 57: The convergence of Publishing and the Web

TECHNICAL CHALLENGE:ADDRESSING,

IDENTIFICATION

Page 58: The convergence of Publishing and the Web

LOTS OF QUESTIONS,WORKING ON THE

ANSWERS…

Page 59: The convergence of Publishing and the Web

• These a two “roles” are different• The usual situation is that:

• an HTTP(S) URL is used to address a resource on the Web• some form of a URI is used to (uniquely) identify a resource

• In many cases the two roles coincide, but not always• E.g., for a Book Publication:

• URN:ISBN:1-56592-521-1 identifies the publication• http://www.ex.org/ex.pwp addresses my particular copy

IS IT "ADDRESSING" OR IS IT"IDENTIFICATION"?

59

Page 60: The convergence of Publishing and the Web

• Possibilities may be• some sort of a manifest describing the PWP as a whole (e.g., metadata,

content, etc.); or• some content with a link to a manifest through a LINK: HTTP response

header entry; or• some HTML content with a link to a manifest through a <link> element

• Details of what a manifest contains should be worked out• that may become a crucial constituent of a PWP

WHAT DOES AN HTTP GETRETURN?

60

Page 61: The convergence of Publishing and the Web

• Several possibilities should be considered:• based on some sort of a fragment identifier:

http://www.ex.org/doc.pwp#pwp(…)

• explicit separator between the URL for the publication and the rest:http://www.ex.org/doc.pwp!chapter1.html

• simulate “tree” view of the publication’s content:http://www.ex.org/doc.pwp/chapter1.html

• The third case is the most “webby”• it may need some extra information (“virtual redirection”) in, e.g., a

manifest if the resources are spread all over the place

• Decomposing such URLs would happen in the dedicatedService Worker

WHAT IS THE URL OF ARESOURCE WITHIN A PWP?

61

Page 62: The convergence of Publishing and the Web

• This is exactly what fragment identifiers do on the Web• PWP-s should not define a different mechanism, but should

rely on what is widely deployed• note that this pretty much excludes http://www.ex.org/doc.pwp#pwp(…)

as an answer to the previous question

• Although… new types of fragment identifers may beproposed by the publishing community to the Webcommunity at large

WHAT ABOUT ADDRESSINGWITHIN A RESOURCE?

62

Page 63: The convergence of Publishing and the Web

TECHNICAL CHALLENGE:PRESENTATION

CONTROL

Page 64: The convergence of Publishing and the Web

• What is the level of user control of the presentation?• The Web and eBook traditions are vastly different:

• in a browser, the Web designer is in full control• CSS alternate style sheets are hardly in use• some user interface aspects can be controlled but only for the browser as a whole

• in an eBook reader, there is more user control• foreground/background color• choice of fonts

• There is a need to reconcile these traditions

64

Page 65: The convergence of Publishing and the Web

HOW DO WE GET THERE?(PRACTICALLY)

Credit: Moyan Brenn, Flickr

Page 66: The convergence of Publishing and the Web

• “Portable Web Publications” was,originally, a separate “vision”document

• Was adopted, formally, as part ofthe group’s work in September2015, and is now published as an IG document

• The group will contribute to the formulation of the PWPtechnical challenges, to a better understanding of therequirements

• PWP is the guiding principle for the group’s further work

DPUB IG AND PORTABLE WEBPUBLICATIONS

66

Page 67: The convergence of Publishing and the Web

• On long term, some PWP related standard-track specificationwork may have to be done

• this requires a consensus and agreement of different communities

• IDPF and W3C (and maybe others?) may create the necessarygroups, eventually

IDPF, W3C, AND OTHERS67

Page 68: The convergence of Publishing and the Web

• PWP does not replace EPUB 3 (and upcoming EPUB 3.1) atthis moment

• Many of the new features may also be part of EPUB 3.1 (e.g.,structural semantics)

• The vision is a convergence of the EPUB 3.* specificationsand PWP, eventually

HOWEVER…68

Page 69: The convergence of Publishing and the Web

• There is a great potential in a convergence between theOpen Web Platform and Portable Web Publications

• It will require a common effort and cooperation of bothcommunities

• But it is an exciting prospect!

CONCLUSION69

Page 70: The convergence of Publishing and the Web

DPUB IG Wikihttps://www.w3.org/dpub/IG/wiki/Main_Page

Latest PWP Draft:http://www.w3.org/TR/pwp/

PWP Issue list:https://github.com/w3c/dpub-pwp/issues

This presentation:http://w3c.github.io/dpub/markup-forum-2015-

11/index.html (PDF is also available for download)Contact me:

[email protected]

SOME REFERENCES70

Page 71: The convergence of Publishing and the Web

THANK YOU FOR YOURATTENTION!