ripping your s apart

62
RIPPING YOUR PDF FILES APART What you need to know about what goes on inside your PDF files Mark Stephens Thursday, 29 March 12

Post on 21-Oct-2014

257 views

Category:

Technology


1 download

DESCRIPTION

iText Summit 2012 by Mark Stephens, CEO/Developer at IDRsolutions, explaining "What you really need to know about the guts of your PDF files"

TRANSCRIPT

Page 1: Ripping your s apart

RIPPING YOUR PDF FILES APART

What you need to know about what goes on inside your PDF files

Mark Stephens

Thursday, 29 March 12

Page 2: Ripping your s apart

RIPPING YOUR PDF FILES APART

What you need to know about what goes on inside your PDF files

Mark Stephens

Thursday, 29 March 12

Page 3: Ripping your s apart

Mark’s Bio

Thursday, 29 March 12

Page 4: Ripping your s apart

Mark’s Bio

Thursday, 29 March 12

Page 5: Ripping your s apart

Mark’s Bio

Thursday, 29 March 12

Page 6: Ripping your s apart

Mark’s Bio

Working with Java and PDF since 1997

Thursday, 29 March 12

Page 7: Ripping your s apart

Mark’s Bio

Working with Java and PDF since 1997Founded IDRsolutions 1999

Thursday, 29 March 12

Page 8: Ripping your s apart

Mark’s Bio

Working with Java and PDF since 1997Founded IDRsolutions 1999Speaker at Seybold, Javaone, Business of Software

Thursday, 29 March 12

Page 9: Ripping your s apart

Mark’s Bio

Working with Java and PDF since 1997Founded IDRsolutions 1999Speaker at Seybold, Javaone, Business of Software

Thursday, 29 March 12

Page 10: Ripping your s apart

Mark’s Bio

Working with Java and PDF since 1997Founded IDRsolutions 1999Speaker at Seybold, Javaone, Business of SoftwareMA degree in Mediaeval History from St Andrews (how useless is that)

Thursday, 29 March 12

Page 11: Ripping your s apart

Mark’s Bio

Working with Java and PDF since 1997Founded IDRsolutions 1999Speaker at Seybold, Javaone, Business of Software

Ask me about Java, PDF, business or anything which happened before 1500 AD

MA degree in Mediaeval History from St Andrews (how useless is that)

Thursday, 29 March 12

Page 12: Ripping your s apart

BUT FIRST SOME KITTENS...

The support team at IDRsolutions are waiting for your call (maybe)

Thursday, 29 March 12

Page 13: Ripping your s apart

The PDF reference guide

Thursday, 29 March 12

Page 14: Ripping your s apart

Loading page 1124 of a file

WordRead pages 1-1123 (time passes - scroll bar shrinks)Found it (eventually)

Thursday, 29 March 12

Page 15: Ripping your s apart

Loading page 1124 of a file

PDFRead the metadata refs table(s) - where do I find all the objectsSkip to page 1124

WordRead pages 1-1123 (time passes - scroll bar shrinks)Found it (eventually)

Thursday, 29 March 12

Page 16: Ripping your s apart

Loading page 1124 of a file

PDFRead the metadata refs table(s) - where do I find all the objectsSkip to page 1124

WordRead pages 1-1123 (time passes - scroll bar shrinks)Found it (eventually)

PDF (in detail)Read the refs table(s) - where do I find all the objectsRead the Root object - points to the Pages objectRead object for page 1124 (tells me the linked font, image, content objects)Draw it

Thursday, 29 March 12

Page 17: Ripping your s apart

Your PDF file is a Tree

A root linked to all the branches

Thursday, 29 March 12

Page 18: Ripping your s apart

The PDF reference guide

Thursday, 29 March 12

Page 19: Ripping your s apart

The PDF reference guideLike you have never seen it before...

Thursday, 29 March 12

Page 20: Ripping your s apart

The PDF reference guideLike you have never seen it before...

Thursday, 29 March 12

Page 21: Ripping your s apart

The PDF reference guideLike you have never seen it before...

You can use vi or emacs if you preferThursday, 29 March 12

Page 22: Ripping your s apart

The PDF reference guideEnd of the file

Thursday, 29 March 12

Page 23: Ripping your s apart

The PDF reference guideLike you have never seen it before...

Thursday, 29 March 12

Page 24: Ripping your s apart

The PDF reference guide

Thursday, 29 March 12

Page 25: Ripping your s apart

The PDF reference guideLike you have never seen it before...

Thursday, 29 March 12

Page 26: Ripping your s apart

The PDF root objectLike you have never seen it before...

Thursday, 29 March 12

Page 27: Ripping your s apart

The PDF root objectLike you have never seen it before...

Thursday, 29 March 12

Page 28: Ripping your s apart

PDF files on the webIsn’t having the marker at the end a problem??

Thursday, 29 March 12

Page 29: Ripping your s apart

PDF files on the webNot if you create it properly

Thursday, 29 March 12

Page 30: Ripping your s apart

Key takeaways from the PDF structure

Thursday, 29 March 12

Page 31: Ripping your s apart

Key takeaways from the PDF structure

We do not need to load the whole file

Thursday, 29 March 12

Page 32: Ripping your s apart

Key takeaways from the PDF structure

We do not need to load the whole file It is equally fast to load any part of it

Thursday, 29 March 12

Page 33: Ripping your s apart

Key takeaways from the PDF structure

We do not need to load the whole file It is equally fast to load any part of itIt is very easy to replace objects with new versions

Thursday, 29 March 12

Page 34: Ripping your s apart

Key takeaways from the PDF structure

We do not need to load the whole file It is equally fast to load any part of itIt is very easy to replace objects with new versionsThere are certain key locations - like at the end of a file

Thursday, 29 March 12

Page 35: Ripping your s apart

Key takeaways from the PDF structure

We do not need to load the whole file It is equally fast to load any part of itIt is very easy to replace objects with new versionsThere are certain key locations - like at the end of a fileYou should not edit it in a text editor

Thursday, 29 March 12

Page 36: Ripping your s apart

Key takeaways from the PDF structure

We do not need to load the whole file It is equally fast to load any part of itIt is very easy to replace objects with new versionsThere are certain key locations - like at the end of a fileYou should not edit it in a text editorIf you want to use PDF files across the Internet, there is a special mode to make these load the most important parts first.

Thursday, 29 March 12

Page 37: Ripping your s apart

Key takeaways from the PDF structure

We do not need to load the whole file It is equally fast to load any part of itIt is very easy to replace objects with new versionsThere are certain key locations - like at the end of a fileYou should not edit it in a text editorIf you want to use PDF files across the Internet, there is a special mode to make these load the most important parts first.Lots of features need you to setup the PDF file correctly.

Thursday, 29 March 12

Page 38: Ripping your s apart

Those PDF objects in more detail

All PDF objects have:-1. An ID number2. (Optional) A set of dictionary key pairs3. (Optional) A block of binary data.

Thursday, 29 March 12

Page 39: Ripping your s apart

Those PDF objects in more detail

All PDF objects have:-1. An ID number2. (Optional) A set of dictionary key pairs3. (Optional) A block of binary data.

Thursday, 29 March 12

Page 40: Ripping your s apart

PDF images are not Tiff, Png or JPeg

Thursday, 29 March 12

Page 41: Ripping your s apart

PDF images are not Tiff, Png or JPeg

Thursday, 29 March 12

Page 42: Ripping your s apart

A word on colour

Thursday, 29 March 12

Page 43: Ripping your s apart

A word on colour

DeviceRGBCalRGB

DeviceCMYKICC

SeparationDeviceN

DeviceGrayCalGray

LabPattern

Thursday, 29 March 12

Page 44: Ripping your s apart

PDF pages are ‘drawn’

Thursday, 29 March 12

Page 45: Ripping your s apart

PDF pages are ‘drawn’

Thursday, 29 March 12

Page 46: Ripping your s apart

PDF pages are ‘drawn’

0 0 0 1k set cmyk color of text to black

Thursday, 29 March 12

Page 47: Ripping your s apart

PDF pages are ‘drawn’

0 0 0 1k set cmyk color of text to blackBT start of some text

Thursday, 29 March 12

Page 48: Ripping your s apart

PDF pages are ‘drawn’

0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere

Thursday, 29 March 12

Page 49: Ripping your s apart

PDF pages are ‘drawn’

0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties

Thursday, 29 March 12

Page 50: Ripping your s apart

PDF pages are ‘drawn’

0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen

Thursday, 29 March 12

Page 51: Ripping your s apart

PDF pages are ‘drawn’

0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*

Thursday, 29 March 12

Page 52: Ripping your s apart

PDF pages are ‘drawn’

0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*/T1_1 1Tf change font

Thursday, 29 March 12

Page 53: Ripping your s apart

PDF pages are ‘drawn’

0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*/T1_1 1Tf change font0.856 0 Td move to a different location onscreen

Thursday, 29 March 12

Page 54: Ripping your s apart

PDF pages are ‘drawn’

0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*/T1_1 1Tf change font0.856 0 Td move to a different location onscreen( = 100) Tj draw the text = 100

Thursday, 29 March 12

Page 55: Ripping your s apart

PDF pages are ‘drawn’

0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*/T1_1 1Tf change font0.856 0 Td move to a different location onscreen( = 100) Tj draw the text = 100 -0.324 -1.133Td move to a different location onscreen

Thursday, 29 March 12

Page 56: Ripping your s apart

PDF pages are ‘drawn’

0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*/T1_1 1Tf change font0.856 0 Td move to a different location onscreen( = 100) Tj draw the text = 100 -0.324 -1.133Td move to a different location onscreen[(whit)6(e)] Tj draw the text white (put a gap between t and e)

Thursday, 29 March 12

Page 57: Ripping your s apart

Thursday, 29 March 12

Page 58: Ripping your s apart

PDF myth - files are cross platform

Only if you create them properly...

Thursday, 29 March 12

Page 59: Ripping your s apart

Obfuscation for idiots!

No-one will be able to guess the secret password

Thursday, 29 March 12

Page 60: Ripping your s apart

20 seconds later...

And the password is....

Thursday, 29 March 12

Page 61: Ripping your s apart

Lastly a plea

Not all PDF creation tools are equal

Thursday, 29 March 12

Page 62: Ripping your s apart

In summary

Thursday, 29 March 12