ripping your s apart
Post on 21-Oct-2014
257 views
DESCRIPTION
iText Summit 2012 by Mark Stephens, CEO/Developer at IDRsolutions, explaining "What you really need to know about the guts of your PDF files"TRANSCRIPT
RIPPING YOUR PDF FILES APART
What you need to know about what goes on inside your PDF files
Mark Stephens
Thursday, 29 March 12
RIPPING YOUR PDF FILES APART
What you need to know about what goes on inside your PDF files
Mark Stephens
Thursday, 29 March 12
Mark’s Bio
Thursday, 29 March 12
Mark’s Bio
Thursday, 29 March 12
Mark’s Bio
Thursday, 29 March 12
Mark’s Bio
Working with Java and PDF since 1997
Thursday, 29 March 12
Mark’s Bio
Working with Java and PDF since 1997Founded IDRsolutions 1999
Thursday, 29 March 12
Mark’s Bio
Working with Java and PDF since 1997Founded IDRsolutions 1999Speaker at Seybold, Javaone, Business of Software
Thursday, 29 March 12
Mark’s Bio
Working with Java and PDF since 1997Founded IDRsolutions 1999Speaker at Seybold, Javaone, Business of Software
Thursday, 29 March 12
Mark’s Bio
Working with Java and PDF since 1997Founded IDRsolutions 1999Speaker at Seybold, Javaone, Business of SoftwareMA degree in Mediaeval History from St Andrews (how useless is that)
Thursday, 29 March 12
Mark’s Bio
Working with Java and PDF since 1997Founded IDRsolutions 1999Speaker at Seybold, Javaone, Business of Software
Ask me about Java, PDF, business or anything which happened before 1500 AD
MA degree in Mediaeval History from St Andrews (how useless is that)
Thursday, 29 March 12
BUT FIRST SOME KITTENS...
The support team at IDRsolutions are waiting for your call (maybe)
Thursday, 29 March 12
The PDF reference guide
Thursday, 29 March 12
Loading page 1124 of a file
WordRead pages 1-1123 (time passes - scroll bar shrinks)Found it (eventually)
Thursday, 29 March 12
Loading page 1124 of a file
PDFRead the metadata refs table(s) - where do I find all the objectsSkip to page 1124
WordRead pages 1-1123 (time passes - scroll bar shrinks)Found it (eventually)
Thursday, 29 March 12
Loading page 1124 of a file
PDFRead the metadata refs table(s) - where do I find all the objectsSkip to page 1124
WordRead pages 1-1123 (time passes - scroll bar shrinks)Found it (eventually)
PDF (in detail)Read the refs table(s) - where do I find all the objectsRead the Root object - points to the Pages objectRead object for page 1124 (tells me the linked font, image, content objects)Draw it
Thursday, 29 March 12
Your PDF file is a Tree
A root linked to all the branches
Thursday, 29 March 12
The PDF reference guide
Thursday, 29 March 12
The PDF reference guideLike you have never seen it before...
Thursday, 29 March 12
The PDF reference guideLike you have never seen it before...
Thursday, 29 March 12
The PDF reference guideLike you have never seen it before...
You can use vi or emacs if you preferThursday, 29 March 12
The PDF reference guideEnd of the file
Thursday, 29 March 12
The PDF reference guideLike you have never seen it before...
Thursday, 29 March 12
The PDF reference guide
Thursday, 29 March 12
The PDF reference guideLike you have never seen it before...
Thursday, 29 March 12
The PDF root objectLike you have never seen it before...
Thursday, 29 March 12
The PDF root objectLike you have never seen it before...
Thursday, 29 March 12
PDF files on the webIsn’t having the marker at the end a problem??
Thursday, 29 March 12
PDF files on the webNot if you create it properly
Thursday, 29 March 12
Key takeaways from the PDF structure
Thursday, 29 March 12
Key takeaways from the PDF structure
We do not need to load the whole file
Thursday, 29 March 12
Key takeaways from the PDF structure
We do not need to load the whole file It is equally fast to load any part of it
Thursday, 29 March 12
Key takeaways from the PDF structure
We do not need to load the whole file It is equally fast to load any part of itIt is very easy to replace objects with new versions
Thursday, 29 March 12
Key takeaways from the PDF structure
We do not need to load the whole file It is equally fast to load any part of itIt is very easy to replace objects with new versionsThere are certain key locations - like at the end of a file
Thursday, 29 March 12
Key takeaways from the PDF structure
We do not need to load the whole file It is equally fast to load any part of itIt is very easy to replace objects with new versionsThere are certain key locations - like at the end of a fileYou should not edit it in a text editor
Thursday, 29 March 12
Key takeaways from the PDF structure
We do not need to load the whole file It is equally fast to load any part of itIt is very easy to replace objects with new versionsThere are certain key locations - like at the end of a fileYou should not edit it in a text editorIf you want to use PDF files across the Internet, there is a special mode to make these load the most important parts first.
Thursday, 29 March 12
Key takeaways from the PDF structure
We do not need to load the whole file It is equally fast to load any part of itIt is very easy to replace objects with new versionsThere are certain key locations - like at the end of a fileYou should not edit it in a text editorIf you want to use PDF files across the Internet, there is a special mode to make these load the most important parts first.Lots of features need you to setup the PDF file correctly.
Thursday, 29 March 12
Those PDF objects in more detail
All PDF objects have:-1. An ID number2. (Optional) A set of dictionary key pairs3. (Optional) A block of binary data.
Thursday, 29 March 12
Those PDF objects in more detail
All PDF objects have:-1. An ID number2. (Optional) A set of dictionary key pairs3. (Optional) A block of binary data.
Thursday, 29 March 12
PDF images are not Tiff, Png or JPeg
Thursday, 29 March 12
PDF images are not Tiff, Png or JPeg
Thursday, 29 March 12
A word on colour
Thursday, 29 March 12
A word on colour
DeviceRGBCalRGB
DeviceCMYKICC
SeparationDeviceN
DeviceGrayCalGray
LabPattern
Thursday, 29 March 12
PDF pages are ‘drawn’
Thursday, 29 March 12
PDF pages are ‘drawn’
Thursday, 29 March 12
PDF pages are ‘drawn’
0 0 0 1k set cmyk color of text to black
Thursday, 29 March 12
PDF pages are ‘drawn’
0 0 0 1k set cmyk color of text to blackBT start of some text
Thursday, 29 March 12
PDF pages are ‘drawn’
0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere
Thursday, 29 March 12
PDF pages are ‘drawn’
0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties
Thursday, 29 March 12
PDF pages are ‘drawn’
0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen
Thursday, 29 March 12
PDF pages are ‘drawn’
0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*
Thursday, 29 March 12
PDF pages are ‘drawn’
0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*/T1_1 1Tf change font
Thursday, 29 March 12
PDF pages are ‘drawn’
0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*/T1_1 1Tf change font0.856 0 Td move to a different location onscreen
Thursday, 29 March 12
PDF pages are ‘drawn’
0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*/T1_1 1Tf change font0.856 0 Td move to a different location onscreen( = 100) Tj draw the text = 100
Thursday, 29 March 12
PDF pages are ‘drawn’
0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*/T1_1 1Tf change font0.856 0 Td move to a different location onscreen( = 100) Tj draw the text = 100 -0.324 -1.133Td move to a different location onscreen
Thursday, 29 March 12
PDF pages are ‘drawn’
0 0 0 1k set cmyk color of text to blackBT start of some text/T1_01Tf Use the font defined as T1_0 elsewhere0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen(L*) Tj draw the text L*/T1_1 1Tf change font0.856 0 Td move to a different location onscreen( = 100) Tj draw the text = 100 -0.324 -1.133Td move to a different location onscreen[(whit)6(e)] Tj draw the text white (put a gap between t and e)
Thursday, 29 March 12
Thursday, 29 March 12
PDF myth - files are cross platform
Only if you create them properly...
Thursday, 29 March 12
Obfuscation for idiots!
No-one will be able to guess the secret password
Thursday, 29 March 12
20 seconds later...
And the password is....
Thursday, 29 March 12
Lastly a plea
Not all PDF creation tools are equal
Thursday, 29 March 12
In summary
Thursday, 29 March 12