tim keefe - dri training series day ucc: digitising your collection
TRANSCRIPT
DRI Training: Preparing Your Collection for DRI2. Digitising Your Collection
Digital Imaging – Introduction, components, process.
Tim Keefe, Head of Digital Resources and Imaging Services, Trinity College Dublin
Questions we all need to ask ??
When beginning a digitization project it is easy to ignore the basic questions, those questions that we all assume we know the answers to … however these questions are often the most important, and need to addressed formally.
Questions to ask
What is the purpose of this project? What is the scope of the digitization
activity? What is the intended lifetime of the
digital files? Who is the intended audience?
Purpose What is the purpose of this project?
Why are we digitizing the material Need/Trend
Access Research Education
Who are the champions for this project Local External
Who or what are the barriers to the implementation of this project Human Resource Procedural/Political
Scope What is the scope of the digitization activity
What is to be digitized What is not to be digitized Why?
Who is likely to demand operation outside of these criteria
Intended Audience Who is the intended audience for the digital
resources What are their needs How will they access the material Who else will be interested
Are you prepared for a new audience (known or unknown) to self select to become the primary audience
Do you wish to prevent any audience from having access to the resources
Image Lifetime
What is the intended lifetime for the digital records This question is critical to the appropriate
development of the digitization activity Significant resource implications Significant planning implications Significant digitization process implications
So Why Digitize? Access
Electronic mediums provide the most dynamic assess Digital data structures offer the opportunity for truly dynamic new
research and educational models offering unique new capabilities to existing methodologies
Preservation Digital files designed to proper specifications can be true surrogates
for delicate source materials for all but a hand full of advanced research needs
Manipulation Non Linear Digital resources allow for easy modification to image characteristics Digital files easily cross medium boundaries providing opportunities
for new use models
Problems with digitization Pace of technological change is constantly
increasing the digital attributes bar Not human readable Lack of best practices / attribute recommendations Long term digital preservation is a newly emerging
field, solutions just beginning to emerge Much more complex than having IS Services make a
backup copy Extremely costly activity
TCO not well understood, few models
Capture for What?
In TCD we designate the capture activity on the object intent Capturing for Content
Speed and cost most important Quality less important
Capturing the Object Quality most important Meeting the needs of the researcher… researching
anything
Components The primary components of an average
imaging system: Digital capture device
Light source if not included in the capture system Optics if not included in the capture system
Color Calibration System Image Capture/Image Processing Computer
System(s) Software packages Data Storage Systems
Digital Capture Systems Scanners
Digital Capture Systems Flatbed
Reflective /transmissive capabilities Infra red dust and scratch removal systems (ICE) Linear/Tri linear or CCD systems Low productivity Inclusive of software
Digital Capture Systems Flatbed (limitations)
Works best with two-dimensional materials. Not recommended for use with fragile or tightly
bound material. Limited scan area. Very slow
Digital Capture Systems
35mm Photographic
Digital Capture Systems Digital Photographic systems
35mm format CCD / CMOS digital capture sensors Full Frame or Reduced frame sensors
1.5 to 1.33 avg. magnification values High productivity Limited resolution Limited bit depth (8-14 bit) Cost effective Good starting solution
Digital Capture Systems
Medium format (MF digital back)
Digital Capture Systems Medium format (MF digital back)
CCD sensors 6 x 4.5cm to 6 x 7cm sensor size With and with/out micro-lenses High bit depth (16bit)
High productivity High Cost Requires high level of studio photographic
experience Additional software needs. Associated Equipment also expensive
Digital Capture Systems Dedicated Book Scanning Systems
One size fits all… and all its limitations Limited source material input Material handling and support Possible automation
page turning , image management
Linear or CCD based Digital Camera based
High to very high productivity
Digital Capture Systems Dedicated Book Scanning Systems
Linear CCD based, generally with included software. (flatbed in different form factor)
Digital Capture Systems Dedicated Book Scanning Systems
Digital Camera based Robotic Scanners
Robots…Really?
Computer Technology What to buy
Image processing is one of the more intensive computing tasks
Recommendation is to buy the fastest most modern computer that you can afford right now Memory requirements are often more critical than processor
speed (multi core technology is not being fully advantaged by software yet)
Graphics Card often more important than processor Have a minimum RAM of 4x your largest file size… 8x
recommended Will cost 2-5x more than normal office computer
Computer Technology Consider the software needs of the digital capture
system you have chosen. Is software for generating the files required by your
Project Scenario or device type? Some MF camera systems require unique software
Will it be necessary to purchase additional image editing software packages (e.g. Adobe Creative Suite/ Photoshop) or file management software (Lightroom, Bridge, etc.) Many of these software packages are now subscription based
Storage Technology RAID (Redundant array of inexpensive disks)
Level 0 (striped) – Speed and performance increases Data is broken up and is written across several disks, taking
advantage of multiple writing heads to improve data throughput (often used for video processing)
Level 1 (mirrored) – Security through redundancy Data is identically written to more than one disk, allowing
for backup protection should any single disk fail The overall all data storage volume of the system is halved
when a level one raid is activated Local Hard drive (under the desk solution)
Low cost, lowest preservation (use only when required)
Digital Vocabulary
File Structure File types Compression Spatial resolution Bit Depth Dynamic range Color mode
File Types Tiff (Tagged Image File Format)
Large file size Standard format Lossless compression LZW (and lossy options)
Jpeg (Joint Photographic Experts Group) Smaller file sizes Lossy compression in most cases but newest versions
support lossless (Rarely supported) Standard format
Jpeg 2000 (Lossless and or Lossy) Multiple file sizes embedded within single digital record Emerging format (adoption very slow, caution)
File Types cont. PDF (Portable Document Format - Adobe Acrobat)
Advanced Cross Platform Compatibility Ability to support complex document generation
Text, images, notes, embedded graphics, etc, Support for advanced printing Support for sharing and dissemination
Standard file type Caution as there are a wide variety of versions and variants Digital preservation ISO standard acrobat type A files
Adoption rate very low Some believe that this standard had political / corporate influence driving
recommendation GIF
Dying file format, not recommended
File CompressionTwo basic types of compression Lossy and Lossless
Lossy Image structure is changed (damaged) by the compression
activity, but not in a perceptual way Jpeg is the most common format using lossy compression Every file save increases the damage
file conversion/save into a lossy format should always be the final step in the digitization and image processing process
Large reduction in file size
File Saving
Save Order When working with files that use or will use a
lossy compression (Jpeg) it is important that the very last step in the process is the file save
Each save recompresses the data and causes further image degradation It is best practice to work in a lossless format such
as Tiff, and save out the final Jpeg as a last step. This workflow will minimize the impact of the compression artifacts
Compression cont.
Lossless Image file structure is not changed in any way
by the compression activity The Tiff file format with LZW compression is
the most widely used lossless compression format Note, the tiff file format can be also generated with
no compression or lossy compression
Compression examples
Resolution This metric is generally stated as pixels per inch (ppi), or
the total number of individual picture elements that will fit in a 1 x 1 inch sample This is sometimes confused with dots per inch (dpi) which is a
printing specific metric Spatial resolution requires dimensional measurements and
ppi sample rate Screen resolution is 72 ppi (newest technology screens now
exceeding 125ppi) High resolution commercial printing requires 300-650 ppi image
files General internet jpg files 72-150ppi
Bit Depth
Bit depth is the number of samples provided within each image channel (RGB, CMYK) This term is often confused with dynamic range
They are not the same however there is an interaction between them
The number of discrete steps between black and white
Bit Depth
Bit depth is stated in the number of bits of data per channel Bit depth is 2 (binary measure) raised to the power of
the bit depth number so 4 bit color will have 16 steps between the black and white values
** note that bit depth is stated in either the number of bits per channel as in 8 bit color or by the sum of all the channels combined (R+G+B) = 24bit color… this can be confusing
Bit Depth 8 bits per channel (or 24 bit color)
256 value steps in each channel 16.8 million possible colors
16 bit per channel (or 48 bit color) 65536 value steps in each channel 281.5 trillion possible colors
Many manufacturers talk about interim bit depths (12- 14), but the final output is often reduced to 8 bits per channel you cannot add missing data by moving to a higher bit depth
Dynamic range Dynamic range is the ability of a sensor to
simultaneously capture dark detail, and light detail This is an inherent weakness of digital capture
Decisions are made to set device to support either a greater tonal range of dark densities(more common) or light
Commonly confused with bit depth They are separate characteristics despite all the contrary
information out there (much of it from reputable sources)… I promise
Greater bit depth will not automatically provide greater Dynamic Range (however improvements in bit depth often accompany other sensor improvements that include increased DR)
Dynamic Range Clipping
Clipping is a failure state of a digital image as the limited dynamic range of a device is unable to correctly capture either very light or very dark tones
Color Mode
RGB (Red/Green/Blue color channels) Additive color Most common color mode for digital images Mimics human visual system
Color Mode CMYK (Cyan/Magenta/Yellow/Black)
Subtractive color Commercial Printing standard
Most desktop color printers support RGB color files (CMYK conversion is internally managed)
Limited color gamut
Color Mode Lab color
Single luminance (grey scale channel) and 2 opposing color channels
Loosely represents the range of human vision
Good for transforms
Color Profile Standards The user defined color profile assigned to the
image files supports several informal standard configurations
sRGB Profile developed more than a decade ago by HP and
Microsoft. Represents the Gamut of an average CRT monitor Very Limited color palette New output devices currently capable of exceeding this space Most commonly used profile (usually the default if not stated)
Color Profile Standards Adobe RGB 1998
Newer profile designed to support wider palette of colors to support higher quality printing Lower use than sRGB, but well recognized Maintains a color appearance consistent with sRGB devices
ProPhoto RGB A wide gamut color space designed for very high quality printing of
photographic images Color appearance is highly inconsistent when use with devices not color
managed, or set to sRGB standards Despite the benefits of this color space, its use is quite limited due to the setup
and management requirements Caution in its use, as inaccurate color characteristics can occur with
improperly managed devices
Image Processing
Post capture modifications and manipulations to the original digital
image file structure
The Controversy
Two primary schools of thought The digital master image files should remain
untouched as they emerge from the capture device and all subsequent processing should occur only on the surrogates
Image processing will occur on the master capture file with the intent of matching the original source material as closely as possible at the time of capture
Color Mode RGB
Standard image space for files Common, not likely to change
CMYK Avoid this space for all but specific commercial printing
activities (even then try to ignore it) Lab
Great for processing transforms that can benefit from a luminance channel Sharpening Noise removal
No color profile
File Formats Master
This is the high quality large image generated from the capture device
Surrogates These are secondary files generated from the
master file to be used for specific purposes
File format Sets Master
Tiff This is intended to be the highest quality image Represents the asset derived from the € spent Lossless compression recommended
Compressed Jpg’s File size reduced for easier management, and
dissemination, and to manage costs Lossy compression is acceptable within the use cases Often several sizes (Large, small, thumbnail) Used for public display
Image Manipulations
Tone Scale To adjust tone scale you need to push or pull
predetermined black and white values to defined positions on the histogram This requires the use of a calibrated reference target placed
within the image
Image Manipulations Sharpening
Sharpening works by increasing the contrast between edges in an image. This change in contrast fools the human visual system into believing that the image is sharper
Image Manipulation
Sharpening
Cropping Cropping
Cropping is the permanent removal of unwanted parts of the image Formally determine where the boarders of your images
should be For research purposes the entire page should be represented For access and content related scanning cropping to the
textural areas of the page may be desired Failure modes
What determines a crop or image capture that is unacceptable requiring reprocessing or a new capture
Formalize this
Skew/Rotation
Skew/Rotation When the source material is not perpendicular
to the edges of the digital image Failure mode
Determine what percent is unacceptable Formalize this criteria
White Balance White balance is a color balancing function used
to address the color differences imparted by varying light sources. The human visual system does this automatically in the
brain, removing the real color cast imparted by source illuminant and giving us the perception that most lights are white.
Think of the differences evident when you have a desktop incandescent bulb in a room lit by fluorescent This is also important in the environment where your image
processing occurs
White Balance Most white balance is preset within the capture system, however fine
tuning or custom profiles can be applied in the processing stage Neutral 18% grey references are used to generate a custom balance When adjusting tone scale in Photoshop, neutral grey adjustment can be
used to correct White Balance inconsistencies
Quality Control/Assurance
Imaging and image processing are a highly repetitive, human dependent set of processes
and are therefore highly susceptible to regular error
Control vs. Assurance
Control is in process activities to ensure quality in the creation of the products ( digital images)
Assurance is focused on an evaluation of the processes used and generally takes place outside of the creation process
Quality Control
Processes built into the imaging work flow to ensure that the creation of digital images is Consistent Accurate Repeatable
Often automated these processes are inherently part of the imaging workflow
Quality Assurance
The Quality Assurance Audit Formal.. Informal just does not work Existing toolsets developed for a variety of
manufacturing based industries are highly effective TQM Six Sigma Etc.
Takes place fully outside of the imaging processes
Quality Assurance Testing
What to test for Imaging
File structure metrics Naming, page counts System/Network (positioning, backup
etc.) Metadata
Structure Accuracy Completeness
Color Management
One of the most critical, and often ignored, components of a successful digitization project is a well planned
color management strategy
Color Management
Within any imaging and processing system you need to ensure that consistent color is displayed from device to device, and that a files color metrics are electronically recognized
Technology Required Capture reference targets Color profiles / icc
Color Reference Targets
Allows a formal measured reverence to be associated with the image (future proofing)
Color Management Technology Color meters (Basic screen calibration)
Absorptive measurements Less dynamic than Spectrophotometers
Spectrophotometers (Advanced CM) Can measure the intensity of light as a
function of the wavelength of the light Light absorption Diffuse Specular
CM Standards ICC (international color consortium)
Works through a standardized Color Matching Module (CMM) connection space
Not an ideal solution, but one that has been very well adopted by most imaging related hardware and software vendors
ColorSync (Apple Computer) Apple solution to color management Part of the Macintosh system software Generally plays well with others, occasionally some
fiddling is necessary (ICC integrated) Hands off approach
Further Reading and Resources DRI and Digital File Format Choices Factsheet:
http://dri.ie/sites/default/files/files/dri-factsheets-file-formats.pdf DRI Long-Term Digital Preservation Factsheet:
http://tinyurl.com/hbp28xe Online Resources for Digitisation Projects:
http://dri.ie/digitisation-resources- includes resources for Project Planning, File Formats, Audio
& Audiovisual, Hardware, Metadata & Vocabularies and Policy. Trinity College Dublin Digital Collections Repository:
https://www.tcd.ie/Library/dris/digital.php