raymond a. clarke enterprise storage consultant oracle corporation technology trends in preservation...

29
Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Raymond A. ClarkeEnterprise Storage ConsultantOracle Corporation

Technology Trends in Preservation and ArchivingPart 2

PASIG Spring 2011

Page 2: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

2

Agenda

Issues and Industry Observations

Bitrot

To Cloud or Not to Cloud

Logical Migration

Archive, Tiered Storage, Backup?

Key Take-Aways

Page 3: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Top Use Cases

• According to a recent study (*), the top archiving use cases are:

• Archive for Compliance.• Apply litigation hold on archived data.• Early case assessment (ECA) on archived data.• eDiscovery on archived data.• Archiving for application efficiency.• Storage cost mitigation by archiving historical data.• Storage cost mitigation by archiving infrequently accessed data.• Data reuse• Business continuity.

(*) IDC, Worldwide Archival Storage Solutions Taxonomy, 2011

Page 4: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

4

Agenda

Issues and Industry Observations

Bitrot

To Cloud or Not to Cloud

Logical Migration

Archive, Tiered Storage, Backup?

Key Take-Aways

Page 5: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

What Causes Bit Rot?

• Lot of things -• System(NICs, HBAs, etc.)/Software interface

changes in active code that is called from the dormant code

• Online aging databases may also suffer from data loss due to update errors, media failure, incomplete backup and restore operations, user error, changes in the database structure, and other related maintenance issues.

• Cosmic rays and sun spots?• Data transmission noise• File system errors• Insider Attacks• Natural Disasters

Page 6: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

BER & Storage Infrastructure Components

● NIC/Link/HBA: 10-10 (1 bit in ~1.1 GB)● Check-summed, retransmit if necessary● Memory: 10-12 (1 bit in ~116 GB)● ECC● Desktop Disk: 10-14 (1 bit in ~11.3 TB)● Various error correction codes● Enterprise Disk: 10-15 (1 bit in ~113 TB)● Various error correction codes● Tape: 10-19 (1 bit in ~1.11 PB)● Various error correction codes

Note: Data maybe encoded up to five or more times as it travels from memory to the physical disk!

Page 7: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

BER & Types of Corruption – cont.

● Type 1 Corruption

● Usually persistent● Bit(s) have flipped in a byte● Single Bit Error (SBE)● Double Bit Error (DBE)

● DBEs are 3x more common than SBEs● 1→0 transition more frequent than 0→1● Strong correlation with bad memory ● Happens with expensive ECC-memory too

Source:Silent CorruptionsPeter.KelemenCERN After C5, June 1st, 2007

Page 8: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

BER & Types of Corruption – cont.

● Type 2 Corruption

● Usually transient● Small chunks of “random” looking data

● ...but can go up to 128K● Sometimes identifiable user data

Source:Silent CorruptionsPeter.KelemenCERN After C5, June 1st, 2007

Page 9: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

BER & Types of Corruption – cont.

● Type 3 Corruption

● Usually persistent, comes in bursts● Strong correlation: I/O command timeouts

● Observed on plain SATA systems● ...sometimes with failed READ commands!

● Appears to match RAID stripe size (64K)● Observed on 16K chunk RAID arrays as well

Source:Silent CorruptionsPeter.KelemenCERN After C5, June 1st, 2007

Page 10: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

BER & Types of Corruption – cont.

● Type 4 Corruption

● Usually persistent● Still pretty much unexplained● Not as clearly defined ...not sure yet this warrants another category

Source:Silent CorruptionsPeter.KelemenCERN After C5, June 1st, 2007

Page 11: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

11

Agenda

Issues and Industry Observations

Bitrot

To Cloud or Not to Cloud

Logical Migration

Archive, Tiered Storage, Backup?

Key Take-Aways

Page 12: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

What is Cloud Storage?

• The use of the term cloud in describing these new models arose from architecture drawings that typically used a cloud as the dominant networking icon.

• The cloud conceptually represents any to any connectivity in a network, but also an abstraction of concerns such the actual connectivity and the services running in the network that accomplish that connectivity with little manual intervention.

Page 13: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Security or related component is #1 concern/issue for most customers.Not always the same at Cloud Providers & not part of the definitions!

Cloud – Adoption is tempered by uncertainty

LOB/IT CIO

Am I really doing what’s right for the business?

SecurityPerformance

Availability

Reliability

Scalability

Service levels

Data security & protection Compliance

AuditingCost

Governance

Control

Page 14: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Forecast

• Forecasts for 2010-2014 indicate that Cloud Archiving revenue will grow between 28-36% per year, the fastest growing service segment after basic cloud storage services (38%).

• Medical Image Archive to become one of the drivers of Cloud Archiving.

– Over 1 billion diagnostic imaging procedures will be performed in the US during year 2014, generating about 100 petabytes of data (*).

• Consumer segment also a target market for Cloud Archive vendors in the near future:

– Archival of digital assets (backup/archive): Photos, videos, personal data.

• Forrester research report (**) reccomends cloud archiving to reduce costs, taking into account data protection and security.

(*) IEEE. A Medical Image Archive Solution in the Cloud, 2011

(**) Forrester. Your Enterprise Data Archiving Strategy, 2011

Page 15: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Cloud Issues

• No standard for data validation, audit.• Long-term relationship bewteen customer and

vendor.• Data migration between vendors could be difficult

without standarized interfaces.• Regulatory and compliance issues may difficult the

use of public clouds for data archiving.

Page 16: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Where are the standards?• Cloud Security Alliance

– Promoting Best Security Practices for the Cloud• Jericho Forum

• Cloud Cube Model: Recommendations & (Security)Evaluation Framework

• NIST– Cloud Research

• DMTF– Cloud Lab + VMware submission of vCloud

• OASIS– Cloud TC for security standards

• Open Group• Open Web Application Security Project (OWASP) • Open Cloud Manifesto• ... And moreDoes it help to apply SAS 70, ISO 27001and other related security standards to evaluate and audit provider solutions.

Page 17: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Where Can a Cloud Storage Standards Help ?

• Common method to instantiate and manage data storage resource– Easier development, faster deployment

• Common Cloud Storage metadata management– Storage, System, Network, Application, Security, Privacy

• Cloud-to-Cloud data portability– Migrate not only data but also management requirements

• Easier SLA/SLO management– Requirements pushed down object-level

• Easier integration in greater Cloud environments:– Cloud Storage for Cloud Computing, Federated Services

Page 18: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

SNIA Cloud Storage Initiative(CSI)

• Industry’s First Cloud Standard: Cloud Data Management Interface (CDMI)– Now a SNIA Architecture (standard)

• Completed in record time– Cloud Storage TWG formed at last Spring’s SNW

• Participation from a broad group of industry players– 175 participants from over 50 organizations

• Widely Applicable:• Public Storage Cloud• Private Storage Cloud

– Storage Vendors• Cloud Computing

Page 19: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

19

Agenda

Issues and Industry Observations

Bitrot

To Cloud or Not to Cloud

Logical Migration

Archive, Tiered Storage, Backup

Key Take-Aways?

Page 20: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Key Findings

• The problems of logical and physical retention– Practitioners are struggling – information is at risk long-term– Problems are real and generally understood

• Long-term generally means over 10-15 years. – IT can manage to migrate and retain readability for about this long. For

longer periods, processes begin failing, become too costly, and the volume of information becomes overwhelming.

• Long-term retention requirements are real. – Over 80% of organizations reporting have a need to retain information

over 50 years and 68% report a need of over 100 years.

20

“This is the problem with 'Digital Archive', you are not thinking long enough into the future.“ (Source: Respondent)

Page 21: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

What is the Self-Contained Information Retention Format (SIRF)?• A logical container format for the storage subsystem

appropriate for the long-term storage of digital information– A logical data format of a mountable unit e.g. a filesystem,

a block device, a stream device, an object store, a tape, etc.

– Includes a cluster of “interpretable” preservation objects that can be understood in the future

– Self-describing – can be interpreted by different systems– Self-contained – all data needed for the preservation

objects interpretation is contained within the preservation objects cluster• If a mountable unit is damaged or lost, the effect is

contained – the information in the other mountable units is still valid !

• Need to define how and when external references are supported

• A work effort by SNIA’s Long Term Retention Technical Work Group

Page 22: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

SIRF Objectives

• Facilitate transparent logical and physical migration and movement in order to support long term preservation–Media, subsystem or bitstream movement – remove the mountable unit from system A and put it at system B.

–Transparent – System A is not involved. All the information needed for system B to understand the mountable unit is self-described and self-contained within the mountable unit.

–Long term – 15 years and above (according to 100 years archive requirements survey).

–Preservation – sustain the understandability and usability of the data and not just the bits.

• Considering multiple implementations of SIRF to utilize:–the Open Archival Information System (OAIS) ISO standard

–SNIA’s eXtensible Access Method (XAM)

Page 23: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Problem SIRF is Addressing

• Can move cluster of preservation objects between systems by itself

• Any SIRF compliant application can read and interpret the preservation objects

• No need for export and import processes• Preservation Objects can survive longer

Application A

Interface

PreservationRetentionStorage

Subsystem

SIRF

SIRF

Application B

Interface

PreservationRetentionStorage

Subsystem

SIRF

SIRF

SIRF

SIRF

PASS

With SIRFApplication A

Interface

PreservationRetentionStorage

Subsystem

Data Type

Data Type

Application B

Interface

PreservationRetentionStorage

Subsystem

Data Type

Data Type

Without SIRF

Stop

Cannot move cluster of preservation objects between systems by itself

Only the original application who wrote the preservation objects can read and interpret them

Utilize export and import processes Preservation Objects cannot be

sustained over the long-term

Page 24: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

24

Agenda

Issues and Industry Observations

Bitrot

To Cloud or Not to Cloud

Logical Migration

Archive, Tiered Storage, Backup?

Key Take-Aways

Page 25: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Backup vs. HSM vs. Archive – What are we really talking about?

Applications Data Protection Tiered Storage Management

Information Archive

PurposeShort term protection of records for system recovery

Efficient Management of physical storage assets

Long term preservation of records for business, compliance, libraries

Data TypeDynamic data in production

Should be data type agnostic but some are not

Fixed content with ongoing, long-term integrity needs and value

Access PatternEntire volume or directory is restored after outage

Does not imply or necessarily provide seamless promotion of files, objects or tables upon request

Individual files, objects or tables are searched/queried and retrieved as needed

System Activity Usually block basedUsually block based but some do support file based activity

Must be file, table and/or object based

Security Strict access policies Not a function.

Must guarantee information integrity and security over extended periods of time

Fundamentally different approaches to data availability and storage infrastructure efficiency. All may be required in today’s enterprise environments.

Page 26: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Copyright © 2010 Oracle Corporation – Confidential26

Backup Archive

Copies Data Moves Data

Supports Operation and Recovery Supports Business and Compliance

Supports Availability Supports Operational Efficiencies

Short Term In Nature Long Term In Nature

Data Typically Overwritten Data Typically Secured, Not Overwritten

No Historic Relevance For Historic Information

Not Easily Searched Easily Searched

Source: Addressing the Problem of Inactive Data Filling Up Expensive Active Disk Silos Floyd Christofferson, SGI

Assessing Archive Requirements

Page 27: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Building a Terminology Bridge• Archive: the report advocates that IT practices adopt a more consistent usage of the

term ‘archive’ with other departments within the organization. To the archival, preservation, and records management communities, an “archive” is a specialized repository with preservation services and attributes.

• Preservation: managing information in today’s datacenter with requirements to safeguard information assets for eDiscovery, litigation evidence, security, and regulatory compliance requires that many classes of information be preserved from time of creation. Preservation is a set of services that protect, provide availability, integrity and authenticity controls, include security and confidentiality safeguards, and include an audit log, control of metadata, and other practices for each preservation object. The old IT practice of placing information into an archive when it becomes inactive or expired no longer works for compliance or litigation support, and only adds cost.

• Authenticity: is defined in a digital retention and preservation context as a practice of verifying a digital object has not changed. Authenticity attempts to identify that an object is currently the same genuine object that it was “originally” and verify that it has not changed over time unless that change is known and authorized. Authenticity verification requires the use of metadata. The critical change for IT practices is that metadata is now very important and must be safeguarded with the same priorities the data is. IT practices

Page 28: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Parting Thoughts…

● Bit Rot is a fact of life – It may never go away!

● Early detection is the first step towards a solution.

● Cloud future bright but adoption and archive use still

evolvin

g

● Logical migration is as big an issue as physical

migrati

on

● Terminology - Key to classification and efficient long-

term

archive

design

● Efforts have begun and your help is needed!

Page 29: Raymond A. Clarke Enterprise Storage Consultant Oracle Corporation Technology Trends in Preservation and Archiving Part 2 PASIG Spring 2011

Copyright © 2010 Oracle Corporation – Confidential 29

Copyright © 2009, Oracle and/or its affiliates. All rights reserved. 29

for your time and attention