dus - klassifizierung von informationen als grundlage für ... · klassifizierung von informationen...

46
Frank Bunn, Symantec Corporation SNIA European Board of Directors Klassifizierung von Informationen als Grundlage für ILM

Upload: others

Post on 24-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Frank Bunn, Symantec CorporationSNIA European Board of Directors

Klassifizierung von Informationen als Grundlage für ILM

Page 2: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

2

SNIA Legal Notice

The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under the following conditions:

Any slide or slides used must be reproduced without modificationThe SNIA must be acknowledged as source of any material used in the body of any document containing material from these presentations.

This presentation is a project of the SNIA Education Committee.

Page 3: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Agenda

ILM Overview ILM Overview Data ClassificationData Classification

What and why?What and why?Data ClassificationData Classification

TypesTypesPros and ConsPros and Cons

ILM Roadmap and futuresILM Roadmap and futuresSummarySummary

Page 4: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

ILM Overview

Page 5: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Definition of ILM

The policies, processes, practices, services and tools used to align the business value of information with the most appropriate and cost-effective infrastructure from the time information is created through its final disposition. Information is aligned with business requirements through management policies and service levels associated with applications, metadata and data.

ILM is a standards-based, business-driven management practice

Page 6: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Implementing ILM

Data Discovery

Business Group I.T.Classify Data

Setup Standard PoliciesDefine SLOs for Classes

1 2 3

Report, Monitor, Analyze

Standardize Practices

Tier Storage

Performance, Availability, Protection, Recovery, ArchiveCompliance, Retention/Deletion, Confidentiality, Distribution

ILM is a standards-based, business-driven management practice

Information LifecycleInformation Management

Information Assurance

Information LifecycleInformation Management

Information Assurance

Page 7: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Value Proposition for ILM

1 2 3 4 5

$10.00

$X

Today = > $20 per GB

Implementation Phase

Without ILM

REDUCE THE TCO of Storage

With ILM

Page 8: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

8

Global Digital Information Growth

Source: The Expanding Digital Universe, March 2007, IDC, Merrill Lynch 2007-08 storage forecast & views from CIOs, Enterprise Strategy Groups 2006 Digital Archive study

200640.25 M TB10,000 T files

2010296.4 M TB

60,000 T files

80% is unstructured

60,000 T Files

Storage TCOExternal disk storage purchase projected to grow at 52% annuallyCapacity is #1 storage issued driven by email, unstructured dataSignificant transition to disk-based archival storage Digital archive capacity will increase nearly tenfold between 2005 and 2010

Page 9: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

9

Knowledge Management

Improved productivityThe average knowledge worker spends six hours per week searching for information

50% of all searches fail to locate desired information 15% of the average knowledge worker’s time is spent recreating existing information

Need Better organization of information Accurate searchConsistent management of informationShortened “time-to-information”

Source: IDC

Page 10: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

ILM Principles

Information is data with contextInformation is data with contextInformation runs the businessInformation runs the businessInformation is your competitive advantageInformation is your competitive advantage

The business unit understands informationThe business unit understands informationAnd its value And its value –– or cost if it is missingor cost if it is missing

The infrastructure will become a utilityThe infrastructure will become a utilitySo ultimately, itSo ultimately, it’’s all about the datas all about the data

ILM enables growth of resources & processesILM enables growth of resources & processesAutomation of processes & infrastructureAutomation of processes & infrastructureBusiness drivenBusiness driven

Page 11: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Data Classification

Page 12: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Pre-Data Classification:Many infrastructure silos

Business

Expensive to manage, inefficient, unsustainableExpensive to manage, inefficient, unsustainable

Applications

LOB

Page 13: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Why Classify Data

Align data requirements and storage servicesAlign data requirements and storage services

Data Classification enablesData Classification enablesMore efficient storage utilizationMore efficient storage utilizationConsolidated Operational Recovery practicesConsolidated Operational Recovery practicesConsolidated Disaster Recovery practicesConsolidated Disaster Recovery practicesArchiving to meet compliance & other needsArchiving to meet compliance & other needs

Build a business caseBuild a business caseImproved alignment of IT with business prioritiesImproved alignment of IT with business prioritiesReduced hardware costsReduced hardware costsImproved utilization & managementImproved utilization & managementReduced footprint Reduced footprint Improved environmental resource utilizationImproved environmental resource utilization

Practiced by many IT thought leaders todayPracticed by many IT thought leaders todayIn various phases of definition & implementationIn various phases of definition & implementationCornerstone for ILMCornerstone for ILM

Page 14: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Tier 1 Tier 2 Compliance Tape vault

Characteristics Hi-performance, hi-availability

Medium performance, high storage density

Write-once, Secure, Immutable

Low performance, removable

Usage Production data Reference data Compliance data

Offsite DR

Relative Cost (CAPEX/OPEX*)

10 2 3 1

%age capacity 20% 35% 30% 15%

Weighted cost 3.75 (62.5% reduction vs. all Tier 1)

The business case looks simple...

* CAPEX/OPEX = Capital & Operational ExpenditureCAPEX/OPEX = Capital & Operational Expenditure

Page 15: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Tiers of Networked Storage

Few will implement all or even most levels; most will pick & choFew will implement all or even most levels; most will pick & choose.ose.““LevelLevel”” indicates characteristics, not strict place or value in hierarcindicates characteristics, not strict place or value in hierarchy.hy.Innovations will drive characteristics, value, & relative positiInnovations will drive characteristics, value, & relative position changes.on changes.

FC • For continuous access• Highest reliability & performance

SAS • For continuous access• Highest reliability & excellent performance

SATA • Highest capacity with good online access

MAID • Highest levels of capacity; slightly delayed access• Low footprint & power consumption

CAS • Specialized storage for online access • Regulatory control, single-instance & reference data emphasis

Tape • May be near-line (in library) or offline (shelf or vault) = access delays• “Unlimited capacity” & off-site custody/safety possible• Data not in format or available for failover.

Page 16: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

What it might look like...

Tier-1 Tier-2

Compliance Archive

Tape Archive

ApplicationServers

FC/iSCSI SANFC/iSCSI SAN

NAS

LAN

So that was easy!So that was easy!

But what about...But what about...•• What data/application for which What data/application for which

tier?tier?•• When and under what conditions When and under what conditions

to move/copy data from tier to to move/copy data from tier to tier?tier?

•• How should data be moved, How should data be moved, block, file...?block, file...?

•• What to keep on the compliance What to keep on the compliance archive?archive?

•• ......ThatThat’’s why we need Data s why we need Data ClassificationClassification

Page 17: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Classification is Difficult

Stakeholders are numerousHuge amounts of Information – do we need to classify it all?

Determining what should be retained and what should be thrown awayScope

Information is hard to find – how much risk are you willing to bear?Lack of buy-in from business and senior executives

But compliance is pushing execs to classify

Page 18: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Storage Networking

Getting Started –Information Stakeholders

Line of BizApplication

Sponsor

BusinessProcessAnalyst

RecordsManager

LegalCounsel

DBA

I.T. Architect

Security Officer

Information Management

Executive Committee

InformationManagement

Architect

I.T. Admin

Page 19: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

What is Data Classification?

Organization of data and information into groups for management purposes.

Allows IT to create multiple service level offeringsAllows LOB to select services based on value of dataMay use software to enable some of the process

Represent corporate requirements:Security officer: Secret, confidential, proprietary, …Records Manager: retention time, …Compliance officer (HIPAA, SOX, …): authorization, retention, …

Represent LOB requirements:Application performance, availability, recoverability, …Staff response time, asset reporting, …

IT Organization needs data classification:Method to rationalize requirements into service level offerings

Page 20: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Sample Class Models

Security Classes: CLASS-1 Public Information, CLASS-2 Internal Information, CLASS-3 Confidential Information, CLASS-4 Secret Information, CLASS-5 Hazardous Information Source: U.S. Gov, ISO 17799

DATA CLASSIFICATION MODELClass 1 –Not Important to operations Class 2 – Important for ProductivityClass 3 – Business Important informationClass 4 – Business Vital informationClass 5 – Mission Critical information

DMF Work in Progress

Source: IBM Mainframe – circa 1990

Tier Description

Tier 1 – Mission Critical

Tier 2 – Business Critical

Tier 3 – Business Important

Tier 4 – Productivity Important

Tier 5 – Non-Critical

Processor Model

Page 21: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

With Data Classification:Standard Configurations

Business Applications

LOB

SLA 1 SLA 2 SLA 3 SLA 4

Simplified Management, more efficient, scalableSimplified Management, more efficient, scalable

Page 22: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

To Achieve Alignment, I.T. Classifies Its Resources Into A Service Catalog

Mission Critical Business Critical Business Important Development

Requirement SLA 1 SLA 2 SLA 3 SLA 4

Availability 99.99% 99.9% 99% 97%

Threshold based Automated provisioning

Up to 20% of current fileSystem allocation within 1business Day

Up to 20% of current filesystem allocation within 2business days

Up to 10% or current filesystem allocation within 4business days

Up to 20%of file systemallocation within 1 businessweek scratch basedallocation

RTO 15 minutes 1 hour 8 hours 24 hours

RPO 1 hour 12 hours 48 hours 96 hours

Restore Requests 100 requests/ week 100 requests / week 50 requests / week 50 requests / week

Backup success rates 97% 95% 90% 90%

Archive Policy No access in 90 days No access in 30 days No access in 90 days No access in 180 days

Archive access time Seconds Seconds Up to 4 Hours 24-48 hours

Forecasting Monthly Quarterly Yearly Yearly

Incident classification andnotification

Sev. 1 < 15 minutesSev. 2 < 30 minutesSev. 3 < 1 daySev. 4 < 1 day

Sev. 1 < 25 minutesSev. 2 < 40 minutesSev. 3 < 1 daySev. 4 < 1 day

Sev. 1 < 25 minutesSev. 2 < 40 minutesSev. 3 < 1 daySev. 4 < 1 day

Sev. 1 < 25 minutesSev. 2 < 40 minutesSev. 3 < 1 daySev .4 < 1 day

Asset reporting tosupport chargeback Weekly Bi-weekly Monthly Monthly

Compliance HIPPA SEC 17a4 Sarbanes-Oxley None

Compliance categories

Cost $$$$ $$$ $$ $

Page 23: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

The Classification Process: Types of Classification

Application-basedMetadata (and extended metadata)Content

Page 24: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Application Classification

Focused on Business Applications

Drivers for Application ClassificationDisaster recovery and business continuityServer consolidationApplication performance

Application Classification is fairly “simple”

Establishes a ranking of applicationsAll information associated with the application is treated the sameWorks best when applications are segmented by server

Application Classification is often “good enough”

Page 25: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Metadata-based Classification

METADATA classification is largely based on file attributes and access patterns

What is file named?What is the file type?Who owns the data?Where is it located?When was it created?

Page 26: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Metadata-based Classification

File-level metadata offers limited input, therefore limited recognition

Still useful but class of solutions limitedGenerally useful in optimizing HSM or archiving strategiesTends not to meet complex ILM needs (security, retention, etc.)

Combined with valid ownership data (not “System”), can yield incremental ILM value

Ex: Legal dept {Joe, Mary, Betty, Tom, and Matt }Rule: all legal files stored in host_legal, retain = 5 yrs

Pros & ConsFast, lightweight, not invasiveDoesn’t address changing business value over time

Page 27: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Beyond Metadata – Extended Metadata

The BasicsNameSizeDate

OwnerCreation Date

Modification Date

OwnershipCreatorLast Updated ByDepartmentDivisionApplicationProject IDetc.

IdentificationFile FormatVersionRelated TransactionRelated Content ObjectsParent ObjectChild ObjectsBar Code Tracking IDRadio Frequency IDetc.

Access ControlSecurity ClearanceAccess Control ListBrowse PrivilegesRead PrivilegesWrite PrivilegesSharing Policyetc.

ComplianceRetention PolicyExpunge DateIndustry Regulation FlagCorp Governance FlagAttorney-Client Priv Flagetc.

Process ControlApproval StatusLifecycle PhaseWorkflow RoutingSend To RulesNext Approveretc.

Page 28: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Content Classification

• Classification based on CONTENT makes use of indexes, lexicons and taxonomies

• What keywords?• How is this data related to

other data?• How should data be

retained/disposed of for compliance or otherwise used by the business?

Page 29: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Some Relevant Terms

DataData is what I.T. manages: files, volumes, bits and bytes

InformationInformation is data with contextData Lifecycle supports the Information Lifecycle

RecordRecorded information, regardless of medium or characteristics, made or received by an organization that is evidence of its operations, and has value requiring its retention for a specific period of time (ARMA)

TaxonomyA hierarchical structure used for categorizing a body of information or knowledge, allowing an understanding of how that body of knowledge can be broken down into parts, and how its various parts relate to each other. Taxonomies are used to organize information in systems, therefore helping users to find it.

Related terms: ontology, categories, evidence structuresLexicon

the vocabulary of a language, an individual speaker or group of speakers, or a subject

Example: A dictionary of over 200,000 medical, pharmaceutical, biomedical & healthcare acronyms and abbreviations is a medical lexiconRelated terms: thesaurus, vocabulary

Page 30: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

When Does Content Classification Make Sense?

Automated Classification speeds time-to-information effectivenessAutomated Content Classification make sense

When multiple classification options results in confusionWhen there is an overwhelming volume of items to classifyWhen some documents require time-consuming review by subject matter expertsWhen there are a large number of non-business documentsWhen you don’t want to have idiosyncratic results

“The highest quality and accuracy occurs when records management is as non-intrusive as possible to the desktop end users and does not interfere with the normal work routines of professional staff in the enterprise”**

** Timothy J. Sprehe and Charles R. McClure, “Lifting the Burden.”Information Management Journal, Vol.39 Issue 4 (Jul/Aug 2005), 475

Page 31: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Content Classification Algorithms

All content-based classification is based on “natural language”Content classification algorithms

Keywords Term frequencyPattern matchingLatent semantic analysis (synonymy and polysemy)Neural NetworksBayesianRules-based

Page 32: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Example of Content Classification (Rules-based)

Objective: Find and classify all documents at a mortgage company into category “New Homes Built in Fresno”

Classification Rules Document has “Fresno” in title (metadata) Document refers to homeowner/builder “Perry” or “Trendmaker” (keyword)Document includes “Fresno” in the text (keyword)Document uses abbreviations or “regular expressions” (entity extraction)

– “4 bdrm”, “5/2.5/3”Comprehensive rule: if “Fresno” is true, “Perry Homes” is true, the document contains a numeric string such that 2003<x<2007 and/or uses the above regular expressions, then classify as a new home being built by Perry Homes in Fresno

Secondary objective: Relate and groups plot plan, land survey, deed based this classification of documents

Page 33: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Building Taxonomies

Taxonomies A Taxonomy is a classification scheme that makes it easy for user to find information based on familiar hierarchiesIndustry specific lexicons and taxonomies are available

Taxonomies should maintain policies and rules as industry and business environment changes

Taxonomies should leverage exiting thesauri and glossaries

Phylem

Class

Order

Family

Genus

Species

Kingdom

Kingdom

Kingdom

Page 34: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Classification: What is “Good Enough”?

Challenges of classification – various typesSome human intervention always required to review results of classification

Automated tools improve efficiency Documents with little text – how are these classified?

Power point slides, email, etc. Varying document typesMetadata classification might be better in this case

Lack of consistency in naming , structure, formatMetadata classification may be best

Factors affecting accuracyDocument consistency / naming consistencyThe strength of the taxonomy (content)Applicability of classification algorithms to specific content

What is a reasonable cost per document?What is the cost of a document that is incorrectly classified?

Does cost outweigh the value to the organization?

Page 35: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Additional Definitions

Indexing = the act of preparing documents to be searched later;

specifically, where in the document does a word/ phrase/ string occur

Search = the act of looking for something in documents

Clustering = the act of grouping related documents and using these groups to generate dynamic categories;

statistical/ semantic analysis to generate vector; documents are grouped along similar vectors

Classification = the act of organizing/sorting/storing documents into pre-defined categories;

includes concept recognition, pattern matching, entity extraction, thesaurus, misspelling, related terms, broad/ narrow terms, etc.

Page 36: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Indexing versus Content Classification

Index…When keyword searches only are sufficientWhen looking to find information quickly within a particular file or document

AspectsSearch may return too many matchesCan be security hole if indexed by “system”Proprietary formatting issues

Not all formats can be indexedProvides objective analysis of textual information found

Nothing about misspellings, query expansion, thesauri, etc.

Page 37: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

ILM Roadmap

Page 38: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Storage Networking

ILM Roadmap

Standardize Data & Storage ServicesPhase 2• Classify data against SLOs• Tier storage & data services

into consistent configurations in support of SLOs

ILM Solution StacksPhase 3 • Operate to ILM practices per SLAs

Automated ILM IslandsPhase 4 • Automate with ILM Management tools

Enterprise ILMPhase 5 • Heterogeneous interoperability • Standard practices across multiple sites

Time

Consolidate Data Services, Implement Network StoragePhase 1 • Foundation

Page 39: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Storage Networking

SMI:Technology Road Map

BluefinContribution to

SNIA

Initial Release:Initial Release:

SMISMI--SpecificationSpecificationV1.0V1.0

Broadened Coverage:Broadened Coverage:

Moving up Moving up ““The StackThe Stack””

SMISMI--SpecificationSpecificationV1.1V1.1

20032002 2004 2005 2006

SMISMI--SpecificationSpecificationV1.2V1.2

Deeper Functionality:Deeper Functionality: SMISMI--SSV2.+V2.+

ICTP TestsICTP Tests

CIM Storage ProfilesCIM Storage ProfilesSLP DiscoverySLP Discovery

‘‘RecipesRecipes’’ for for InteroperableInteroperableoperationsoperations

SMISMI--S Test SpecificationS Test Specification

SMISMI--Lab validationLab validation

Arrays, Switches, Libraries, HostsArrays, Switches, Libraries, Hosts

NAS NAS

Storage SecurityStorage Security

iSCSIiSCSI

CascadingCascadingOwnershipOwnership

Management Management Services Services

PolicyPolicy

Health/FaultHealth/FaultManagementManagement

PolicyPolicyImprovements Improvements

Object BasedObject BasedStorage Storage

PerformancePerformance

LockingLocking

DatabasesDatabases

ApplicationsApplications

QoS QoS

SingleSingleSignSign--on on

ILMILM

CIM 2.8

CIM 2.9

CIM 2.x

CIM 2.7

CIMCIM--SoapSoap

CIM 3.x

Page 40: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Storage Networking

XAM

Page 41: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Storage Networking

XAM

Page 42: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Storage Networking

XAM

Page 43: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Storage Networking

Summary

Classification - Immediate BenefitsBetter understanding of your informationBetter deployment and alignment of your I.T. resources (storage/server consolidation, “smart” purchases, etc.)Better compliance readiness and eDiscovery

Classification - Longer term benefitsService Level Management improves I.T. service deliveryInformation management automationCost reduction

Page 44: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

To Achieve Alignment, I.T. Classifies Its Resources Into A Service Catalog

Mission Critical Business Critical Business Important Development

Requirement SLA 1 SLA 2 SLA 3 SLA 4

Availability 99.99% 99.9% 99% 97%

Threshold based Automated provisioning

Up to 20% of current fileSystem allocation within 1business Day

Up to 20% of current filesystem allocation within 2business days

Up to 10% or current filesystem allocation within 4business days

Up to 20%of file systemallocation within 1 businessweek scratch basedallocation

RTO 15 minutes 1 hour 8 hours 24 hours

RPO 1 hour 12 hours 48 hours 96 hours

Restore Requests 100 requests/ week 100 requests / week 50 requests / week 50 requests / week

Backup success rates 97% 95% 90% 90%

Archive Policy No access in 90 days No access in 30 days No access in 90 days No access in 180 days

Archive access time Seconds Seconds Up to 4 Hours 24-48 hours

Forecasting Monthly Quarterly Yearly Yearly

Incident classification andnotification

Sev. 1 < 15 minutesSev. 2 < 30 minutesSev. 3 < 1 daySev. 4 < 1 day

Sev. 1 < 25 minutesSev. 2 < 40 minutesSev. 3 < 1 daySev. 4 < 1 day

Sev. 1 < 25 minutesSev. 2 < 40 minutesSev. 3 < 1 daySev. 4 < 1 day

Sev. 1 < 25 minutesSev. 2 < 40 minutesSev. 3 < 1 daySev .4 < 1 day

Asset reporting tosupport chargeback Weekly Bi-weekly Monthly Monthly

Compliance HIPPA SEC 17a4 Sarbanes-Oxley None

Compliance categories

Cost $$$$ $$$ $$ $

Page 45: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Storage Network Industry Association

SNIA is the trade group for storage networks“ensuring that storage networks become complete and trusted solutions across the IT community”http://www.snia.org

SNIA’s “Dictionary of Storage Networking Terminology” online resourcehttp://www.snia.org/dictionary

SNIA’s Data Management Forum, and its ILM Initiative, is an excellent information resource for data and information lifecycle management

http://www.snia.org/dmf

Page 46: Dus - Klassifizierung von Informationen als Grundlage für ... · Klassifizierung von Informationen als Grundlage ... Capacity is #1 storage issued driven by email, unstructured data

Information Classification: The Cornerstone to Information Management © 2007 Storage Networking Industry Association. All Rights Reserved.

Q&A / Feedback

Please send any questions or comments on this presentation to SNIA: [email protected]