gilbane 2009 -- how can content management software keep pace?

28
How Can Content Management Software Content Management Software Keep Pace? San Francisco Gilbane Conference 2009 Content Integration Strategies Dick Weisinger June 4, 2009

Upload: weisinger

Post on 02-Nov-2014

609 views

Category:

Technology


1 download

DESCRIPTION

The amount of data stored is growing at a phenomenal rate. This paper documents the growth and suggests that a new standard, CMIS, may be useful in getting better control over data and data repositories.

TRANSCRIPT

Page 1: Gilbane 2009 -- How Can Content Management Software Keep Pace?

How Can Content Management SoftwareContent Management Software

Keep Pace?

San Francisco Gilbane Conference 2009Content Integration Strategies

Dick WeisingergJune 4, 2009

Page 2: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Dick Weisinger Vice President and Chief Technologist

Formtek, IncFormtek, Inc 20+ years of experience in Content,

Document and Image Management g g Regular blogger at

http://www.formtek.com/blog

Page 3: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Formtek An ECM software and services company

– 25-year history25 year history Experts in general ECM and CM space Depth of experience in engineering dataDepth of experience in engineering data

management Formtek Orion ECM SoftwareFormtek Orion ECM Software Alfresco Gold Integration Partner

Page 4: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Drowning in Digital Data Hand-held devices High-resolution video

E-Discovery / Records ManagementDi iti d B i D t High-End Video Games

High-Resolution G hi d I

Digitized Business Data Financial and Health

RecordsGraphics and Images Scientific Data

Records Business Continuity

Backups

Analysts at:Gartner Group, Forester ResearchForester Research, IDC and The 451 Group

all predict massive growth in digital dataall predict massive growth in digital data.

Page 5: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Size of the Digital Universe 2003 – 20 exabytes 2006 – 161 exabytes 2007 – 281 exabytes2007 281 exabytes 2008 – 486 exabytes 2010 – 988 exabytes of data 2011 – 1800 exabytes of data 2012 – 2500 exabytes of data

(30% of data is created by enterprises) Source: IDC(30% of data is created by enterprises) Source: IDC

One Exabyte == 1 billion gigabytes or 1000 petabytes(about 250 million DVDs)(about 250 million DVDs)

161 exabytes is the equivalent of 12 stacks of books each extending 93 million miles from the earth to the Sun.

Page 6: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Data in Business and Science Walmart adds a billion rows of data to

its 600 terabyte database every hourits 600 terabyte database every hour Chevron’s gas and oil exploration

collects 2 terabytes of data dailyy y Large Hadron collider in Switzerland to

collect 300 exabytes per year Department of Energy has increased

their data by a factor of 10 every four years since 1990

Page 7: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Hardware’s Shrinking Cost

Year Cost/MB1986 $51.301991 $13.00

Storage costs are plummeting but not as fast1991 $13.00

1994 $1.001997 $0 09

plummeting, but not as fast as the amount of data is growing.

1997 $0.092000 $0.072003 $0 02

Cheap storage costs also encourage applications to2003 $0.02

2009 $0.0002encourage applications to store ever more data.

Page 8: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Can Software Keep Pace?How Can We Find Anything?How Can We Find Anything?

Search Algorithms have evolved and Search Algorithms have evolved and improved, but…

Internet Search is only Fair to Good Internet Search is only Fair to Good – Google Page-Rank 8+ billion web pages, hundreds of thousands of p g ,

servers

Enterprise Search is Poor– Usage patterns are hard to model

Page 9: Gilbane 2009 -- How Can Content Management Software Keep Pace?

The Problem of Search

49 percent of business users say that finding d t i diffi lt d ti idata is difficult and time consuming.

-- AIIM 2008 Market Study

Users have a 50 percent success rate at hsearch

-- Recommind SurveyMarch 2009March 2009

Page 10: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Scattered Data Repositories p Corporate Applications

– ERP– PLM/PDM– Business Intelligence / Knowledge Management

Content and Document Management– Content and Document Management

Relational Databases Local and Shared File Systemsoca a d S a ed e Syste s Internet/Intranet HTTP servers Email Servers Disk Appliances (digital cameras, cell phone…)

Page 11: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Multiple Repository Challengep p y g

Problem How to access and search data to achieve: How to access and search data to achieve:

ComplianceeDiscoveryBusiness IntelligenceBusiness Intelligence

Challenge Many organization have multiple repositories from y g p p

multiple vendors Lack of standards around API and query language Each system is different and has very little common Each system is different and has very little common

reuse

Page 12: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Unstructured Data Search is HardUnstructured Data Search is Hard

80 percent of enterprise data is unstructuredp p– Eg., emails, PDF, Word and Office docs

No underlying data model or schemay g– emails and IM often lack context and use

shorthand and abbreviations that increase the search challengesearch challenge

Page 13: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Huge Data Sets Brings Huge Problems

Search gets harder as data sets grow Search gets harder as data sets grow– Longer to index and search– Harder to determine context

The more systems, the harder to secure The more systems, the harder to

consolidate search Conflicting or Inconsistent Data

Whi h i th t f f ?– Which is the system of reference?

Page 14: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Getting Data Under ControlGetting Data Under Control

Ultimate goal: Content Intelligence– Knowledge extraction – Ability to distill, condense and summarize data

How? Apply more Structure and ReuseApply more Structure and Reuse

– XML Tags Allow greater access across data sources

– Consolidation of Systems– Integration of Systems

Page 15: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Creating StructureS SSemi-Structured Data Use a structured native data format

– XML Authoring/Publishing applications DITA publishing XML

– Microsoft Office 2007 docx, etc. (Office OpenMicrosoft Office 2007 docx, etc. (Office Open XML) Complex: 29 namespaces and 89 schema models

Add Structure Add Structure– Append Headers and Embedded Properties Eg., Tiff, jpeg images PDF and embedded Microsoft Office files

Associate tags and metadata with unstructured dataunstructured data

Page 16: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Centralized Repository Efficiency

Management efficiencies of scale More efficient search

– No need to consolidate search results Available to users via a single interface

Page 17: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Integration of Repositories Content-Intelligence Platforms can

integrate/unite multiple repositoriesintegrate/unite multiple repositories XML is the pipeline for integration Integration via APIs or XML WebIntegration via APIs or XML Web

services– REST Web Services have momentum– Integration with SOA

Page 18: Gilbane 2009 -- How Can Content Management Software Keep Pace?

CMIS -- ECM Integration

ECM vendors have united to create a new interoperability standard: Content Management Interoperability Services (CMIS)Services (CMIS)– Web services for sharing information

between different content repositoriesp– “SQL for Document Management”

Page 19: Gilbane 2009 -- How Can Content Management Software Keep Pace?

What is CMIS?

Content Management Interoperability Services– Defines a lowest-denominator CM capability set– CM content is accessed as SOAP or AtomPub

(REST) web services(REST) web services– A single application works identically with content

from any CMIS vendory

Page 20: Gilbane 2009 -- How Can Content Management Software Keep Pace?

CMIS Timeline

1993 – ODMA (Open Document Management API)

1996 DMA 1996 – DMA (AIIM Document Management Alliance)

1996 – WebDAV (Web-based Distributed Authoring and Versioning )

2002 - JSR-170 / Java Content Repository (Day Software)2002 JSR 170 / Java Content Repository (Day Software)

2005 – iECM (AIIM Interoperable ECM)

October 2006 – CMIS started August 2008 - Contributing members invited September 2008 - Draft Specification submitted to

OASISOASIS Possible completion and acceptance in late 2009 or

early 2010

Page 21: Gilbane 2009 -- How Can Content Management Software Keep Pace?

JCR versus CMIS

Session-based API Services BasedJava Only Language AgnosticJava Only Language Agnostic“Complete” ECM Core ECM functionsInfrastructure Interoperabilityp yTargets DM, RM, DAM, WCM…

Intended specifically for DM

Complex SimplePrescriptive Little or No ChangeConnectors by Day Vendor ConnectorsVersion 2.0 Version .61Design spearheaded by Day Software

Design Led by Top Tier ECM Vendors

Page 22: Gilbane 2009 -- How Can Content Management Software Keep Pace?

CMIS: Creators and Participants Founding Companies for the Original Standard

– EMC/Documentum– IBM/Filenet– Microsoft

Contributing Members (after August 7, 2008)– Alfresco– Open Text– Oracle

SAP– SAP– More …

Page 23: Gilbane 2009 -- How Can Content Management Software Keep Pace?
Page 24: Gilbane 2009 -- How Can Content Management Software Keep Pace?

CMIS – The ModelCMIS The Model

DocumentsEg Office document or image– Eg., Office document or image

– Content, Metadata and Version History Folders

– Defines Organization and Hierarchy– Container, Metadata and Hierarchy/Organization

Object Links and Relationsj– Reference between two folders or documents– Requires a source and target

PoliciesPolicies– Set of rules that can be applied to control other objects, eg.

ACLs or retention policy

Page 25: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Benefits of CMIS Standardized Core ECM functions Enables Interoperability between repositories p y p Encourages Flexible Application Development Encourages ‘mash-up’ composite applications A single application can consolidate and

aggregate content from multiple CMIS repositoriesrepositories

Business Processes/Workflow can span and touch all enterprise content

Page 26: Gilbane 2009 -- How Can Content Management Software Keep Pace?

CMIS Weak Points Only Basic Content Functions Available Does not cover Admin/Management Does not cover Admin/Management Does not cover User Authentication Does not handle Security/Authorization Does not handle Security/Authorization

Page 27: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Applications Workflow/Business Processes

– Connect work packages from anyConnect work packages from any repository

Portals and Mash-ups– Aggregated Content from multiple sources

E-Discovery and Compliance

Page 28: Gilbane 2009 -- How Can Content Management Software Keep Pace?

Summary Massive Growth in Content Creation Advances in hardware technology is Advances in hardware technology is

fueling content creation and storage Search and Retrieval of content growsSearch and Retrieval of content grows

in complexity with its volume Content Intelligence is needed to bringContent Intelligence is needed to bring

understanding to data Standards like XML and CMIS provide p

consistent classification and handling of data