Download - HP Integrated Archive Platform
©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Twitter hashtag #HPSWU
IM-TH-1000Twitter hashtag #HPSWU
©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Speaker Name: Jaap van KleefDate: December 2010Session ID: IM-TH-1000
HP Integrated Archive PlatformA Technical Overview
Agenda• Information Lifecycle Management
• HP Integrated Archiving Platform (IAP)– Concepts & Architecture
• HP E-Mail Archiving Software for Exchange (EAsE)
April 13, 2023 4
HP Information Management solutions
Business outcomes
Storage optimization
E-Discovery and compliance
Chief Compliance OfficerVP Risk Management
General CounselDir. of Records/Messaging
Informationretention
solution set
Business continuity and availability
Informationavailabilitysolution set
CIOVP IT Operations
Application DirectorsBackup Administration
Chief Storage Officer • VP/Director of Storage • Storage Administrator
Storage Optimization solution set
IM solutions
HP Information Management solution sets
April 13, 2023 5
Information Availability solution
set• HP Data Protector
software• HP Database
Archiving software
Information Retention solution set
• HP Integrated Archive Platform
• HP Email Archiving software for Microsoft Exchange
• HP Email Archiving software for IBM Lotus Domino
• HP File Archiving software
• HP Medical Archive solution
HP Information Management
ILM is a set of solutions and services to capture, manage, retain, and deliver information according to its business relevance
Reduce the cost of managing ever-increasing amounts of data while simultaneously transforming it into accessible,
relevant business information.
− Comply with ever changing business needs
− Leverage information for better business performance
− Automate the management of your business information
Capitalize on business information for competitive advantage
ILM addresses the three major information management challenges
• Reference information (static content) is underutilized and the ability to tap into it has potential business value
• When you need it, reference information is of great value
Reference Information
Management
• Corporate and government regulations require retention policies
• Companies placed under subpoena to produce email and documents in legal actions taken against the company
Retention Management • Information growth
continues at an accelerated rate
• Need to significantly reduce management costs while maintaining service-levels
• Increase performance on file servers
• Reduce back up time
Data Management
©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
HP Integrated Archive PlatformConcepts and Architecture
Change - Simplicity, Agility, ValueTraditional Total Cost of Ownership
€ 5
€ 1 Storage
Maintenance
StorageSoftware
AccessSoftware
Application Middleware
Industry StandardHardware
File system
CAS SW
Storage & archiving middleware
HSM
DatabaseSearch engine
Servers
TapeLibrary
CAS HW DAS, SAN or NAS storage
Application
e.g. Traditional Lego approach to archiving
Non-IntegratedSingle-points of failure, unable to scale
Integrated
HP Integrated Archive Platform (IAP)
ComplianceAnti TamperingRetentionIndexingSearch Policy Management
ECMInformation
e-Discovery
IAP
IAP is :• Integrated Storage• Build with Grid technology
• provides scalability to billions of objects• enables mixed HW or eases refresh
• Build on standard HW components• Full content indexing• Capacity optimization via single-instancing• Fast web-search and retrieval• WORM• Retention management• Low TCO via single administration
Connectivity: IP LANProtocols: IAP API, HTTP, SMTPBase Unit: Starts with 5 TB
HP IAP
Large scale storage, access and content retrieval for reference information
IAP Scalability• Simply start with a base unit• And grow by one cell increments! (5 TB)• Up to 250 cells!!! (today)
IAP advantages
• An “all-in-one” complete solution – reduces complexity of integration & management
• Full content & attribute indexing & search
• Scalable grid computing architecture
• All content stored within the IAP non-tamperable architecture.
• Single Instances – storage optimization
• Disaster Tolerance enabled
IAP concepts
Flexible Grid Approach enables technology refresh allows performance upgrades provides scalability
IAP Grid Computing Architecture
SmartCell (storage) Fabric:Distributed computer system of self contained, all
inclusive data repositories (data-grid)
SmartCell
SmartCell
SmartCell
SmartCell
SmartCell
SmartCell
Storage Content indexing
processingpower
Backbone Fabric• HTTP portals• SMTP portals• AdministrationSystem• Mirroring for fault tolerance
HTTP
SMTP
SMTP
FW
Smart Cell grid architectureDomains and repositories
– Domains:• Subgroups of Smart Cells
within a larger grid• Distinct policies• Might in large organization
– Repositories• Logical entity spanning one
or more smart cells• Multiple repositories in a
domain• Might represent a single
user’s mailbox
16
Domains and Repositories •Domains
– Provide physical isolation of data via physically separate SmartCells
– Consist of repositories and users– Attributes: retention, replication,
backup, audit, audit log– Note: usually one domain per IAP
system
•Repositories– Logical entities that span SmartCells
inside a domain– Typically corresponds to a user’s
mailbox– Routing rules determine which data
goes to which repository– Access is governed by access control
lists (ACL’s)– Information pertaining to one
repository can never be stored on media belonging to another repository
3-17
Data flowStore path1. Mail message or document enters system via
network.2. SMTP portal creates digital signature (CRC of
message plus date/time of receipt).3. SMTP portal encrypts digital signature using 128-
bit triple-DES encryption.4. Portal contacts Metaserver and addresses mail to
Smart Cell associated with recipient’s repository.5. Router passes message and digital signature on to
selected Smart Cells (Primary and Secondary).6. If both Smart Cells agree, they send
acknowledgment back to sending application.7. Smart Cells index and compress object, store index,
digital signature, and compressed object on disk.
18
Data flowQuery and retrieval path
• Query1. Query submitted from browser through
firewall. 2. HTTP portal formats query, uses information
from Metaserver to determine which Smart Cells will return data.
3. Smart Cells receive multicast request in parallel; those with data perform local query.
• Retrieval1. Each Smart Cell returns results from its search,
along with results’ digital signature.2. Meta Server receives results, time-orders
them, and passes them back to HTTP portal.3. HTTP portal computes digital signature and
validates against stored digital signature before returning data to querying application.
19
Revision 09.09a.
2008– HP
Restricted
Security - Physical
•Domain vs. repository– Domains provide physical isolation of data– Repositories are logical entities that span across SmartCells within a domain– Access to repositories is controlled by ACL’s
•Grid architecture– Allows immediate isolation of information subsets without interruption
•Software installation controlled via MAC address– All software components “locked down” to authorized hardware via the MAC
to prevent unapproved hardware substitution
•Locked Racks– Strongly recommended to secure physical rack components
3-20
Security - Network
•Private internal subnet not exposed to the outside– Built-in NAT for production network (via Firewall)– Built-in NAT for operations network (via PCC)
•Built-in firewall (limited exposure)– 80: HTTP (can be disabled by customer)– 25: SMTP (no relay mechanism and protected via ‘IP to domain map’ preventing
unauthorized protocol transport)– 443: HTTPS (DoD server certificate support)– SSH (only if technical support access required by customer)– SNMP (can be enabled/disabled by customer) – Insight Manager
•Other– Software installation/update controlled via MAC address– SSL access - 128-bit encryption supported for all external HTTP access
3-21
Security - Other– Firewall Server prevents unauthorized access into
the appliance from the customer network– File access is restricted to authorized users– All traffic between the end-user and the appliance
is protected against ‘snooping’– URLs for retrieved data are encrypted– User passwords are encrypted within the appliance
to provide internal security– Digital signatures identify if a file has been
tampered with
3-22
Data Integrity
•Mirroring– Motto “Never less than 2 copies of information”– Synchronous mirroring of all data to Primary &
Secondary SmartCells• WRITE ACK is a lock step process with CRC check
•Digital signature for detection of tampering
3-23
Administration and Management• Single Web based built-in admin interface• SNMP traps• E-Mail notifications• No need for dedicated database administrator• SIM Manager
– (SIM agents future rel.)
IAP Replication Overview• Building a mirror of the IAP system on a remote location• Replica site provides same functionalities as primary site (search, data
retrieval,…)• Store can read-failover to replica site when primary down• Domain based management: each IAP domain can be replicated to a
different site
Replicated data(compressed)Site 1- Primary
siteSite 2 - Replica site
Queries / retrievalsQueries / retrievals
end user
Replication - Components
•Primary site:– Each SmartCell provides a list of files to replicate– Replication manager
• Polls SmartCells for data to replicate• Retrieves data and sends it to secondary as an email attachment
– Database Server is used for controlling the replication flow• Access to SmartCells• Suspend or resume replication for a given domain
•Secondary (replica) site:– SMTP servers receive data– Replica SmartCells store data and index (optionally mirror)
3-26
Replication - Data Transfer
– Uses SMTP to send the data– Mail envelope carries each stored file (actually a
zip file)– Header provides necessary information for the
SMTP server to figure out where to store the file (group, path, file name, etc.)
– Header’s data is encrypted– Emails are sent individually but batched 100 at a
time
3-27
©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
HP Integrated Archive PlatformE-Mail Archiving
IAP for: Messaging management example with Base IAP
• 2 email modes– Policy based data movement
Messages are archived via policy settings (i.e. message age > 90 days) and are replaced with a reference link
– ComplianceAll messages are archived to the IAP (Exchange journaling)
• Results (for policy based archival) – Increased scalability of individual email
servers– Fewer email servers eases
administration– Reduced storage requirements of email
and file servers
Capture & index everythingAll to, from, subjectBody and attachments
Exchange or Dominoemail environment
IAP Outlook Plugin
Outlook clients
EAsE
Two email archive modes
Compliance Archiving
Selective Archiving• All messages are
automatically forwarded from the mail-server (using journal mailbox) to the archive
• Provides a secure copy of all emails for compliance
• IT centrally sets retention policy
• Messages retrieved via Web UI
• Individuals can access their own messages
• Auditors can access the entire archive
• Designed for Legal Discovery & Compliance
• Messages are “mined” out of user mailboxes based on policies e.g. age of message, size of message
• A “stub” is left behind in the user’s mailbox
• User double-clicks the stub to retrieve the archived message
• User can also search for and retrieve messages via Web UI
• Designed for Email Data Management
HP EAsEEmail example
Mail Infrastructure(Exchange / Domino)
Outlook Clients
IAP Outlook Plugin
IAP Archive• Content Searchable• Scalable to billions of
objects• Retention management
IAP Archive• Content Searchable• Scalable to billions of
objects• Retention management
Outlook plugin provides transparent access to archived messages
Outlook plugin provides transparent access to archived messages
Automatic message archival • i.e. based on size and
age• i.e. keep all incoming
mails
Automatic message archival • i.e. based on size and
age• i.e. keep all incoming
mails
Mailbox store can be reduced due to archived messages
Mailbox store can be reduced due to archived messages
Extended Reference data storageExtended Reference data storage
XX
EAsE
MAPI
Selective Archive Migration Rules
What are the options?• Bodies and attachments are archived, attachments are
replaced by a pointer (optionally email)• Rules are defined per user or group of users and
messaging servers• Migration criteria: Age, size, contents, to, from, …
Best practices:
- Keep it simple -• Archive messages
• >90 days & Attachment size > 150KB• Trim Attachment
• Archive messages • > 1year & Message size > 50 KB• Trim Attachment + Body
• Some conditions are rarely used, such as• Mailbox status/quota
EAsE-Outlook Plug-In
• Provides end-user with seamless integration to IAP
• Needed on client PC that uses Outlook to retrieve tombstoned messages
• Not needed when using OWA• COM Add-In (.msi file) available on the IAP
Utilities CD• Can be automatically deployed with AD Group
Policies
Transparent to end-users (selective Archive) Messages in the extended storage are “stubbed” in mailbox
“stubbed” message
Offline Exchange Users
RIMPlug-inRIMPlug-in
Offline Cache Synchronization• Continuously synchronizes messages
between IAP & Outlook 2003 cache• Messages are cached based on
Outlook plugin policy settings• Offline Plugin cache size dynamically
configurable• Cache FIFO managed per retention
rules & cache space available Cached Message Access• Tombstone message accesses from
Outlook always checks the local offline cache first− Cached messages satisfied locally− Un-cached messages retrieved from IAP
• All retrievals from IAP locally cached, subject to cache space availability
Outlook 2003/2000Outlook 2003/2000
Offline CacheOffline Cache
HP RIM Outlook Plugin
HP RIM Outlook Plugin
2
Exchange 2K /2003Exchange 2K /2003
1
1
2
IAPIAP
EAsE Outlook Web Access (OWA) Support
OWA Support for IAP• No loss of OWA functionality
• Enables OWA users to retrieve archived messages
• Supports Exchange 2000, 2003
• IE is the recommended browser
• Recommended by MS for OWA “Premium Service”
OWA
OWA Server
IAP OWA Plugin
IAPExchang
e
Bringing all together: Unified, Web-based search
All search functions are available via free API
IBM Lotus Notes & Domino Archiving• Flexible, robust, & “Lotus-like” interface
• NSF Tools Export / Bulk Import
• Friendly administration using Notes based tools
• O/S platform support Windows, AIX, Solaris, Linux
©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
HP Integrated Archive PlatformBeyond E-Mail
HP IAP - the central information storeMail Servers
IAP Archive• Reduction in TCO• Content searchable• Reduced amount of file
servers• Eases administration• Reduced storage
requirements
IAP Archive• Reduction in TCO• Content searchable• Reduced amount of file
servers• Eases administration• Reduced storage
requirements
File Servers
HPFMAHP
FMA
Print-Outs
any other supported
ISV
any other supported
ISV
Clients
IAP APIsIAP APIs
Web-searchWeb-search
RetrievalRetrieval
Sharepoint
Tower SoftwareTower
Software
MessagingMessaging
File archival/move to IAP
• Migrate inactive files from Windows File servers to multiple targets
• Migration based on age and/or size of the file
• Cluster-Support• Advantages for file servers:
– Faster Backup and Recovery – File Server capacity does not need
to grow• Failure tolerant data management
software (no HSM DB needed)
file migration & recall via LAN and
IAP API, CIFS, FTP
IAP NAS/disk system
FMAFMAFMAFMA FMAFMA
MSCS Cluster
EVA
FSE Archive
Cache disk
FSE Server
Cache disk
FSE Server
E SeriesE SeriesMSA. EVAMSA. EVA
WindowsWindows
File archival/move to IAP
Continue the conversation with your peers at the HP Software Community hp.com/go/swcommunity