computational storage services (wp7 forgetit 1st year review)
TRANSCRIPT
Concise Preservation by combining Managed
Forgetting and Contextualized Remembering
Simona Rabinovici-Cohen
IBM Research - Haifa
WP 7 PresentationComputational Storage Services
ForgetIT 1st Review Meeting, April 29-30, 2014
Kaiserslautern, Germany
WP Objectives
• Increase the value and outcome of preserved information over time
–Provide additional incentive for preservation
–Increase return-on-investment (ROI)
• Transform the generic storage service to a richer service with
potentially higher business value and automated preservation
processes
Focus of Year 1
• Build a consolidated platform for objects and computational
processes (storlets) that will be defined, triggered and executed
close to the data
• Utilize the OpenStack Swift open source for cloud storage
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Objectives of WP and Year 1 Focus
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Role in Preserve-or-Forget Architecture
Leveraged PDS and Storlet Engine adding:
Adapt Preservation Engine for ForgetIT
Rules mechanism
Storlets at interface proxy servers and local object servers
Multiple programming languages for storlets
New storlets:
image transformation storlet
fixity storlet
concept detection storlet
Searchable metadata contributions to OpenStack community
Integration with whole ForgetIT framework
Co-chair LTR group in SNIA to develop SIRF
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Achievements in Year 1
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Preservation DataStores (PDS)
� PDS offloads some archiving
functionality to:
�Decrease probability of data loss
�Simplify the applications
�Provide improved performance and
robustness
�Supports automation of archiving
processes
�Provides computational storage via
Storlet Engine
�PDS was also storage infrastructure of EU research projects CASPAR and ENSURE
with partners: Europe Space Agency, Maccabi HMO, Tessella, Philips and more
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
PDS in OAIS
Functional Model
AIP
• OAIS is ISO standard reference model for preservation (ISO:14721:2002)
• Provide fundamental ideas, concepts and a reference model for long-term archives
• Archival Information Package (AIP) - a logical structure for the preservation object that needs to be stored to enable future interpretation
• Content Data Object (CDO) –raw data to be preserved
PDS
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
DSpace and PDS
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
PDS Data Model
Docket
Costa Rica 2013
Docket
Edinburgh
Object (AIP)
Aggregation
Business Photos (silver)
Object (AIP)
Aggregation
Private Photos (gold)
Tenant
Peter Stainer
Hierarchical data model
Tenant � Aggregation and Tenant �Docket � object (AIP)
Flexible organization of assets in collections with varied preservation policies (gold,
silver, bronze)
Aggregations support dynamic and transparent configuration of data management
Metadata:aggregation=Private
Metadata:aggregation=Business
Docket
Toy Conference 2014
Object (AIP) Object (AIP)
Aggregation
Press Releases (gold)
Tenant
Spielwarenmessen
Metadata:aggregation=Press
Metadata:aggregation=Press
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
The Need for Computational Object Storage
• “Data is the new Oil”– In its raw form, oil has little value – Once processed and refined, it helps power the world
• Data deluge of content depots and unstructured data – Documents, medical images, photos, videos, etc.– The fastest growing type of storage by volume– Object storage is ideal for this type of data
• Object storage for content depots generally:– Utilizes large bandwidth to serve big data over the WAN – Uses server-based storage with under utilized CPUs
• Process and refine the data where it is stored
– Create a computational object storage with storlets
“Data is the new
oil.”
Clive Humby
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Client Value for Using Storlets
Reduce bandwidth – reduce the number of bytes transferred over the
WAN
�e.g. Analytics storlet
Enhance security – reduce exposure of sensitive data
�e.g. De-identification storlet
Save costs – consolidate generic functions that can be used by many
applications while saving infrastructure at the client side
�e.g. Curation storlet
Support compliance – monitor and document the changes to the
objects and improve provenance tracking
�e.g. Transformation storlet
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Storlet Engine Architecture
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Rules Mechanism
Enables automatic conditional invocation of storlets
Explicit storlet activation overrides implicit activation
Rules kept as per tenant editable object, with specified access
control
Configured by tenant, user, role, container, object,
content_type
Wildcards (“*”) allowed in a rule (high flexibility)
The first rule that matches the input is activated – prioritized
list of rules
Examples:
De-Identification (per Role)
Transformation (per Content Type)
Fixity (per docket)
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Storlets at proxy node and object node
L2 Rack Switch1GB Ethernet
account node - SSD
L2 Rack Switch1GB Ethernet
L3 Switch10GB Ethernet
Virtual IP
L3 Switch10GB Ethernet
container node -SSD
object node - HDD
object node - HDD
proxy nodeproxy node
Swift Object Node
object
service
Swift Proxy Node
Storlet Engineproxy
service
Storlet Engine
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Fixity Storlet
16
• Papers
• S. Rabinovici-Cohen, E. Henis, J. Marberg, K. Nagin, “Storlet Engine: Performing
Computations in Cloud Storage”, to be submitted
• S. Rabinovici-Cohen, R. Cummings, S. Fineberg, “Self-contained Information
Retention Format For the, to be submitted
• Posters
• S. Rabinovici-Cohen (IBM), M. Baker (HP), R. Cummings (Antesignanus), S. Fineberg
(HP), E. Henis (IBM), "Self-contained Information Retention Format (SIRF) in
ForgetIT EU Project", 6th International Systems and Storage Conference (SYSTOR),
2013
• Other Dissemination Activities
• The Storage Networking Industry Association (SNIA) published in its March 2013
Newsletter that SNIA Long Term Retention group formed a liaison with ForgetIT
• The tutorial "Combining SNIA Cloud, Tape and Container Format Technologies for
the Long Term Retention of Big Data" is given at several SNIA conferences
• Deliverables
• D7.1: Foundation of Computational Storage Services
• D7.2: Computational Storage Services First Release
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Publications
Thank you for your attention!