desy cloud, the scientific data cloud - dcache · june 1, 2016 , frankfurt, patrick fuhrmann et al....

43
INDIGO DataCloud DESY Cloud, The Scientific Data Cloud Managed Shared Storage At the “ownCloud Connects Business” workshop Dr. Patrick Fuhrmann Quirin Buchholz Tigran Mkrtchyan Peter van der Reest Lusine Yakovleva

Upload: dangphuc

Post on 06-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

INDIGO DataCloud

DESY Cloud,The Scientific Data CloudManagedSharedStorageAtthe“ownCloud ConnectsBusiness”workshop

Dr.PatrickFuhrmannQuirin BuchholzTigranMkrtchyanPetervanderReestLusine Yakovleva

June1,2016,Frankfurt,PatrickFuhrmannetal. 2TheScientificDataCloud@ownCloud ConnectsBusiness

Content

• Storage@DESY?• Sync’n ShareatDESY

• Motivation• Requirements• Implementation• Setup

• RequirementsfromScienceCommunities.• dCache forDummies.• TheownCloud– dCache Hybridsystem• Summaryandoutlook.

June1,2016,Frankfurt,PatrickFuhrmannetal. 3TheScientificDataCloud@ownCloud ConnectsBusiness

Storage@DESY

• PetraIII[Tier0](2012…)• SynchrotronRadiation

• 14Beamlines• BeamlineGuestScientists

• 1PB/year– 5PB/year

• European[Tier0]XFEL (2017…)• 3.4Km(Linear)• 2017(Firstbeamline)

• BeamlineGuestScientists• 10– 100PB/year

• HERA[Tier0](1992– 2007)• Particleaccelerator(Proton– Electron)

• 6.3Km(Ring)• Somehundredscientists

• 5PBintotal

• LCG[WLCGTier2](2008,2009 …)• Particleaccelerator(Proton– Proton)• 26.7(Ring)

• About10.000scientist• 15PB/year

2020100PBytes

1992

June1,2016,Frankfurt,PatrickFuhrmannetal. 4TheScientificDataCloud@ownCloud ConnectsBusiness

MorestorageatDESY

•TheDESYdatamanagementteamhasquitesomeexperienceinmanaginghugeamountsofdata.

• Incollaborationwithother‘bigdata’sites,weareprovidingadatamanagementsystem‘dCache’,deployedat70sitesaroundtheworld.

• Seelater.•So,whyarewerunningownCloud ?

June1,2016,Frankfurt,PatrickFuhrmannetal. 5TheScientificDataCloud@ownCloud ConnectsBusiness

Motivation

• DESYhasnoexperienceinsophisticateddatasharing.• DatasharingwasdoneinthetraditionalwaywithACL’sand’group’directories

• However:YoungscientistsstarttheircareersatUniversitiesandLab’swithSync’n Shareintheirblood.(DropBoxGeneration).

• PublicITdepartments,foraverylongtime,didn’tregardSync’n Shareasbeingtheirproblemasmanycommercialsolutionswerearound.

• ItessentiallybecameanissueafterSnowden.• LegalRequirement:Datahadtobestored‘onsite’oratleastinGermany

• Consequence:CCneededtoprovideSync’n Sharelikemechanisms.

June1,2016,Frankfurt,PatrickFuhrmannetal. 6TheScientificDataCloud@ownCloud ConnectsBusiness

Requirements

• Finegrainedsharingoffilesanddirectorieswithindividualsandgroups.

• SharingviaintuitiveWeb2.0mechanisms(AppsorBrowser)• Sharingwith‘thepublic’withorwithoutpasswordprotection• Sharingofspacetouploaddata.(protected)• Expirationofshares• Automaticbidirectionalsynchronizationofdatabetweenmobiledevicesandcentralrepository.

June1,2016,Frankfurt,PatrickFuhrmannetal. 7TheScientificDataCloud@ownCloud ConnectsBusiness

TypicalApplication

Your Cloud SpaceSync

Sync

File up and download

June1,2016,Frankfurt,PatrickFuhrmannetal. 8TheScientificDataCloud@ownCloud ConnectsBusiness

StepstakenbyDESY• Evaluatedpossiblesolutionsin2013.• DecidedtogoforownCloud

• Providesmostofthefeaturesneeded.• OpenSource• WasinusebymanyinstitutesandUniversitiesinGermany• UsedbycolleaguesatSURFSara (Amsterdam)andCERN

• Evaluationshowed:• VerygoodSync’n Sharefeature set• Verygoodinplanningahead(roadmap)• Plansforcrosssitefederatedaccess(nowinplace).• Abitweakindatamanagement

• StartedprototypeinstallationatDESYbeginningof2014

June1,2016,Frankfurt,PatrickFuhrmannetal. 9TheScientificDataCloud@ownCloud ConnectsBusiness

WhatshouldtheDESYSetuplooklike?

(ActuallywilllooklikeinJuly)

June1,2016,Frankfurt,PatrickFuhrmannetal. 10TheScientificDataCloud@ownCloud ConnectsBusiness

TheInfrastructure

AuthenticationKerberos

UserManagementRegistryLDAP

Monitoring

LocalandWide AreaNetworkLoadBalancing Firewalls

Virtualization

Accounting 8 UnlimitedPersistentStorage

June1,2016,Frankfurt,PatrickFuhrmannetal. 11TheScientificDataCloud@ownCloud ConnectsBusiness

Infrastructure Integration

PostgresDB

OwnCloud

OwnCloudOwnCloud

OwnCloud

F5,LoadBalancer

AutomaticFailover

June1,2016,Frankfurt,PatrickFuhrmannetal. 12TheScientificDataCloud@ownCloud ConnectsBusiness

MoreIntegration

DESYKerberos

OwnCloud

8UnlimitedCentral

Storage

DESYLDAPDataLifeCycle

Engine

June1,2016,Frankfurt,PatrickFuhrmannetal. 13TheScientificDataCloud@ownCloud ConnectsBusiness

PoolNode

PoolNode

PoolNode

PoolNode

PoolNode

PoolNode

200TBytesRAID6

200TBytesRAID6

200TBytesRAID6

Horizontally ScalingBackend

OwnCloud OwnCloud OwnCloud OwnCloud

NFS4.1/pNFS

WebLoadBalancer(F5)

June1,2016,Frankfurt,PatrickFuhrmannetal. 14TheScientificDataCloud@ownCloud ConnectsBusiness

SomeStatistics

Filesin/outin7days10.000

70.000Filesin/outperhour

Users Total 490

Users Active 277

SpaceAvailable 567TBytes

SpaceUsed 2*30TBytes

Files 10Millions

CurrentDefaultQualityTwoReplicasondifferentstoragenodes.

June1,2016,Frankfurt,PatrickFuhrmannetal. 15TheScientificDataCloud@ownCloud ConnectsBusiness

Isthatsufficient forscientists?

June1,2016,Frankfurt,PatrickFuhrmannetal. 16TheScientificDataCloud@ownCloud ConnectsBusiness

TypicalWorkflow

Derived PublicationRaw

Sharing

June1,2016,Frankfurt,PatrickFuhrmannetal. 17TheScientificDataCloud@ownCloud ConnectsBusiness

DataCategories

1TB

10- 100TB

1– 100PB Raw

Derived

Publication

LHCDetectordataRawX-RayImagesBrainScansReconstructed(Ntuples)PurifiedImagesBrainMaps

Papers,Presentations,Histograms

Amount Category TypicalApplication

June1,2016,Frankfurt,PatrickFuhrmannetal. 18TheScientificDataCloud@ownCloud ConnectsBusiness

Whatdoweneedtosupport ‘scienceworkflows’?

June1,2016,Frankfurt,PatrickFuhrmannetal. 19TheScientificDataCloud@ownCloud ConnectsBusiness

MoreRequirements

• Storagemustbemanageable:DefinedQoS andDataLifecycle• DifferenttypeofdatamusthavedifferentQoS attached,regardingaccesslatency(performance)anddatadurability(howsafeismydata?)

• SpinningDiskforstreaming• SSDforfastrandomaccess• Tapeforarchive• Multiplecopiesindifferentlocationsondifferentmediaforlongtermdatapreservation

• MovingdatabetweendifferentQoS typeshastobeperformed• w/oserviceinterruption• transparentlytotheuser• w/ochangesinthenamespace

June1,2016,Frankfurt,PatrickFuhrmannetal. 20TheScientificDataCloud@ownCloud ConnectsBusiness

QualityofService

Raw

LongTermPreservation(LegalRequirement)

Derived

SSD

LowLatency(HPC,Analysis)

Publication

SSD

Fast,MultiStreamAccess

June1,2016,Frankfurt,PatrickFuhrmannetal. 21TheScientificDataCloud@ownCloud ConnectsBusiness

EvenmoreRequirements

• Differentaccessprotocolsfordifferentapplications• POSIXMountedFS(nfs4.1/pNFS) forfastanalysis• FTPdialects(gridFTP) forwideareatransferswithGLOBUS,WLCG-FTS• http/WebDAVmostlyforbrowserbasedapplications,visualization,..

• Differentauthenticationmechanismmustbeavailable.• Username/passwordforwebapplications• SAMLtosupporttraditionalIdP’s• OpenIDConnectforgoogle/facebook likeIdP’s• CertificatesforhttpsorGRIDapplications

• Differentcredentialsmustbemap-abletothesameidentity.

June1,2016,Frankfurt,PatrickFuhrmannetal. 22TheScientificDataCloud@ownCloud ConnectsBusiness

ScientificDataCloud

HighSpeedDataIngest

FastAnalysisNFS4.1/pNFS

WideAreaTransfers(Globus Online,FTS)byGridFTP

Sync’ing andSharingwith OwnCloud

June1,2016,Frankfurt,PatrickFuhrmannetal. 23TheScientificDataCloud@ownCloud ConnectsBusiness

Whatwouldthatlooklikefromtheuser’sperspective?

June1,2016,Frankfurt,PatrickFuhrmannetal. 24TheScientificDataCloud@ownCloud ConnectsBusiness

MyDESYXXLHomeQoS support

Patrick’shome

June1,2016,Frankfurt,PatrickFuhrmannetal. 25TheScientificDataCloud@ownCloud ConnectsBusiness

MyDESYXXLHomeProtocolSupport

MultiProtocolNFS4.1/pNFS

GridFTPWebDAVSRM

MyownCloud Home SyncShare

Web2.0ownCloud

June1,2016,Frankfurt,PatrickFuhrmannetal. 26TheScientificDataCloud@ownCloud ConnectsBusiness

Howdoweachievethosegoals?

ORChoosingdCache asthestoragebackendfor

ownCloud !

Thescientificdatacloud

June1,2016,Frankfurt,PatrickFuhrmannetal. 27TheScientificDataCloud@ownCloud ConnectsBusiness

SideTrack

What’sdCache ?

June1,2016,Frankfurt,PatrickFuhrmannetal. 28TheScientificDataCloud@ownCloud ConnectsBusiness

dCache inanutshell (cont.)

• Started2000’• Internationalcollaboration(DESY,FERMIlab,NDGF)• About10members:developers,deployment,support,management• Softwaredeployedatabout70sitesEurope,US,Asia,Russia• Largestdeploymentsintheorderof20PBytes ontapeanddisk.• Totalstoragecloseto200PBytes.• Geographicallylargestinstallationspans4countries.• LargelyfundedbyINDIGO-DataCloud,DESY,FERMIlab andNDGF

INDIGO DataCloud

June1,2016,Frankfurt,PatrickFuhrmannetal. 29TheScientificDataCloud@ownCloud ConnectsBusiness

dCache Design

MediaTransferEngineandPoolManagement dCache

Automaticand

ManualMedia

transition

Virtual file-systemnamespaceLayerProtocoland Authentication Engines

gridFTPNFS/pNFS httpWebDAV

SSDs

SpinningDisks

Tape, BlueRay…

June1,2016,Frankfurt,PatrickFuhrmannetal. 30TheScientificDataCloud@ownCloud ConnectsBusiness

NamespaceDesign

NameSpace PhysicalStorage

Disk

Tape

ExternalSystem

LocationManager

Name

Disk1

Disk2

Tape1

June1,2016,Frankfurt,PatrickFuhrmannetal. 31TheScientificDataCloud@ownCloud ConnectsBusiness

DesignConsequence

• Filesarestoredasobjectsonvariousdataback-ends• RandomDevices :Harddisk,SSD• RemovableMedia:Tape• Objectstores:CEPH

• Back-endscanbehighlydistributed(evenbeyondcountries).• TheFilenamespaceengineisindependentofthedatastorageitself.• Internalandexternalservicescanmovedataaroundw/oserviceinterruption.

June1,2016,Frankfurt,PatrickFuhrmannetal. 32TheScientificDataCloud@ownCloud ConnectsBusiness

dCache Featuressupporting ourideaofascientificdatacloud

• MultiProtocolSupport(TransferandAuthentication)• Transferprotocols:NFS/pNFS,http,WebDAV• MultiAuthenticationCredentialsupport(OpenIDConnect,Kerberos,passwd)

• SophisticatedDataManagement• MultiMediasupport(Tape,SpinningDisk,SSD,…)• Automaticandmanualmediatransitions• Addingandremovingdatanodesw/oserviceinterruption• Automaticreplicamanagement

• Enforcesn<x<mcopiesofdatafiles.• Externalstoragesupport(e.g.Tapesystems:TSM,HPSS,OSM,DMF)

June1,2016,Frankfurt,PatrickFuhrmannetal. 33TheScientificDataCloud@ownCloud ConnectsBusiness

Inparticular :TheQoS Interface

June1,2016,Frankfurt,PatrickFuhrmannetal. 34TheScientificDataCloud@ownCloud ConnectsBusiness

dCache QoS Interfaces

WebService

CDMIService

Cloud

dCache

QoSModule

RESTful

June1,2016,Frankfurt,PatrickFuhrmannetal. 35TheScientificDataCloud@ownCloud ConnectsBusiness

TheQoS WebInterface

DISK TAPE

Click,togetFilebackfromTape.

June1,2016,Frankfurt,PatrickFuhrmannetal. 36TheScientificDataCloud@ownCloud ConnectsBusiness

Puttingpiecestogether

June1,2016,Frankfurt,PatrickFuhrmannetal. 37TheScientificDataCloud@ownCloud ConnectsBusiness

TheDataPath

OwnCloud OwnCloud OwnCloud OwnCloud

NFS4.1/pNFS

WebLoadBalancer(F5)

SpinningDisks

SSD’s TAPE

dCache

June1,2016,Frankfurt,PatrickFuhrmannetal. 38TheScientificDataCloud@ownCloud ConnectsBusiness

FutureWorkTheNamespacePath

Namespace

NamespacedCache

SharingDB

ShareAPI

Namespace,Proxy

June1,2016,Frankfurt,PatrickFuhrmannetal. 39TheScientificDataCloud@ownCloud ConnectsBusiness

dCache – OwnCloud hybrid

• Datapathistheeasiestpart.Worksnicely.• Namespacesynchronizationis/wasverydifficult

• Importanttoletallprotocolsseesynchronizednamespace.• ownCloud didn’texpecttheunderlyingstoragesystemtochangenamespacetree.• Manuallytriggeredsynchronizationtooktoolong.• OwnCloud 9providesfirstattemptforanAPIforexternalnamespace.

• Exposing‘shares’toexternalcomponentnotyetinownCloud.• ImportanttoallowallprotocolstouseownCloud-definedshares.• Prerequisites:

• ownCloud :needsAPItoexpose‘shares’• dCache :needstohavea‘share’objectimplemented.

June1,2016,Frankfurt,PatrickFuhrmannetal. 40TheScientificDataCloud@ownCloud ConnectsBusiness

ownCloud andQoS

I/O(NFS)

ownCloud GUIWeb

dCacheNamespaceAPI

ShareAPI

QoSPluggin

(ServerSideApp)

QoSModule

RESTServices

June1,2016,Frankfurt,PatrickFuhrmannetal. 41TheScientificDataCloud@ownCloud ConnectsBusiness

Summary

• AnOwnCloud - dCache Hybridisaperfectsystemforprovidingmanagedsharedstoragetoscientists.

• Sync’n ShareisprovidedbyownCloud.• AccessprotocolsandAuthenticationMechanismsusedinscienceareprovidedbydCache.

• Unlimitedstoragespaces(viaremovablemedia,e.g.tape)• QualityofServicesupport

• automaticandmanualmediatransitions• Automaticreplicamanagementresultinginhighavailabilityanddatadurability.

• Reduceddowntimesduetotransparentdatamigration.

June1,2016,Frankfurt,PatrickFuhrmannetal. 42TheScientificDataCloud@ownCloud ConnectsBusiness

Outlook

• ThecurrentversionoftheownCloud-dCacheHybridsatisfiestheneedfor

• Sync’n Share• Highlyscalableandmanageableback-endstorage

• Forafullintegration• Thename-spacesofthetwosystemsneedtobesynchronized(OC9)• TheownCloud ‘shares’needtobeexposedtohavethemvisibleinallprotocols(nfs,gridFTP,…)

• WeneedtoprovideanownCloudpluggin(serversideapp)tomakethedCacheQoSstoragetypesvisibleinownCloud.

June1,2016,Frankfurt,PatrickFuhrmannetal. 43TheScientificDataCloud@ownCloud ConnectsBusiness

TheEND

furtherreadingwww.dCache.org