grau dataspace architecture
DESCRIPTION
GRAU DataSpace provides FileShare & Sync for enterprises and managed service providersTRANSCRIPT
GRAU Data Space 2.0 –The Secure Communication Platform forBusinesses and Organizations
YOUR DATA. YOUR CONTROL
7. Dez 2013
Architectural Overview● The GDS is based on a very robust core which is available since years
● The architecture scales from SMB (<100 user) to large enterprises and service providers (>100.000 users)
● The key features for scalability are:
– Separation between data and meta data (optional)– Transactional scalable storage backend – Versioning of all file objects (UUID)– Chunking of large objects (chunksize can be different for each object)– Hashing of chunked objects (offloading to object store is possible)– Chunk level deduplication based on hash (under development)– Bidirectional master/master replication of all data and meta data on folder level– Session director allows redirection of sessions to another node– RESTful APIs– CMIS (getContentChanges)– Distributable in-memory cache for meta data
Open interfaces
● Open standard interfaces– WebDAV
– JSON/SOAP core API
– CIFS
● Gateways
– OwnCloud
– CMIS 1.1 (SOAP, AtomPub, JSON)
● Identity Management
– Provisioning Gateway (LDAP, AD,SQL)
– Authentication Gateway (LDAP, AD, RADIUS)
Architecture
GDS core
CaringoS3
SWIFTNASGAM
DB/2OracleMySQL
Postgres
GDS2 API (JSON)
CMIS GW
Object-Store FS/CIFS SQL SQL
Storage Backend Metadata
WebGUI
WebDAV
ownCloudGWAdm GW
Admin GUI
DB/2OracleMySQL
Postgres
CIFS
Storage Backend (1)
● Storage backends:– Filesystem (ext4, XFS)
– NAS / CIFS
– RDBMS (MySQL, Oracle, Postgres, MSSQL, DB2)
– Object stores (Caringo, S3, SWIFT)
● Plugins:
– Object chunking (size definable on object level, 512k default)
– Hashing (MD5, SHA-1, SHA-256)
– Dedup on chunk-level [under development]
– Mirroring (one or many backends) [planed]
– Crypto (symmetrical) [planed]
– HSM [planed]
Storage Backend (2)
GDS core
CaringoRADOS
SWIFT/S3
ext4XFS
DB/2OracleMySQL
Postgres
Filesystem SQL
Storage Backend
CIFSObject store
NASGAM/Archive
Chunking (512kB)
MirroringHashing(optional)
Crypto (sym.)
Storage Backend (3)
GDS core
Metadata Object Store
GDS2 API (JSON)
RADOSOSD
GDS core
Object Store Metadata
GDS2 API (JSON)
RADOSOSD
ReplicationMetadata
GDS2 API (JSON)
RADOSOSD
RADOS GW
librados
RADOS GW
librados
SWIFT SWIFT
Scalability / High availibility● Master/master replication on folder level
– Data, metadata
– Users, groups
– Access lists
● Shared nothing architecture– Horizontal scalability
– High availability
– Users that share a lot of folders can be relocated to the same node
– Adding or removing nodes dynamically
– Software updates on deactivated nodes
● Distributed metadata cache
– CMIS gateway allows session and metadata caching
● Session redirector (reverse proxy)– Redirects session to the home node of the user
– If the home node is down, one of the backup nodes will be used
High availibility
GDS core
Storage Metadata
GDS2 API (JSON)
GDS core
Storage Metadata
GDS2 API (JSON)
ReplicationData
Metadata
GDS2 API (JSON)
Load Balancer Load Balancer
GDS (Session) Director GDS (Session) Director
Scalability (1)
GDS core
Metadata Data
GDS2 API (JSON)
Load Balancer Load Balancer
GDS core
Data Metadata
GDS2 API (JSON)
Master/MasterReplicationMetadata
GDS (Session) Director
GDS2 API (JSON)
GDS (Session) Director
Objectstore / Cluster filesystem
Scalability (2)
GDS core
MD Data
GDS2 API (JSON)
Load Balancer Load Balancer
GDS (Session) Director GDS (Session) Director
MetadataReplication
CMIS Cache
GDS core
GDS2 API (JSON)
GDS core
GDS2 API (JSON)
DataData
Objectstore / Cluster filesystem
MD MD MetadataReplication
CMIS Cache CMIS Cache CMIS Cache
Multiple Sites - Roaming (1)● Every user has a home node which is stored in the account data● Redundancy of file objects is provided by objects store at each site● Users, groups and ACLs are synchronized between all sites● File objects are not synchronized between sites● Synchronization takes place asynchronously● Load balancer directs client request to session director● Session director redirects request based on user account to
– Home node of the user [my]
– Node which hosts shared data room [shared]
– Any node [global]
● Session director analyzes the request and forwards to
– CMIS caching layer
– JSON API layer
Multiple Sites - Roaming (2)
GDS core
MD Data
GDS2 API
GDS Director
CMIS Cache
GDS core
Data MD
GDS2 API
CMIS Cache
Site B
GDS Director
Site A
GDS core
MD Data
GDS2 API
CMIS Cache
GDS core
Data MD
GDS2 API
CMIS Cache
GDS Director GDS Director
LB LBLB LB
JSON CMIS
Identity Management (1)
● Separation between user provisioning and authentication● Multiple instances of gateways are possible● Multiple directories can be connected in parallel
● Provisioning gateway– LDAP/AD/SQL crawler
– Users that match a regular expression are created in the GDS
– Users that got deleted in the directory get deactivated in the GDS
– SCIM/SAML module [planed]
Identity Management (2)
● Authentication gateway– LDAP/AD/SQL module
– Multilevel authentication
– Google authenticator [planed]
– RADIUS module [planned]
– MTAN/OTP module [planed]
● Single Sign-On [planned]
– Kerberos module
– OAUTH2 module
Identity Management (3)
ProvisioningGateway
Storage Backend Metadata
WebGUI
Admin GW
Admin GUI
GDS2 API (JSON)
GDS core
AuthenticationGateway
LDAP/AD
LDAP/AD
SAML
RADIUS
SAML
SQL
Multi Tenancy● Dedicated Hardware
– Highest level of separation and security
– No performance impact of virtualization layer
● Full virtualization (KVM, HyperV, Vmware, XEN)– Highest level of separation and security in virtualized environment
– Similar static memory pages can be shared between instances
– GDS version can be different for each tenant
● Linux Containers (LXC)
– Lightweight virtualization
– Memory and program files on disk can be shared between instances
● Single instance
– Same GDS version for all tenants
– Everything gets shared
– Software bugs or operational problems affect all tenants
Distributed Data Space
FW
Internet
CIFS
LAN
CIFS
GDS
JSON
HTTPSFW
CIFS
LAN
CIFS
GDS
JSON
HTTPS
FW
CIFS
LAN
CIFSGDS
JSON
HTTPSHTTPSLAN
CIFS
FW
CIFS
GDS
JSON
Site A Site B
Site C Site D
Corporate CDN
CIFS GDSCMIS
HTTPS
HTTPS
Site A
Site C
GDS
GDS
GDS
CM
IS C
ache
SD
CM
IS C
ache
CM
IS C
ache
OS
SD
WebDAV
CIFS
CMIS
WebDAV
CIFS
CMIS
WebDAV
HT
TP
S
Site B
GDS
GDS
OS
OS
CIFS GDSCMIS
Site B1
WebDAV
CIFS
CMIS
WebDAV
Site B2
GDS
HTTPS
HTTPS
Cloud attached Data Space
FW
Internet
CIFS
LAN
CIFS
GDS
JSON
HTTPS
FW
GDS
HTTPSLAN
CIFS
FW
CIFS
GDS
JSON
Site A
Site B
GDS
GDS
GDS
LB
LB
WWW: HTTP://WWW.GRAUDATA.COM/DATASPACE
E-MAIL: [email protected]
CEL: +49 151 54354373
TWITTER: @graudataspace
YOUR DATA. YOUR CONTROL.