service and support for science it-peter kunzst, university of zurich
TRANSCRIPT
![Page 1: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/1.jpg)
Service and Support for Science ITScientific Cloud Experiences
Dr. Peter KunsztDirector S3IT
![Page 2: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/2.jpg)
Outline
• Introduction– What is Science IT– How are we organized
• UZH ScienceCloud Infrastructure and Implementation
• Science Data and Security/Privacy
![Page 3: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/3.jpg)
Challenge : Scale Up
• High Throughput Instruments– Much larger data volumes– Increased data complexity
• Large Collaborations– More people– More experiments and measurements– More coverage
BIG
DATA
![Page 4: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/4.jpg)
Fire and forget...
• Scientists do not want to be bothered with infrastructure details
• IT JUST NEEDS TO WORK!
![Page 5: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/5.jpg)
Widening Complexity Gap: IT-Research
Local IT Resources
Research LabsCore Facilities
MiracleSCIENCE IT
![Page 6: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/6.jpg)
What is Science IT ?
FILL THE GAPDedicated Support Center for Science IT
• SPEED : faster time to solution• ACCESS : to infrastructure,
software, expertise• ENABLE : use IT technology and
software for new ideas
Speed
Access
Enablement
![Page 7: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/7.jpg)
![Page 8: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/8.jpg)
Supporting Science• Be a partner to research projects for Science IT• Provide services to individual researchers, groups and consortia
– Consultancy for advanced usage of IT in Science– Research software development and support– Access to competitive IT infrastructure– Access to a library of tools and software– Project management and collaboration support– Training and education on the usage of infrastructure and software
• Collaborate internally, nationally and internationally with partners, suppliers and other Science IT units
• Maintain high level of internal expertise on topics relevant to Science IT
• Advise UZH Governance on evolution of needs, assist in prioritization
![Page 9: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/9.jpg)
Organization Structures are Changing
OrgA
Org C Org D Org E
Org B Org F
Org G
Org H
Org AOrg B
Org C
OrgD
Old world: Hierarchical New world: Federated
http://www.fedsm.eu/
![Page 10: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/10.jpg)
S3IT Organization
Core Team
Site Team
Site Team
EE
EE
EE
EE
EE
...
...
EE = Embedded ExpertWorking directly in projects or on-site in groups on specific tasks
Site TeamsJoint teams with other units providing local support and some global services
Core TeamDirectorate, Office, core services, central infrastructure and consultancy, project mgmt
![Page 11: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/11.jpg)
Partner Interactions
CoreFacilitiesCore
FacilitiesCoreFacilities
Agreements
Services
Research GroupsProjectsProjectsProjects
Partners / Clients
Research GroupsResearch
GroupsResearch Groups
Services
FacultiesInstitutes
Departments
FacultiesInstitutes
Departments
Services
Central IT
Partners / Suppliers
Agreements
CSCS
internalexternal
VendorsVendorsVendorsVendors
Agreements
![Page 12: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/12.jpg)
S3IT Core Business: Project Support
• Infrastructure is important but ‚just‘ a means to an end• Science IT Support: Applications, access, integration• Data analysis• Simulations• Data Integration• Application scaling, making use of big infrastructures• Workflows, automation• Visualization• Software design and usage advice, Code Clinic• Training and education• ...
![Page 13: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/13.jpg)
14
Understand the science..
.. to map Science IT services!
![Page 14: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/14.jpg)
Mapping Security and Privacy
• Most science follows 3 stages– Conception, preparation, proposition stage – private – Project stage (3-5y) – share in group– Publication of results – open to all
• Some have additional constraints (regulations)– Medicine – patient data records need consent
(different per country)– Law and business – confidentiality in projects– Engineering, pharmacology, etc.. – patents
![Page 15: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/15.jpg)
Infrastructure• Supercomputing
– Used as a scientific instrument by • theoretical physics, astrophysics, mathematics, computational chemistry,
biochemistry, quantum chemistry• Continuous usage
• Cluster computing– Used as a workhorse by many groups
• Life science, biochem, geoscience, medicine, digital humanities, banking and finance, art history, ...
• Data analysis, statistical analysis, parameter studies, etc• Non-continuous usage
• Server computing– Used as interactive computers by many groups
• All groups. Interactive processing, visualization, steering of computation. Commercial and open-source tools.
• Daily usage, non-continuous.
![Page 16: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/16.jpg)
Storage Classes
• Large, cheap data store for projects O(xPB)– No need to be backed up: Easy to regenerate but time-
consuming• Reliable project data store O(1PB)
– With secondary copy– Only addition, no changes
• Working storage O(x100TB)– Active data, databases, server-side processes
• Fast storage for streaming analysis O(100TB)– Fast changing data, immediate analysis, rare!
![Page 17: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/17.jpg)
Datacenter Consolidation
OCI – S3IT
ZMB
BIOC
MATH
PHYS
IMLS / Neuro
Consolidate into
Central Datacenter
Aim: Scale and Secure!
NEW
![Page 18: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/18.jpg)
UZH ScienceCloud Implementation
• OpenStack – based on Canonical• Deployment using Ansible• Vagrant-like system for configuration:
Elasticluster (developed at UZH)• Flexible submission and workflow framework
for job control: GC3pie (developed at UZH)• Database management framework openBIS
for data lifecycle management (developed at ETH/SystemsX.ch)
![Page 19: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/19.jpg)
Business Model
• Supercomputing– Investment every 4 years into the system– Research groups to find 3rd party funding
• Commodity Cloud and Storage– Subscription / year : Cores, TB– Per use fee– Subsidized, not TCO – covering operations
• Servers / Pets– Yearly or monthly fee– Size matters
• Yearly acquisition / rollover– Easy to plan
![Page 20: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/20.jpg)
Experience so far:
• Supercomputing needed only by few groups– Can be completely outsourced to national center, done as of 2015
• Cloud is suitable for most Science Workloads– User support scales well– Can cover very many use cases– Build dedicated boxes for exceptions, don‘t be driven by them– Flexibility is key
• Must use local infrastructure for secure, data intensive and memory intensive workloads– Data locality needed for COST and (rarely) policy reasons – exception:
medical data– Hybrid cloud – burst available for CPU intensive jobs– Deal with heterogeneity
![Page 21: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/21.jpg)
Future Cloud Strategy: HYBRID
• Run sizeable local cloud infrastructure for internal workloads
• Burst peak loads to public cloud providers– For selected workloads coherent with policy and cost
Advantages• Plannable local infrastructure (plan for full usage)• Flexibility in scaling, quick provisioning of needed
capacity
![Page 22: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/22.jpg)
Open Questions
• Policies. What workloads can be burst to public clouds? Under what conditions– Calculations, simulations usually OK– Data analyis: depends on data (network issues being resolved)– Check compliance of cloud providers. ISO, HIPAA, etc– Adherence to swiss cantonal data protection regulations
• Cost. How to buy public cloud services? – Public procurement of agreements? – How not to be bound to a single provider? – Is this necessary at all?
• How do i charge my users?– For internal and for external use?– Aim: consolidate their workload into our cloud. No TCO!
![Page 23: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/23.jpg)
Comments on Security in academia
• Users in academia are super smart. They remove barriers faster than you can erect them.
• Do risk assessment and risk analysis instead of prevention.• Don‘t do anything ‚for security reasons‘, always qualify
with real risk numbers• Public Clouds are MUCH MORE secure than our own
– Amazon, Microsoft, IBM etc have whole teams of security experts – they hired our best students for this
• It is a question of TRUST– Regulations by countries– Do we trust the US not to do industrial and academic
espionage, forcing their own companies to give out our data?
![Page 24: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/24.jpg)
Scientific Requirements
• Know your workload: Data, Privacy, Science, Sharing aspects are tightly connected
• Lots of hidden complexity and contradicting requirements
29
![Page 25: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/25.jpg)
1. What Data?
• Different kinds of ‚BIG‘ data• Volume, Variety, Velocity, Veracity• Understanding is Knowledge is Science
– Data vs. Information and Knowledge – What are the right questions?– What should be protected, till when?– How to navigate, explore, evolve
30
WHO OWNS THE DATA?For science, proprietary data is a hindrance
![Page 26: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/26.jpg)
2. Data Reuse
• Currently a wealth of data is not reused for new discovery
• Lots of potential! Regulators need to be told..
• Data repositories with computing and search capability – perfect for Cloud Model
• Do the computation where the data is – Private, public, hybrid Cloud
31
IP on TOOLS, ease of data USE, not DATA itself.
![Page 27: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/27.jpg)
3. Motivate to annotate
• Scientists publish what is necessary and prescribed by the journals, not more –mandate better annotation
• Provide more recognition for producing ´good´ datasets – Data Citation
• Check Data quality – bad quality ordata without annotation has no value
32
Creation of well annotated, sustained public resources
![Page 28: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/28.jpg)
4. Standard Formats
• Too many ‚Standards‘ or not used– Instrument vendors often at fault
• Protection of data by proprietary formats– Data is lost to research
• Do not pay for data in nonstandardformats– Data value is zero if unusable
33
Mandate standard formats for domain data
![Page 29: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/29.jpg)
5. Data Sharing/Publishing
• Share in collaborative mode• Avoid Data Loss • Motivate and enable data publication• Establish business model for data publication
(reward/career benefit)• Journals adapt, see Scientific Data
http://www.nature.com/scientificdata
New role for Archives and Libraries
![Page 30: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/30.jpg)
6. Patient Data Records
• Legal issues of data privacy• People are not in control of their own data• Difficult to get consent• NSA effect – trust
Put citizens back in control
![Page 31: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/31.jpg)
Patient Data Records
• TRUST– Swiss Cooperative: citizen owned
• NEUTRALITY– A simple e-Banking system for any personal health data. Same level of
security• TRACTION
– Volume: it is free, it‘s rewarded• IMPACT
– Request data directly, avoid legal issues
36
![Page 32: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/32.jpg)
• It is a cooperative, not a business• Funding by running campaigns to ask people to
participate in research & surveys• Participants are REWARED for sharing their data
or providing new data• Build tools on top
• Currently seeking funding– H2020, foundations– Projects with hospitals, clinics 37
![Page 33: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/33.jpg)
Approach at S3IT
• Early involvement with Research Groups– Proposal writing, partnership– Advice on Data Management, infrastructure, standards
• Strong cooperation with Libraries– Early involvement with publishers, archives– Joint information to research groups on data management
plans, data citations• Seeking contact with funding bodies and decision makers
– Communicate business plan for Science IT ‚project consumables‘
– Evaluation of projects based on technology cost and feasibility– Usage of public and each others‘ cloud resources for cash
![Page 34: Service and Support for Science IT-Peter Kunzst, University of Zurich](https://reader036.vdocuments.net/reader036/viewer/2022081603/55877007d8b42af93e8b4650/html5/thumbnails/34.jpg)
Links
• www.s3it.uzh.ch - Science IT at UZH• www.sybit.net - Systems Biology IT, SystemsX.ch• www.erasysapp.eu - Systems Biology, DMMCore
project• www.healthbank.ch - Public Cooperative being
set up for patient-owned data. Seeking funding (H2020, pending, and other sources)