big data science in the cloud from big data world conference 2013
DESCRIPTION
TRANSCRIPT
„Big Data Science in the Cloud“
Markus Schmidberger
Big Data Analyst & Cloud Engineer
Big Data gets Political
● New coalition agreement in Germany:– “Wir wollen die Informations- und Kommunikations-
Strategie (IKT-Strategie) für die digitale Wirtschaft weiterentwickeln. ...
– ... Wir werden die Forschungs- und Innovationsförderung für „Big Data“ auf die Entwicklung von Methoden und Werkzeugen zur Datenanalyse ausrichten ... “
3. December 2013 - 3
Continuos Software delivery
“We change the rules!”
Curios, playful, agile, experienced, goal-oriented, love to detail, thinking differently ...
Big data &polyglot persistence
Lean & agile
3. December 2013 - 4
Customer and Partners
3. December 2013 - 5
Big Data
3. December 2013 - 6
Big Data Science
● Data science seeks to use all available and relevant data to effectively tell a story that can be easily understood by non-practitioners.
3. December 2013 - 7
Cloud Computing
● Wikipedia: “... describes a variety of computing concepts that involve a large number of computers connected through a real-time communication network such as the Internet. ...”
3. December 2013 - 8
1) Put Apps & Data to best Place
3. December 2013 - 9
AWS Zones at the right Place
3. December 2013 - 10
Example: R and RStudio Server
● R: open-source statistical Software– www.r-project.org
● RStudio IDE– www.rstudio.org– IDE + web / server
version
3. December 2013 - 11
2) Choose Cloud Resources carefully
● Instance type● EBS optimized● EBS provisioned
IOPS● Load Balancer● Availability Zones
http://media.amazonwebservices.com/AWS_NoSQL_MongoDB.pdf
3. December 2013 - 12
● MongoDB hosting on Amazon EC2 (eu-west-1) and in Munich● 24x7 monitoring and support● Dedicated instances and shared hosting available● Replica Sets and Sharding available● SSL-enabled MongoDB
MongoSoup is the first German-based MongoDB cloud hosting solution!
Supported by a team of experts from MongoDB Inc. first German partner comSysto. You can have a running MongoDB database in virtually no time.
3. December 2013 - 13
Performance <-> Costs
● scale up & out● scale down ?● monitor your resources
from the beginning
3. December 2013 - 14
3) Use full Cloud Technology Stack
3. December 2013 - 15
Example: AWS EMR with mapR
● Speed● Compression
– reduces disk and network I/O and increases performance
● Snapshots– data protection
3. December 2013 - 16
4) Data Protection
● talk to the experts (e.g. Bitkom)
● use available mechanisms & services– EMR in VPC– Mongosoup.de
● be aware of the topic
3. December 2013 - 17
More Big Data Events
● “Map-Reducing Everywhere”– https://hadoopsummit.uservoice.co
m
● Forum Big Data und Verantwortung u.a. mit Frank Schirrmacher– Di, 03.12. 19:00; Große Aula LMU
3. December 2013 - 18
„Big Data Science in the Cloud“
- Yes We Can -
http://comsysto.com/events