nosql and big data for devops

NoSQL and Big Data for DevOps

Gustavo Fernandes

Sunday, 3 February 13

Agenda

• DevOps?

• Toolset

• BigData

• NoSQL

• Demo

• Q&A


DevOps - Motivations

• Silos development/ops

• Slow release cycles

• Lack of awareness from either side


Developers

• Payed to add new features constantly

• “Works in my laptop”

• Usually IGNORE non-functional requirements


Operations

• Keep it stable

• Reliable

• Monitoring

• Distance from code


Devops - Enablers

• Agile

• Infrastructure as software


What is Devops?

• Development + Operations

• Discipline/Philosophy/Methodology

• Role - coder of non-functional requirements

• Faster, more reliable, continuous releases to production


Devops - Principles

• Automate everything: release, deployment, provision

• Infrastructure as code - TDD, tags, branches, ...

• Agile to ops


Open Source Toolset • Configuration management tool

• Puppet, chef

• Application Lifecycle management tools

• Build tools: Maven, Gradle, Buildr, SBT, Rake

• Maven repository: Nexus, Artifactory

• Provisioning tools

• Vagrant, Boxgrinder

• CI servers: jenkins


Puppet

• Custom Declarative Language• Describe resources and states• Applies states to servers• Standalone/client-server/pub-sub• Testable


Puppet Resourcesfile { "/my/file":

source => "/path/to/", backup => main, mode => “0644”}

cron { logrotate: command => "/usr/sbin/logrotate", user => root, hour => 2, minute => 0}

exec { "tar -xf /Volumes/nfs02/important.tar": cwd => "/var/tmp", creates => "/var/tmp/myfile", path => ["/usr/bin", "/usr/sbin"]}

user { 'opuser': ensure => 'present', password => '$1$9VC1vFFa$GHKWgtdODti8eKqkQ7Ruv.'}


Classesclass mongodb($replicaset = ‘’, $disablenuma = ‘’) {

$mongo_tgz = "mongodb-${arch}- ${version}.tgz" $base_dir = "${base}"

group { "mongodb": "ensure => present" }" user{ "mongodb": "ensure => present, gid => "mongodb", "shell => "/sbin/nologin" " } file { "$base_dir":" " ensure => "directory"," " owner => "mongodb"," " group => "mongodb"," " alias => "mongo-base"" }}


File Serverfile { "/etc/sudoers": mode => 440, owner => root, group => root, source => "puppet:///modules/name/sudoers"}

file { "$installdir/conf/mongo.conf": mode => 0744, ensure => present, content => template('mongodb/mongodb.conf.erb'),}


ERB Templatesservers = [‘server1.domain’,‘server2.domain’,‘server3.domain’]

file {'/etc/foo.conf': ensure => file, content => template('foo/foo.conf.erb'),}

# foo.conf.erb<% servers.each do |server| -%> <%= server %><% end -%>

# foo.conf.erbserver1.domainserver2.domainserver3.domain


Modules

Static Files

ERB Templates

Module manifest


Site

node "box1.domain" { include java include hadoop_master}

site.pp


Manifest• Puppet entry point

• Nodes

node "box1.domain" { include java include hadoop_master class { 'myapp': version => "1.5-SNAPSHOT", maven_repo => "http://devserver:8081/nexus/content/repositories/snapshots/", }}

node “box2.domain” { include java include hadoop_slave}


http://puppet.domain:8081/nexus/content/repositories/snapshots/

http://puppet.domain:8081/nexus/content/repositories/snapshots/

Facter Inventory

• Companion ruby utility to puppet

• Collect facts about one environment and expose as a map

• Puppet uses it to decide what to deliver to a node


Facter - Example$ facterarchitecture => x86_64domain => domainfqdn => devserver.domainhardwaremodel => x86_64hostname => devserveripaddress_eth1 => 192.168.95.15ipaddress_lo => 127.0.0.1is_virtual => truekernel => Linuxmemorytotal => 491.11 MBnetwork_eth0 => 10.0.2.0network_eth1 => 192.168.95.0operatingsystem => OpenSuSEoperatingsysrelease => 12.2physicalprocessorcount => 1processor0 => Intel(R) Core(TM) i7-2677M CPU @ 1.80GHzprocessorcount => 1......


Using facter values

... <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value><%= $::processorcount %></value> </property>...

mapred-site.xml.erb


Puppet Extensions

• The language itself

• New types

• New functions

• New resources


The maven resource

maven { "download-artifact": groupid => "com.gustavonalle", artifactid => "$artifact", version => "1.0-SNAPSHOT", repos => "http://devserver:8081/repo/snapshots/", directory => "/opt/myapp",

classifier => “jar” require => File["basedir"], before => Exec["unzip"]}


Maven resource

• Support for SNAPSHOTS

• Support for Releases

• No need to install Maven in the server

• Support Authentication

• Support for http and https

• Support for puppet:// protocol


Installing Hadoop


Hadoop - Processes

• Name Node running on at least one node

• Secondary Name Name node running elsewhere

• Data Node process running on nodes who are part of HDFS

• Job Tracker running in the cluster

• Task Tracker running in each node that can execute map or reduce task


Hadoop HDFS• Name node knows all the slaves from

<HADOOP_HOME>/conf/slaves

• Slaves need to point at namenode in the file <HADOOP_HOME>/conf/core-site.xml

slave1.domain.comslave2.domain.com...

<property> <name>fs.default.name</name> <value>hdfs://namenode:9000</value></property>


Hadoop Map Reduce• Job Tracker knows all the slaves from

<HADOOP_HOME>/conf/slaves

• Slaves need to point at jobtracker in the file <HADOOP_HOME>/conf/mapredsite.xml

slave1.domain.comslave2.domain.com...

<property> <name>mapred.job.tracker</name> <value>master:9001</value> </property>


Hadoop - SSH

• SSH is required to do cluster-wide operations

• ‘ssh localhost’ without asking password

• su hadoop -c ‘ssh slave01’ without password


How puppet can help

• Facter calculates optimal values for memory, cpu, number of maps

• Generate ssh keys on the fly

• Obtain hostnames automatically

• Hide most of the complexity and expose only the bare minimal

• Install in parallel high number of slaves


MongoDB - Replicaset

PrimaryArbiter

Secondaries

(no data)


MongoDB - Creating cluster

• Replicaset must be done with all servers running

• Using cmd tools and a bit of javascript


How puppet can help

• Ensure all servers are running and configured

• Generate .js files and configuration files using templates


Demo

Jenkins

Nexus

Puppet Master

github.com/gustavonalle/puppet


Demo

Name Node

Mongo Primary

JVM

Mongo Secondary

Job Tracker

Data Node

Task Tracker M&R Job

Mongo Arbiter

JVM

box1.domain

box2.domain

devserver.domain** 2 Cpus* 1 Cpu

**

*

*Data Node Task Tracker


site.ppclass mongo_replicaset { class { 'mongodb': replicaset => 'fosdem', primary => 'box1.domain', secondaries => ['box2.domain'], arbiter => 'devserver.domain' }}class customApp { class { 'myapp': version => "1.0-SNAPSHOT", maven_repo => "http://devserver.domain:8081/repo/snapshots/", }}


http://devserver.domain:8081/nexus/content/repositories/snapshots/

http://devserver.domain:8081/nexus/content/repositories/snapshots/

site.pp (cont.)node "box1.domain" { include java class { 'hadoop': master => box1, slaves => [box1,box2] } include mongo_replicaset include customApp}

node "box2.domain" { include java include mongo_replicaset class { 'hadoop': master => box1, }}


Wrapping up

• Server side software is not getting any simpler

• But infrastructure is now “software”

• Devops is here to stay


Thank you

github.com/gustavonalle/puppet

[email protected]


mailto:[email protected]

mailto:[email protected]

nosql and big data for devops

Documents