big data analysis with crate and python

20
Big Data Analysis with Crate and Python Matthias Wahl - developer @ crate.io Email: [email protected]

Upload: matthias-wahl

Post on 11-Aug-2014

179 views

Category:

Data & Analytics


0 download

DESCRIPTION

Analysing any huge dataset with the help of the crate datastore using the bare crate python client or SQLAlchemy.

TRANSCRIPT

Page 1: Big Data Analysis with Crate and Python

Big Data Analysis with Crate and Python

Matthias Wahl - developer @ crate.io !

Email: [email protected]

Page 2: Big Data Analysis with Crate and Python

Crate

shared nothing massively scalable datastore

standing on the shoulders of giants

Page 3: Big Data Analysis with Crate and Python

Crate

get it at: https://crate.io/download

# bash -c "$(curl -L try.crate.io)"

Page 4: Big Data Analysis with Crate and Python

Crate

automatic sharding and replication

(semi-) structured models

single table only

SQL query language

Page 5: Big Data Analysis with Crate and Python

Crate

all common SQL types(and more)

powerful aggregations (‘GROUP BY’)

linear scalability - data and query execution is distributed

basic arithmetics (next release 0.39)

Page 6: Big Data Analysis with Crate and Python

Crate

Page 7: Big Data Analysis with Crate and Python

Aggregation Execution

SELECT station_name, max(temp), avg(temp), min(temp), count(distinct date) FROM weather_de WHERE temp != -999 GROUP BY station_name ORDER BY station_name ASC;

Page 8: Big Data Analysis with Crate and Python

Aggregation Execution

H

M

M

M

R

R

R

collect

Request

Page 9: Big Data Analysis with Crate and Python

Aggregation Execution

H

M

M

M

R

R

R

collect

hash based distribution

Page 10: Big Data Analysis with Crate and Python

Aggregation Execution

H

M

M

M

R

R

R

group results

Page 11: Big Data Analysis with Crate and Python

Aggregation Execution

H

M

M

M

R

R

R

final reduceResponse

Page 12: Big Data Analysis with Crate and Python

Aggregation Execution

Page 13: Big Data Analysis with Crate and Python

Using the python client

>>> from crate.client.http import Client >>> client = Client([“127.0.0.1:4200”]) >>> response = client.sql(“select * from weather_de limit 1”) >>> print(response) { u'duration': 659, u'rowcount': 1, u'rows': [ [1303365600000, 82.0, None, None, None, 0, u'954', 54.1667, 7.45, u'UFS Deutsche Bucht', 60.0, 10.9, 100, 5.2] ], u'cols': [u'date', ...] }

Page 14: Big Data Analysis with Crate and Python

Using SQLAlchemy

>>> import sqlalchemy as sa >>> from sqlalchemy.ext.declarative import declarative_base >>> from sqlalchemy.orm import sessionmaker >>> engine = sa.create_engine(“crate://localhost:4200”) >>> Base = declarative_base()

Page 15: Big Data Analysis with Crate and Python

Using SQLAlchemy

>>> class Weather(Base): ... ... __tablename__ = 'weather_de' ... ... station_id = Column('station_id', String, primary_key=True) ... station_name = Column('station_name', String) ... station_lat = Column('station_lat', Float) ... station_long = Column('station_lon', Float) ... station_height = Column('station_height', Integer) ... date = Column('date', DateTime, primary_key=True) ... temp = Column('temp', Float) ... humility = Column(Float) ... sunshine_hours = Column(Float) ... wind_speed = Column(Float) ... wind_direction = Column(Integer) ... rainfall_fallen = Column(Integer) ... rainfall_height = Column(Float) ... rainfall_form = Column(Integer)

Page 16: Big Data Analysis with Crate and Python

Using SQLAlchemy

>>> from sa import func >>> res = DBSession.query( ... Weather.station_name, ... func.avg(Weather.temp) ... ).group_by(Weather.station_name) ... .order_by(Weather.station_name) ... .limit(10).all()

SELECT station_name, avg(temp) from weather group by station_name order by station_name limit 10;

Page 17: Big Data Analysis with Crate and Python

Using SQLAlchemy

#Average sunshine hours from sqlalchemy.sql import func DBSession.query(func.avg(Weather.sunshine_hours)).scalar() # Average sunshine hours in Konstanz DBSession.query(func.avg(Weather.sunshine_hours)).filter(Weather.station_name==‘Konstanz’).scalar()

Page 18: Big Data Analysis with Crate and Python

Feature Requests

I’m no data scientist

Page 19: Big Data Analysis with Crate and Python

Feature Requests

Please tell us what you would like to see in crate.

I’m no data scientist

Page 20: Big Data Analysis with Crate and Python

CRATE

Thank you

web: https://crate.io/

github: https://github.com/crate

twitter: @cratedata

IRC: #crate

stackoverflow tag: cratedata