php continuous data processing

35
1 PHP & Continuous Data Processing, PHPNW 2011 PHP & CONTINUOUS DATA PROCESSING Michael Peacock, October, 2011

Upload: michael-peacock

Post on 25-May-2015

1.278 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: PHP Continuous Data Processing

1 PHP & Continuous Data Processing, PHPNW 2011

PHP & CONTINUOUS DATA PROCESSINGMichael Peacock, October, 2011

Page 2: PHP Continuous Data Processing

2 PHP & Continuous Data Processing, PHPNW 2011

NO. NOT MILK FLOATS (ANYMORE)

ALL ELECTRIC, COMMERCIAL VEHICLES.

Photo courtesy of kenjonbro: http://www.flickr.com/photos/kenjonbro/4037649210/in/set-72157623026469013

Page 3: PHP Continuous Data Processing

3 PHP & Continuous Data Processing, PHPNW 2011

ABOUT MICHAEL PEACOCK

• Senior/Lead Web Developer• Web Systems Developer

• Telemetry Team – Smith Electric Vehicles US Corp

• Author• PHP 5 Social Networking, PHP 5 E-Commerce

Development, Drupal Social Networking (6 & 7), Selling online with Drupal e-Commerce, Building Websites with TYPO3

• PHPNE Volunteer• Occasional technical speaker

• PHP North-East, PHPNW 2010, SuperMondays, PHPNW 2011 Unconference, ConFoo 2012

Page 4: PHP Continuous Data Processing

4 PHP & Continuous Data Processing, PHPNW 2011

SMITH ELECTRIC VEHICLES & TELEMETRY

• Worlds largest manufacturer of Commercial, all-electric vehicles

• Smith Link – on-board vehicle telematics system, capturing over 2500 data points each second on the vehicle and broadcasting them over mobile network

• ~400 telemetry enabled vehicles on the road• Worlds largest telemetry project outside of F1

Page 5: PHP Continuous Data Processing

5 PHP & Continuous Data Processing, PHPNW 2011

SYSTEM ARCHITECTURE

Page 6: PHP Continuous Data Processing

6 PHP & Continuous Data Processing, PHPNW 2011

SYSTEM ARCHITECTURE

Page 7: PHP Continuous Data Processing

7 PHP & Continuous Data Processing, PHPNW 2011

PROBLEM #1: WE CAN’T LOOSE ANY DATA

Data is required as part of a $32 million grant from the US Department of

Energy

• Thousands of pieces of information collected on a per second basis from a range of remote collection devices

• Un-predictable amounts of data at any one time

• More vehicles rolling off the production line with telemetry enabled

• What about system downtime, upgrades, roll-outs and connectivity problems?

Page 8: PHP Continuous Data Processing

8 PHP & Continuous Data Processing, PHPNW 2011

MESSAGE QUEUING

Solution: We use a fast, reliable, scalable, secure, hosted message

queue

• If our systems are offline, data builds up in the external message queue

• If we are processing at full capacity, surplus builds in in the message queue

• If the vehicle loses GPRS signal, or message queue were to be inaccessible, vehicles have an internal buffer of up to 7 days

Page 9: PHP Continuous Data Processing

9 PHP & Continuous Data Processing, PHPNW 2011

SECRET WEAPON #1: STORMMQ

• Based on AMQP, an open standard• Secure: All data is encrypted and sent over SSL• Reliable: Huge investment in server

infrastructure• Hosted: Backed up with an SLA• Scalable: Capable of processing huge numbers

of incoming messages, with capacity to store the messages when we perform maintenance on our systems

Page 10: PHP Continuous Data Processing

10 PHP & Continuous Data Processing, PHPNW 2011

PROBLEM #2: PROCESSING DATA QUICKLY

We utilise a dedicated server and number of dedicated applications to pull these messages and process them

• This needs to happen quick enough for live data to be seen through the web interface

• Data is rapidly converted into batch SQL files, which are imported to MySQL via “LOAD DATA INFILE”• Results in high number of inserts per second (20,000 –

80,000)• LOAD DATA INFILE isn’t enough on its own...

Page 11: PHP Continuous Data Processing

11 PHP & Continuous Data Processing, PHPNW 2011

SECRET WEAPON #2: DBA

• Constantly tweaking the servers and configuration to get more and more performance

• Pushing the capabilities of our SAN, tweaking configs where no DBA has gone before

• www.samlambert.com• http://www.samlambert.com/2011/07/how-t

o-push-your-san-with-open-iscsi_13.html• http://www.samlambert.com/2011/07/diagn

osing-and-fixing-mysql-io.html• [email protected]

Sam Lambert – DBA Extraordinaire

Page 12: PHP Continuous Data Processing

12 PHP & Continuous Data Processing, PHPNW 2011

SHARDING

• Huge volumes of data being stored

• We shard the data based on the truck it came from, each truck has its own database

• Databases held on one of many database servers in our cluster each with ~100GB RAM

Page 13: PHP Continuous Data Processing

13 PHP & Continuous Data Processing, PHPNW 2011

LIVE, REAL TIME INFORMATION

[live screen photo]

Page 14: PHP Continuous Data Processing

14 PHP & Continuous Data Processing, PHPNW 2011

REAL TIME STATUS AND TRACKING

Page 15: PHP Continuous Data Processing

15 PHP & Continuous Data Processing, PHPNW 2011

LIVE, REAL TIME INFORMATION: PROBLEM

Original database design dictated:• All data-points were stored in the same table• Each type of data point required a separate

query, sub-query or join to obtain

Workings of the remote device collecting the data, and the processing server, dictated:• GPS Co-ordinates can be up to 6 separate data

points, including: Longitude; Latitude; Altitude; Speed; Number of Satellites used to get location; Direction

Page 16: PHP Continuous Data Processing

16 PHP & Continuous Data Processing, PHPNW 2011

REAL TIME INFORMATION: CONCURRENT

Initial Solution from the original developers:• Pull as many pieces of real time information

through asynchronously• Involved the use of Flash based “widgets”

which called separate PHP scripts to query the data

• Pages loaded relatively quickly• Data points took a little time to load

• Not good enough

Page 17: PHP Continuous Data Processing

17 PHP & Continuous Data Processing, PHPNW 2011

REAL TIME INFORMATION: CACHING

• High volumes of data, and varying levels of concurrent processing means query times are often not consistent

• Memcache was used when processing the data from the message queue, keeping a copy of the most recent of each data point for each truck

• Live, Real-Time information accessed directly from memcache, bypassing the database

Page 18: PHP Continuous Data Processing

18 PHP & Continuous Data Processing, PHPNW 2011

CACHING: REGISTRY/DI IS IDEAL

• Sporadic use of memcache within the web application – ideal use case for a lazy loading registry or DI container

• Give the registry or container details of memcache

• Object only instantiated and connection made only when data is requested from memcache

Page 19: PHP Continuous Data Processing

19 PHP & Continuous Data Processing, PHPNW 2011

LAZY LOADINGpublic function getObject( $key ){

if( in_array( $key, array_keys( $this->objects ) ) ){

return $this->objects[$key];}elseif( in_array( $key, array_keys( $this->objectSetup ) ) ){

if( ! is_null( $this->objectSetup[ $key ]['abstract'] ) ){

require_once( FRAMEWORK_PATH . 'registry/aspects/' . $this->objectSetup[ $key ]['folder'] . '/' . $this->objectSetup[ $key ]['abstract'] .'.abstract.php' );

}require_once( FRAMEWORK_PATH . 'registry/aspects/' . $this->objectSetup[ $key ]

['folder'] . '/' . $this- >objectSetup[ $key ]['file'] . '.class.php' );$o = new $this->objectSetup[ $key ]['class']( $this );$this->storeObject( $o, $key );return $o;

}elseif( $key == 'memcache' ){

// requesting memcache for the first time, instantiate, connect, store and return$mc = new Memcache();$mc->connect( MEMCACHE_SERVER, MEMCACHE_PORT );$this->storeObject( $mc, 'memcache' );return $mc;

}}

Becomes the limit for the registry pattern, DI container more suitable

Page 20: PHP Continuous Data Processing

20 PHP & Continuous Data Processing, PHPNW 2011

REAL TIME INFORMATION: EXTRAPOLATE AND ASSUME• Our telemetry unit broadcasts each data point

once per second

• Data doesn’t change every second, e.g.• Battery state of charge may take several minutes to

loose a percentage point• Fault flags only change to 1 when there is a fault

• Make an assumption. • We compare the data to the last known value…if it’s

the same we don’t insert, instead we assume it was the same

• Unfortunately, this requires us to put additional checks and balances in place

Page 21: PHP Continuous Data Processing

21 PHP & Continuous Data Processing, PHPNW 2011

EXTRAPOLATE AND ASSUME: “INTERLATION”

Built a special library which:• Accepted a number of arrays, each

representing a collection of data points for one variable on the truck

• Used key indicators and time differences to work out if/when the truck was off, and extrapolation should stop

• For each time data was recorded, pull down data for other variables for consistency

Page 22: PHP Continuous Data Processing

22 PHP & Continuous Data Processing, PHPNW 2011

INTERLACE

* Add an array to the interlationpublic function addArray( $name, $array )

* Get the time that we first receive data in one of our arrayspublic function getFirst( $field )

* Get the time that we last received data in any of our arrayspublic function getLast( $field )

* Generate the interlaced arraypublic function generate( $keyField, $valueField )

* Beak the interlaced array down into seperate dayspublic function dayBreak( $interlationArray )

* Generate an interlaced array and fill for all timestamps within the range of _first_ to _last_

public function generateAndFill( $keyField, $valueField )

* Populate the new combined array with key fields using the common fieldpublic function populateKeysFromField( $field, $valueField=null )

http://www.michaelpeacock.co.uk/interlation-library

Page 23: PHP Continuous Data Processing

23 PHP & Continuous Data Processing, PHPNW 2011

REAL TIME INFORMATION: SINGLE REQUEST

• Currently, each piece of “live data” is loaded into a flash graph or widget, which updates every 30 seconds using an AJAX request

• The move from MySQL to Memcache reduces database load, but large number of requests still add strain to web server

• Moving to image and JavaScript widgets, which are updated from a single AJAX request

Page 24: PHP Continuous Data Processing

24 PHP & Continuous Data Processing, PHPNW 2011

LOTS OF DATA: RACE CONDITIONS

Sessions in PHP close at the end of the execution cycle• Unpredictable query times• Large number of concurrent requests per

screen

Session Locking

Completely locks out a users session, as PHP hasn’t closed the session

Page 25: PHP Continuous Data Processing

25 PHP & Continuous Data Processing, PHPNW 2011

RACE CONDITIONS: PHP & SESSIONS

session_write_close()

Added after each write to the $_SESSION array. Closes the current session.

(requires a call to session_start immediately before any further reads or writes)

Page 26: PHP Continuous Data Processing

26 PHP & Continuous Data Processing, PHPNW 2011

RACE CONDITIONS: USE A ******* TEMPLATE ENGINE

• V1 of the system mixed PHP and HTML

• You can’t re-initialise your session once output has been sent

• All new code uses a template engine, so session interaction has no bearing on output. When the template is processed and output, all database and session work has been completed long before.

Page 27: PHP Continuous Data Processing

27 PHP & Continuous Data Processing, PHPNW 2011

RACE CONDITIONS: USE A SINGLE ENTRY POINT

• Race conditions are further exacerbated by the PHP timeout values

• Certain exports, actions and processes take longer than 30 seconds, so the default execution time is longer

• Initially the project lacked a single entry point, and execution flow was muddled

• Single Entry Point makes it easier to enforce a lower time out, which is overridden by intensive controllers or models

Page 28: PHP Continuous Data Processing

28 PHP & Continuous Data Processing, PHPNW 2011

INTENSIVE QUERIES & CALCULATIONS

• How far did this vehicle travel?• Motor RPM x Various vehicle specific constants• Calculated for every RPM value held during drive process

• How much energy did the vehicle use• Battery Current x Battery Voltage x Time• For every current and voltage value combination held

during the driving process

• How well was the vehicle driven• Analysis of idle time• Harshness of accelerator and brake pedal usage• Inappropriate duration of AC / Heater on time?

• What about for a customers fleet, or all of our vehicles sold?

Page 29: PHP Continuous Data Processing

29 PHP & Continuous Data Processing, PHPNW 2011

INTENSIVE QUERIES & CALCULATIONS

Page 30: PHP Continuous Data Processing

30 PHP & Continuous Data Processing, PHPNW 2011

INTENSIVE QUERIES & CALCULATIONS

• Involves a fair number of queries per vehicle• Calculations involve holding this data in

memory• Processing required for every single record for

that piece of data during that day

Takes a while!Solution:• Calculate information overnight• Save it as a compiled report• Lookups and comparisons only need to look at

the compiled / saved reports in the database

Page 31: PHP Continuous Data Processing

31 PHP & Continuous Data Processing, PHPNW 2011

REPORTS

In addition to our calculated reports, we also need to export key bits of information to grant authorities

• Initially our PHP based export scripts held one database connection per database (~400 databases)

• Re-wrote to maintain only one connection per server, and switch the database used

• Toggles to instruct the export to only apply for 1 of the servers at a time

• Modulus magic to run multiple export scripts per server

Page 32: PHP Continuous Data Processing

32 PHP & Continuous Data Processing, PHPNW 2011

TRIGGERS AND EVENTS

Currently a work-in-progress R&D project, evaluating two options:

• Golden hammer: Use PHP• Run PHP as a daemon• http://kevin.vanzonneveld.net/techblog/article/cre

ate_daemons_in_php/

• Continually monitor for specific changes to memcache variables

• Node.js• Light weight and fast• Give PHP another friend• Link into PHP based API to run triggers

Page 33: PHP Continuous Data Processing

33 PHP & Continuous Data Processing, PHPNW 2011

THE FUTURE

• More sharding• Based on time – keep the individual tables smaller

• NoSQL?• Currently investigating NoSQL solutions as alternatives

• Rationalisation• Do we need as much data as we collect?

• Abstraction• We need to continually abstract concepts and ideas to make

on-going maintenance and expansion easier; especially in terms of mapping code to database shards

• More hardware• Expand our DB cluster, more RAM, R&D

• Design• A much needed design refresh

Page 34: PHP Continuous Data Processing

34 PHP & Continuous Data Processing, PHPNW 2011

CONCLUSIONS

• Make the solution scalable from the start• Where data collection is critical, use a message queue,

ideally hosted or “cloud based”• Hire a genius DBA to push your database engine• Make use of data caching systems to reduce strain on

the database• Calculations and post-processing should be done

during dead time and automated• Add more tools to your toolbox – PHP needs lots of

friends in these situations• Watch out for Session race conditions: where they can’t be

avoided, use session_write_close, a template engine and a single entry point

• Reduce the number of continuous AJAX calls

Page 35: PHP Continuous Data Processing

35 PHP & Continuous Data Processing, PHPNW 2011

Q & A

Michael PeacockWeb Systems Developer – Telemetry Team – Smith Electric Vehicles US [email protected]

Senior / Lead Developer, Author & [email protected] www.michaelpeacock.co.uk

@michaelpeacock

http://joind.in/3808http://www.slideshare.net/michaelpeacock

Extra information!