Download - Connecting Stuff to Azure (IoT)
Mark Simms, Principal Group Program Manager Azure Customer Advisory Team (AzureCAT)Twitter: @mabsimms
Connecting Stuff to AzureAnd all of the crazy stuff that happens along the way..
Building an end-to-end Internet of Things (IoT) experience requires careful design and architecture choicesIn this session we are going design a 1M device IoT solution, and go through the fire hose of choices.
All of these choices will be about balancing conflicting constraints regarding physics and economics.
Most of these choices are made in devices, and adapted to in the cloud.
Azure Customer Advisory Team (AzureCAT) Works with internal and external customers to build out some of the largest applications on Azure.
This presentation is an early preview of an IoT guidance series to be published by Patterns & Practices early in 2015.
This presentation will be an interactive design session.
We’re going to break down the sequence of choices that go into delivering IoT at scale in Azure
This is going to be a densely packed sequence of choices..
Note: all code samples will be pure OSS (Java, Kafka, etc)
Setting the stage
What is IoT?
Collecting information from lots of devices is cool – telematics
Merging perspectives between devices, systems and humans to build a better understanding of the world around us..
Then tying together insight with action – there lies the promise of IoT.
No really.. What is IoT?
http://en.wikipedia.org/wiki/Internet_of_Things
End to End IoT Architecture
Not going to cover
these today
RFC1149?
https://pbs.twimg.com/media/Ak5nLs2CIAA1ypg.jpghttps://twitter.com/hashtag/ipoac
In any IoT application at scale, the needs of the device outweight the needs of the service.
Devices have to be designed, manufactured, programmed, shipped and provisioned.
Devices consume – battery, heat, network bandwidth, network sockets.
Commercially viable products and services must maintain operational CoGS (cost of goods sold)
Device choices endure – sometimes for decades!
Devices (or why physics always wins)
Cost of the “oops” – cloud services
Oops. Inverted a variable assignment. Cost of fixing –
edit, commit, push, build, deploy.
Cost of the “oops” – hardware design
Oops. Put the piezo electric buzzer on a general purpose
I/O pin, not a PWM (pulse width modulation) pin.
Not enough processor cycles to handle software based
PWM / would destroy battery life.
Cost to fix: $100k, including 6 week product ship delay.
Choices – What powers the device?Option Upside Downside Common examples
Battery (primary) Device can operate in a mobile environment for extended periods of time.
Device now has a current / wattage budget (CPU cycles are not free).
Efficient and safe battery charging requires sophisticated circuitry (you won’t do it in firmware).
Mobile brains phones
Battery (secondary) Device can sustain function through transient power interrupts
Efficient and safe battery charging requires sophisticated circuitry (you won’t do it in firmware).
May have to add additional circuitry to run while charging
Laptops
Main power (primary)
Device can leverage all available computing power (barring thermal constraints)
Device functionality susceptible to interruption during power supply events
3D printer
Main power + backup
Device can leverage all available computing power (barring thermal constraints), and operate at reduced capacity during power events.
Additional power management circuitry. Need to reduce current load during loss of main power.
NEST thermostat
Choices – What connects the device to cloud services?Option Upside Downside Common examples
Ethernet Cheap, easy to install. No hard bandwidth or framing limitations.
Requires hard wired connection provided by end-user. May require additional configuration or security enhancements to route through firewalls, etc.
Industrial PLC (programmable logic controllers)
WiFi Readily available on more sophisticated microcontrollers and embedded devices.
Requires ambient WiFi network, and method of managing security keys and access (including rotation).
May require additional configuration or security enhancements to route through firewalls (commercial).
NEST thermostat.
Cellular Self-contained; plug and go. Communication heavily metered – cost of operations (CoGS) borne by service operator.
3rd party car data logger
Local (Bluetooth, Zigbee, etc)
Minimal cost and power requirements.
Short ranged, require field gateway or other “smart” edge device to proxy connections.
iBeacon
With the ubiquity of firewalls and NAT (network address translators), cloud services connecting inbound to devices is typically impractical.
If two local devices want to talk to each other, two options: Device A connects directly to device B, or vice-versa The devices communicate through a secured cloud endpoint (service assisted communication)
Whom connects to whom?
Messaging and Connectivity
LiFX lightbulbs create a mesh network between each other
One lightbulb elects as master, and proxies to WiFi router
Devices shipped from factory with a single GLOBAL PRE-SHARED KEY.
Break one device – break them all. Remediation Options:
Global firmware update. How do the devices “call home” to get firmware updates? At scale there will always be devices behind the update curve.
Don’t make any mistakes in the bootloader for in-field firmware updates. A single RMA (return material authorization) can wipe out the profit from dozens of devices.
Move to provisioned key-per-device. Need to build and manage key infrastructure. Also need to incorporate key rotation (don’t make a mistake here of the device will “bricked”).
Is there an out-of-band update mechanism (USB?). Is the end-user community amenable to handling firmware updates (industrial, technical vs. mass consumer)
Peer to peer sounds cool!
http://contextis.com/resources/blog/hacking-internet-connected-light-bulbs/
Choices – Let’s connect!Option Upside Downside
UDP • Simple; datagrams require no framing.
• Efficient on bandwidth metered links.
• Impractical to secure channel. • Need faith or out of band acknowledgement
mechanism for reliable transfer. • Cannot reliably support ordered data streams.• Challenging to implement return-channel (cloud to
device) for commands
TCP/IP • Simple; minimal code footprint for RTOS class devices.
• Can use TLS to secure channel• Bi-directional channel for notifications
and commands
• Need to handle framing on both sides of connection (or hard code avoidance of MTU limits from end to end)
• Firewall traversal is challenging
HTTP/S • Straightforward firewall traversal, use of SSL for channel encryption and signing
• Built in framing, can leverage semantic conventions (REST) to publish data
• Inefficient for Signal-to-Noise ratio of bytes on wire• Heavy device stack footprint to implement
general purpose HTTP client stack
AMQP, MQTT • Bi-directional channel for notifications and commands
• Efficient use of bandwidth (batching, efficient framing, etc)
• Firewall traversal is challenging• Client stack may not fit on smaller devices• Evolving standards and implementation levels
Choices – Let’s encode!Option Upside Downside
XML • You have more money than you know what to do with. Enjoy another mojito on your yacht.
• Extremely inefficient for both serialization/deserialization time and wire encoding.
JSON • Self-describing (“tagged”) format requiring no type identifiers. Readable by convention.
• Need to handle framing on both sides of connection (or hard code avoidance of MTU limits from end to end)
• Firewall traversal is challenging
Tagged / Untagged “standard” Binary (Protobuf, Thrift, etc)
• Highly efficient wire protocol with broad range of encoder bindings for various languages
• Can use common IDL (definition) to generate device and cloud code
• Built in support for protocol versioning
• Implementation may not be compatible with RTOS class device BSP (board support packages)
• Until you’ve lived through the mistake, you probably won’t use the versioning features.
Custom Binary (roll your own)
• You can put “wrote yet another custom protocol” on your resume
• High degree of control over bit packing, ordering, etc.
• Can support any device.. Since you wrote it for that device
• Very few implementations use code generation from a common definition (result -> divergent implementations with subtle differences)
• Rarely incorporate version management, self-describing type and version fields, rich variable support (arrays, maps, etc)
• Take on a life of their own, generating support burdens with inertia
Enough Choice Lists – Let’s Design!
Segment: Commercial or consumer?
Power: Parasitic or battery-assist? Transport: Cellular or periodic WiFi? Connection: UDP, TCP/IP, HTTP,
etc? Encoding: custom, binary, json? Workflows:
Telematics only (publish data) Latent notifications (time for a firmware update)
and commands Interactive commands (unlock the car)
Design Challenge: Automotive Telematics Interface
Building cloud services for IoT requires an understanding of the target device(s), and adapting to their needs
Green field (new) devices allow a broader range of choice – we’re at an inflection point: Many extant devices and platforms being retrofitting for IoT
connectivity Stable investments in protocols, encoding approaches Highly sophisticated system-on-chip designs emerging (relaxing
processing efficiency constraints) More ubiquitous / cheaper networking options (cellular data chips and
plans more available – efficiency for long-term CoGS still crucial)
Recap: devices drive choices, cloud follows
End to End IoT Architecture
Building a cloud gateway; responsibilities: Scalability. Connections are a metered resource in a shared environment. In Azure
each SLB (cloud service) will handle 60k-80k concurrent TCP/IP sockets. Security (authz/authn). Validate that connecting devices are allowed and trusted
to send information. Connection affinity and command routing. How to route commands and
notifications from other devices and entities down to specific devices. Protocol / encoding translation. Sparse/packaged network formats may not be
optimal for hot-path (workflow processing, stream analytics) and cold-path (bulk analytics). May need to convert older protocols into canonical formats.
Routing. Enrich incoming data streams with context (per-device, per-system), and route messages to the appropriate downstream consumers.
System telemetry. How many devices are connecting, status, resources, errors. Load shedding / shock absorption. What happens when everybody decides to
reconnect all at once (inrush effect)?
Now – connecting stuff to Azure!
What destination resource is encoded on the device? Hard coded IP address, factory-set? Hard coded IP address, set during provisioning (where does the provisioning information come from?)
Host name, dynamic lookup via DNS (can I fit a DNS stack on the device)?
What are pros and cons of each approach?
Not so obvious learning moments
Imagine an embedded linux device, periodically publishing telemetry data (files) to a remote service. Using crontab:
What’s the problem with this approach?
Everybody.. Go!
0,15,30,45 * * * * /usr/local/bin/publish_data
Microsoft Confidential.
Delivering on CollectionChallenges and Physics
Synchronized Interval• Devices publishing data at fixed interval and offset (e.g.
every 15 minutes, on the quarter hour – 12:00:00, 12:15:00, etc)
• No guarantee of precise clock synchronization in a highly distributed system
Unsynchronized Interval• Devices publishing data at fixed interval• Start on device or application activation• No guarantee of precise clock synchronization in a highly
distributed system
1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 100
1020304050607080
This is wasted money (unless you can auto scale in the troughs)
Azure Software Load Balancer kills inactive connections after 4 minutes
To increase timeout value (max value 30 minutes)
Managing Socket Lifetime
Set-AzurePublicIP –PublicIPName webip –VM MyVM -IdleTimeoutInMinutes 15http://azure.microsoft.com/blog/2014/08/14/new-configurable-idle-timeout-for-azure-load-balancer/
Then you’ll need to “nudge” the socket with some data.
What about TcpKeepAlive?
But I want at least an hour!
A really simple cloud gateway in Java with Jetty and Kafka
End to End IoT Architecture
Decouple incoming data streams from consumers. Static topologies are brittle, and challenging to extend.
Handle inrush shock, transient interrrupts Enable “time travel” – downstream
consumers can go back in time to begin reading data
Rate conversion; can handle offsets between rate of production and rate of consumption
Why an Event Broker?
Qualities of an event broker for IoT: Partitioned, append-only journal with tunable consistency Ingest 100k’s -> 1M+ events / second, with sub-second e2e latency Standards-based wire protocol Client cursors for externalized state management; no resource
contention between readers Tunable retention policies
Azure options for an event broker: Event Hubs (PaaS service in public preview -
http://azure.microsoft.com/en-us/services/event-hubs/) Kafka on Linux (IaaS)
Event Broker
React to the incoming message stream, apply business logic + enduring state Retrieve next message. Determine (a) associated device state, and (b)
processing logic Execute processing logic, update durable state Trigger any additional actions as a result of the processing logic (raise
an alert, start another workflow, etc)
Workflow processing
Common aspects: Most real-world IoT solutions have hundreds of message types and
handlers (including multiple versions) Common contention points are the message dispatcher (retrieving
logic) and state management. Relational databases are generally ill-suited state stores for IoT
(everyone sharing the same transaction log)
Workflow processing
Not all workloads are created equal
Pareto curve in message processing – can focus optimization efforts on a small number of message types.
Focus on optimizing
these message handlers
These, probably not.
A really simple message processor in C# with Event Hub
When designing an IoT application, the needs of the device drive engineering choices (physics always wins)
Need to carefully design cloud services to meet devices, and handle scale / availability / compatibility
This stuff is hard at scale. We are committed to making it easier. Azure Intelligent Systems Service (
http://www.microsoft.com/windowsembedded/en-us/intelligent-systems-service.aspx) Patterns and Practices – IoT Guidance (coming early 2015)
More details and context on building Azure cloud services at scale Building Big; Lessons Learned from Azure Customers (http
://channel9.msdn.com/Events/Build/2014/3-633) Connecting the World: Building Services for Connected Devices (
http://channel9.msdn.com/Events/Build/2014/3-634)
Takeaways
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.