building smart iot devices with automl - sensiml...building ai on extreme edge devices (iot smart...
Post on 08-Jul-2020
6 Views
Preview:
TRANSCRIPT
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design Rev. 1.11 – February 7, 2020
SensiML Toolkit © Copyright 2020 by SensiML Corp.
Please visit the SensiML website (https://www.sensiml.com) for more information.
No part of this manual may be photocopied or reproduced in any form without prior written consent from SensiML Corp. SensiML
and the SensiML logo are trademarks of SensiML. Other product or brand names are trademarks or registered trademarks of their
respective holders
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 2
Table of Contents
AI Moves to the Extreme IoT Edge .................................................................................. 5
A Smarter AI Pipeline ................................................................................................................................................ 6
Smart Edge AI Tools .................................................................................................................................................. 6
About this Guide ................................................................................................................ 9
How the Smart Edge AI Approach Works ..................................................................... 10
Smart Sensors ........................................................................................................................................................... 10
AutoML ........................................................................................................................................................................ 10
Data-Based Training ............................................................................................................................................... 11
Data Science Engine ............................................................................................................................................... 11
Good Data Collection ............................................................................................................................................. 11
Optimized Coding ................................................................................................................................................... 13
The Key Stages of Smart Edge AI Process .................................................................... 14
Model/Hypothesis Development ...................................................................................................................... 15
IoT Device Prototype (Physical Design Considerations) ........................................................................... 16
Sensor Selection ....................................................................................................................................................... 16
Sensor Configuration ............................................................................................................................................. 16
Sensor Data Collection .......................................................................................................................................... 17
Data Labeling ............................................................................................................................................................ 17
ML Algorithm Development ................................................................................................................................ 17
Optimized Endpoint Code.................................................................................................................................... 18
Local IoT Device Insight (Test/Validation of Local IoT Model) ............................................................... 18
Developing Your Application Model ............................................................................. 18
Constructing Your Hypothesis ............................................................................................................................ 19
Defining Your Insights ........................................................................................................................................... 19
Classifier Model/Class Mapping ........................................................................................................................ 20
Prototype IoT Device ....................................................................................................... 23
Sensor Selection ............................................................................................................... 23
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 3
Types of Sensors ...................................................................................................................................................... 23
Virtual Sensors .......................................................................................................................................................... 24
Sensor Interfacing ................................................................................................................................................... 24
Physical Sensor Placement ................................................................................................................................... 25
Sensor Configuration ...................................................................................................... 26
Analog Noise Suppression ................................................................................................................................... 26
Signal Conditioning ................................................................................................................................................ 28
Sampling Rate and Recording Length ............................................................................................................. 28
Sensor Data Collection .................................................................................................... 29
Understanding Data Inputs ................................................................................................................................. 30
Sources of Variance ................................................................................................................................................ 31
False Positives ........................................................................................................................................................... 33
Population Diversity ............................................................................................................................................... 33
Subject Sample Size and Dataset Sufficiency ............................................................................................... 34
Phasing Data Collection ........................................................................................................................................ 36
Documenting Methodology ................................................................................................................................ 36
Data Labeling ................................................................................................................... 36
Enumerating Relevant Metadata Annotation ............................................................................................... 36
Defining Data Labeling Methodology ............................................................................................................. 37
ML Algorithm Development ........................................................................................... 37
Defining Model Appropriateness ...................................................................................................................... 38
Accuracy ...................................................................................................................................................................... 39
Specificity .................................................................................................................................................................... 39
Sensitivity .................................................................................................................................................................... 39
Precision ...................................................................................................................................................................... 40
F1 Score ....................................................................................................................................................................... 40
Performance Measures for Multi-class Datasets ......................................................................................... 40
Confusion Matrices ................................................................................................................................................. 40
Overfitting/Underfitting ........................................................................................................................................ 41
Data Splitting: Train versus Test Data .............................................................................................................. 43
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 4
Interpreting ML Performance ............................................................................................................................. 43
Converting an Algorithm to Optimized Endpoint Code ............................................. 44
Test/Validation of Local IoT Device Insight.................................................................. 45
Sample Bias ................................................................................................................................................................ 45
Lifelong Learning and Iterative Model Updates .......................................................................................... 46
Conclusion ........................................................................................................................ 47
Appendix – Smart Edge AI Test Plan Template (Example) ......................................... 48
References ........................................................................................................................ 50
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 5
AI Moves to the Extreme IoT Edge
AI holds great promise for IoT device developers seeking to build intelligence into network edge
embedded sensing products. But until just recently it has remained beyond reach for most
development teams. AI development tools until now have been primarily offered as cloud based
solutions because of two key challenges: First, a lack of low-cost hardware capable of running
complex AI processing on the embedded edge device, and second, the complexity and know-
how required to implement AI algorithms on IoT devices within existing development tools has
been prohibitive.
Thus, a centralized cloud-based “big data” approach persists, wherein sensors are largely dumb
devices with little or no local data processing capabilities. Large volumes of raw sensor data are
sent via (hopefully) high-bandwidth networks to cloud-based systems for processing by server
executed algorithms. These cloud centric systems use traditional AI frameworks such as Google
TensorFlow, Caffee, Apache Spark, and others to generate and manage AI insights for
applications involving these originating sensor devices. These systems required complex manual
interaction and data science expertise to execute. They also result in centralized cloud
applications that face inherent network issues of latency and bandwidth demands that challenge
large deployments of connected endpoint device networks. The net result is many missed
opportunities to realize the true benefits of connected intelligent IoT applications. Beyond a few
high-volume IoT products where the investment in time and effort to hand-code algorithms can
be rationalized, most applications make do with more limited sensor insights and/or the time
and expense of shipping raw data for centralized processing.
Enter the newest generation of AI development tools designed specifically for IoT developers.
Such tools enable the implementation of learning AI and more sophisticated sensor algorithms
running directly on edge embedded sensing devices. Some of these tools also automated the
process to a large extent to allow use by developers without extensive data science and
algorithm coding firmware expertise. In other words, the same benefits AI-based algorithms
have brought to cloud-centric big data analytics are now possible to implement directly on the
originating IoT sensing node. For the first time, IoT developers can get real-time responsiveness,
adaptive smart devices, network efficiency and resiliency, and security and data privacy that
comes with localized data processing domains.
Furthering this breakthrough, recent advancements in embedded hardware and AI algorithm
automation tools offer a new frontier for competitive differentiation of IoT devices. The
combination of these new machine learning (ML) advancements with low cost sensors and
microcontrollers empowers IoT developers with more modest budgets and team sizes to create
their own ”smart sensors” quickly and easily. Now IoT domain experts can create complex ML
algorithms by simply training the ML algorithm with datasets to define the insights they want.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 6
A Smarter AI Pipeline
The AI pipeline used to generate insightful algorithms without explicit coding refers to the entire
process for teaching a device using real-world data. This includes data input through to the
algorithm execution for insight output itself. Gaining such insight at the IoT device traditionally
required writing algorithms by hand to fit such devices. The notion of running cloud-based deep
learning algorithms on a microcontroller seemed a bridge too far and thus edge IoT was limited
to those things that could be implemented by skilled teams hand-tuning specific application
code for such devices.
Building AI on extreme edge devices (IoT smart home, wearables, industrial IoT sensor nodes,
remote 5G sensors to name a few) using this newest generation of tools allows the entire
algorithm development process to be dramatically streamlined and simplified. The “magic” of
this approach comes ironically from the application of AI into the very process of creating AI
itself by automating data science expertise and its associated manual coding. Such automated
machine learning workflows are known appropriately as AutoML. Combining this with the
unique ability to generate code specifically for the smallest footprint edge devices (the extreme
edge let’s say), yields new development tools for IoT OEMs we refer to as Smart Edge AI Tools.
The hardware enablers for this new Smart Edge AI Tools approach for IoT devices are the myriad
low-cost, high-performance microcontrollers that enable low-cost sensors to be applied to
millions of applications and locations previously impractical to monitor. This revolution is a
classic example of Moore’s Law of hardware advancing faster than software. While hardware has
been around several years capable of AI at the edge, it is only very recently that AI development
software has caught up for harnessing the new capabilities of IoT hardware. Now Smart Edge AI
tools are transforming IoT endpoints from merely dumb data collector nodes feeding centrally
processed cloud AI into contributors of distributed network analytics with the collective power
of many truly smart IoT sensor nodes. Local sensors at the endpoints can drive new useful
decisions in real-time from local processing of sensor data instead of cloud-based central
processing.
Smart Edge AI Tools
At the heart of this new data driven AI approach is a reduction of human intervention, skill sets,
and long lead times needed for developing ML algorithms using traditional hand-coding.
AutoML algorithm development often can deliver equivalent or better results as systems
involving human intervention by data scientists utilizing conventional coding methods. Such
tools automatically traverse the hundreds of thousands of modeling options to converge on
solutions that meet or exceed defined constraints set by the user.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 7
Figure 1 – Traditional coding process vs. the new Smart Edge AI approach
The best of the Smart Edge AI tools incorporate automation across multiple aspects of the
endpoint AI algorithm pipeline to reduce the manual stages needed for data collection to model
cost/performance evaluation. The more advanced tools automate data collection, cleansing,
feature selection, labeling, event detection, classifier algorithm selection, hyperparameter
optimization, and tuning. As you evaluate tools for AI model creation, you should consider how
much of the process a given Smart Edge AI tool covers and what remains for the user to
perform themselves.
SensiML Toolkit includes industry-leading levels of AutoML automation across the
endpoint sensor AI algorithm generation pipeline. Automation includes data collection
and multi-user aggregation of samples, metadata and label annotation, repetitive
segment label prediction, segmenter algorithm creation, feature engineering, classifier
algorithm selection, classifier hyperparameter tuning, and test/validation stages.
A Real-World Example…
To illustrate the power of distributed analytics using Smart Edge AI Tools versus centralized
cloud sensor processing, let’s consider an example application for intelligent agricultural
livestock monitoring. Similar benefits can be applied across virtually all connected IoT
applications, but we’ll use this one to illustrate the typical differences.
In this example application we consider the benefits of deploying smart animal wearable devices
that can be affixed to each cow in a rancher or farmer’s herd to continuously monitor for
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 8
notable health indicators important to the farmer. Our example herd of cattle are fortunate as
they are free-range, grass-fed cattle allowed to roam the pasture rather than being confined to
feed pens. While this promotes herd health and happiness, it presents challenges in that the
cattle are remote and distributed across a large area.
Each monitored animal in our example smart farming application is equipped with an array of
sensors integrated into a compact worn device that can measure animal temperature, motion in
six axes (X,Y,Z acceleration and rotation using a MEMS IMU sensor), audio (using a digital
microphone), humidity, environmental temperature and humidity, and location (using GPS). This
allows the farmhands to potentially monitor the herd constantly and attend to sickness or needs
quickly.
Now let us know consider two means for implementing this application: A centralized or ‘edge’
gateway processing approach and a truly distributed approach using Smart Edge AI Tool
generated algorithms.
The centralized sensor analytics approach requires a constant streaming of updated sensor data
to be sent wirelessly to a processing node that might be in the cloud or might reside on a
centralized gateway or server on the farm. Either way, the application developer must choose
update frequency, data fidelity, and sensor breadth based on the realities of network bandwidth
and coverage, and volume of raw data needed for the analysis. For centralized analytics, this
may mean that the only data continuously monitored are temperature, humidity, and GPS fix as
these are each small datastreams that can be carried over limited cellular networks in remote
areas.
With this compromise, the developer may only be able to offer infrequent sampling of more
complex audio and motion data when the cows are brought into the milking parlor and Wi-Fi
coverage. Alternatively, the centralized analytics solution may demand many wireless access
points be deployed across the farm (at high cost) to provide the level of performance across the
ranch for continuous monitoring of all sensor data.
No matter the network topology, the transmission of continuous complex datastreams means
much higher power consumption as radio transmit power can reach 100mw and total power
consumed by the radio several times as much. Compare this to processing where low-power
microcontrollers consume on the order of 100uA/MHz and you begin to recognize the power
saving benefits of local processing and small datastream transmission versus small processing
and large datastream transmission. If the farmer has to make the rounds to replace smart
sensor batteries every month, much of the labor-saving benefit of remote monitoring from an
IoT device is negated.
Now consider the distributed analytics approach. With the ability to practically deploy
algorithms for complex pattern recognition, the smart livestock developer can now offer a
practical solution for gait monitoring, health assessment from motion and animal utterances,
and estrus monitoring (when cows are in heat for mating) based on complex waveforms of
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 9
motion and sound impractical to continuously transmit and much less valuable if only
intermittently polled by gateway/server edge processing. Because processing can be done
locally at several orders of magnitude less power consumption than high bandwidth transmitted
raw data, battery life is substantially improved as well. Taken together, the benefits of Smart
Edge AI are substantial and those who implement such distributed IoT analytics will find
themselves possessing much more competitive solutions than centralized alternative offerings.
About this Guide
This guide is for IoT device developers who recognize the benefits of distributed smart IoT
device analytics and are looking to harness this advantage. Doing so quickly and efficiently
means building algorithms capable of running at the endpoint sensor node. For most, this
means looking to the new Smart Edge AI tools and the data driven process of training
algorithms versus writing tops-down code. As both tools and process are unique and new, this
guide walks through the various considerations and seeks to arm the reader with the knowledge
to be successful in this new AI based development methodology. It provides IoT developers
with real world, practical advice for implementing this new approach from sensor selection and
data capture to generating local insights.
The objective of this guide is to teach you how to harness the new Smart Edge AI approach for
planning, designing, and executing your own intelligent sensor products. It focuses on helping
you understand the best methodologies to generate optimal results from this new data driven
AI process. The key to success in this new approach is planning out your implementation
process upfront to ensure the desired results and avoid the common pitfalls of bad training data
that leads to poor performing algorithms.
This guide takes you through the stages needed for mastering this data-driven process to
capture, label, organize, and analyze data so that it can be used to gain insights at the endpoint
device. We walk you through the key upfront considerations that go into building high quality,
accurate, and efficient ML algorithms, including the general principles of ML model design, data
collection methodology and implementing ML based algorithms for embedded IoT sensing
devices.
This guide is brought to you by SensiML, a leading provider of Smart Edge AI developer tools for rapidly creating embedded endpoint AI algorithms. Throughout the guide we will highlight specific benefits of the SensiML Toolkit in boxed text like this, but this guide covers general principles of design, planning and implementing AI based algorithms at the extreme edge regardless of your choice of software tools.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 10
How the Smart Edge AI Approach Works
The Smart Edge AI process enables an intelligent IoT sensor to take raw physical sensor data and
transform it in real time to provide local insights directly at the endpoint. These insights
generated at the IoT device also enable you create a foundation for hierarchical data analytics
by providing the front-end processing of physical sensor inputs feeding network
communications and higher-level, cloud-based processing. By unburdening the network and
downstream computing resources from the real-time signals processing effort, you can provide
an optimal distributed AI system with far less network throughput required, much lower latency,
processing autonomy at the various stages of analysis, and the ability to act in real-time to
critical sensor events. Providing such an automated means for intelligent endpoint coding,
distributed analytics becomes much more practical for the “long tail” of specialized sensor
applications across industrial and consumer IoT sectors.
Smart Sensors
An intelligent IoT sensor device (like a fitness wearable, an industrial smart sensor, a smart pet
collar, or elderly fall detection wristband, etc.) includes one or more integrated physical sensors,
a microcontroller, and a means of communicating information to other parts of the system. The
physical sensors combined with intelligent sensor processing algorithms running in some
portion of the microcontroller constitute a smart sensor device. The algorithm code runs in the
microcontroller taking physical sensor data and converting it to specific insights right on the IoT
device.
AutoML
The data driven approach to building ML algorithms uses a wide range of statistical modeling,
machine learning, or deep learning methods that are automatically chosen by the ML generator
to best match your specific sensor data. AutoML does the algorithm building for you based on
you providing training data from your IoT device sensors. Free from the complexity of manually
building ML algorithms, you no longer need to focus on manual ML algorithm generation. The
process of deploying an endpoint AI solution for this new process is one of interaction between
sensor pre-processing choices, training/testing data collection and labeling processes to
generate accurate ML algorithms based on these inputs. In simpler terms, AutoML converts the
algorithm development process of codifying algorithm expertise for a particular device into one
of ”teaching by example” and then using those example data to train the AutoML tool to code-
gen the right algorithm on your behalf.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 11
Data-Based Training
ML uses statistical techniques to enable programs to learn through training, rather than being
programmed with rules or explicit programming in a language like C or C++. Such
programming rules are not only technically challenging to construct, but also get increasingly
”brittle” as more and more code is added to address the inevitable corner case conditions
typical in real-world sensor algorithms.
The Smart Edge AI tools process training data to provide prediction of future data based on
established models that were tuned to “ground-truth” data during training that can be
generalized for new as-yet-unseen instances. Corner cases for ML are handled simply as
additional data points used to cover the expected range of variability expected. While the model
still grows in complexity as the dataset demands additional dimensions, branches, or neurons to
characterize the variance observed, the underlying model remains the same.
ML enables devices to contextualize their immediate environments far better using data such as
vision, sound, heat, and vibration. ML systems can process training data over time to
progressively improve performance on a task, providing results that improve with experience.
Once an ML system is trained, it can analyze new data and categorize it in the context of the
training data. This is known as inference and in the case of Smart Edge AI, can be performed
locally on the device where in many cases the output decisions matter most.1
The SensiML Toolkit software, using local inferencing, allows developers to create their
own application specific virtual sensors where most of the raw physical sensor pre-
processing occurs on the virtual sensor device Microcontroller Unit (MCU) and what gets
shipped to the cloud is the classification data they actually care about.
Data Science Engine
Building data science decision making and processing techniques into AI development tools is
the difference between AutoML tools versus machine learning frameworks that provide the
algorithms but still rely upon the expertise of the user to correctly apply them. This data science
expertise built into the software takes away the need for human data science and coding
expertise out of the workflow and shifts it from IP embodiment in logic-based algorithms to
application knowledge embodied in datasets themselves. Instead the system continues to
develop intelligence as the datasets grow over time and benefit from automated and iterative AI
approaches to embedded code generation.
Good Data Collection
As you might guess, shifting the challenge from understanding ML techniques and algorithms to
presenting AutoML tools with representative train/test datasets makes data collection the critical
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 12
effort for these tools. Sensor data collection and labeling is about providing better training sets
to your ML algorithms. Using the data driven approach means you need to place high emphasis
on pre-planning for what matters in your application and how to properly collect the data to
support it. You collect actual real-world data as would normally be seen by the application
sensor(s).
One of the first questions to ask and resolve in data collection planning is the feasibility for
collecting and labeling the desired model data in the first place. In most cases it’s possible to
establish conditions to capture examples of each of the desired states or inputs that the smart
device should be capable of recognizing. Where the application supports this, data collection
becomes a process of capturing a statistically significant number of labeled training data
examples for each state of interest. An example would be capturing multiple instances of
different hand gestures to be recognized from motion data of a wearable device. For this
application, it’s relatively easy to perform each such gesture a multitude of times, with examples
from different users, left-handed people, right-handed people, and construct the desired
training dataset.
In other instances, real world data collection for each desired insight state may NOT be practical.
Think about re-creating all of the mechanical failure modes for an expensive piece of factory
machinery. In this example, the purpose of the smart device is often to create predictive
maintenance algorithms that can warn in advance of such failures given their cost to repair and
downtime to machinery. Thus, it would be of little value to induce such fault states purposefully
just for the sake of collecting example training data for a preventative smart device.
In such cases where real-world data collection is not practical, other techniques can be
employed to overcome this obstacle. In some cases, you can create simulation data to
approximate a fault state. In others, it may be preferable to simplify the initial model to normal
or expected behavior versus anomalous behavior. In this case, the expectation is that additional
data will be collected in use which can be used to further train the algorithm when such faults
are observed in normal use. From the outset, the model can provide anomaly detection to
minimally inform when machine behavior is out of expectation.
Regardless of which above method is chosen, haphazard data collection will undermine ML
algorithm development time and quality just as poor programming can add time spent in QA
and debug sorting through logical errors and software bugs. The traditional AI adage that more
data is better in ML should really be stated as more good data is better than volumes of
unfiltered raw data. Faulty data, incomplete data, and poorly labeled data injected into training
datasets leads poor results. Faulty and poor annotated data can also be hard to rectify after the
fact without proper means to discriminate the faulty data from quality data that preceded it. The
time to think carefully about this is BEFORE data collection begins not afterwards. The price to
be paid can otherwise often be the need to recollect data again from scratch if the test
approach, metadata collection, and labeling cannot be salvaged.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 13
Managing data collection and labeling for ML at the endpoint requires planned, disciplined
methodology as the basis for building optimized algorithms. Without a solid upfront design and
plan, your team can end up spending a large fraction of their time fixing preventable data
quality issues as they surmount this learning curve. Often such issues are discovered late in the
schedule after the initial dataset leads to poor model performance, extended timelines, added
project risk, and disillusionment for AI based approaches.
Over time datasets can be extended to cover a broader population, more classification types,
and more negative cases. It’s just as important as what sensor patterns should trigger a
meaningful event as are what patterns of sensor data should not trigger an event. For example,
a glass breakage sensor for a burglar alarm system should certainly be sensitive to detect a true
window breakage, but should be discriminant enough to disregard car noises, kids playing,
people coughing or sneezing, and myriad other sounds that might trigger a false alarm in a
poorly constructed model. Often, developers will overlook the importance of negative testing in
this way, so be sure to give thought about your model sensitivity and its ability to reject false
positives by providing sufficient negative test data.
Optimized Coding
This step in the ML pipeline involves transforming the algorithm into code that can be run
optimally on your target hardware. Because of the inherent limitations of storage and low power
sensor microcontrollers, converting an algorithm into a power efficient code for the smart
sensor requires optimization. Using a Smart Edge AI tool will create the right implementation of
embedded code delivery and assurance automatically. The process of converting an ML
algorithm into efficient embedded executable code utilizes feature and classifier libraries already
hand-tuned for low-power resource constrained microcontrollers and can instantly generate
corresponding library or binary code formats for your chosen ML algorithm and target platform.
Even the best staffed data science development teams can struggle or waste time pursuing
algorithms that may work well but are not appropriate for the level of computing resources
available on endpoint microcontrollers and IoT devices. This is a common pitfall among users
who immediately assume they need to employ deep learning or artificial neural networks
(ANNs) as their algorithm of choice simply because this is considered state of the art rather than
contemplating suitability for their intended usage. ANNs are very powerful tools indeed and for
many applications like processing of image data for object recognition are the only viable
method. But a great many other applications can be better served by using feature pre-
processing and classic machine learning approaches at far less computational cost.
The proper fitting of ML algorithm to application can mean the difference between an IoT
device that is overpriced to its target market because a complex routine required an application
processor rather than a microcontroller. Or a mobile IoT device that has substandard battery life
because it demands excessive computational power and/or memory access versus a more suited
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 14
algorithm. The beauty of AutoML tools with rich libraries of ML algorithms and hardware specific
profiling capability is the ability to choose the BEST algorithm for a given task.
The SensiML Toolkit software supports a broad array of ML algorithms spanning simple
binary classifiers, to decision trees, hierarchical and ensemble models, regression models,
and neural networks.
The Key Stages of Smart Edge AI Process
The building of an endpoint AI solution for your IoT device is a process that includes stages
shown in Figure 2 below. Understanding this workflow enables you to define and execute your
sensor data collection to produce optimal training data for your IoT application requirements.
This diagram also shows how these stages fit into the broader scope of distributed cloud and
edge AI and analytics for a connected IoT device network application.
The overall process shown in the diagram describes your specific application model as built on a
data driven ML approach. You define how various sensor data and metadata can be collected,
labeled and curated to form your custom predictive model.
We will describe below each of these steps in brief and then expand upon them later in the
document along with key insights and pitfalls throughout.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 15
Figure 2 - The stages of the Smart Edge AI pipeline
Model/Hypothesis Development
The overarching aspect of developing a quality application model is that it requires a solid
understanding of the cause and effect aspects of the conditions you are attempting to detect
and the sensor data that can reliably and practically be used to inference those states. A domain
expert typically has the intuitive understanding of a given application to formulate a good
working hypothesis prior to collecting and analyzing a great deal of data for correlation, making
them an invaluable asset in the process upfront and throughout the process. Armed with such
insight, the domain expert can then put their hypothesis to test by collecting a small initial
dataset, confirming the hypothesis, refining or reformulating the test collection process as
required based on this initial collection. Only then does it usually make sense to expend the
effort collecting a larger dataset to make the model robust for corner cases prior to putting the
code into executable form. The dataset is run through data science algorithm optimization and
search to arrive at a working algorithm manifested in code on a device.
Thus, think of your application model as an experiment. For a data driven ML model, the
application starts as a hypothesis for how various sensor data and metadata can be collected
and labeled for building a predictive model. Hypothesis is the formation of a working theory for
how available physical sensors can be used to determine a desired set of classes for a given
application and is then confirmed incrementally to reduce project risk. As part of your model
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 16
development process, you will need to define roles between domain expert(s), data collection
personnel, and data managers to execute these experiments as efficiently as possible.
IoT Device Prototype (Physical Design Considerations)
Constructing a prototype IoT device is literally building a physical prototype device that can be
used during development to collect sensor data in the intended application. As such, it should
as closely as possible approximate the intended final product from the sensor standpoint so that
the data collected during development will not differ appreciably from that from the final
product. Devising the data collection plan for development is a concurrent task that falls within
this prototyping phase of development. Careful thought should be given to maintaining fidelity
of sensor data as it will be in the end product even if form factor, processor, or interconnects in
the prototype may differ. If interconnects will involve analog interfacing of raw sensors to the
prototype, ensure that differences in sampling rate, noise, and bit depth will not corrupt your
training data for the final product. In short, at the same time you’re formulating your design of
the physical product and prototypes you should also be formulating your plan for data
collection and labeling for the intended application and AI algorithm.
Sensor Selection
Sensor selection is about choosing the right type, placement, and mounting of physical sensors
for your application. Most applications involve physical processes that can be measured either
directly or indirectly by one or more means. For instance, an application to detect failing motor
bearings might choose to measure bearing temperature using a thermocouple or IR sensor,
bearing noise from a microphone, bearing vibration from a piezo sensor or MEMS
accelerometer, bearing drag from torque sensors or motor current, or some combination of
these. Each can be correlated with bearing wear but will have different response characteristics,
signal-to-noise ratios, and other limiting constraints that lend one approach to be selected over
others for a given application.
Sensor Configuration
Sensor configuration encompasses a number of factors aimed at maximizing the signal inputs
into the ML algorithm from the selected physical sensor(s) to ensure optimal performance. This
step includes considerations such as sample rate, signal gain or amplification, noise suppression,
filtering or signal conditioning, and analog to digital conversion. Even the best sensor can easily
be undermined by improper sensor configuration and signal conditioning leading to a poor
signal-to-noise ratio feeding the downstream ML processing.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 17
Sensor Data Collection
Sensor data collection concerns the process of capturing and logging actual sensor data
samples to be used as the training data for ML algorithm development. Beyond just recording
arbitrary data from sensors, when done well, the data collection phase seeks to minimize the
effects of any undesired or unknown effects that might introduce variability into the model and
subsequently lower the predictive performance of the ML algorithm.
SensiML Data Capture Lab provides a means to capture and label datasets accurately
whether by a single user or a large team of data collection and test technicians are
involved. By streamlining annotation for labels and project metadata based on predefined
custom fields, large scale collections can be done conveniently without custom scripting,
field notations, separate spreadsheets, or file conversion headaches.
Data Labeling
Data labeling often, but not always, is combined with sensor data collection. It is the supervised
piece of supervised machine learning in that labels provide examples of input sensor data and
output classification results used to train the ML model. Therefore, it becomes obvious hat
labeling is a critical step in the process and can lead to either good model performance or poor
performance in the same way a bad teacher can misguide a student and undermine their
learning with wrong examples. The actual process of labeling may be split from data collection
based on what makes sense for a given application. Some labels are readily apparent and can be
annotated easily by anyone. Others require expert insight and may not be performed by those
doing bulk subject testing and data collection work.
ML Algorithm Development
The Smart Edge AI approach to ML algorithm development uses training and test data to
generate the desired results. This automation of data science and ML processes is performed in
the cloud during the development phase to generate the algorithm that you will embed in your
IoT devices for local inferencing at runtime. The distinction of cloud versus edge processing is
that cloud computing is used only during the model selection and optimization process to
automate the work of the data scientist using AutoML tools that refine the algorithm to fit into
the capabilities of the endpoint device itself.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 18
SensiML Toolkit provides automation for the full spectrum of Smart AI functions to automate the process of ML algorithm generation. With over 80 feature extractors and a dozen different ML classifiers tuned for low-power endpoint IoT microprocessors, SensiML Toolkit can provide you with candidate algorithms when supplied with basic performance and constraint parameters.
Optimized Endpoint Code
Once you have generated the optimal algorithm for your IoT device insights, you need
functional code that you can load on to you low-power embedded IoT endpoint. Here the ML
algorithm is implemented in embedded code and then flashed onto the target IoT device to
generate classification results to be tested in real-world settings for accuracy. At this stage, the
model can either be deemed acceptable or additional test data added to the data collection
phase and the process iterated based on this new data. The challenge is to maintain model
fidelity such that the hard work in the model optimization and selection is not compromised by
simplifications made in the name of power and resource reductions. Using the Smart Edge AI
tool will create the right implementation of embedded code delivery and assurance
automatically.
The process of converting an ML algorithm into efficient embedded executable code can
be non-trivial. SensiML Toolkit utilizes feature and classifier libraries already hand-tuned
for low-power resource constrained microcontrollers and can instantly generate
corresponding library or binary code formats for your chosen ML algorithm and target
platform.
Local IoT Device Insight (Test/Validation of Local IoT Model)
Once created and loaded onto the IoT device itself, the remaining step in the workflow is to
subject the device to new validation testing to create data that was not used for the train/test
phase. This important step ensures the model generated is generalized and not overfit to simply
provide good results when presented with the same data as used to construct the model in the
first place. In some cases, the data collected for model validation may suggest a need for further
data collection and can be contributed back to the train/test dataset and the process is
repeated.
Developing Your Application Model
As previously mentioned, a model is the combination of the initial hypothesis put to practice
through collection of data that is labeled and then processed through a feasible working ML
classifier. This model building process done upfront is critical for an effective AI implementation
for sensors at your endpoint IoT device. It’s common for data collection projects start out with
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 19
too little attention spent on defining the insight objectives up front. The rationale for this
shortcut is the idea that capturing a mountain of data will sort it out in the end. This “more is
better” approach to data collection is reinforced by confusion between supervised ML as needed
for most smart sensor applications, and the branch of AI used for data mining or ”big data”
applications. Big Data uses unsupervised ML techniques like cluster analysis to discover useful,
previously unknown associations buried in large datasets.
When problems arise in developing data-driven edge IoT inferencing algorithms, often the issue
was a failure to take into account various common issues in the upfront design of the
application. It can be expensive and frustrating to discover a fatal flaw in the collection process
only after significant data collection has already been completed. To reduce this risk, develop a
well-planned application model with effective data collection protocols and an iterative process
for data collection. This section walks through key upfront assumptions and considerations to
help ensure you think through the various factors in your application model.
Constructing Your Hypothesis
The hypothesis is put to practice through the collection of data that is labeled then processed
through a feasible ML classifier. It comes down to the datasets you are using to train your
algorithm. This is why planning your data collection and labeling is essential to data driven ML
algorithm development. The hypothesis defines the problems or events you want the algorithm
to act on. It also defines the datasets you need to match the answers to the problems identified.
These are ground truth events based on real world empirical evidence from direct observation.
Hypothesis is the formation of a working theory for how available physical sensors can be used
to determine a desired set of classes for a given application (a good hypothesis would be using
accelerometer and gyro sensors on a user’s wrist to detect a given tennis swing, a bad
hypothesis would be using a temperature and pressure sensor to determine which class of
keyword is being spoken to a home smart hub). You’re relying on the domain expert to have a
sufficient understanding of the application to have a good assumption of what sensor data can
be used with an ML classifier and pre-processing to arrive at an accurate model. That’s the
hypothesis part. The proof-of-concept (limited data collection) is used to refute/confirm this and
then made robust with much more data typically.
Defining Your Insights
Insights are where one or more input sensors and/or contextual inputs are used to predict a
discrete state or class label (e.g. for a predictive maintenance demo this might be: ‘machine
normal, ‘faulty bearing’, ‘imbalanced load’). As you consider your set of desired insights, it’s
useful to think not only of those items of immediate interest but also about future insights of
interest that might be far easier to include in data collection from the outset than to recollect
from scratch at a later phase. By listing out all potential areas of interest upfront, you improve
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 20
your odds for optimizing data collection and development speed over the long run. The
predicted quantities of immediate interest, it is also worth considering desired future insights
where you might be able to start capturing valuable data now even though these model insights
might not be utilized until a subsequent product release.
Insights generated from ML predictive models can either take the form of continuous or discrete
values.
• Continuous events are periodic events requiring continuous classification, such as for
predictive maintenance function like what is the motor status is in a normal, warning or
failure state, or in a fitness application that detects user activity such as running, walking
or resting. Continuous events involve predictive regression models.
• Discrete events have trigger event actions. Classification occurs after a trigger, such as
in a wearable application with different types of gestures. Discrete values are commonly
known as predictive classifier models.
Classifier Model/Class Mapping
Your training data must contain the correct labeled answers or classes (ground truth) for the
example input data you provide as the basis of your model. Classification is the process of
predicting the class for the given data points. Classes are sometimes called targets, labels or
categories.
SensiML Toolkit uses the term event to refer to a class label as classification of time-
series sensor data typically involves a span of recorded sensor readings for a period of
time during which an event takes place that is of interest for subsequent detection.
The ML learning algorithm finds patterns in the training data that map the input data attributes
to the provided training classes and then delivers a ML predictive model for classification of
newly presented input data thereafter. Classification predictive modeling is the task of
approximating a mapping function (f) from input variables (X) to discrete output variables (y).
Classification belongs to the category of supervised learning where the targets or classes are
also provided along with the input data. There are many applications in classification in many
domains such as in credit approval, medical diagnosis, target marketing etc. Defining your
requirements upfront for classification is an essential upfront step and should mapped out and
reviewed by all those involved in the data collection process. This important step cannot be
overstressed in importance as the cost and time required for performing data collection and
labeling is the dominant activity in AutoML workflows. Done right based on good upfront
planning, the algorithm development process can be much faster than traditional hand-coded
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 21
algorithm development. Done wrong, additional time is often needed to augment or worst-case
recollect labeled datasets where classes were not properly labeled and/or relevant metadata not
collected. Involving the overall team’s input often can reveal any such deficiencies in data
collection methodology prior to the expense of data collection itself. Below is an example test
plan template to illustrate the class mapping portion of the test plan preparation (the full test
plan template and example inputs can be found in the appendix):
Smart Edge AI Test Plan: Boxing Punch Detection Wearable
Revision: 1.0 Last Revised: 12/15/2019 By: SensiML AE Team
Application Summary: Motion classification for recognition of boxing punches from glove-mounted
3-axis accelerometer and 3-axis gyro sensor device.
Desired Inference Classifications
Categorical Variable
(SensiML Event Group)
Class 1
(SensiML Event 1)
Class 2
(SensiML Event 2)
Class 3
(SensiML Event 3)
Class n
(SensiML Event n)
Must Include
Boxing Punch Jab Hook Uppercut Overhand
Should Include
Boxing Impact Knockout Punch Solid Connect Glancing Blow Miss
May Include
Boxing Stance Upright Semi-crouch Full Crouch
Future Classes
Boxing Defense Bob Block Clinch Cover-Up
Note above that the classification types are bucketed into ‘must include’, ‘should include’, ‘may
include’, and ‘future classes’. The reasoning behind this is to force upfront thought into the
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 22
potential additional data that may or may not be included in the product feature plans for an
initial product but may have value later. Again, the difference with data-driven algorithm design
versus code-based algorithm development is the collection and curation of high-quality
datasets. Given the time and expense of signing up user subjects for the sports wearable data
collection example in the test plan shown above, it can be minimal incremental effort to have a
domain expert (like a coach) label not just the type of punch, but the stance and impact as well.
Going back to capture this later could double the cost of re-recruiting subjects, securing lab
space, technician time, and domain experts compared to a trivial addition of coach labeling time
if anticipated beforehand.
On the other hand, adding completely new collection protocols (like the boxing defensive
moves), may or may not have merit depending on additional time/cost to collect longer session
data. Only you will know this answer for your intended application. The intent of the test plan
template is to elicit the consideration of such costs and future plans.
A Real-World Example…
Say your product plan is to build a wearable sensor or sensor array for runners. Your first
product iteration might be aimed primarily at novices to provide coaching advice on how to
avoid injury from poor form. Your domain expert, an exercise physiologist, has determined that
heel strike and excess tibial (shinbone) rotation are the key motion parameters they seek to
detect from a device worn on the ankle with accelerometer sensors. Great, you know the
problem and the details for test planning will be more involved as discuss later, but you proceed
to enlist 200 test subjects to collect running data with the prototype ankle sensor product.
Fast forward to the next product release and the plan is now to expand the product’s insight to
running performance as well. The exercise physiologist tells you to infer this insight they need
user data on hip rotation and arm swing. If only you had thought ahead in our previous data
collection, you might have outfitted the initial 200 subjects with another sensor capable of
measuring that data as well. It was out of scope at the time, but the incremental cost of
attaching arm and waist worn sensors and capturing data and metadata for later use would have
been far less than recruiting 200 more subjects to start anew. Opportunity missed, and time and
costs increase. This example above applies not only to labeled sensor data but even more so to
associated context or metadata.
Now imagine your exercise physiologist, working with the product design team, raises the
opportunity to improve the wearable sensor’s feedback based on knowing user attributes. They
suggest that running speed is correlated with body mass index (BMI), a popular and simple to
obtain metric to obtain. But alas, you did not think to capture height and weight data for our
subjects during the data collection effort, so you’re missing the key context data needed as an
input feature to enhance the ML algorithm. In this case, the missed opportunity was involving
the domain expert in the upfront test design process to understand how desired insights might
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 23
drive inputs beyond the sensors themselves and which subject metadata should have been
included from the outset to inform the modeling work. While the above example is not always
foreseeable or practical, it pays to spend the time upfront to consider longer term product
insights.
Prototype IoT Device
As previously covered, any prototype to be used during development to collect sensor data in
the intended application should as closely as feasible approximate the sensor response and data
of the intended final product so that artifacts or erroneous data are not injected into the
algorithm that lead to poor algorithm performance.
Some changes can readily be made and transformed without the need to recollect data.
Examples include changed orientation of axes for inertial sensors or even changes in sensor
vendors provided there nearly identical responses or correlated calibrations. Positional changes
in motion sensors where the IC is moved elsewhere on the device printed circuit board (PCB) can
have more complex implications to repurposing of existing data. Careful thought and validation
of results before and after changes are needed to retain use of data across such physical design
changes.
Sensor Selection
The choice of sensor type for your IoT device starts with a clear definition of what predicted
outcomes are being sought from an algorithm. Sensor choice also includes not just what sensors
to use but also how many, location, orientation, physical coupling, frequency response, and
range. The selection of the sensor inputs feeding your model is one of ensuring you have
chosen your input sensors to maximize this correlation of measurable signal from the physical
sensor to desired insight while minimizing noise that can mask the desired signal.
Factors that influence the choice of sensor are many and highly specific to the application in
question. While it’s beyond the scope of this guide to give guidance on each and every potential
application, in this section we cover factors that are common across most applications. When it
comes to determining sensor location, the cost of time and effort to collect data far outweighs
the cost of the sensors themselves in most cases and thus favors over-collection at many
different locations simultaneously while still performing trial capture sessions.
Types of Sensors
For the time-series data, there’s always a physical sensor at the front of the chain that measures
a real-world property and converts to an electrical analog signal. There are a wide range of
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 24
physical sensors available as individual units of in combinations. The following lists the most
common types of sensors.
Physical sensor types
• Temperature Sensor
• Proximity Sensor
• Accelerometer
• Gyroscopic
• Vibration sensor
• IR Sensor (Infrared Sensor)
• Pressure Sensor
• Light Sensor
• Ultrasonic Sensor
• Acoustic Emission
• Smoke, Gas and Alcohol Sensor
• Touch Sensor
• Color Sensor
• Humidity Sensor
• Tilt Sensor
• Flow and Level Sensor
SensiML Toolkit supports a wide variety of time-series sensors such as accelerometers,
gyroscopes, magnetometers, microphones, load cells, pressure sensors, strain gauges,
acoustic emission sensors, ultrasonic, and piezo vibration sensors.
Virtual Sensors
Virtual sensors are the combination of individual physical sensors that enable you to monitor
more complex composite activities. For example, for a running wearable, there exists no such
physical sensor as a musculature/skeletal injury “riskometer”. Instead, you use readily available
low-cost MEMS motion sensors like multi-axis accelerometers and gyroscopes combined with
the ML algorithm and context data to create a “virtual” sensor or injury ‘riskometer’. Just like a
physical sensor, this virtual sensor has metrics for sensitivity, noise immunity, and error. You
control those performance parameters partially with use of ML methods to maximize the
algorithm portion, but also though is the selection and configuration of the physical input
sensors themselves.
Sensor Interfacing
When talking about the source physical sensors used as input to the virtual sensor device, there
are generally two types of sensor interfaces:
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 25
• Analog sensors – These purely analog devices that must interface with the
microcontroller (MCU) so there needs to be an analog to digital converter.
• Digital sensors – While most of these sensors have an MCU in them, they are almost
always not user-programmable and only support the fixed-function protocol within their
datasheet for interfacing with a host MCU. Digital sensors tend to have much better
noise immunity and are generally preferred in most cases.
The SensiML Toolkit executes via data received an Analog/Digital Converter (ADC) that
itself is either integrated in the MCU SoC, integrated as part of the sensor IC, or as a
discrete ADC IC. The SensiML Toolkit receives data from digital sensors via a peripheral
bus (i.e. SPI, I2C, I2S, and UART).
Physical Sensor Placement
A sensor may or may not be physically mounted on the same board as the MCU doing the
sensor processing. It depends on the application. If monitoring a large machine, there may be
multiple wired analog or digital sensors feeding into a virtual sensor MCU board that classifies
machine is operating normal or has one or more fault states based on limit switches, load cells,
temp sensors, vibration sensors located at various points on the machine. When the sensor is
remoted, noise and cable lengths come into play and dictate which interface makes the most
sense.
A Real-World Example…
Consider an application to monitor and classify proper versus suboptimal running form from
motion data placed on an athlete. At first, we may not yet already know where such motion
sensors should best be placed on the subject. Should these be located on the shoe? The ankle?
Calf or thigh or perhaps on the runner’s hip to capture pelvic rotation? Certainly the insights of a
domain expert (in this case an exercise physiologist or coach) would help narrow down the
possibilities. But it’s also worthwhile to consider capturing data from multiple different locations
(more than are envisioned in the final product or system) to understand where the greatest
correlated sensor signal originates for the class labels desired.
When initially formulating a plan for sensor placement, consider overpopulating sensors for the
initial trial if practical. Often the cost and effort of collecting data can far outweigh the cost of
the physical sensors (exceptions include very expensive piezo and acoustic emission sensors and
other more exotic sensor types). Having more data initially can help quickly converge on ideal
placement, orientation and placement.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 26
Sensor Configuration
Sensor configuration encompasses a number of factors aimed at maximizing the signal inputs
into the ML algorithm from the selected physical sensor(s) to ensure optimal performance.
Configuration of sensors includes considerations such as sample rate, signal gain or
amplification, noise suppression, filtering or signal conditioning, and analog to digital
conversion. Even the best sensor can easily be undermined by improper sensor configuration
and signal conditioning leading to a poor signal-to-noise ratio feeding the downstream ML
processing.
Analog Noise Suppression
Industrial environments often exhibit considerable electromagnetic interference (EMI) which can
adversely impact sensitive analog sensors. Noise can be injected through conducted noise in
shared power supply leads or induced through radio frequency (RF) into nearby sensors, signal
conditioners, microcontroller boards, and associated interconnect cabling. Sources for this noise
are many in the industrial environment and can include:
• Motors
• Transformers
• Contactors
• AC Power Conduits
• Solenoids
• Fluorescent and Arc Lighting
• Variable Frequency Drives
• Switching Power Supplies
Undesired noise can obscure the true signal and diminish model performance unless properly
mitigated. As an example, figures below show the effect of a nearby DC motor to a sensor that
used shielded coax signal cabling but lacked ideal shielding at signal termination.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 27
Figure 3 - Baseline noise in analog sensor input
Same signal w/ nearby DC motor running
Clearly the noise seen in the right image would be undesired and likely to degrade model
performance unless filtered out or suppressed. To reduce EMI noise in your signals, the
following are considered as best practices to follow.2
1) Use high quality cabling for interconnects. Shielding, wire gauge, outer casing
material, flexure ratings, and terminations are features to review in the selection of
cabling. Better cable can go a long way towards minimizing induced noise issues. At
minimum use of twisted-pair wiring can help, but full braid shielded coax cabling works
2) Suppress motor noise at the motor itself. Use of common mode chokes and/or filter
capacitors at the motor terminals can do much to attenuate noise at the source.3 With
DC brushed motors, arcing between brushes and commutator always occurs to some
degree and can generate RF and conducted electrical noise. The best mitigation is
proper brush alignment to minimize the arc noise, followed by metallic shielding of the
motor case to minimize the transmission of the RF noise.
3) Isolate power source for sensors and data acquisition. Route cabling for sensors and
downstream amplifiers and signal conditioning modules separate from power
conductors for motors, contactors, transformers, and other noises sources. Verify power
supplies for data acquisition and sensors are not receiving conducted noise from power
supply coupling of other equipment. If cabling must intersect power wires, ensure they
do so at 90 degrees and not in parallel wire runs.
4) Maintain shielding at cable splits and terminations. At any shielded cable split or
termination use connectors with metal back shells that are properly connected to the
shield wire. No portion of the sensor conductor should be unshielded.
5) Use differential versus single-ended inputs. By not gauging different sensors to
common ground, noise susceptibility can be greatly reduced. Connect the output signal
to the plus (+) differential input and the sensor ground to the minus (-) differential input.
6) Avoid and remove ground loops. For shielded signal cables, ensure only one end of
shield wire is terminated (at the zero-signal reference potential for the signals within the
shield). In special circumstances shields may be terminated at both ends but care must
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 28
be taken that there is no difference in potential between either ends of the shield,
because if there is, a ground loop will be induced.
These are the primary considerations to keep in mind with analog sensor noise suppression. Far
more detail can be readily found in other literature, a few of which are listed in the references
section of this document.
Signal Conditioning
Not every signal pre-processing step can be done fully within the post analog-to-digital
conversion realm and thus attention and care must be placed on analog realm signal
conditioning steps. Typical cases where analog signal conditioning can be used effectively to
cleanse raw sensor data include:
• Amplification of very low voltage or charge based sensors (examples: piezo sensors,
thermocouples, and biosensor electrodes).
• Passive low-power analog filtering.
• Voltage-current translation for long cable runs.
• Signal Isolation (i.e. opto-isolators to protect compute domain from high
power/voltage).
• Calibration / Linearization (although often possible with post-ADC calibration).
• Sensor Excitation (necessary for some sensors like strain gauges and gas/oxygen
sensors).
SensiML Toolkit include a variety of pre-processing algorithms that can be used to cleanse
raw sensor data prior to feature transformation and classification within the algorithm
Sampling Rate and Recording Length
The general rule of thumb during data collection is to capture train/test data at the highest
fidelity as practical. The reason for this is that it’s always possible to go back and down sample
original data to a lower data rate. But it’s not possible to add fidelity to a signal that was
captured at too low of a sample rate. Sample rate selection should be at least twice the
maximum frequency component to be measured. This criteria, popularly known as the Nyquist
theorem, dictates faithful reproduction of digitized analog signals. Sampling rates lower than the
Nyquist sample rate will introduce error in the form of aliasing.
Figure 4 shows a graphical example of aliasing that illustrates with a simple sine wave signal the
misleading signal profile produced by sampling rate too low for the signal being measured.
Aliasing is an effect that causes different signals to become indistinguishable.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 29
Figure 4 - Graphic example of aliasing showing misleading signal profile
Recording length for collected data is dependent on the nature of the event itself. For a discrete
event (examples: motion gesture, sound, or any time-series event with a clear start and end), it’s
no great surprise that the recording time should encapsulate the full event. While a distinct
signature may be possible from a subset of the event window, it’s best to collect as much data
as possible upfront and such schemes to identify subsets can be applied as post-processing of
the original dataset. Much like sample rate, it’s possible to truncate a sampled event window but
impossible without recollecting data to resurrect portions not captured originally.
For continuous events (vibration data, regularly cyclic movements like running/walking), it’s
helpful to capture a practically large number of iterations to use later for assess variation over
time and repetitions. For example, running and walking data would preferably contain 50-100
steps per recorded data sample rather than 2-5 strides. The incremental time and storage cost
are minimal but the cycle to cycle variance data might reveal important model information
contained in the longer sample window.
Sensor Data Collection
Capturing an accurately representative data set is the most important step in creating a smart
sensor algorithm using ML methods. The quality of the dataset is paramount to the ML
methodology. Upfront time spent ensuring data collection and labeling will be done in a high-
quality and cost-effective manner is well worth the invested pre-work effort. Figure 5 shows a
sample dataset collected from accelerometer.
In the data driven ML development process, your data collection is your custom training
mechanism for algorithm development. With the data science knowledge built into the
algorithm process, data capturing and labeling can be performed by semi-skilled data collection
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 30
people under guidance of the IoT device domain expert. Data collection and labeling for ML
algorithm learning is an iterative process of building a suite of datasets showing the events you
want to define as the basis for insights.
Figure 5 - Sample sensor data collection
In this section of the guide, you will grasp the key factors to carefully consider at the outset of
an ML data collection project. Working through a data collection plan will vastly improve your
odds for generating a high-quality application model quickly and with minimal iterative rework.
Keep in mind however that many applications are started without full knowledge of the factors
at play that will affect the model. Even the best planned data collection efforts can involve
iterative refinement to the test plan. It’s for this reason that later in the guide we’ll cover pilot
testing and the importance of validating the test plan itself before embarking on large scale
collection.
Training data is the fuel that ML uses to build an algorithm. A good rule of thumb is you
will need 30 to 50 data set samples to create a good algorithm. This is an iterative, trial
and error process that should not be done all at once.
Understanding Data Inputs
Building models is about using predictive algorithms to classify useful insights from varying
input data. Creating a model is about inferencing from signal patterns generated from data
inputs. The figure below shows and example of data inputs for a motor fault detection and
classification.
There are three sources of variance in any data collection for ML application.
1. Signal: Useful sensor measured differences correlated to intended insight.
2. Metadata: Knowable contextual differences that can be useful filters.
3. Noise: Unknown differences not correlated to intended insights.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 31
These variances define the foundation for planning your data collection and labeling objectives.
1. Signal: Capture/label examples for each desired insight state across the distribution of
metadata and noise.
2. Metadata: Seek to convert as much noise variance into measurable metadata as
possible.
3. Noise: Seek to suppress; more unexplained variance equals more data required for a
good algorithm.
Figure 6 - Motor fault detection and classification data Inputs example
Sources of Variance
Variance should be thought about in terms of intended and unintended variance.
• Intended variance includes all the sources of variation that are directly associated with
our application model.
• Unintended variance is what we might consider as a form of noise in that we have not
or practically cannot accommodate it through measured inputs or simply did not
anticipate or think about or understand it in the upfront planning process.
A Real-World Example…
Take the case of a predictive maintenance sensor for a piece of industrial machinery. You can
anticipate that if a V-belt between two drive pulleys is about to fail based on vibration and
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 32
microphone sensors attached to the pulley bearing blocks. You’ve thought about source of
variance in our failure model and included the following measured sources in the table below.
Intended / Measured Variance
Sensor Measured Variation Audio Signal as Measured At Belt Pulley
Vibration Signal as Measured at Pulley
Annotated Metadata Variation Brand of Belt Installed
Technician Who Installed Belt
Calculated Metadata Variation
Time Since Belt Placed in Service
Cumulative Machine Hours Since Belt
Last Changed
Table 1 - V-Belt intended/measured variances
At the same time, you should also spend time considering all of the other unplanned measures
and sources of variance that could impact or model. For this sample case, these might include
the variances in the table below.
Unintended / Unmeasured Variance
Potentially Measurable Variation
Pulley Center Distance
Pulley Angular Misalignment
Pulley Axial Misalignment
Belt Tension
Belt Wear Thickness
Belt Durometer Hardness
Belt Temperature
Ambient Ozone Reading at Machine
Potentially Annotatable Metadata Variation
Belt Plant of Manufacture
Belt Manufacture Date
Number of Plies (Belt Construction)
Table 2 - V-Belt unintended/unmeasured variances
For this example, the V-Belt failure prediction sensor, some of the unintended variation sources
may seem absurd at first. But this exercise is useful to go through at the outset and one of the
most valuable and overlooked tasks in data collection pre-planning. While many of the
unintended variations and measure may not be addressable, this exercise usually reveals
opportunities to confirm, control, or collect variance data that may prove invaluable during the
algorithm development and modeling phase.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 33
In this V-Belt sample case, this exercise could lead you to recognize or change the following
elements of your failure prediction:
1) You may for train/test, choose to standardize on a brand, type, maximum age, and
construction of V-belt to eliminate unknown manufacturing variance in the belt itself
from corrupting our model. In turn, your fan belt sensor may then be certified as
accurate only when used with recommended belt specifications and brands
2) You conclude that belt tension is highly correlated to pulley separation and can be
readily measured with a non-contact displacement sensor on the idler pulley, so you
ensure this is added as an input sensor measured feature.
3) You recognize angular and axial misalignment could contribute substantially to your
implementation outcomes and ensure that operators are trained properly to configure
these contributing sources of wear variation to within a specified tolerance during
train/test. You further stipulate that the accurate of the resulting model requires that
operators ensure pulleys are aligned within model specifications.
False Positives
When developing an application, the tendency is to focus on the events of interest (i.e. a specific
gesture, a particular machine fault). But for good algorithm performance you also need carefully
consider those things that are NOT events of interest but might be confused by the algorithm as
such. False positives are a significant problem in any application that attempts to provide
meaningful insight from sensors. Excessive false positives erode confidence in the insights and
can lead to alarm fatigue (think of the din of false buzzers and alarms in a typical hospital ER).
A Real-World Example…
A watch wearable where the device needs to trigger an event when the user brings their wrist up
to their face like their intending to read the watch display. This event triggers a screen wakeup.
But if you want to train the algorithm not just on the proper event that triggers an event but
also a bunch of examples of when NOT to trigger (scratching your head with your hand, wiping
your mouth with your arm, reaching out to turn on a light switch, etc.), you can see that even a
simple event detection algorithms can get complicated. You need capture not only all the
variations that SHOULD trigger (like left-handed vs right-handed people) but also those that
SHOULD NOT.
Population Diversity
As you seek to capture and properly represent variation in your application model, frequently
one of the most important is that attributed to subject-to-subject variance. The challenge is to
understand how to select your training dataset population to be most representative of the data
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 34
diversity expected for the desired insights. This is where knowledge of the application domain
expert is invaluable.
The broader the level of population diversity, the larger the dataset that is required to properly
characterize outcomes across the population. Don’t assume a fixed number of subjects will
suffice irrespective of population variance. Instead, ensure a sufficient number of representative
subjects exist for each unique combination of meaningful population groups. A good rule of
thumb is to have at least 10-20 examples for each unique combination. In each case, the
categories of subject variation should be captured and annotated for each subject as metadata.
In the runner example, you would record a running session and also collect information about
each subject’s experience level, height, and weight, and the type of running surface used during
the data collection.
Note that individual sources of metadata can be added over time to the model. Thus it is not
necessary to collect running data for all subject build types and experience levels across every
imaginable running surface from the outset. Rather the developer must realize the model’s
limitations will be subject to the coverage across the possible source of variation and population
diversity in the dataset to date. Don’t expect the model to universally apply to as-yet
unmeasured subject attributes not yet included in the training dataset. It’s more than likely the
model will perform suboptimal in those instances.
A Real-World Example…
If you are creating a fitness wearable device for assessing and coaching on running form, an
expert running coach would understand that differences in running styles. Novices will exhibit
many more examples of poor form while seasoned runners will more likely demonstrate proper
form. Having training subjects solely from just one or the other population risks generating a
model that is incomplete in its characterization of various running forms that are likely to be
encountered by users of the product. In this case, experience is likely just one of several
important factors. Others might include runner height, weight, and even the type of surface
(treadmill, track, road, or unpaved trail).
Subject Sample Size and Dataset Sufficiency
One of the most common questions asked during ML data collection and labeling is ‘How many
subjects/examples do I need to get a quality model?” Unfortunately, there’s no one answer to
this question. For more information, read “How Much Training Data is Required for Machine
Learning.” Figure 6 shows a practical and data-driven way of determining whether you have
enough training data is by plotting a learning curve.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 35
Figure 6 - A sample learning curve 4
The learning curve represents the evolution of the training and test errors as you increase the
size of your training set.
• The training error increases as you increase the size of your dataset, because it becomes
harder to fit a model that accounts for the increasing complexity/variability of your
training set.
• The test error decreases as you increase the size of your dataset, because the model is
able to generalize better from a higher amount of information.
As you can see on the rightmost part of the plot in the previous figure, the two lines in the plot
tend to reach and asymptote. Therefore, you eventually will reach a point in which increasing the
size of your dataset will not have an impact on your trained model. The distance between the
test error and training error asymptotes is a representation of your model's overfitting. An
asymptote is a line that continually approaches a given curve but does not meet in any finite
distance. This plot is saying whether you need more data or not. Basically, if you represent test
and training error for increasing larger subsets of your training data, and the lines do not seem
to be reaching an asymptote, you should keep collecting more data.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 36
Phasing Data Collection
The process of collecting data should be iterative. It’s worth splitting a large data collection
effort into a pilot phase and a volume phase such that the limited pilot phase data can be
assessed and methodology revised if needed. Experience with sensor data ML projects shows
that even the best planned processes invariably have factors that had not been considered
upfront. Some of these are practical operational factors like developing an efficient process for
prepping subjects, collecting the data, annotating metadata, and organizing the database for
use later. Other factors arise after initial analysis of a limited dataset which may reveal surprises
about level of variance, signal/noise issues, missing metadata, or inappropriate population
selection which require revisions to the test plan.
Documenting Methodology
Ensure technicians, domain experts, database curators document the process methodology as
they proceed through the Smart Edge AI process so there is written documentation to reference
along the way ensuring consistent approach is used, and to debug any issues during the pilot
phase of data collection:
Data Labeling
Data labeling often goes hand-in-hand with sensor data collection attaching ground truth labels
for predictive insights to the training samples collected as well as relevant contextual or
metadata that aids in the predictive model. We treat this step separately as it may or may not be
done in tandem with data collection. The choice typically is driven by practical issues in
assessing and annotating ground truth as data is being collected. A hand gesture recognition is
pretty obvious to anyone collecting data. Alternatively, measuring gas turbine operating
conditions that lead to compressor blade failure is a different matter. Since structural health at
the microscopic level cannot be measured in situ during operation, labeling based on separate
disassembly and ultrasonic inspection requires separate but coordinated processes and datasets.
Enumerating Relevant Metadata Annotation
As stated in the section on population diversity, you should enumerate all of the potential
sources of variance and includes these as metadata in your data collection process. In addition
to the sensor data itself, it’s equally important to collect other contextual inputs such as those
metadata which can influence outcomes if known. Bear in mind that to be useful, the contextual
data needs to be knowable both for train/test stage model generation and during runtime use
of the model for prediction.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 37
Careful thought upfront can identify the right set of metadata to capture and record during data
collection to balance the cost of collecting it versus its immediate usefulness for modeling and
long-term potential for value as more data is developed or the application evolves.
A Real-World Example…
Consider a predictive maintenance model for a motor that predicts bearing failure. If the
variance in manufacturing quality across bearing suppliers was found to be substantial this
might be a useful model input, provided this data is easily obtainable in use. In this case, it may
not be practical or reasonable for the user to have such component information upfront, thus
while an important input, it’s not a practical one for purposes of building our model. Whether or
not you would choose to capture this information, it comes down to judgement within the
application as to whether the information might be usable in the future. It’s possible that such
information would be known upon servicing of the motor and thus could augment the mode
input at a later date.
Defining Data Labeling Methodology
The mechanics of labeling data are usually dictated by the complexity of the label
determination. In many cases, the outcomes desired are straightforward and immediately
obvious and thus can be annotated directly at the time of capture. Such is the case with gesture
recognition where anyone that is conducting the test is probably capable of recognizing the
performed gesture during data capture. But other applications may not be so readily apparent
and require offline analysis by a domain expert to label “ground truth” properly in the training
dataset.
ML Algorithm Development
Thus far, we have focused nearly all our attention on the front-end stages of ML algorithm
building: data collection, physical sensor choices and factors, pre-processing considerations and
test methodologies for collecting high-quality ML training data. These are vital factors in the
steps leading up to ML algorithm development itself as algorithms for ML are only as good as
the data used to train them.
The ultimate success in ML algorithm development beyond the front-end stages relies equally
on the ability to analyze this training data using ML frameworks to generate accurate predictive
ML code that performs well classifying never-before-seen data. This step is often the biggest
hurdle in the ML pipeline as it typically requires human expertise, data science training, and the
expertise in operating the specific AI tools and methods used to select and tune the ML
algorithms themselves based on this knowledge. Volumes have been written on this stage of the
process and are far beyond the scope of this document to cover.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 38
Generating ML algorithms using the AutoML approach delivers sophisticated ML methodologies
to mainstream developers without data science backgrounds. Key to effective implementation of
AutoML tools is sufficient knowledge to understand how to interpret the results of the AutoML
process and evaluate models generated by the AutoML tool for appropriateness, accuracy,
sensitivity, specificity, overfitting, data splits, and efficiency.
The AutoML approach to ML generation is typically done in the cloud to harness the processing
power needed to create the algorithm. Once the algorithm is completed, you can evaluate it for
model appropriateness at the endpoint.
Defining Model Appropriateness
By model appropriateness, we’re talking about having a well-considered selection and
evaluation of possible algorithms for a given dataset and application. Data scientists will often
cite the “No Free Lunch Theorem” 5which in simple terms states that, averaged over all possible
problems, no one classifier outperforms any other. Thus, it’s common practice in machine
learning to try many models to find the one that works best for a particular application and
dataset. Data splitting of training datasets (which we discuss shortly) is used for validation or
cross-validation to assess the predictive value of many different models with the most suitable
one chosen based on the comparative results.
It’s beyond the scope of this document to get into the specifics of each and every type of
classification algorithm. The list is extensive and includes, in rough order of complexity, Naïve-
Bayes, k-nearest neighbor (KNN), logistic regression, support vector machine (SVM), decision
trees, ensemble models, and artificial neural networks (ANN) also known as deep learning
models.
What is worth noting is that AutoML tools, by their nature can allow for rapid iteration across
multiple algorithms, thus 1) making it practical to evaluate many different models against a
given dataset to determine the best performing type for a given problem and 2) allowing
developers with only modest data science background to test and select amongst a variety of
algorithms. The effectiveness of this approach obviously depends upon the variety of supported
algorithms and the level of automation supported by a given AutoML tool.
Unlike many tools that focus only on deep learning of ANN model frameworks, SensiML
supports a broad and growing array of classifiers. This diversity of modeling approaches
increases the likelihood of finding an optimal performing algorithm for a given application
and dataset. When factoring the limited resources of implementing ML in resource
constrained endpoint microcontrollers and IoT devices, this is particularly important to
good performance.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 39
Accuracy
Accuracy represents the ratio of correct predictions provided by a given ML algorithm to the
total number of predictions made by that algorithm. Beyond the rather obvious desired
condition of an algorithm that achieves 100% accuracy, this metric if used alone is lacking in
scenarios where performance is less than 100% accurate. To understand more, we need to
further explore cases where an algorithm is inaccurate.
A Real-World Example…
Imagine an algorithm that can use vibration and sound measurement to predict in advance
when a machine is about to suffer a catastrophic failure. Now imagine two scenarios where such
prediction algorithm fails. In the first failed case, the algorithm might indicate imminent machine
failure when in fact the machine is perfectly healthy and nowhere near failure. This could result
from erroneous signals, noise, or outside influences that trigger what is known as a false
positive. Now consider a case where the machine indeed fails, but the algorithm did not
anticipate the imminent failure and indicated all is normal. This opposite algorithm failure is
what is known as a false negative. Neither is desirable, but based on the application one type of
failure may be more costly or detrimental than the other (as with medical screening and lab
tests).
Where this is particularly important is in datasets that are class-imbalanced as is often the case
in real-world applications. By imbalanced, we mean skewed datasets where the incidence of one
class far outweighs others. In our machine example, the state of ‘machine healthy’ is usually
many orders of magnitude more prevalent than ‘machine failure imminent’. Thus, an algorithm
may appear very good by always reporting ‘machine healthy’ no matter what, since that’s the
predominant machine state anyways. Obviously that would not be a very insightful predictive
maintenance algorithm even if it was 99.98% accurate in actual use.
Specificity
To combat the issue of accuracy in class imbalanced datasets, we need to look at additional
metrics. One such metric is known as specificity which measures the ratio of algorithm predicted
negatives to all negative instances (whether true negative or false positive predictions). Stated
differently, this is a measure of false alarms or false positives. In applications where false
positives are to be avoided to maximum extent possible, specificity is an important metric.
Sensitivity
The reverse case is what is known as specificity (otherwise known as recall) or the ratio of
predicted (true) positives to all positives (either true positive or false negatives). Stated
differently, this is a measure of how good the algorithm is at catching positives. An algorithm
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 40
that always predicts a machine about to fail when applied to a machine that most often is
running normally would have high sensitivity (100%) but very low accuracy.
Precision
Yet another metric looks at the ratio of correctly predicted positives. This is known as precision.
An algorithm that gets half of its positive predictions right (and thus half are false positives)
would have a precision of 50%.
F1 Score
From the above metrics, it’s probably clear that no one of these measures alone gives a good
enough picture of overall model performance. And accuracy alone can be most misleading when
class imbalance is high like a motor that 99.9% of time is running normally and thus the 0.01%
instance of impending failure looks insignificant but comes at high cost of a destroyed machine.
The F1 score, a harmonic mean of precision and sensitivity, calculated as 2 x (precision x
sensitivity) / (precision + sensitivity) attempts to negate the biasing effect of highly imbalanced
class distribution and favors balance between precision and sensitivity. Often it is a better overall
metric than simply looking at accuracy.
Performance Measures for Multi-class Datasets
You might be wondering what if the algorithm is predicting from amongst multiple classes? The
above explanation of accuracy, specificity, selectivity, and F1 scores were introduced for the
simplest case of two-state classification (positive or negative, yes or no). Do are these measures
applied when the algorithm required must select amongst many classes (select from one of a
dozen gestures, or predict various machine fault states)? The answer is that these metrics when
used in multistate classifiers are considered for a given class relative to the sum of all other
classes. So in this way, performance measures can be broken down for each class in a multiclass
classifier and evaluated on the importance of each and all of the classes present.
Confusion Matrices
In the prior section we mention how we can extend two class algorithm performance measures
to multiclass datasets. But more commonly the performance of multiclass classification
algorithms is assessed by use of a confusion matrix. Figure 7 shows an example of what a
confusion matrix looks like.
On the vertical axis is represented the actual (or ground truth) distribution with one row per
class with each row summing to the total actual class occurrences for each class label. The
horizontal axis, with one column per class tabulates the distribution of algorithm predicted class
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 41
labels with each column summing to the total predicted class occurrences for each class label. In
the field of the 2x2 table are cells representing the total number of occurrences of actual and
predicted for each and every combination of actual and predicted class.
A perfectly accurate prediction model would see all occurrences at the same class label for
actual and predicted value and thus only values in the cells diagonally from the upper left to
lower right with all other cells being zero. Erroneous predictions show up in the triangular
regions on either side. Thus as a tabulated representation of multiclass algorithm performance,
the confusion matrix provides a handy and quickly readable format to assess model quality.
Figure 7 – Confusion matrix example
Overfitting/Underfitting
Having assessed the primary metrics of accuracy, specificity, selectivity, F1 scores, and confusion
matrices, we may think our job is done at the point we achieve algorithms with satisfactory
performance on these measures. But there are other considerations that can lead us into a false
belief we have a good performing algorithm.
Let’s next look at the concept of model fitting, either underfitting, overfitting, or good fit
characterization of algorithm to the training dataset. To help, a visual representation is given of
each state is shown in Figure 8.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 42
Figure 8 – Underfitting, appropriate (good) fitting, or overfitting
Imagine a dataset with two features represented by the x and y axes in the graphs shown above.
By features, we mean model inputs derived from the underlying sensors we measure to predict
an outcome as a class. The classes in the graphs above are represented by the different markers
showing the distribution of the different classes relative to the two features in the space
represented by the x and y axes (known as feature space). Again, for simplicity we’ll consider
only two classes for illustration: “X”s and “O”s.
Now if we’ve selected good features for our dataset and application, we should see separable
regions in feature space that distinguish between the Xs and Os. With real-world data though
the reality is often not 100% clean and we’ll have complicated borders and outliers between the
regions that require are dividing line between classes (known as the decision boundary) to be
more complex. If our decision boundary is too simple to represent the true complexity of the
separable classes in our dataset, our algorithm can be said to be underfitting. Much of the
variance in the model is not being explained by our simple linear decision boundary in the left-
hand picture and thus we have an underperforming model. Now this will be evident in our
performance metrics (accuracy, F1 scores, confusion matrix) so we have clues to this problem
from the prior performance analysis discussed earlier.
Now let’s look at the right side. We’ve come up with a seemingly wonderful model that ensures
every class is where it belongs. We did this by creating an overly complex decision boundary
that meanders through feature space as required to appropriately classify our training dataset.
Great right? Wrong. In short, we’re kidding ourselves with a false sense of fit or what can be
characterized as overfitting.
An analogy we can all probably intuitively grasp the false sense of predictive value we might
obtain from taking a couple months of stock market asset price data and applying a complex
polynomial curve fit and then extrapolating that curve into the future and believing it will
continue to hold based on historical data. A similar false sense of accuracy applied in multiclass
classification models with overly complex decision boundaries or what we call overfitting.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 43
Data Splitting: Train versus Test Data
The key to detecting overfitting is holding back some of our original dataset from the algorithm
training set and using it like new unseen data to test whether our model works when presented
examples it hasn’t been trained against. Thus we choose a fraction of data for training and the
remainder for testing (or validating) our model. If we’ve been successful in creating a model that
isn’t overfitting, the model should still perform well when given this “new” data we’ve set aside
for validation. The properly fit model is said to generalize well.
So where do we split our data? Too much of it used for training leaves little to provide assurance
testing against overfit. Too little used for training, we end up with a poorer model because we
did not have sufficient number of samples to characterize all of the actual variance in the data.
Training dataset size translates directly into money. Money from time and effort to collect
samples, money from machine time or test conditions that are difficult to record as needed. We
don’t want to collect more data than we need. Rather, we wish to collect just enough to capture
real-world variance for algorithm training with enough left over for validation and overfitting
checks.
A very common and clever solution is what is called k-fold cross-validation. This involves
chopping up your overall dataset into k subsets of data called folds. Then we iteratively train our
algorithm with all but one of these folds and validate with the holdout fold where we perform
the same train/test using a different holdout fold each time. This yields the much more efficient
use of the limited dataset between training and testing and maximizes our ability to tune the
model while limiting overfit.
SensiML supports an enhanced version of k-fold cross validation called stratified k-fold
cross validation that seeks to ensure each fold has a equal distribution of classes. So data
is thus redistributed to ensure each fold is a reasonable representation of the whole
dataset. SensiML also offers other validation schemes such as stratified shuffle split,
metadata k-fold, and stratified metadata k-fold validation.
Interpreting ML Performance
There are many more techniques to model training, combining, and validation that seek to
improve performance and minimize the pitfall of overfitting. These are beyond the scope of this
document and well covered in numerous references on machine learning. Our objective is NOT
to make you into a data scientist, but to teach you enough to understand how good
experimental methods, dataset collection, and labeling can be combined with powerful AutoML
toolkits and a basic understanding and appreciation of interpreting ML performance results to
achieve stellar algorithm performance with modest data science skills.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 44
Converting an Algorithm to Optimized Endpoint Code
At this point, we have covered a great deal of ground having addressed sensor inputs, signal
processing, data collection and labeling, AutoML algorithm search, and performance assessment
aspects of endpoint AI algorithm development. But we still do not have functional code that we
can load on our low-power embedded IoT endpoint and test our application. The next step in
the ML pipeline involves transforming our algorithm into code that can be run optimally on our
target hardware. Since our hardware platform of choice for smart IoT endpoints is not a Linux
server with terabytes of storage, GHz of multicore CPU and GPU processors, and Gbps of
network bandwidth, committing an algorithm to practice in the form of power efficient code is
itself a non-trivial task.
This task must be done either iteratively, starting from an idealized algorithm that is in-turn
simplified to fit the hardware or integral with algorithm search and selection. The former is
common in development process. A statistical tool (like MATLAB or R) or ML framework (Caffee)
is used to arrive at an algorithm and then manually coded by firmware engineers in conjunction
with data scientists providing guidance on deviations from algorithms provided in compute rich
modeling tools.
For example, floating point math may be substituted for quantized integer mathematics to
minimize clock cycles required and/or memory for large arrays. Complex math may be
approximated with lookup tables, compiler optimizations made, DSP extensions and math
libraries employed, profilers used to streamline loops/branches and various other means to
improve cost/performance on the microcontroller chosen for the IoT device endpoint.
The challenge is in so doing, to maintain model fidelity such that the hard work in the model
optimization and selection is not compromised by simplifications made in the name of power
and resource reductions. It is here that many AutoML tools fall short as they do not carry the
implementation to the point of embedded code delivery and assurance. A few such tools do
automate this step in the pipeline as well.
SensiML supports a multi-platform embedded code generation step that integrates code
optimization early in the model selection process. Each feature extractor and classifier
algorithm includes profile data on code size to ensure that candidate models not only fulfill
performance constraints like accuracy and F1 score, but also memory limitations imposed
by the chosen target hardware to be used for the end product. In this way, SensiML Toolkit
extends the AutoML pipeline to ensure bit-exact implementation of efficient code on
device that performs just as expected from the output of the ML algorithm selection and
tuning process.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 45
Test/Validation of Local IoT Device Insight
Once the ML algorithm has been implemented in embedded code and then flashed onto the
target IoT device, you are ready to test it in real-world settings for accuracy. At this stage, the
model can either be deemed acceptable or additional test data added to the data collection
phase and the process iterated based on this new data.
This testing includes empirical tests done on the device in real-world usage but also usually
includes reproducible test files run both in emulation and on the target device. Use of test files
ensures a repeatable test bench with which to consistently measure behavior of the algorithm
over changes as we make modifications. As you perform validation testing, it’s important that
you keep in mind some of the same error sources that can get introduced in model training and
test for these to ensure the model generalizes and performs well.
Sample Bias
Since humans are responsible for collecting both training and test data in building algorithms,
any inherent bias in human decision-making during training is often carried over into test phase.
Sample bias results from omission or skewing of subjects used for train/test datasets in a way
that is not representative of the population of subjects as a whole (i.e. what will be encountered
in subsequent real-world product use). Ways to detect sample bias may include assigning the
training data collection and test data collection to different individuals. Also using
methodologies to ensure random (stochastic) selection methods are used in selecting
samples/subjects for both training and testing.
Bias in ML models can trigger costly errors if not caught in the validation stage. Imagine a
product that is released only to discover it has omitted a major segment of subjects found in the
application population. This is where the advice and input of domain experts familiar with the
application space is invaluable. Have domain experts involved closely as the test data selection
process is made and utilize appropriate data de-biasing techniques to remove bias from your
test datasets. Fortunately, ML techniques provide advantages if or when bias is discovered.
Because all data collected can be partitioned into either training or testing, any bias discovered
can be corrected by re-pooling and partitioning data using stochastic methods to true
randomness been the splits.
The other advantage of ML is the speed in which iterative retraining can be performed versus
hand-coded algorithm development methods. With hand-coded algorithms, a given model
implementation is tested against expected usage as hopefully represented by bias free test data
as well as empirical spot checks from new sensor data streams coming from supported usage
conditions. Any errors in model performance that are discovered are isolated and root caused.
Developers seek to understand in this step if the issue is a result of a logical error in the
algorithm or whether it represents conditions and sensor data not anticipated at the outset of
the algorithm creation. Thus hand-coded algorithms involve significant time and manual effort
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 46
spent in probing cause/effect scenarios for outlier data and determining how to revise the
algorithm to address these exceptional conditions.
While the same process is true with traditional AI-based testing, the key difference is automated
re-training of data which is performed in compute time using standardized workflows and tools.
Thus a great number of passes through the algorithm development step with subsequent
testing can be performed using tools that support this process. By retaining with the inclusion of
the same test data used to discover incorrect prediction(s), the same data can be re-run through
model tuning or retraining step. In this way, the same dataset collected at the outset can be
augmented with additional new data for errant cases discovered during empirical testing as well.
Without manual effort expended in understanding how to revise the underlying algorithm to
accommodate the wrongly predicted example, the model training process can simply be
repeated as though this data was part of the initial training dataset.
Lifelong Learning and Iterative Model Updates
Taking this closed-loop learning concept further, the true power of AI based algorithms comes
from the adoption of what is known as “lifelong learning.” The ability to continually improve
algorithm performance throughout product usage, drawing not only from data collection and
labeling performed during the IoT sensor device’s product development stage, but even after
commercial launch and use in the field. This ability to generate datasets drawing from an entire
installed base of IoT devices rather than just the sample set conducted as part of the limited
product development and testing phases, opens up even more learning capabilities.
Such lifelong or continual learning can be segmented into two types, collective learning
benefiting the entire global population of devices or localized learning where algorithm
performance is adjusted based on new data specific to a given device. Collective learning
requires connectivity and transmission of newly labeled data back to the dataset repository used
for the AutoML retraining process. Localized learning, or algorithm personalization uses newly
labeled data to re-learn locally on a specific endpoint node.
Because the computational power required to retrain models is typically considerable, a full
retraining of the model is usually not practical on the endpoint IoT sensor device. Instead, the
feasible approach is to generate good results in the vast majority of cases performed by
classifier tuning. In this approach, feature engineering is skipped under the assumption that the
underlying feature vector is still valid and errors in prediction can be eliminated by changing
classifier hyperparameters. Much less computationally intensive than a full retrain and feature
engineering job, this tuning which involves adding, pruning, or resizing classifier neurons,
adding or modifying decision tree branches can be accomplished within the constraints of the
embedded microcontroller in most cases. By allowing the device to revise its model the machine
learning aspect is realized over time and performance will improve in use with additional
corrected data examples.
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 47
Figure 9 – Lifetime learning process
Conclusion
Over the course of this paper we have presented a number of considerations that should be
factored in the design of Smart Edge AI or AutoML based IoT sensor device algorithm
development. While there is much to know and factor into careful upfront planning, the process
building IoT algorithms with automated ML tools can vastly outperform traditional coding
approached in time, cost, and quality. Rather than expending a great deal of resources and time
on data science and embedded code firmware optimization, the AutoML approach exemplifies
in Smart Edge AI tools can free developers to focus on their intended application functionality.
Sensor
Data Input
Feature
Extraction
Buffered
Raw Data
AI Algorithm
Processing
Buffered
Feature
Vector
Prediction
Correct?
Tune Edge Model w/ Label
Corrected Feature Vector
Retrain Model Adding Label
Corrected Raw Data
Yes
No (Learn Locally)
No (Learn Globally)
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 48
Appendix – Smart Edge AI Test Plan Template (Example)
The following template is provided as an example for advance annotation of data collection,
metadata collection, and sources of variance to control during train/test data collection for
Smart Edge AI data capture activity. The example used is for a consumer sports feedback
wearable device, but similar approach can be used regardless of intended application.
Smart Edge AI Test Plan: Boxing Punch Detection Wearable
Revision: 1.0 Last Revised: 12/15/2019 By: SensiML AE Team
Application Summary: Motion classification for recognition of boxing punches from glove-mounted
3-axis accelerometer and 3-axis gyro sensor device.
Desired Inference Classifications
Categorical Variable
(SensiML Event Group)
Class 1
(SensiML Event 1)
Class 2
(SensiML Event 2)
Class 3
(SensiML Event 3)
Class n
(SensiML Event n)
Must Include
Boxing Punch Jab Hook Uppercut Overhand
Should Include
Boxing Impact Knockout Punch Solid Connect Glancing Blow Miss
May Include
Boxing Stance Upright Semi-crouch Full Crouch
Future Classes
Boxing Defense Bob Block Clinch Cover-Up
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 49
Intended Variance
Metadata Variable Metadata Value 1 Metadata Value 2 Metadata Value 3 Metadata Value n
Annotated Metadata
Subject ID Unique User ID#
Gender Male Female
Experience Expert Intermediate Novice
Dominant Hand Left-Handed Right-Handed Ambidextrous
Calculated Metadata
Subject Height Height (inches)
Subject Weight Weight (lbs)
Unintended Variance
Metadata Variable Metadata Value 1 Metadata Value 2 Metadata Value 3 Metadata Value n
Annotated Metadata
Test Technician Technician ID#
Collection Date m/d/y h:m
Calculated Metadata
Subject Warm-up Time (minutes)
Ambient Temp Temp (°F)
Sensor Inputs
Sensor Sample Rate Full Scale Range Type (Digital, ADC) Notes
6DoF IMU
(Accel/Gyro) 200 Hz
+/- 2G,
+/- 2000 dps Digital
QuickLogic Chilkat EVB
(on-board sensors)
Building Smart IoT Devices with AutoML: A Practical Guide to Zero-Coding Algorithm Design 50
References
1 “Embedded Machine Learning Design for Dummies” 2 “Top 8 Ways to Deal with Noise in Data Acquisition and Test Systems” 3 “A Guide to Understanding Common Mode Chokes” 4 "No Free Lunch Theorems for Optimization"
top related