Edge AI applications
Edge AI is not a single application. It is an inference capability: the ability to classify, predict, and detect, in real time, on hardware that is already deployed, without a cloud connection. The applications it enables span dozens of industries and hundreds of specific use cases. What they share is a common architecture — on-device inference, at the point of measurement, producing decisions rather than data.
Where edge AI is deployed. And why.
Edge AI applications tend to cluster around three constraints. The first is latency: decisions that must happen within milliseconds of a sensor reading, before the opportunity to act has passed. The second is power: devices that cannot sustain the energy cost of cloud inference, whether because they are battery-operated or because the data transmission itself is too expensive. The third is connectivity: environments where network access is intermittent, contested, or absent by design.
Any application with all three constraints is a natural fit for edge AI. Most embedded industrial applications have at least two. The result is a very large category: rotating machinery monitoring, infrastructure sensing, automotive control systems, battery management, supply chain forecasting, precision agriculture, acoustic monitoring, and smart building systems all qualify.
The common thread is not the industry or the sensor type — it is the requirement that intelligence lives at the hardware layer. When data must be acted on before it can be transmitted, when transmitting it is too expensive in energy or bandwidth, or when connectivity cannot be relied upon, inference at the device is the only architecture that works.
Catch failures before they cost.
In any continuous process — a manufacturing line, a utility network, a mechanical system, a distribution grid — anomalous behaviour is the signal that something is wrong or is about to be. Detecting it early prevents equipment failure, reduces unplanned downtime, and avoids the cost of reactive repair. The key word is continuously: sporadic sampling misses the events that matter most.
Continuous anomaly detection requires models running at high frequency against a steady stream of sensor data: vibration, current, pressure, temperature, acoustic. Transmitting that raw data to a cloud service for inference is expensive in bandwidth, infrastructure cost, and latency. At scale — hundreds or thousands of sensors, each inferring many times per second — it is impractical. The architecture that works is inference at the node: a model trained on healthy and anomalous signatures runs locally, and only event flags travel upstream.
Anomaly detection is one of the most demanding edge AI applications from a hardware perspective. The model must be sensitive enough to catch genuine anomalies in a noise-heavy signal, compact enough to fit within the flash and RAM of the monitoring hardware, and energy-efficient enough to run continuously without exhausting a battery-powered node. Neural networks can achieve the first requirement with difficulty and at significant hardware cost. Logic-based models address all three simultaneously because the computational requirements are matched to embedded hardware from the outset.
The MLPerf Tiny anomaly detection benchmark, run on Arm Cortex-M4 hardware against an acoustic anomaly detection dataset, provides a standardised comparison point. The results for LBN inference versus a neural network fully-connected autoencoder on identical hardware are documented in the MLPerf Tiny benchmark results.
Maintain before failure occurs.
Equipment failure is expensive — in downtime, repair cost, and sometimes safety implications. Traditional maintenance schedules are either too conservative, replacing parts that still have life remaining, or too reactive, waiting for failure before acting. Condition-based maintenance solves neither problem unless the condition monitoring is accurate enough to be trusted. The aim is to detect degradation when it is developing, not when it has already produced a fault.
Mechanical degradation produces consistent signatures in the data that surrounds the equipment. Bearing wear creates specific frequency components in vibration spectra: the ball pass frequency outer race, inner race, and fundamental train frequency all shift in characteristic ways as degradation progresses. Insulation degradation in motor windings elevates current harmonics. Impeller imbalance in a pump produces vibration at multiples of the shaft frequency. These signals are detectable before they are visible, often weeks before failure.
A model trained on labelled data from healthy and degrading equipment runs inference continuously on the asset itself — the motor, the pump, the compressor — using the sensor already measuring its condition. Degradation is flagged as it develops. The maintenance team receives a specific alert, not a raw data dump to interpret.
For pumps in water treatment infrastructure, detecting cavitation early prevents impeller damage that would otherwise require pump replacement. In a wastewater monitoring application, an LBN running on the pump controller's existing MCU detected cavitation and bearing degradation patterns at milliwatt power levels, with no cloud infrastructure in the critical path. The same monitoring system using cloud inference would have required reliable connectivity to remote pump stations — a dependency that could not be guaranteed across the network.
The energy efficiency of on-device inference matters acutely for wireless predictive maintenance sensors: battery life is the practical constraint that determines how densely an asset can be monitored and how frequently the monitoring system requires service. See the battery-powered AI section for the field deployment figures.
Listen to your equipment.
Machines communicate their condition acoustically. A bearing beginning to fail sounds different from a healthy one. A leaking valve sounds different from a closed one. A compressor running under cavitation has a distinct acoustic signature. Industrial equipment in normal operation has a characteristic sound pattern; deviation from it carries information.
Acoustic monitoring has historically been difficult to deploy at scale because of what it implies for data handling. Recording and transmitting audio raises immediate concerns: bandwidth cost, privacy implications for audio from occupied spaces, and regulatory constraints in some industries. The bandwidth alone is substantial — audio sampled at even modest quality generates far more data per second than a vibration or temperature sensor.
The practical alternative is local acoustic classification. A model running on the MCU processes audio frames in real time, classifying the acoustic signature rather than transmitting the signal. What leaves the device is a classification result: "normal", "impact detected", "cavity resonance", "bearing anomaly". The audio data never leaves. The bandwidth consumed is negligible compared to raw audio streaming.
For this to be practical on embedded hardware, the model must fit within kilobytes of SRAM and execute within a power budget that sustains continuous operation. The preprocessing — computing mel-frequency cepstral coefficients or a frequency spectrum from the raw audio signal — runs in firmware before inference. The inference itself, running on the output of that preprocessing, is where the model's memory and compute requirements become binding constraints.
Applications include machinery condition monitoring, acoustic leak detection in pressurised pipework, smart building occupancy sensing (where privacy concerns make audio transmission unacceptable), and wearable medical devices that monitor respiratory or cardiac sounds. Each requires a different trained model but the same architectural approach: preprocess locally, classify locally, transmit the result.
Anticipate demand. At any scale.
Demand forecasting is not traditionally an edge AI problem. The computation is substantial and the data is historical; cloud analytics platforms have been the standard approach. For high-frequency forecasting on large SKU counts, that standard approach has a latency and infrastructure cost that logic-based models challenge directly.
The supply chain forecasting deployment on record is illustrative. A UK food supply company managing 2,400 product lines needed demand forecasts at a minimum of eight times per day. Their existing GPU-based neural network could produce one complete forecast per day, with error margins around 25% weighted mean absolute percentage error. The business need — eight forecasts, not one — was unachievable with the existing architecture regardless of how the infrastructure was scaled.
An LBN-based forecasting model deployed on a single CPU produced a forecast per product line in 0.03 seconds. All 2,400 product lines forecast in 40 seconds. Running continuously, 24 hours a day, with an 18-month forecast horizon, at 6.42% weighted mean absolute percentage error — substantially more accurate than the neural network it replaced, on a fraction of the infrastructure.
The pattern here differs from IoT sensor applications: the hardware is a server CPU, not a microcontroller, and the data is tabular supply chain data rather than real-time sensor readings. The underlying property is the same: a logic-based model designed for CPU-native execution outperforms a neural network designed for GPU execution, when both are run on CPU hardware. The inference is faster, the accuracy is higher, and the infrastructure cost is a server rather than a GPU cluster.
For any high-frequency forecasting application where the volume of predictions makes cloud GPU inference either too slow or too expensive, the same architecture applies. The CPU versus GPU inference comparison covers this in more detail.
Know battery state in real time.
Battery degradation is gradual, non-linear, and highly variable between individual cells and usage patterns. Accurate state-of-health estimation matters in electric vehicles, industrial battery systems, grid storage, and consumer electronics. The models that produce this estimation must run on the battery management system MCU — the same processor already managing charge and discharge — completing inference in microseconds from a battery they are simultaneously monitoring.
State-of-health estimation is a regression problem: the model predicts a continuous value (remaining capacity as a percentage of original) rather than a discrete class. It is trained on historical charge-discharge data, voltage and temperature profiles, and cycle count data from batteries of the same chemistry and form factor. The resulting model learns the degradation pattern specific to that battery type.
Running this model locally on the BMS MCU provides continuous real-time state-of-health estimates without transmitting cell-level electrical data to a cloud service. For consumer products where that data is sensitive, or industrial systems in environments without reliable connectivity, local inference is the only practical approach.
The memory constraint on BMS microcontrollers is acute: these are typically low-cost MCUs with limited flash, running firmware that already has demanding real-time scheduling requirements. Neural network inference on the same hardware would require either significant memory that is not available or aggressive quantisation that compromises accuracy. An LBN model for state-of-health estimation fits within the available flash budget without this trade-off.
Deterministic safety on the road.
Automotive control systems — ADAS, powertrain management, chassis dynamics, traction control — require inference latencies measured in milliseconds and output that is deterministic. A model that produces different results for the same input cannot be validated against automotive safety standards. Cloud dependency is not viable in a moving vehicle operating across variable connectivity environments.
Edge AI in automotive applications is not a new concept. The challenge has been fitting models within the SoC budget already allocated to automotive embedded control. Every neural network approach that demonstrated accuracy at the task level ran into the same constraint: it required a more capable and more expensive SoC than the one already in the vehicle's electronic architecture. The cost of adding a new SoC — not just the bill of materials cost, but the qualification, certification, and integration cost — often made the project commercially unviable.
An ADAS Tier 1 customer needed to replace a physical sensor with an AI model for vehicle dynamics classification. The target hardware was a PowerPC e200, a processor already present in their electronic architecture. Neural network approaches required a 200% increase in SoC cost — a figure that eliminated the component cost savings the project was designed to achieve. The system integrator's solution failed on the same hardware.
An R-LBN — a Recurrent Logic-Based Network — running on the existing PowerPC e200 completed inference in four microseconds, with a 4 KB memory footprint, 13.4 times faster than the neural network that failed to meet the requirement. The SoC remained unchanged. The qualification programme was unaffected. The automotive business case held.
Beyond performance, automotive safety standards including ISO 26262 have specific implications for AI systems in safety-related functions. Deterministic output — the same input always producing the same result — simplifies the validation process considerably. LBNs are deterministic by architecture.
Control processes at the source.
Industrial processes — chemical manufacturing, food production, semiconductor fabrication, pharmaceutical production — have operating parameters that must be maintained within tight tolerances. Deviations from those tolerances produce waste, defects, or safety events. Real-time monitoring and control requires inference that matches the timescale of process dynamics.
The challenge in process control is not the accuracy of AI classification. Supervised learning models for process state classification achieve high accuracy when trained on representative data. The challenge is getting inference into the process control layer: the PLCs, RTUs, and embedded controllers that actually actuate process equipment.
These systems are frequently not recent hardware. Industrial control infrastructure has long replacement cycles. A PLC installed in 2010 may be running a process that will not be replaced for another decade. Neural network inference frameworks were not designed for this environment and do not run on it. A model delivered as a portable C library with no external dependencies runs on it immediately.
The classification task varies by process: detecting temperature excursion in pharmaceutical manufacturing, classifying mixing quality in food production, identifying etch uniformity drift in semiconductor fabrication. In each case the structure is the same: continuous sensor readings, a model trained to classify operating states, on-device inference that flags deviations before they propagate through the process.
Scale intelligence, not data.
The challenge of large IoT sensor networks is not data collection. Sensors generate data at rates that are technically manageable. The challenge is that raw sensor data at scale is expensive to transmit, expensive to store, and requires a further processing layer before it produces actionable information. Inference at the node inverts this: instead of transmitting data to be understood, the node transmits understanding.
A network of 1,000 vibration sensors, each sampling at 1 kHz, generates 1 million readings per second of raw data. Transmitting and processing this at useful latency requires significant infrastructure. Inference at each node — classifying the vibration signature locally — reduces the upstream data to event flags. Each node transmits "normal", "bearing anomaly detected", or "cavitation" rather than a continuous numerical stream. The infrastructure required to handle event flags from 1,000 sensors is orders of magnitude smaller than the infrastructure required to process 1 million sensor readings per second.
This pattern applies across factory floor monitoring, smart infrastructure (water networks, bridges, power distribution), precision agriculture, environmental sensing, and smart building systems. The underlying benefit is consistent: the network carries signal, not noise. The cloud receives classification results at low bandwidth. Each node operates autonomously, regardless of network availability, and transmits summaries when connectivity is present.
For the full technical picture on how on-device intelligence changes IoT network architecture, see AI for IoT.
From sensor data to deployed model.
The common requirement across all the applications above is the same: a trained AI model that fits within the memory, power, and compute constraints of real deployed hardware, produced from operational data, and integrated into existing firmware without restructuring it.
ModelMill is the platform that handles this workflow. Upload labelled data from your application domain, define the target hardware and performance constraints, and ModelMill trains, benchmarks, and packages a Logic-Based Network for deployment. The output is a C-code SDK: trained model, inference engine, build configuration, and integration documentation.
The result runs on Arm Cortex-M, RISC-V, ESP32, or x86 hardware. Without a GPU. Without cloud inference costs. Without requiring an ML engineering team embedded in the firmware group.
Train your model