AI for IoT

The ambition of IoT was always intelligence at scale. Sensors distributed across a factory, a water network, a cold chain, each contributing to a picture that makes operations sharper. The sensors are there. The connectivity is there. The data is flowing. What has been missing is intelligence at the point of measurement, rather than somewhere downstream of it.

01 • Intelligence at the IoT edge

From data collector to decision-maker.

Most IoT architectures share a common assumption: the sensor gathers, the cloud decides. Raw data leaves the device over a wireless link, passes through a gateway, crosses a network, and arrives at a server where a model makes sense of it. The decision, if it triggers an action, must travel back.

This works tolerably well when the network is reliable, the latency is acceptable, the data volumes are manageable, and the cloud infrastructure costs make economic sense at the deployment scale. Strip away any one of those assumptions and the architecture begins to fracture. Strip away more than one and you have a system that is expensive, fragile, and unsuitable for the environments where IoT intelligence is most valuable.

The alternative is on-device inference. Not just local processing — local intelligence. The model runs on the same microcontroller already gathering the data. Decisions happen in microseconds, at the point of measurement, regardless of network state.

The implications run deeper than reduced latency. When inference moves to the node, the entire data architecture changes. Instead of streaming raw sensor readings upstream, each node transmits a classified result: anomaly detected, state change, threshold exceeded. The network carries signal, not data. A deployment of ten thousand sensors that previously generated gigabytes of daily upload generates a fraction of that in meaningful events.

This matters for bandwidth economics, for battery life (transmitting data is expensive in energy terms), and for the reliability of the monitoring system itself. A system that depends on constant connectivity is only as reliable as the weakest link in the network chain. A system where inference runs locally is independent of that chain.

The problems of running AI with cloud-based data centres

No cloud round-trip

Inference runs on the sensor itself. Decisions happen locally, independently of network availability. Latency is microseconds, not the milliseconds of a cloud call.

No gateway bottleneck

Each node classifies its own data and transmits the result. The gateway aggregates events, not raw streams. Network infrastructure scales with the density of meaningful events, not with sensor sampling rates.

No data volume problem

Ten thousand sensors each making decisions locally send orders of magnitude less data than ten thousand sensors streaming raw readings. The compression happens at source, before it touches the network.

02 • What IoT hardware actually looks like

The MCU is the intelligence layer.

The typical IoT sensor node is a microcontroller. Arm Cortex-M series chips are by far the most common. ESP32 is standard in connected consumer products. RISC-V is gaining ground in industrial and cost-sensitive designs. These devices run at frequencies between 48 MHz and a few hundred megahertz, consume between one and a few hundred milliwatts depending on operating mode, and have RAM measured in kilobytes, not megabytes. They are capable, mature, and extraordinarily well-understood by the embedded engineering teams that work with them.

The question has never been whether these devices can run code. They run firmware. They handle sensor interfaces, wireless protocols, power management, and real-time control loops. The question has been whether they can run AI inference. For neural networks, the honest answer has been: not well, and sometimes not at all.

Memory

Industrial IoT nodes typically have between 32 KB and 512 KB of SRAM. A neural network, even after aggressive quantisation, often exceeds this. A model that works on a Raspberry Pi development board may be entirely impractical on the sensor node where the decision actually needs to happen.

Compute

Neural networks depend on floating-point matrix multiplication. On MCUs without a hardware FPU, this is emulated in software at significant cost in clock cycles and energy. The mismatch is not a matter of optimisation; it is architectural.

Power

A device running on a coin cell or energy harvester cannot sustain inference drawn from a power budget designed for development boards. Energy per inference determines whether a wireless sensor with a given battery can run for months or years. See the battery-powered AI analysis for the full picture.

Connectivity

Industrial environments — RF-contested factory floors, underground sewer networks, remote agricultural land — have intermittent wireless coverage at best. A monitoring system that requires cloud connectivity to make decisions is unreliable in precisely the environments where reliable monitoring matters most.

03 • The architecture problem

Why the model matters as much as the hardware.

Neural networks were designed for GPU hardware. Their core operation, floating-point matrix multiplication across many layers, is something GPUs handle with massive parallelism. On a microcontroller, the same computation runs on sequential hardware with no parallel execution capability. The result is slow, energy-intensive, and often impractical within the power and memory constraints of a real IoT node.

The industry's response has been TinyML: quantisation to reduce weight precision, pruning to remove low-magnitude connections, knowledge distillation to produce smaller models. These are genuine engineering achievements. They move the problem in the right direction. They do not solve it. A quantised neural network is still a neural network — it still requires multiply-accumulate operations, still has memory requirements that can exceed MCU limits, and still draws energy that battery-powered sensors cannot spare at continuous inference rates.

There is a second problem that quantisation cannot address: explainability. When a sensor on an industrial pump flags an anomaly, the engineers responding need to know why — which signal changed, which condition was detected, what the model observed that triggered the alert. Neural networks do not provide this. Their outputs emerge from weighted sums across hundreds or thousands of parameters. Useful for accuracy; opaque for diagnosis.

For IoT applications in regulated environments — water utilities, pharmaceutical manufacturing, safety-critical infrastructure — the ability to trace an AI decision back to its inputs is not a convenience. It is a compliance requirement. An algorithm that cannot explain its decisions creates operational and regulatory risk regardless of its accuracy metrics.

Too large.
After quantisation and pruning, neural networks often still exceed the SRAM available on production MCU nodes.

Too power-hungry.
Floating-point inference at useful frequencies drains batteries in weeks when the application requires years of unattended operation.

Too opaque.
When a sensor flags a fault, engineers need to understand what triggered it — not simply that the model's output exceeded a threshold.

04 • A different kind of algorithm

Logic instead of arithmetic.

Logic-Based Networks operate on propositional logic rather than floating-point arithmetic. Where a neural network multiplies weight matrices, an LBN evaluates AND, OR, and NOT conditions applied to binary inputs. These are operations that every digital processor executes natively — in a single instruction, without floating-point hardware, without a co-processor, on hardware that costs less than one dollar.

The practical consequence is an AI model that fits within the constraints of real IoT hardware rather than demanding hardware designed around it. The same Arm Cortex-M or ESP32 already gathering sensor data can also run inference, without modification, without additional silicon, and without compromising the power budget the product was designed around.

52×

less energy per inference

Architectural, not approximate

The energy reduction is not a product of compression or approximation. It comes from replacing floating-point matrix multiplication with bit-level logic operations. Those operations are among the cheapest computations in digital electronics. The gap is structural, not marginal. See battery-powered AI for field deployment figures.

54×

faster inference

Microseconds, not milliseconds

At sub-millisecond inference speeds, an IoT sensor can respond to an anomaly before the next sensor sample arrives. The detection loop closes at the device. No cloud call in the critical path, no latency accumulation across a network round-trip.

Kilobytes
of RAM

typical MCU footprint

Fits where neural networks cannot

Production LBN models for common IoT inference tasks fit within a few kilobytes of flash and RAM. An Arm Cortex-M0 with 32 KB of flash is a viable deployment target. The same task would be impractical for even an aggressively pruned neural network on the same hardware.

The explainability of LBN outputs is structurally different from what post-hoc explanation tools can achieve for neural networks. Every inference follows a path through logical clauses that engineers can read, verify, and reason about. When the model flags an anomaly, the triggering conditions are visible — not inferred after the fact, but native to how the model was built. For IoT applications where engineers must act on AI alerts, this changes how teams work with the system.

Deployment is a self-contained C SDK: model, inference engine, build configuration, and integration documentation. No runtime dependencies. No dynamic memory allocation. No cloud calls. The SDK drops into standard embedded C projects with a function call — typically inside the sensor sampling loop already present in the firmware.

Because inference uses bitwise operations rather than floating-point arithmetic, there is no FPU requirement. This matters significantly for the lower tiers of the Cortex-M range — M0 and M0+ parts have no FPU, and software-emulated floating point on these devices is slow enough to make continuous neural network inference impractical at useful frequencies. LBNs run natively on all Cortex-M variants, from M0 through M7, on ESP32, on RISC-V, and on x86. The same firmware SDK works across the full hardware range without modification.

Explore AI on microcontrollers for the detailed hardware compatibility picture.

05 • Applications

Where on-device intelligence changes what is possible.

The applications below share a structural property: each requires inference at the node, not downstream of it. Either the latency of a cloud round-trip is unacceptable, or the connectivity cannot be relied upon, or the power budget rules out the radio energy cost of streaming data, or all three. In each case, moving the intelligence to the device is not a design preference — it is a requirement.

Rotating machinery

Vibration and acoustic sensors on motors, pumps, and compressors detect bearing degradation, misalignment, and cavitation in real time. Inference runs on the sensor itself. The maintenance team receives a classified alert. The raw audio or vibration stream never leaves the device. For a pump controller on a water treatment site with intermittent wireless coverage, the cloud is not a reliable inference path — but the MCU already in the controller is.

Distributed infrastructure

Water networks, gas pipelines, and electricity distribution systems are geographically dispersed and often poorly connected. On-device LBN inference removes the cloud dependency entirely. Each node classifies its own sensor data, logs anomaly events locally, and transmits summaries when connectivity is available. The monitoring system continues to function regardless of network state. See the wastewater field deployment described in the battery-powered AI section.

Environmental monitoring

Air quality, flood-level, and structural health sensors share the connectivity problem of other remote infrastructure, plus a false positive problem. A sensor that transmits raw readings generates noise at the gateway. A sensor that classifies locally and transmits confirmed events generates signal. The distinction matters enormously for monitoring systems that need to be operationally useful rather than theoretically comprehensive.

Manufacturing quality

Inline quality inspection requires decisions within the production cycle. A cloud round-trip measured in tens to hundreds of milliseconds may be longer than the time between parts on a production line. On-device inference at microsecond speeds closes that gap. The decision happens at the production point rather than some distance downstream of it.

Smart agriculture

Soil, moisture, and crop condition sensors in remote fields have no reliable connectivity and no power infrastructure. On-device classification enables autonomous actuation — irrigation control, ventilation, pest alerts — without depending on a network path that is unavailable. Long battery life is the operational baseline; sensors that run for months instead of years require frequent maintenance visits that are expensive and logistically difficult at field scale.

Smart buildings and facilities

HVAC, occupancy, and energy management sensors generate continuous data across large commercial buildings. Local inference reduces the bandwidth that a smart building system consumes, and keeps control systems responsive when gateway or cloud connectivity degrades. For a building management system that must continue operating during a cloud outage, on-device inference is a reliability design decision, not just an optimisation.

Train and deploy on your IoT hardware.

ModelMill trains a Logic-Based Network on your sensor data and generates a self-contained C SDK: model, inference engine, build configuration, and integration documentation. Your firmware team includes the header, calls the inference function, and the model runs on the device already in your product.

No new silicon. No cloud dependency. No ongoing inference cost.

Get started with ModelMill