As the saying goes, time is money. That adage is as true of artificial intelligence as anything else. The longer training or inference takes, the more expensive an AI model typically is in terms of energy and water consumption. That leads machine learning engineers to pursue a variety of techniques in order to achieve better benchmarking. Key amongst them, data binarisation - simplifying training and inference input data by representing it with binary (0 and 1) digits.
But such simplification is not typically without trade-offs.
Data binarisation is a critical data transformation technique, converting continuous numerical data into a binary format, typically represented as 0 or 1. This binary approach simplifies complex datasets, allowing AI and machine learning models to process and interpret the information more efficiently. Owing to its numerical base, data binarisation is most commonly used within AI models to represent already numerical values, such as those found within time series analysis and anomaly detection. Literal Labs' AI models extend this, however, allowing for non-numerical data, including sounds and images, to be represented in binary before being passed to a Tsetlin machine for further processing.
In the realm of machine health monitoring, binarisation can be highly effective when applied to audio anomaly detection, something well reflected in successful ToyADMOS benchmarking.
For example, audio sensors continuously monitor the sound emitted by a machine. By applying binarisation, an AI model can set a threshold for normal operational noise levels. Any sound pattern that deviates from this expected range—such as grinding, screeching, or sudden changes in frequency—can be classified as an 'anomaly' (1), while regular sound levels are marked as 'normal' (0). This binary distinction enables AI models to swiftly flag abnormal machine behaviour, allowing operators to intervene before catastrophic failure occurs.
Binarisation can, similarly, be applied to AI for predictive maintenance through vibration monitoring.
Machines with moving parts generate specific vibration patterns during normal operation. However, as components wear out or faults develop, these vibration signatures change. By using binarisation, an AI model can learn the appropriate threshold for normal vibration based on historical data. For example, in the case of a motor, any vibration amplitude above a certain level—indicating excessive wear or imbalance—can be marked as ‘abnormal’ (1), while those within the acceptable range are classified as ‘normal’ (0). This binary labelling allows AI systems to automatically detect potential issues and schedule maintenance before more severe mechanical failure occurs, reducing costly downtime and repairs.
Of course, data binarisation isn't necessarily without limitations — else, all AI models would use it, thus reducing their environmental impact while increasing their speed.
In the machine learning and artificial intelligence space, accuracy is mission critical. But we did note that binarisation can lead to accuracy reduction. So how does Literal Labs use the technique?
Our novel approach to AI models has shown that data binarisation will have no significant impact on a model’s accuracy when underpinned by a series of other technologies. In fact, our benchmarking has shown our models to have an average accuracy variation of ±2% — put another way, while utilising binarisation, our pipeline is often able to build models that just aren't faster and more energy efficient than their neural network counterparts, they're more accurate as well.