The introduction of Large Language Models (LLMs) opens up new possibilities, such as interaction with machines via voice assistants.
We have looked at three typical challenges for this example:
1. Challenge: Resource requirements
A key question when using LLMs on edge devices is how much power is actually required. The model size and the so-called context window, i.e. the area that a model can take into account when generating text, play a role here. These factors are decisive for the choice of suitable edge hardware. We have tested this on two systems: an experimental system based on the "Jetson Orin Nano" from NVIDIA and an industrial edge computer from WAGO that is conceivable for real-life use.
The performance of the models was measured using the metric "token processing per time". A token is an elementary component of an LLM and represents individual parts of a text. When comparing the two devices with the Meta Llama 2 model with 7 billion parameters, it quickly became clear that a GPU can provide a significant performance boost. In the context of industrial automation, this resulted in the desire for devices with a passively cooled, industrial-grade GPU.
2. Challenge: software architecture
Using LLMs on less powerful devices brings challenges such as dealing with Python code, dependencies, version conflicts and unpredictable library lifecycles. In an industrial environment where sustainable solutions are required, this conflicts with the rapid pace of development in the AI industry.
A microservice architecture has proven to be an effective solution for us. It enables independent implementation and deployment using container technology. It also simplifies the handling of model selection and dependencies.
3. Challenge: Selecting suitable models
A microservice architecture allows us to quickly integrate and test new models. It is particularly exciting that quantisation can play a special role for devices with low resources. Significant savings can be achieved by reducing the data type (e.g. from 16-bit float to 4-bit integer). However, this is usually accompanied by lower accuracy in the execution of the models. Another important factor is the number of parameters. Our tests have shown that models with 3 to 7 billion parameters are useful.
Overall, Edge AI and LLMs offer a promising combination for the future by opening up new fields of application. The associated challenges can be overcome by using suitable hardware, a well thought-out software architecture and careful model selection.