Exciting new breakthroughs show off the superior AI capabilities of the Arm CPU.
Generative AI now on Mobile, Powered by Arm.
The cutting edge of generative AI has reached mobile, and it includes the popular, widely reported large language models (LLMs) of today.
This implies that instead of being transferred to the Cloud and back, AI generative inferences which can create images, films, and interpret words in context are beginning to be processed fully on the mobile device.
The latest AI-enabled flagship smartphones and the direct processing of LLMs on the Arm CPU are just two examples of the exciting new breakthroughs that show how Arm is the foundational technology that makes AI run everywhere when it comes to generative AI on mobile.
Modern cellphones with AI
These days, high-performance smartphones with AI capabilities that are based on Arm’s v9 CPU and GPU technology are available for purchase. These include the Samsung Galaxy S24, the Google Pixel 8, and the new Vivo X100 and X100 Pro smartphones, which are powered by the MediaTek Dimensity 9300.
These flagship mobile devices’ performance and efficiency are creating previously unheard-of prospects for AI innovation. Over the past ten years, Arm’s own CPU and GPU speed enhancements have actually doubled AI processing power every two years.
With additional AI performance, technologies, and features on our extensive consumer technology roadmap, this tendency will only continue to grow.
The process of leveraging a trained model, such as LLMs, to power AI-based applications will help this, as will the advent of AI inference at the edge. CPUs are best suited to meet this need when more specialized instructions and AI support are added.
With additional AI performance, technologies, and features on our extensive consumer technology roadmap, this tendency will only continue to grow.
Everything begins with the CPU.
The CPU is typically where artificial intelligence (AI) is first used in our favorite mobile devices. Some notable instances of this include face, hand, and body tracking, sophisticated camera effects and filters, and segmentation across various social media apps. Such AI workloads will either be fully handled by the CPU or may require assistance from accelerators such as GPUs or NPUs. Since our CPU designs are widely used in the SoCs of today’s smartphones, which are used by billions of people globally, Arm technology is essential to enabling these AI workloads.
As a result, today’s third-party programs, such as the newest social, health, and camera-based apps, among many others, use 70% AI and run on Arm CPUs. In addition to the designs’ widespread use, the adaptability flexibility and AI capabilities of the Arm CPU makes it the best technology for mobile developers to target for their applications’ AI workloads.
Arm CPUs are quite flexible and can operate many different types of neural networks in a wide range of data formats. In order to further benefit Arm’s industry-leading ecosystem, such as the Scalable Matrix Extension (SME) for the Armv9-A architecture, future Arm CPUs will incorporate additional AI capabilities into the instruction set architecture.
These enable developers around the globe to provide their AI-based applications with enhanced performance, cutting-edge features, and scalability.
With the help of the new Arm Kleidi Libraries, which will be integrated straight into AI frameworks, developers will be able to swiftly and efficiently create applications by having transparent access to the exceptional AI capabilities of the Arm CPU.
Arm offers a powerful compute platform that is facilitating the emergence of generative AI at the edge. This might include improvements in gaming, picture enhancement, language translation, text generation, and virtual assistants. This is made possible by the ecosystem support provided by top hardware and software providers.
LLM on the Arm compute platform for mobile
We created a virtual assistant demo for Mobile World Congress (MWC) 2024 that used a chat-based application to leverage Meta’s Llama2-7B LLM on mobile devices. Still, new models are coming out all the time, and we’re working hard to make LLM on Arm better.
When the latest Llama3 model from Meta and Phi-3 3.8B model from Microsoft came out, we worked quickly to run them on Arm CPUs on mobile. These new AI models are far more capable and can respond to a wider range of questions.
Our latest demo utilizes Microsoft’s Phi-3 3.8B model on mobile through ‘Ada’, a chatbot specifically trained to be a virtual teaching assistant for science and coding.
The generative AI workloads take place entirely at the edge on the mobile device on the Arm CPUs, with no involvement from accelerators. The impressive performance is enabled through a combination of existing CPU instructions for AI, alongside dedicated software optimizations for LLMs through the ubiquitous Arm compute platform that includes the Arm AI software libraries.
As you can see from the video above, there is a very impressive time-to-first token response performance and a text generation rate of just under 15 tokens per second that is faster than the average human reading speed.
This is made possible by highly optimized CPU routines in the software library developed by the Arm engineering team that improves time-to-first token and text generation significantly compared to native implementation of the models
The Arm CPU provides the AI developer community with opportunities to experiment with their own techniques to provide further software optimizations that make LLMs smaller, more efficient and faster.
Enabling more efficient, smaller LLMs means more AI processing can take place at the edge. The user benefits from quicker, more responsive AI-based experiences, as well as greater privacy through user data being processed locally on the mobile device. Meanwhile, for the mobile ecosystem, there are lower costs and greater scalability options to enable AI deployment across billions of mobile devices.
We are also excited to see the developer open-source community engaged in working with models on Arm. This was demonstrated by the fact that developers in the open-source community managed to have the new Llama3 and Phi-3 3.8B models up and running on Arm in around 48 hours. We look forward to seeing more open-source engagement with generative AI on Arm.
Find out more information about the previous Llama2-7B demo and current Phi-3 3.8B demo from the Arm engineers that developed them in this technical blog.
Driving generative AI on mobile
As the most ubiquitous mobile compute platform and leader in efficient compute, Arm has a responsibility to enable the most efficient and highest-performing generative AI at the edge. We are already demonstrating the impressive performance of LLMs that are running entirely on our leading CPU technologies. However, this is just the start.
Through a combination of smaller, more efficient LLMs, improved performance on mobile devices built on Arm CPUs and innovative software optimizations from our industry-leading ecosystem, generative AI on mobile will continue to proliferate.
Arm is foundational to AI and we will enable AI everywhere, for every developer, with the Arm CPU at the heart of future generative AI innovation on mobile.
Arm is foundational to AI and we will enable AI everywhere, for every developer, with the Arm CPU at the heart of future generative AI innovation on mobile.
Discover more from Postbox Live
Subscribe to get the latest posts sent to your email.