AI chips: artificial intelligence “at the edge”

| |
1 Star2 Stars3 Stars4 Stars5 Stars

Dedicated AI chips take machine learning from the cloud directly on to devices. Even newcomers, such as automobile manufacturers and online mail order companies, are developing chips for deep learning.

Regardless whether it’s the Industry 4.0, Internet of Things (IoT) or driverless vehicles: Artificial intelligence (AI) can be found in almost every future technology. The aim is to imitate the functions of the human brain in the form of artificial neural networks. Standard CPUs are not suitable for this. And neural networks demand too little and too much of these all-purpose processors at the same time:

they access only a few of the many CPU functions – but they do this in extremely fast sequences, which means they require enormous processing capacity. More suitable for this purpose are graphics processing units (GPUs), which execute certain processing operations very quickly. As a result, GPU producer Nvidia has become the leading hardware supplier for AI solutions. But even the processor architectures of GPUs are not especially designed for deep learning.

Special chips for deep learning

This is why many producers are developing dedicated chips for artificial intelligence. With the help of these specialized chips – they provide only the necessary functions but are very efficient – deep learning will give systems capabilities that humans could either not program or not at a reasonable expense. To do this, they are trained with huge volumes of data, such as images, text, and language. They get feedback about their recognition abilities and, as a result, over time, they are better able to filter out what is important to solve a task.

The first generation of AI processors, which include Google’s first Tensor Processing Unit (TPU) and Nvidia’s Kepler GPUs, had functional units that worked in parallel with each other, although, in theory, the maximum possible performance was hardly achieved in practice. The memory connection was the bottleneck. The second chip generation – for example, Google’s TPU v2, Nvidia’s Volta GPUs, and Microsoft’s Brainwave chip – had large, fast memory chips. The third generation is made up of processors with long short-term memory (LSTM) including a forgetting component.

Neural networks can be compressed to a tenth of their original size with no increase in their error rate. This principle of thinning out nodes and connections is based on nature: while the human brain has only around 50 trillion synapses at birth, this increases to 1,000 trillion during the first year of life. As people age, this decreases to 500 trillion without them becoming dumber. In the future generation of AI accelerators, AI will create better AI: hardware is being developed especially for specific neural networks. Laboratory tests are being carried out, for example, with the Squeezelerator for SqueezeNext, the DeePhis DPU v2 for depthwise convolution, and the ShiftNet Accelerator for ShiftNet.

Neuromorphic Computing

Intel developed the AI processor Nervana Neural Network  Processor (NNP). By 2020, AI calculations should be up to 100 times faster than today. With its Loihi chip, Intel is counting on neuromorphic computing. With its network of artificial neurons and synapses it will teach itself without training. The chip combines training and inference. With TrueNorth, IBM is distancing itself completely from conventional computer architecture: the chip imitates a neural network in a hardware format.

GPU producer Nvidia is looking at autonomous robots and is developing the AI system-on-a-chip (SoC) Jetson Xavier with Volta GPU with Tensor Cores, 8-core ARM64-CPU, two Nvidia Deep Learning Accelerators (NVDLA), and processors for images and videos. Jetson Xavier has nine billion transistors and delivers 30 trillion operations per second – at just 30 watts. Processor IP developer ARM integrates NVDLA deep learning architecture in its AI processors on the Trillian platform. It will support machine learning for mobile, embedded, and IoT applications.

The MIT has combined computing and data storage. Its new AI processor is said to consume up to 95 percent less energy and increase the speed of computations by seven times over previous AI chips.

Cognitive Edge Computing: AI shift from the cloud to devices

In order to learn, neural networks need enormous volumes of data. AI-supported applications also need high processing power. Therefore, the necessary computing operations take place mainly in data centers in the cloud and can be used only with an Internet connection. As a consequence, limited network bandwidths, latencies, and data protection standards exclude many AI applications right from the outset. But things are about to change.

The next development phase involves shifting artificial intelligence from the cloud to the edge of the network (edge), directly into devices. Machine learning is increasingly used in robots, tablets, smartphones, cameras, IoT devices, such as sensors, medical equipment, and other autonomous edge devices. In this way, much of the data can be processed directly on site in real time, which also saves resources. Generally, training (acquiring knowledge) still takes place in the cloud, while inferencing (drawing conclusions from the knowledge, which is the actual intelligent part of the process) is moved to the edge.

AI processors for smartphones

Kirin 970 from Huawei is one of the first smartphone processors with a special computing unit – the neural processing unit (NPU) – for artificial intelligence in addition to the usual main and graphics processors. The 8-core SoC based on the big-little principle is used in Huawei’s Mate 10. The NPU is said to carry out computing-intensive tasks, such as image and speech recognition, up to 20 faster than conventional CPUs.

The smartphone camera uses the processing power to recognize people, animals, and flowers and to adjust the picture settings accordingly. Apple has also given its latest iPhones an AI unit. It is used for Face ID and, after a learning phase, reliably recognizes the user’s face, even with a beard or glasses. In Pixel 2 and Pixel 2 XL from Google, the Pixel Visual Core chip speeds up taking photos by a factor of five in the HDR+ mode of the smartphone camera.

But machine learning at the edges is also subject to some restrictions in terms of performance, memory, and power consumption. Current devices still have to rely on the cloud for many tasks.

Automobile manufacturers: in-house production of AI chips for autonomous driving

With autonomous driving it is also important that there is enough processing power in the vehicle to recognize objects and interpret the surroundings. For example, the decision whether to brake if a child runs on to the road cannot be outsourced to the cloud.

Recently, Tesla boss Elon Musk – a fan of vertical integration – announced that Tesla will produce the chips for the self-driving function of its automobiles itself in the future. Processor development is led by former AMD chip architect Jim Keller. Tesla’s in-house AI chips will form the backbone for complete autonomous level-5 driving and replace the Nvidia GPUs that the vehicles currently use.

It is planned that the Tesla chips will be installed and be backward-compatible from 2019. With its self-driving functions, the car producer uses cameras rather than LIDAR scanners. The new chips will be able to process 2,000 camera images per second – ten times more than with the current Nvidia technology.

The Google subsidiary Waymo, in collaboration with Intel, is taking a similar approach with its self-driving test vehicles. General Motors is also working on AI processors and plans to operate autonomous robot taxis from 2019. Daimler and Bosch are developing a system for fully automated driving in SAE levels 4 and 5. The algorithms for moving the vehicles are developed using machine learning and will run on a system with AI processors and operating software from Nvidia based on the Drive Pegasus platform.

The system will be able to carry out real-time operations with hundreds of trillions of operations per second. Daimler plans to test its autonomous shuttle fleet in Silicon Valley in 2019. Serial production is scheduled for the start of the coming decade.

China is investing is AI chips

China wants to be at the forefront of AI chip development and has therefore invested enormous sums of money over the past years to expand its chip industry. Alibaba is a newcomer with regard to AI chips. The Chinese e-commerce conglomerate has announced a quantum computer and plans to put an in-house developed AI inference chip for machine learning called Ali-NPU on to the market in the second half of 2019. It is to be used in autonomous vehicles, smart cities, and logistics.

electronica 2018

Learn more about the latest embedded processors for machine learning and adaptive systems at the Embedded Platforms Conference.

Huawei Kirin 970