Ali Dharma Institute has developed the world’s first integrated AI chip for deposit and calculation

Recently, Ali Dharma Institute has successfully developed a new architecture chip. This chip is the world’s first DRAM-based 3D bonded stacked storage and computing integrated AI chip, which can break through the performance bottleneck of the von Neumann architecture and meet the needs of artificial intelligence and other scenarios for high bandwidth, high capacity memory and extreme computing power. In certain AI scenarios, the performance of the chip is increased by more than 10 times, and the energy efficiency ratio is increased by up to 300 times.

The so-called “storage and computing integrated AI chip” refers to the transformation of the traditional computing-centric architecture into a data-centric architecture, which directly uses memory for data processing, thereby fusing data storage and computing in the same chip. Improve computational parallelism and energy efficiency, especially in the field of deep learning neural networks, such as wearable devices, mobile devices, smart homes and other scenarios.

This technology can be traced back to the 1960s. Kautz and others at the Stanford Research Institute proposed the concept of an integrated computer with storage and computing in 1969. Follow-up research mainly focuses on chip circuits, computing architectures, operating systems, and system applications. The University of California, Berkeley, Patterson and others successfully integrated the processor into the DRAM memory chip to realize a computing architecture that integrates intelligent storage and computing. However, due to the complexity of chip design and manufacturing cost, as well as the lack of big data application drivers, the early integration of storage and calculation has only remained in the research stage and has not been practically applied.

The storage-calculation integrated chip developed by Alibaba Dharma Institute integrates a number of innovative technologies and is the world’s first chip that uses hybrid bonding 3D stacking technology to achieve storage-calculation integration. The chip memory unit adopts heterogeneous integrated embedded DRAM (SeDRAM), which has the characteristics of ultra-large bandwidth and ultra-large capacity; in terms of computing unit, Dharma Academy has developed and designed a stream-type customized accelerator architecture to conduct an “end-to-end” recommendation system. “The acceleration of “, including tasks such as matching, coarse sorting, neural network calculations, and fine sorting.

Thanks to the innovation of the overall architecture, Dharma Institute’s storage-calculation integrated chip achieves high performance and low system power consumption at the same time. In the actual recommended system application, compared with the traditional CPU computing system, the performance of the storage-computing integrated chip is increased by more than 10 times, and the energy efficiency is increased by more than 300 times. The research results of this technology have been included in ISSCC 2022, the top conference in the chip field, and can be applied to scenarios such as VR/AR, unmanned driving, astronomical data calculation, and remote sensing image data analysis in the future.

How is the research progress at home and abroad?

In recent years, with the rise of application fields such as the Internet of Things and artificial intelligence, technology has been extensively researched and applied by academia and industry at home and abroad. In 2016, the team of Professor Yuan Xie from the University of California, Santa Barbara (UCSB) proposed to use RRAM to build a deep learning neural network (PRIME) based on the integrated architecture of storage and computing, which received widespread attention from the industry. The test result shows that compared with the traditional scheme based on the von Neumann computing architecture, PRIME can reduce the power consumption by about 20 times and increase the speed by about 50 times. This scheme can efficiently implement vector-matrix multiplication operations, and has broad application prospects in the field of deep learning neural network accelerators. In addition, Duke University, Purdue University, Stanford University, University of Massachusetts, Nanyang Technological University, Hewlett-Packard, Intel, Micron and other internationally renowned universities and companies have carried out relevant research work and released test chip prototypes.

my country’s research in this area has also achieved a series of results, such as the team of Professor Liu Ming from the Institute of Microelectronics of the Chinese Academy of Sciences, the team of Professor Huang Ru and Professor Kang Jinfeng of Peking University, the team of Professor Yang Huazhong and Professor Wu Huaqiang of Tsinghua University, and Song Zhitang from the Shanghai Institute of Microsystems, Chinese Academy of Sciences. The team of professors and the team of Professor Miao Xiangshui of Huazhong University of Science and Technology have successively released relevant device and chip prototypes, and have tested and verified them through applications such as image/speech recognition.

In the context of the current slowdown of Moore’s Law, the integration of storage and computing has become a key technology to solve the bottleneck of computer performance.

The Links:   LM130SS1T579 MIG20J503L

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *