Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N1 2023 year

DOI: 10.17587/prin.14.3-11
Specialized Neural Network Hardware Accelerators
V. V. Korneev, Principal Researcher, korv@rdi-kvant.ru, Research and Development Institute Kvant, Moscow, 125438, Russian Federation
Corresponding author: Victor V. Korneev, Principal Researcher, Research and Development Institute 'Kvant', Moscow, 125438, Russian Federation, E-mail: korv@rdi-kvant.ru
Received on November 09, 2022
Accepted on November 17, 2022

It seems that by now enough samples of specialized neural network microprocessor crystals and systems based on them have already been created to indicate the trends of their development, and most importantly, their place in the overall development of supercomputer architectures and technologies. The use of low-bit representations of numbers, such as FP8, INT8, BF16, acceptable in neural network computing, allows, on the one hand, to achieve the performance of the 2015 FP8 TFLOPS, 1008 BF16 TFLOPS crystal, and, on the other hand, to reduce the energy consumption of the multiplication operation. Low bit depth caused attention to rounding errors. In a number of crystals, the set of rounding modes has been expanded in comparison with the generally accepted standard and the possibility of programmatically setting the rounding mode has been introduced. In addition, the validity of the creation of specialized neuroprocessor crystals is due to the use of structural programming elements, in which a computer is programmatically formed for an executable algorithm. Therefore, along with reduction the bit depth and support for processing sparse neural networks, in computing systems created on the basis of SambaNova SN30 RDU, Graphcore Colossus MK2 IPU, Untether AI Boqueria, AWS Trainium1, Tesla Dojo D1, there is the possibility, to some extent, of implementing structural programming of calculations. The sparsity of the processed data caused the abandonment of cache memory and the use of on-chip large scratchpad memory with increased bandwidth for data delivery between memory and arithmetic logic devices, as well as between memory and on-chip and inter-chip communication fabric. Therefore, we can talk about a different hierarchical structure of memory compared to the traditional one using cache memory. Thus, specialization in neural network algorithms has led to the emergence of massively parallel systems architectures for processing low-bit data formats with poor temporal and spatial localization of memory requests.

Keywords: neuroprocessor VLSI, on-chip memory, on-chip communication fabric, low-bit number formats
pp. 3–11
For citation:
Korneev V. V. Specialized Neural Network Hardware Accelerators, Programmnaya Ingeneria, 2023, vol. 14, no. 1, pp. 3—11.
References:
  1. Barth A. Amazon EC2 TRN1 instances for high-per-formance model training are now available, available at: https:// aws.amazon.com/blogs/aws/amazon-ec2-trn1-instances-for-high- performance-model-training-are-now-available/ (date of access 13.10.2022).
  2. Markowitz D. Transformers, Explained: Understand the Model Behind GPT-3, BERT, and T5, available at: https://daleonai.com/transformers-explained (date of access 01.11.2022).
  3. Le K. A quick overview of ResNet models, available at: https://medium.com/mlearning-ai/a-quick-overview-of-resnet-models-f8ed277ae81e (date of access 01.11.2022).
  4. Intelligence Processing Unit, available at: https://www. graphcore.ai/products/ipu. (date of access 01.11.2022).
  5. Jia Zh., Tillman B., Maggioni M., Scarpazza D. P. Dissecting the Graphcore IPU Architecture via Microbenchmarking. Technical Report. December 7, 2019. arXiv:1912.03413v1 [cs.DC] 7 Dec 2019. 91 p.
  6. Valiant L. G. A bridging model for parallel computation, Communications of the ACM, 1990, vol. 33, no. 8, pp. 103—111. DOI: 10.1145/79173.79181.
  7. Poplar Graph Framework Software, available at: https:// www.graphcore.ai/products/poplar (date of access 01.11.2022).
  8. Robinson C. Untether.AI Boqueria 1458 RISC-V Core AI Accelerator, avaiable at: https://www.servethehome.com/untether-ai-bo-queria-1458-risc-v-core-ai-accelerator-hc34/ (date of access 15.11.2022).
  9. Yen I. E., Xiao Zh., Xu D. S4: a High-sparsity, High-performance AI Accelerator. 16 Jul 2022, arXiv:2207.08006v1 [cs.AR].
  10. Zhu M., Gupta S. To prune, or not to prune: exploring the efcacy of pruning for model compression, 2017, arXiv preprint arXiv:1710.01878.
  11. Gale T., Elsen E., Hooker S. The state of sparsity in deep neural networks, 2019, arXiv preprint arXiv:1902.09574.
  12. Frankle J., Carbin M. The lottery ticket hypothesis: Find­ing sparse, trainable neural networks, 2018, arXiv preprint arXiv:1803.03635.
  13. Peckham O. SambaNova Launches Second-Gen DataScale System, available at: https://www.hpcwire.com/2022/09/14/sambanova-launches-second-gen-datascale-system/ (date of access 15.11.2022).
  14. Kennedy P. SambaNova SN10 RDU at Hot Chips 33, available at: https://www.servethehome.com/sambanova-sn10-rdu-at-hot-chips-33/ (date of access: 01.11.2022).
  15. AWS Neuron Documentation, available at: https://awsdocs-neuron.readthedocs-hosted.com (date of access 01.11.2022).
  16. Hot Chips 34 — Teslas Dojo Microarchitecture, available at: https://chipsandcheese.com/2022/09/01/hot-chips-34-teslas-dojo-microarchitecture/ (date of access 15.11.2022).
  17. GPU NVIDIA A100, available at: https://www.nvidia.com/ru-ru/data-center/a100/ (date of access 01.11.2022).
  18. Sapunov G. FP64, FP32, FP16, BFLOAT16, TF32, and other members of the ZOO, available at: https://moocaholic.me-dium.com/fp64-fp32-fp16-bfloat16-tf32-and-other-members-of-the-zoo-a1ca7897d407 (date of access 15.11.2022).
  19. Elizarov G. S., Konotoptsev V. N., Korneev V. V. Specialized large integrated circuits for the implementation of neural network inference. XXII International conference "Kharitonov thematic scientific readings "Supercomputer modeling and Artificial Intelligence ": proceedings / Edited by R. M. Shagaliev, Sarov: FSUE "RFSC-VNIIEF", 2022, pp. 181—184 (in Russian).
  20. Korneev V. V. Approaches to improving the performance of neural network computing, Programmnaya ingeneria, 2020, vol. 11, no. 1, pp. 21—25. DOI: 10.17587/prin.11.21-25 (in Russian).