Inspur Information launches metabrain R1 inference server, capable of unleashing the powerful module power of DeepSeek 671B on a single machine
On February 11th, Inspur Information officially launched the MetaBrain R1 inference server. Through system innovation and software hardware collaborative optimization, the DeepSeek R1 671B model can be deployed and run on a single machine, helping customers significantly reduce the difficulty and cost of deploying the DeepSeek R1 full parameter model, improve inference service performance, and accelerate the emergence of intelligent exploration in various industries.
Currently, DeepSeek is open-source with multiple versions of models, helping various industries accelerate the application of big model technology to promote business upgrading and transformation. Among them, the DeepSeek R1 671B model, as a fully parameterized basic large model, has stronger generalization ability, higher accuracy, and better context understanding ability compared to the distillation model. However, it also puts higher requirements on the system's video memory capacity, video memory bandwidth, interconnect bandwidth, and latency: at least about 800GB of video memory is needed for FP8 accuracy, and more than 1.4TB of video memory space is required for FP16/BF16 accuracy; In addition, DeepSeek R1 is a typical long thought chain model with the application characteristics of short input and long output. The inference decoding stage relies on higher video memory bandwidth and extremely low communication latency. Based on the computing power characteristics and system requirements of the 671B model, the metabrain R1 inference server provides leading video memory capacity, video memory bandwidth, and communication speed, which can help enterprises efficiently complete the localization deployment of DeepSeek full parameter models.
The metabrain R1 inference server NF5688G7 is a leading high-performance AI computing platform, native to the FP8 computing engine, with fast deployment speed and no accuracy loss for the DeepSeek R1 671B model. In terms of video memory, 1128GB HBM3e high-speed video memory is provided to meet the requirement of no less than 800GB of video memory capacity under the FP8 accuracy of 671B model. Even when a single machine supports full model inference, sufficient KV cache space is still reserved. The video memory bandwidth is as high as 4.8TB/s, perfectly matching the technical characteristics of the DeepSeek R1 model of "short input long output, video memory bandwidth sensitivity", and can achieve ultimate acceleration in the inference decoding stage. In terms of communication, GPU P2P bandwidth reaches 900GB/s, ensuring optimal communication performance for tensor parallel deployment on a single machine. Based on the latest inference framework, a single machine can support 20-30 concurrent users. At the same time, a single NF5688G7 is equipped with a 3200Gbps lossless expansion network, which can achieve agile expansion according to the growth of user business needs and provide a mature R1 server cluster Turnkey solution.
The metabrain R1 inference server NF5868G8 is a high-throughput inference server designed specifically for Large Reasoning Models. It is the industry's first to support 16 standard PCIe dual wide cards on a single machine, providing up to 1536GB of video memory capacity and supporting the deployment of DeepSeek 671B models on a single machine with FP16/BF16 precision. Innovative research and development of a 16 card fully interconnected topology based on PCIe Fabric, with a P2P communication bandwidth of up to 128GB/s for any two cards, reducing communication latency by over 60%. Through software hardware collaborative optimization, compared to traditional 2-machine 8-card PCIe models, NF5868G8 can improve the inference performance of DeepSeek 671B model by nearly 40%, and currently supports multiple AI acceleration card options.
Inspur Information is a leading global provider of IT infrastructure products, solutions, and services. By developing a new generation of system centric computing architecture, Inspur aims to create open, diverse, and green metabrain intelligent computing products and solutions. Inspur Information is committed to the research and innovation of AI computing platforms, resource platforms, and algorithm platforms, and collaborates with leading partners through the metabrain ecosystem to accelerate the innovation and application of artificial intelligence.
_________ The article is excerpted from Yuannao WeChat official account