Deci’s NLP Model reaches breakthrough levels in performance at MLPerf, up to 6.46x gain with AMD EPYC CPUs
Developed by Deci’s Automated Neural Architecture Construction (AutoNAC) technology, the NLP model, dubbed DeciBERT-Large, ran on the Dell-PowerEdge-R7525-2 hardware using the AMD EPYC 7773X processor. The consequent model exceeded the throughput performance of the BERT-Large model by almost six and a half times more and earned a one percent accuracy boost. The improvement summarizes reductions in cloud cost, enabling more processes to operate on one machine for a portion of the time. It also enables groups to use a more cost-efficient machine while maintaining accurate throughput performance. The new model was presented under the offline scenario in MLPerf’s open division in the BERT 99.9 category. The objective was to maximize throughput while preserving the accuracy within a 0.1% margin of error from the baseline, which is 90.874 F1 (SQUAD). The DeciBERT-Large model exceeded these goals, achieving a throughput of 116 QueriesPer Second (QPS) and an F1 accuracy score of 91.08. As you can see in the table below, the AMD EPYC 7773X Milan-X chip delivers up to a 6.46x performance bump over the BERT-Large model. Deci leveraged its proprietary automated Neural Architecture Construction technology (AutoNAC) engine to develop a new model architecture tailored for the EPYC AMD processor. AutoNAC, an algorithmic optimization engine forging best-in-class deep learning model architectures for any assignment, data set, and inference hardware, generally powers up to a five times increase in inference performance with similar or higher accuracy close to state-of-the-art neural models. FP32 INT8 MLPerf gathers expert deep learning leaders to create fair and useful benchmarks for calculating the training and inference execution of ML hardware, software, and services. — Prof. Ran El-Yaniv, Deci’s chief scientist and co-founder Deci’s NLP inference acceleration directly decodes into reductions in cloud costs, allowing more processes to execute on the same machine in a smaller amount of time. It enables teams to use cost-efficient machines while retaining the same throughput performance. More increased throughput for some NLP applications, such as answering questions, means a better user experience as the queries are processed quickly, and insights can be rendered in real-time. News Source: Deci