The Long and Mostly Short of China’s Newest GPT
Who said all large-language models (LLMs) necessarily need to be large? In China’s case, LLMs are currently downsizing in their size and number of parameters. According to sources, this is because the country is now focusing on enabling Chinese startups and smaller entities to build their own generative AI applications. As part of this downscaling trend, in June the Beijing Academy of Artificial Intelligence (BAAI) introduced Wu Dao 3.0, a series of open-source LLMs.
Based on interviews with high-ranking, anonymous sources involved in the project, IEEE Spectrum can report that Wu Dao 3.0 builds on the academy’s work with Wu Dao 2.0, a sparse, multimodal generative AI model—as has been widely reported about version 2.0—with 1.75 trillion parameters. Although there is no single set of parameters for Wu Dao 3.0 (it’s a range of models with a variety of parameter counts) all are well below the 1.75 trillion high-water mark that version 2.0 set.
“Wu dao” means “path to enlightenment” in Chinese. Parameters are the weights of the connections between digital “neurons” in the model, representing the relationships between words and phrases. The number of parameters in a model is a measure of its complexity. In a sparse model, only a small subset of the parameters is actually used, making them more efficient than dense models, because they require less memory and computational resources.
“Ultimately, [the government-funded BAAI] can only produce models with smaller parameters, to be used within other Chinese companies.”
—Hejuan Zhao, TMTPost
Rather than another Wu Dao 2.0-size behemoth, however, the Wu Dao 3.0 project is a collection of smaller, nimbler, dense models under the name Wu Dao Aquila, reflecting efforts to enable companies to easily adopt generative AI in their products, Spectrum’s sources say. (In Chinese, the smaller models are called “Wu Dao Tianying,” meaning “path to enlightenment eagle.” Aquila is the Latin for “eagle” and so the smaller models are referred to as the Aquila models in English)
According to these sources, Wu Dao 3.0 models include the following:
the AquilaChat Dialogue model, a 7-billion parameter dense model that BAAI claims has outperformed similar-sized mainstream open-source models both domestically and internationally; as well as an AquilaChat-33B model with, as the name implies, 33 billion parameters. AquilaChat-7B supports both English and Chinese knowledge and is trained on a corpus of English and Chinese languages, with the Chinese corpora comprising about 40 percent of the training material. the AquilaCode Model, designed for text-to-code generation. AquilaCode, still under development, is trained on a massive dataset of code and natural language, and it is able to generate simple programs, such as a program to calculate the Fibonacci sequence or a program to print the prime numbers below 100, and to generate complex programs, such as a program to implement a sorting algorithm or a program to play a game.the Wudao Vision series, which focuses on issues in the field of computer vision, including task unification, model scaling, and data efficiency. Its offerings include the multimodal models Emu; EVA, a billion-scale visual representation model; a general-purpose segmentation model; Painter, a universal vision model pioneering in-context visual learning; EVA-CLIP (Contrastive Language-Image Pre-training), the highest-performing open-source CLIP model; and vid2vid-zero, a zero-shot video editing technique.
In particular, EVA, a foundation model that explores the limits of visual representation at scale, Spectrum‘s sources say uses only publicly accessible data. EVA can be efficiently scaled up to one billion parameters and purportedly sets new records on a range of representative vision tasks, such as image recognition, video action recognition, object detection, instance segmentation and semantic segmentation without heavy supervised training.
In addition to the new models, BAAI has updated the FlagOpen large model technology open-source system that was launched earlier this year. FlagOpen includes parallel training techniques, inference acceleration techniques, hardware and model evaluation, and data processing tools. It is designed to be an open-source algorithm system and a one-stop foundational software platform that supports the development of large model technologies. BAAI has said it envisions FlagOpen as being like a Linux for large-scale models.
Why did BAAI shrink the Wu Dao models?
“Due to high costs, chip sanctions, and regulatory systems, large language models like Wu Dao 2.0 can’t be implemented,” said Hejuan Zhao, founder and CEO of TMTPost, one of China’s largest tech media outlets. “Ultimately, they can only produce models with smaller parameters, to be used within other Chinese companies.”
Open sourcing relatively smaller models may also be a strategic choice by BAAI, sources say, as the academy is a nonprofit research organization—and the return on investment for training another large model is low. (BAAI officials declined to comment on the record for this story).
The new Wu Dao 3.0 Aquila models have failed to garner much attention in China, possibly due to the similarity in parameter scale to other available open-source models, like Meta’s LLaMA and its recently announced open-source(ish) language model, Llama 2.
China’s LLM landscape is dominated by companies like Alibaba, Baidu, Huawei, and others. Baidu’s largest model, Ernie 3.5, is arguably the most powerful, though it still lags the performance of OpenAI’s GPT-4. And China’s more powerful models including Ernie 3.5, Huawei’s Pangos 3.0, and Alibaba’s Tongyi suite, remain proprietary.
Smaller, open-source models have lower inference costs—that is, how much it costs to run the model as it provides an output—and can be commercialized more readily. They are particularly suitable for niche applications, such as medical chatbots.
Training smaller models also requires fewer chips, making them less vulnerable to hardware shortages. Access to sufficient hardware, especially graphics processing units for model training, is a critical aspect of China’s burgeoning AI sector.
The U.S. government has imposed export restrictions on Nvidia’s A100 GPUs and forthcoming H100 chips to China, including Hong Kong. In response, Nvidia has released a cut-down, slower version of the A100, known as the A800, specifically for China. But any further tightening of U.S. export controls on cutting-edge chips would severely hamper model training in China. There is already an underground trade in Nvidia GPUs due to the high demand and short supply.
Presently, China’s focus is centered on the practical application of AI models. Encouraging open sourcing of not just models, but also datasets and computational resources, the Chinese government hopes to boost the nation’s overall AI development. By building a foundation for large models and promoting innovation through open-source collaboration, BAAI said it is trying to create an open-source ecosystem akin to Linux.
Update 28 July 2023 3 p.m. EDT: Information was added to provide more details about specific parameters in the Wu Dao 3.0 family of models.
IEEE Spectrum