China's Artificial Intelligence Application Scale Rapidly Growing: Daily Token Consumption Has Exceeded 30 Million Billions
China Net Financial News, August 14th. On August 14th, the Office of the State Council held a press conference to introduce the achievements in building a digital China during the "14th Five-Year Plan" period and answer questions from reporters.
The following is a record of the press conference:
Reporter: High-quality, massive data are the core driving forces for deep deployment of "Artificial Intelligence +". Can you tell us what work has been done by the government at the national level to promote high-quality data for artificial intelligence?
Liu Li-hong, member of the party committee and director of the National Data Bureau:
Some experts say that computing power is the skeleton, algorithms are the nerves, and data are the blood. As one of the three core elements of artificial intelligence development, data plays a crucial role in promoting "Artificial Intelligence +". Especially high-quality datasets, such as annotated medical image datasets in the healthcare field, can increase model disease diagnosis accuracy by over 15%. In the era of artificial intelligence, tokens, also known as word units, serve as the smallest data unit for processing text, just like internet traffic. As of early 2024, China's daily token consumption was 1 trillion, and by the end of June this year, it had exceeded 30 trillion. This reflects the rapid growth of China's artificial intelligence application scale.
China's rapid development of artificial intelligence is inseparable from its emphasis on data work. China is the first country to treat data as a production factor, and has taken multiple measures to promote the development and utilization of data resources. We stress that "Artificial Intelligence + action" will not be possible without high-quality datasets. We have vigorously promoted the supply of high-quality datasets, issued relevant files for building such datasets, and jointly promoted related work with multiple departments. We have guided the National Data Standardization Technical Committee to research and formulate relevant standards and technical documents, organized pilot projects and exemplary cases in various fields, and established a number of typical solutions.
We will continue to push forward high-quality dataset construction work. By the end of June this year, China had built over 35,000 high-quality datasets, with a total volume exceeding 400 petabytes (1 petabyte can store approximately 5 billion 2MB-sized high-definition photos, and 400 petabytes is equivalent to the digital resources of the National Library of China). The training of artificial intelligence models has also driven up data trading demand. By the end of June this year, the cumulative transaction volume of high-quality datasets had reached nearly 40 billion yuan, with a total scale of 246 petabytes. For example, Beijing Data Exchange has seen its high-quality dataset transactions account for over 80% of the total volume.
Shanghai, Tianjin, and Anhui are currently piloting new models such as "data resource-based stock allocation" to guide enterprises to convert high-quality datasets into equity investments. The development of high-quality datasets requires support from the data annotation industry. We have established 7 data annotation bases in Chengdu, Shenyang, and Hefei to help build high-quality datasets.
Domestic data plays a crucial role in improving the training performance of large models. Everyone is concerned about the proportion of domestic data in training data. After some effort, most models have used domestic data with an occupancy rate exceeding 60%, and some models have reached as high as 80%. The development and supply capacity of high-quality domestic datasets continue to improve, driving the rapid growth of China's artificial intelligence model performance.
In the next step, we will continue to push forward high-quality dataset construction through systematic planning, accelerate the construction of intelligent bodies, low-carbon economies, and biomanufacturing key fields, and promote the recognition of data factors' value by all society, accelerating the development of data factor value creation. We will also cultivate a market consensus around "buying high-quality data".