NVIDIA's Party Just Beginning

Nvidia's competitors are not AMD, but Google and Amazon.

Since the release of ChatGPT by Open AI, generative AI has become a major trend, and Nvidia's GPUs as AI chips have begun to gain popularity. However, there are two bottlenecks in the production of GPUs: TSMC's CoWoS and High Bandwidth Memory (HBM), leading to a global shortage of GPUs.

Among these GPUs, the demand for H100 is particularly high, with its price soaring to $40,000, triggering the so-called Nvidia "GPU frenzy."

In this situation, TSMC has doubled its production capacity, and DRAM manufacturers such as SK Hynix have increased the production of HBM, resulting in the delivery time for "H100" being reduced from 52 weeks to 20 weeks. So, will Nvidia's "GPU frenzy" end?

In this article, we will discuss whether Nvidia's "GPU frenzy" is about to end. First, the conclusion is that even by 2024, only 3.9% of shipments are needed for high-end AI servers required for the development and operation of ChatGPT-level AI. Therefore, the demand of cloud service providers (CSPs) such as Google, Amazon, and Microsoft seems to be根本无法得到满足. In summary, so far, Nvidia's "GPU frenzy" is just the beginning, and the comprehensive generative AI boom is about to arrive.

Advertisement

Next, let's briefly review the two bottlenecks of Nvidia GPUs.

Two Nvidia GPU Bottlenecks

In the production of Nvidia GPUs, the foundry TSMC is responsible for all front, middle, and back processes. Here, the middle process refers to the process of producing chips such as GPUs, CPUs, HBM, etc., and placing them on the square substrate cut from a 12-inch silicon wafer. This substrate is called the silicon interposer.

In addition, the Nvidia GPU packaging developed by TSMC is called CoWoS (Chip on Wafer on Substrate), but the two bottlenecks are the silicon interposer capacity and HBM.CoWoS was developed in 2011, but since then, with the improvement of GPU performance, the size of GPU chips has been increasing, and the number of HBMs installed in GPUs has also been increasing. As a result, the silicon interposer has been getting larger year by year, while the number of interposers that can be obtained from a single wafer has been decreasing in inverse proportion.

In addition, with the increase in the number of HBMs installed in GPUs, the number of DRAM chips stacked inside the HBM has also increased. Moreover, DRAM is miniaturized every two years, and the HBM standard is updated every two years to improve performance. Therefore, the demand for cutting-edge HBM is exceeding the supply.

Under these circumstances, TSMC has doubled its silicon interposer production capacity from around 15,000 pieces per month before and after the summer of 2023 to over 30,000 pieces per month before and after the summer of this year. In addition, Samsung Electronics and Micron Technology have been certified by NVIDIA and have begun to supply cutting-edge HBM, which was previously dominated by SK Hynix.

Affected by the above, the delivery time for NVIDIA H100, which has the highest demand, has been significantly reduced from 52 weeks to 20 weeks. So, how much has the shipment of AI servers increased as a result?

Definition of Two Types of AI Servers

According to the "Global Annual Server Shipment, 2023-2024" (Servers Report Database, 2024) released by DIGITIMES Research, there are two types of AI servers:

A system equipped with two or more AI accelerators but not equipped with HBM is called a "General AI Server."

A system equipped with at least four AI accelerators equipped with HBM is called a "High-End AI Server."The AI accelerators referred to here are specialized hardware designed to accelerate AI applications, particularly neural networks and machine learning. A typical example is NVIDIA's GPUs. Moreover, the development and operation of generative AI at the level of ChatGPT require a large number of high-end AI servers, not general-purpose AI servers.

So, how many units are shipped for general-purpose AI servers and high-end AI servers?

Shipments of General-Purpose AI Servers and High-End AI Servers

Figure 4 shows the shipments of general-purpose AI servers and high-end AI servers from 2022 to 2023. It is projected that the shipments of general-purpose AI servers will be 344,000 units in 2022, 470,000 units in 2023, and 725,000 units in 2024.

Simultaneously, the high-end AI servers required for the development and operation of ChatGPT-level generative AI are expected to ship 34,000 units in 2022, 200,000 units in 2023, and 564,000 units in 2024.

Can the shipments of high-end AI servers meet the demand of U.S. Cloud Service Providers (CSPs)?

Figure 5 shows the shipment numbers for servers, general-purpose AI servers, and high-end AI servers. Looking at the overall server market, the shipments are very low for both general artificial intelligence servers and high-end artificial intelligence servers.

When I researched how many high-end AI servers are needed for the development and operation of ChatGPT-level generative AI, I became even more disappointed.High-End AI Servers Required for Generating AI at the Level of ChatGPT

It is reported that the development and operation of ChatGPT require 30,000 NVIDIA DGX H100 high-end AI servers.

NVIDIA DGX H100 is equipped with eight H100 chips, and the price of each chip has soared to $40,000, making the total system cost reach $460,000. In other words, generating AI at the level of ChatGPT requires an investment of 30,000 units x $460,000 = $13.8 billion.

I believe that the world is saturated with generative artificial intelligence systems, but how many ChatGPT-like generative AIs have actually been built?

Given that the shipment of high-end AI servers in 2022 was 34,000 units, only one ChatGPT-level AI system could be built (which is exactly ChatGPT). The following year, in 2023, the shipment of high-end AI servers reached 200,000 units, so it is possible to build 6 to 7 ChatGPT-level AI systems. As it is expected that 564,000 high-end AI servers will be shipped in 2024, it will be possible to build 18 to 19 ChatGPT-level AI systems.

However, the above estimates assume that a ChatGPT-level AI can be built with 30,000 high-end AI servers "NVIDIA DGX H100". However, as a generation of AI may become more complex, more than 30,000 NVIDIA DGX H100 may be needed in this case. In summary, American communication service providers are unlikely to be satisfied with the current shipment of high-end artificial intelligence servers.

Now, let's take a look at how many high-end AI servers each end user (such as American CSP) has.

Number of High-End Artificial Intelligence Servers for End UsersIt shows the number of high-end AI servers for end-users. In 2023, Microsoft, which owns OpenAI, has the most high-end AI servers, with 63,000 units, but by 2024, Google will surpass Microsoft to have the most high-end AI servers.

The top five in 2024 are Google, ranking first with 162,000 units (5 systems), Microsoft ranking second with 90,000 units (3 systems), Supermicro ranking third with 68,000 units (2 systems), Amazon with 67,000 units (2 systems) ranking fourth. Finally, Meta ranks fifth with 46,000 units (1 system) (the numbers in parentheses are the number of systems that can be built by ChatGPT-like generative AI). It can be seen that the top five photothermal power generation companies in the United States monopolize about 80% of the share.

Next, let's look at the number of high-end AI servers shipped with AI accelerators. As expected, NVIDIA's GPUs are the most used AI accelerators, with 336,000 units expected in 2024. However, it is surprising that the second most popular company is not AMD, but Google.

Google has developed its own Tensor Processing Unit (TPU) as an AI accelerator. By 2024, the number of high-end AI servers equipped with this TPU will reach 138,000 units. Here, from Figure 8, we know that Google will have 162,000 high-end AI servers by 2024. Therefore, it is expected that 138,000 units will be equipped with Google's own TPU, and the remaining 24,000 units will be equipped with NVIDIA's GPUs. In other words, for NVIDIA, Google is both a customer and a formidable enemy.

Additionally, if we look at the shipments in 2024, AMD ranks third with 45,000 units, followed closely by Amazon with 40,000 units ranking fourth. Amazon is also developing AWS Trainium as an AI accelerator. If we wait a bit longer, AMD may be surpassed by Amazon.

In summary, currently, NVIDIA has the most AI accelerator shipments, but Google and Amazon are becoming strong competitors. NVIDIA's competitors are not processor manufacturers like AMD, but Google and Amazon in the United States.

A comprehensive generative AI boom is coming.

Let's summarize everything so far. According to a report by DIGITIMES Research, it is expected that by 2024, the shipment of high-end AI servers capable of developing and running ChatGPT-level generative AI will only account for 3.9% of all servers. It is believed that this shipment cannot meet the needs of CSPs.

That is to say, NVIDIA's "GPU frenzy" from 2023 to 2024 is just the beginning. Therefore, a comprehensive generative AI boom may emerge in the future. Let's show its foundation below.The semiconductor industry association (SIA) has released a semiconductor market forecast by application. According to SIA, the global semiconductor market is expected to exceed $1 trillion by 2030.

By 2030, the largest market will be computing and data storage. This includes PCs and servers (and of course, high-end AI servers), but since PC shipments are unlikely to increase significantly, servers will likely account for the majority.

Wireline communications refer to semiconductors used in data centers. This means that by 2030, computing and data storage ($330 billion) + wireline communications ($60 billion) = a total of $390 billion in semiconductors for data centers (including PCs) will become the largest market globally.

Another thing to watch is the data center market and its prospects. Following the release of ChatGPT in 2022, the data center market is expected to grow steadily. Data centers consist of three elements: network infrastructure, servers, and storage, and it is projected that from 2023 to 2029, both servers and storage will roughly double.

In this way, semiconductors for servers (including high-end AI servers) will account for the largest share of the global market, and the data center market will also expand.

To reiterate, Nvidia's "GPU craze" has only just begun. A comprehensive generative AI boom is on the horizon.

Leave A Comment