AWS and NVIDIA accelerate deployment of R1 models
Cost-effective inference model on-device AI benefits
DeepSeek, Pre-training Build Cost ↑$2.6 Billion
The introduction of DeepSeek R1, an inference model created by Chinese AI startup DeepSeek, is causing a stir in the AI industry. It is receiving attention as a representative example of a paradigm shift in the direction of AI development that offers excellent performance at a low cost based on lightweight parameters.
DeepSeek R1 has adopted a method of developing its own reasoning ability through reinforcement learning, which has significantly reduced the cost and resources of AI model development, and is evaluated to have advanced AI commercialization one step further in terms of 'efficiency' by demonstrating high accuracy and performance indicators with only a small number of parameters.
■ AWS and NVIDIA accelerate R1 model deployment...Expect innovation in on-device inference On the 20th, the Deepseek R1, along with the Deepseek R1-Zero and R1-Distill models, were released together, and the open-source-based R1 model was quickly verified by developers around the world, yielding various test results, and was subsequently distributed and utilized as a service.
AWS announced that it is offering the DeepSeek R1 model through Amazon Bedrock and Amazon SageMakerAI, and Cerebras announced on the 29th that it is hosting DeepSeek R1-Distill-Llama-70B on servers in the United States.
NVIDIA also announced that DeepSeek R1 is available in preview on NVIDIA’s inference microservice, “NIM,” which it describes as “capable of processing up to 3,872 tokens per second on a single NVIDIA HGX H200 system.” API support is also said to be coming soon.
Based on a technical report released by DeepSec, Berkeley PhD candidate Jiayi Pan, who implemented the 7B LLM TinyZero, demonstrated via Twitter that it is possible to achieve an “Aha Moment” even on a very small model at a cost of $30. An 'aha moment' is when an AI model learns to solve problems on its own, improving its performance through trial and error and enabling it to move from easy to complex problems.

▲Deepseek R1 Zero Qwen 2.5-based parameter-by-parameter score graph (Source: Jiayi Pan X)
Jiayi Pan tested DeepSeek R1 Zero Qwen 2.5-based models from 500 million parameters to 700 million parameters, and all models except the 500 million parameter model showed an increase in score. The 700 million parameter model is a very small AI model, and the model size is small enough to be applied to on-device AI applications.
Some say that miniaturized inference AI models based on reinforcement learning, such as DeepSeak R1 Zero, could be introduced to various on-device AI devices, such as improving the inference performance of smartphones, enhancing the noise-canceling performance of wireless earphones, and improving the voice recognition rate of AI speakers.
Comparing DeepSeek R1 and OpenAI o1-mini, the input cost per million tokens was $0.14 to $0.55 for DeepSeek and $1.5 to $3 for OpenAI, and the output cost was also $2.19 to $12, showing a large gap, showing that DeepSeek was cheaper. Accordingly, OpenAI officially launched the o3-mini on the 31st, improving cost efficiency and performance compared to the existing model. The input cost per million tokens is set at $0.55, which is on par with DeepSeek R1, while the output cost is set at $4.4, which is less than double the cost.
■ DeepSeek, Pre-learning Construction Cost ↑ $2.6 billion While domestic and international media outlets are debating the previously known cost of DeepSeek development, which is estimated at 5,576,000 USD or 8 billion KRW, experts are emphasizing that this cost only calculates the official learning cost for DeepSeek V3 and does not include previous research and experiment costs.
According to a report by SemiAnalysis, a semiconductor and AI industry research institute, on the 31st, DeepSeek can access approximately 10,000 H800s and 10,000 H100s, and according to the analysis, DeepSeek's total server capacity is estimated to be approximately $1.6 billion, with operating costs alone estimated to be close to $1 billion.
SemiAnalysis emphasized that they are "confident that the hardware spending will be much higher than $500 million," and analyzed that the $6 million cost in the paper only represents the GPU cost for pre-training.
However, experts agree that the trend in AI model improvement is to reduce inference costs and achieve higher performance with smaller models. SemiAnalysis said, "We can see a 10-fold decrease in cost and an increase in functionality through algorithm improvement and optimization," and asked not to be surprised if DeepSeek produces results that reduce costs by 5 times by the end of the year from the current level.
This suggests that in the future, more models will emerge that are algorithmically optimized in AI inference models to maintain performance and accuracy while reducing costs. A variety of compact models, including Google's recently released Gemini Flash 2.0 Thinking and Open AI's o3-mini, are competitively improving cost-effectiveness and driving the development of the AI industry.
As model competition intensifies and costs become lower, the on-device AI market is expected to flourish and consumer perception to increase at a rapid pace.