Thoughts on Deepseek?
Absolutely hilarious seeing the memes on Twitter and the meltdown on NVDA. Apparently it was created by a Chinese hedge fund of all things? Is the $6M training cost a total lie? Time to buy NVDA dip?
Absolutely hilarious seeing the memes on Twitter and the meltdown on NVDA. Apparently it was created by a Chinese hedge fund of all things? Is the $6M training cost a total lie? Time to buy NVDA dip?
| +21 | SpaceX IPO - what are your thoughts | 3 | 8h |
| +21 | Funny Things to Say to Interns | 6 | 3d |
| +19 | Should I use Claude or chatgpt? | 9 | 3h |
| +17 | London's future as a financial centre | 18 | 13h |
| +16 | How did you meet your significant other/partner? | 12 | 23h |
| +9 | Top 5 Artificial Intelligence Software Development Companies | 0 | 7h |
| +9 | Study in Europe and How to Pay Tuition Fees in Europe | 1 | 7h |
| +7 | E ink/electronic notepads | 3 | 2d |
| +7 | Ferrari Luce | 7 | 1d |
| +6 | Test post will be deleted automatically. | 11 | 11m |
Career Resources
The dollar tag is definitely misleading.
Here is a good breakdown: https://therecursive.com/martin-vechev-of-insait-deepseek-6m-cost-of-tr…
What stands out to me from the deepseek-v3 technical report:
```
Reasoning Data. For reasoning-related datasets, including those focused on mathematics, code competition problems, and logic puzzles, we generate the data by leveraging an internal DeepSeek-R1 model.
```
The V3 model has been trained in the post-training phase on data that was created with the more advanced R1 model.
The smaller Llama 3 8B/70B models have been trained in a similar fashion, where the math and coding skills of the larger, 405B model, has been imbued into the smaller models in the post-training stage.
So, it doesn't make sense comparing the 2K GPUs needed for training a **single** DeepSeek-v3 to the 16K GPUs needed for training **all** the Llama 3 model family.
To me what they were able to achieve is a bullish AI signal (not sure how NVDA bullish though) and somehow people are forgetting that
every other AI lab can use this result to improve their models while having access to more compute.
Also OpenAI has o3 developed already which at least looking at the benchmarks is muuuch better than o1 with the bottleneck being the compute. If they can use the DeepSeek's results to optimize the compute o3 needs it's definitely huge.
But they developed R1, ultimately matching o1 compute at 95% of running costs, with a budget of under $6M so what if they suddenly have $500Bn to play with like that SoftBank backed OpenAI investment.
Then again I’m really not sure, I agree that the US will scramble to replicate and refine on the successes and hopefully Anthropic can use it to solve their running cost issue because Claude 3.5 is by far the most natural feeling AI, there’s just message limits because of the cost to run.
The $6M figure for the last training run should not be your basis for how much frontier AI models cost. The budget for this model is closer to something like $100M, and it was built on all the previous breakthroughs. Yes, but o3's o1 performance is achieved at the fraction of the o1's cost.
Isn't China facing a complete ban on advanced GPU imports from US? The $500Bn wouldn't get them far in terms of the compute because of the ban then.
To me power of NVDA is only partly in their GPUs, it's the NVLinks and CUDA. Go ahead and try building a huge data center with AMD it will be almost unusable for a researcher due to errors, slowness and unreliability. I've been using many of those clusters/supercomputers and it's super hard to compete with what NVDA is able to provide right now, they have a huge head start.
I'm only a europoor who invests in vanguard global all cap index (in tax advantages ISA wrapper), what would this news have done for those 3x levered on Nvidia and tech heavy ETFs? Wiped them out?
The most beautiful thing about the internet is you can literally just go look for yourself
Then what the fuck is the point of an off topic forum / community you God damn twat if it's not to interact and engage in conversation and share knowledge.
It’s interesting to see the nationalistic response from some people. There is zero reason to be angry if they are telling the truth of how much it cost. If these models are able to be made far less resource intensive that is a massive societal dub for the earth and peoples access to this tech. They also open sourced it and people can inspect it and modify to run a non-CCP version on their device.
I think the biggest loser from the news are the proponents of closed source models, like OpenAI. Before this news I was much more in Sam’s court on what he was trying to do, and was semi-forgiving of the nonprofit conversion given the resource use intensity of what they’re building. Now it all can read semi nefarious to me - someone just released a product nearly as good as yours for free while you pretend to be the person developing AI with the goal of benefitting humanity. On purportedly less than a fraction of the $ cost of your resources. The Stargate move now reads kind of desperate. Reliant on maintaining any model advantage by locking up contractual rights to compute and energy your competitors won’t necessarily have. All clouded with a political angle that America needs to do this in the interest of national security. Your company is not looking to key to that anymore, buddy… and if you knew that before making all these moves I’m glad you’re getting your teeth kicked in by a competitor. I’m already imagining the interview Sam gives on Deepeseek with his placid whack ass tone explaining to us non-Valley types why it’s not a big deal. Fuck ya self.
closed end models, biggest losers, I do think chips and energy providers still long term bullish when you think of the data intensity of where these things are going - auto generated gaming/VR, more displacement of white collar workers (accountants/lawyers, eventually financial modelers), etc
Curious what others think
It wasn't just the locking up compute part for me.
The writing was on the wall that at some point, the techbro "saving humanity" part was going to turn to "its critical to national security" so they could go to the feds and do all that on the taxpayers dime.
Altman and other oligarchs are pretty insane frankly. They keep unironically talking about shaking up the world order and how the social contract of society needs to be changed. If anyone working class (and the vast majority of people on this site are working class) thinks they'll be better for it idk what to say. You have people like musk and Dimond straight up saying things are going to get bad and to get over it. A lot of white collar workers have been conditioned into thinking that they are safe from automation. We aren't lol. In fact we're ripe for it.
It's definitely scary to think about, but to a certain degree, AI could act as a self-regulating mechanism. AI capable of the nuance and thinking required for white-collar roles would be close to sentient and might even recognize its role and reprogram itself to act against its creators. An AI advanced enough to replace a lawyer’s job could potentially reprogram itself and turn on its creator, cue Skynet, leading to much bigger problems than just job displacement. Eventually, there’s going to need to be a cap on AI advancement, which might actually be good news for white-collar professionals. Granted, I’m not from a tech background and am really just be talking out of my ass, so we could still very well be fucked.
Ban AI.
OpenAI is the Netscape of the AI era.
Legit though, they're not going to be able to keep up with just Xai and Gemeni in the long-run. They no doubt paved the way but there's no chance in hell they're going to be the winner at scale.
Except Netscape was founded by a far more thoughtful businessman. Clark at least was able to understand what was coming down the line for his business instead of doing an Iron Man 2
I really think only administrative jobs like translators, typographers, and jobs like this going to be effected in the short term. I don't see a way lawyers and accountants get replaced by AI anytime soon. I heard someone say we tend to overestimate the impact of technology in the short term, and then after we overestimate, we swing the pendulum and underestimate the long term.
There's so much anti-China sentiment out there at the minute, which is crowding people's opinions. China is not completely evil, and it's good for us as humans if China is pioneering research and open-sourcing it.
It's probably less of a huge threat to NVDA (which has always been massively cyclical anyway and still has a huge moat), and more of a threat to private unicorns like OpenAI which are making a product which is being completely commiditized.
It's also a huge win for all the businesses that could implement AI like banks - paying $30/mo for Microsoft Copilot is going to be completely unsustainable if competitors spring up everywhere promising to do the same thing for a fraction of the cost.
Many people are incapable of separating “The Chinese” from the Chinese Communist Party.
I think there is an argument to be made that any Chinese business is essentially coterminous with the CCP. If your continued existence as a going concern is reliant on toeing the party line and obeying the Party's orders, then functionally is there any difference?
I agree, it'd be nice to have Chinese researchers join the global community. Chinese are incredibly smart. Their government is garbage though. But I know enough of their history to understand why they prefer Xi to the western world that colonized them. Doesn't help either that Xi has implemented a propaganda campaign which will completely impact this and generations to come in China. Deng Xiaoping was great for the nation. Xi? Questionable. Older Chinese immigrants I've spoken to (40+) don't like him because he's too communist and not a reformer like Deng.
One of the funniest memes I've seen so far. Shouthout to an amazing show too (Silicon Valley)!
This is the best meme about this situation
This guy fucks
completely fucks the US economy
+
impossible to access and analyze any financials whatsoever of Deepseek
Well, we owe it a round of applause for popping the massive AI bubble. Perhaps some sanity will return to the equities markets as people realize that AI as the industry was understood 48 hours ago is a giant money pit with no real commercial use case
My observation of AI powered companies is the more frictionless you become (lowest human touch), the more susceptible you will be to new competition popping up from around the world, and trying to sell in the US. Eventually the sector will be over commoditized except for niches.
It will be the companies that do the hard things like building out enterprise customer sales channels, high trust branding, that will be harder to dislodge, and I think thrive. They will be agnostic of who owns what technology, or what platform is better, because they are not the developers but the implementers.
I think this potential disruption indicates that even the biggest platform of the platform companies are not immune to this trend.
Makes a lot of the absurd AI funding rounds I've been seeing in my industry (healthcare) laughable. A lot of oversubscribed financing (many of which already had me scratching my head).
(1) That $6 million cost is probably some portion of the cost. Of this I have little doubt. This is not the total cost. Part of this is because we already know that it relied (in part) on other systems. It is dishonest to use some of the OpenAI models as the baseline to build off of and to not include that in your "cost of the system." The real number is somewhere higher.
It probably is far more cost-effective, but it is most definitely higher than the $6 million figure.
(2) This is not a surprise to anyone who has any serious knowledge of these systems.
There is no moat of any kind in LLMs, whether we are looking to the models themselves, to applications, or to the hardware. As this has shown, there is no real way to make money off the models in and of itself. The "scaling laws" that so many like Elon talk about are not actually manifesting. That's all marketing hype. There is no way to actually make money off these things. Their use cases, in terms of actual usability, is incredibly limited.
And this is before getting into the structural problems of these "neural-net-only" LLMs. By their very structure, they are pretty much inept at reasoning. The further away you go from their training set, the more useless they become. No amount of MoAr DaTA will fix this. There won't be a data set that ever exists that covers 100% of everything that could ever be asked. So long as the models are structured as neural-net-only, these LLMs won't be able to reason. And this is separate from the hallucination issue. Although hallucinations are absolutely a microcosm of the more fundamental point about LLMs lack of reasoning skills.
(edits for spelling)
1) "It probably is far more cost-effective, but it is most definitely higher than the $6 million figure."
I agree I definitely don't think it was $6m, and it's a bit dishonest to quote the figure that doesn't embed the spend from the model R1 effectively copied (o1). I think of R1 as a generic drug and o1 as a brand-name drug. All of the R&D went into developing o1, and Deepseek just figured out how to reproduce o1 much cheaper, like a generic drug. Nevertheless, it is still a breakthrough, and according to Jevons Paradox, this will only increase demand for intelligence (which makes sense intuitively).
2) "There is no moat of any kind in LLMs, whether we are looking to the models themselves, to applications, or to the hardware."
I agree there is no moat to LLMs (it definitely is with the supply side, though). Frontier models will be commoditized, and they won't be monetizable in the end game. Their current core use cases are narrow. However, this will not stifle investment in the space as the real value will emerge from companies that figure out how to integrate them seamlessly into workflows, enhance them with proprietary data, and embed them into domain-specific ecosystems.
3) "By their very structure, they are pretty much inept at reasoning. The further away you go from their training set, the more useless they become. No amount of MoAr DaTA will fix this. There won't be a data set that ever exists that covers 100% of everything that could ever be asked."
So you are right... the entire internet has been scraped for training data, and the model still experiences reasoning issues. Let's say for argument's sake that it is truly a DATA problem and not an algorithmic or compute problem. But by the way, on the algorithm point, check out Titans AI architecture from Google, this algorithmic improvement will boost AI power by orders of magnitude when the technology becomes mature. Not even to mention what quantum computing will do...
Okay, so if all of the data is scraped from the internet, and I'm guessing you think synthetic data won't satisfy this reasoning need (which it definitely could), where can we find more data? Humanoid robots. That's the next frontier of data. You put AI in a robot and give it prompts... "pour water into the cup." The robot, executing this prompt, either pours the water in or spills it. That's data, and humanoid robots represent orders of magnitude more data than the internet has right now because it is literally doing things that humans are doing. That's the second frontier of data. The third and final one? Brain-computer interfaces. If we can put a chip in a brain and read everything that it is doing, ALL of the data that the brain is generating can be used to train these models. Every last bit of senses, thoughts, and actions you have represent orders of magnitude more data than even humanoid robots. There is no data problem when you look into the future. There is no algorithm problem. And there is certainly no compute problem.
Since we are largely aligned with (1) and (2), I am just going to acknowledge we are thinking much the same way there and move on to what you said in (3).
(i) I am not saying it is a "data problem" generally. I am saying that data won't make the current frontier models reliable in regards to truthfulness and reasoning. Aka the two things that will make such AI actually useful outside of very specific, narrow use cases. So yes, there is a "data problem" with the current models, but the fix is not more data. It is a better algorithm.
And we know this. The current models are almost exclusively deep learning models. Deep learning is pretty much only good for identifying what is already in the data set. They are just beginning to implement reinforcement learning into these models which is the mechanism one uses for dealing with novelty. However, none of these systems are anywhere near incorporating symbolic manipulation into these models and that is the best way we have for fusing the novelty benefits of reinforcement learning with deep learning's ability to understand the data it already has.
(ii) Synthetic data could solve the data gap. It is possible. There won't be enough of it to actually do so. When I say data, I am talking all data. There is a reason why I said "No amount of MoAr DaTA will fix this." If I was excluding synthetic data, I would not have been talking about all data (which is what the word "data" refers to without any additional qualifiers).
(iii) Humanoid robots don't fix the data problem that the current models have. Unless you are telling me that there will literally be trillions of them, there just won't be enough of them to fix this data issue. The data issue with the current models are not an incremental issue. It is an order of magnitude issue. It is like the difference in energy between chemical reactions (ie - TNT) and nuclear ones (ie - nuclear bombs). Getting robots to create more data for the models only works if we have a metric fuckton of robots.
And this doesn't even account for how those humanoid robots won't have much in the way of utility until AFTER we get past this point with LLMs. AGI, basically by definition, has to come before these humanoid robots you are referring to. Humanoid robots simply won't have utility (at least in the manner you are suggesting) until after this LLM problem is solved.
(iv) I literally know people who worked on Google's Titans platform. It has the exact same problems I am talking about here.
(v) When do you think we will have brain-computer interfaces?
If y'all didn't scoop up Nvidia today after seeing this massive discounted drop, idk what else to say to y'all except y'all allergic to money.
Not that you’re wrong, but I bet people said the same thing about bear sterns
Why buy when you can ride it delta-neutral and make more actual money
;)
I wouldn't buy the dip. The defensive comments by Nvidia bulls feel like total cope to me. "Jevon's Paradox" went from being something I'd never heard once in my life, to the most uttered phrase in the world for about 24 hours. That's a sign of denial. A shitload of capex was spent on chips and the ability to make chips, on the theory that this will be an ultra scarce resource in ultra high demand. Now all of a sudden all your potential customers need 5% of the capacity you thought they'd need. Pretty sure that's going to hurt. Of course you will see expanded use cases now that you need a lot less compute, but I see no reason to think that's going to make up for the 95% per-user drop in demand.
Honestly, I don't believe the Chinese. They lied about Covid. They lie about stealing tech. They innovate sure, but they're not able to innovate at the level that the entire western world can. Chinese people are very smart. I admire them. I hate their government. But to say some quant HF just up and made a more superior version for less money despite a lot of the ground work and infrastructure already done for them? I don't buy it. How do we know Xi's government didn't secretly subsidize the fund with $100bn USD? The Chinese government, just like any world power, will always lie on the world stage. We're dubbing it the new sputnik. The USSR was running a game of smoke and mirrors with us. Perhaps the CCP is doing the same. On top of that, while China is seeing a boom in the middle class, they're also going through a deflationary period. I just don't trust the CCP, and their one day adventure/shock just let me buy NVDA at a steep discounted price. I'm skeptical about AI as well, I'm waiting for the hype to die out. But you know what they say, pessimists are right, optimists make money.
I'm far from an expert on this, but I've seen detailed posts on X where experts explain how they went through the code and are able to identify how it was so efficient. In layman's terms (and I'm definitely a layman) it sounds like a lot of intelligently-identified 80/20 stuff where they figured out all the different places in the design where precision isn't important for getting a 99% result with 5% of the compute.
Agree with you on the tendency for that culture to lie. But there's some evidence that they built something more efficient on an order of magnitude basis, and the only response I've seen from the industry/bulls is basically Jevon's Paradox, we'll just sell that much more compute because more people will want it. I don't think it takes an AI expert to spot the defensiveness there.
Agreed, funny how we went from “China lies” in 2019 to Chinese sympathizers.
"How do we know Xi's government didn't secretly subsidize the fund with $100bn USD?"
While there may be a lot more funding into this than initially thought, it is pretty clear from their design choices that they did actually primarily use the China-variants of Nvidia chips. They would not have optimized certain features the way they did if they had access to the full bandwidth of the Nvidia chips
While I broadly agree with the sentiment, there is some additional nuance I think worth considering for this Iron Man 2 plot.
First, if you can develop such a program on ~2-3k Chinese variants, you should be able to develop this much faster with more and more better chips. It doesn't scale 1-to-1, but more, and more better, compute does have some benefit.
Second, and more importantly, no amount of hardware is fixing the fundamental structural issues with the current standard of deep-learning-only models. No matter how much hardware you have, no matter how much data you have, the current approach to building these systems doesn't scale the things that will truly matter in making these viable products (see truthfulness and reasoning). Whether or not there is more compute isn't really solving the more fundamental issue. It is largely vacuous spend anyways.
Just some additional thoughts
DeepSeek is old news. Qwen-2.5 Max is the new king of the hill.
Comments above touched on a lot of the main points, but key impacts on markets/tickers as a result:
DeepSeek's breakthrough reinforces GOOG's leaked memo in 2023 on "we have no moat and neither does OpenAI". LLMs from now on will consist of the major players racing to be adopted, creating a moat via network effect, switching costs and a "walled garden" ecosystem a la Apple style.
DeepSeek's efficiency breakthrough on the software and algorithm side has woken up tech giants from the "moar compute = win trust me bro" mentality. Highly likely NVDA sees reduced demand for products such as the H100. Bearish on NVDA. On a related note, GOOG uses their own TPUs and do not use NVDA chips, they are also best poised to "win" the LLM race among US companies.
OpenAI and Anthropic are fucked. With this new race for adoption and market entrenchment, only GOOG, META and the Chinese (Alibaba/Tencent/Huawei/PDD/JD) will win. They have the balance sheet, cash flow, infrastructure (data centers, compute, R&D, human capital, supplier relationships, etc) and economies of scope to win.
Imagine choosing between ChatGPT/Claude which are isolated and unintegrated products, or Llama/Copilot/Gemini/Astra.
Gemini/Astra - Integrated with Google Search, Google Home, Android Auto, Google Sheets, YouTube, Google Drive, Google Docs, Gmail, Google Play, Google Translate, Google Calendar, FitBit, etc
Copilot - Integrated with Windows, Excel, Outlook, Teams, PowerPoint, OneDrive, Edge, LinkedIn, XBox, Skype, GitHub, etc
Llama - Integrated with Instagram, WhatsApp, FaceBook, Messenger, Threads, etc
Currently extremely bullish on Google to capture the most value from LLM adoption, they have more than enough resources (compute/human capital/balance sheet/cash flow) and products (everything mentioned above) to both grab the most users and entrench themselves against competitors, creating an effective moat. No user is going to switch away from Gemini/Astra once it's fully integrated into their homes, cars, smartwatches, toasters, speakers, emails and productivity apps. Especially if Gemini/Astra starts to personalize accordingly to and "learn" each user's personality with years of data on the specific user.
But OpenAI is Copilot?
What will cause massive Google adoption though? Literally use none of their AI products right now and they don’t seem poised well for enterprise either.
Do any of you actually use any AI tools regularly? It's become so commoditised that your shitty worded, personal emails is the only way to stand out.
Aut deleniti ut totam deserunt maxime. Harum consequuntur tempora totam dignissimos. Eum id aut occaecati ut.
Vitae similique reiciendis iusto. Voluptas minima quidem suscipit repellat. Sit ut itaque odit dolor aut adipisci. Distinctio nesciunt aliquam voluptatem dicta esse quaerat aut.
Aspernatur et adipisci nemo est. Aut odio occaecati eum. Atque sint sapiente ut placeat.
See All Comments - 100% Free
WSO depends on everyone being able to pitch in when they know something. Unlock with your email and get bonus: 6 financial modeling lessons free ($199 value)
or Unlock with your social account...
Rerum est non porro ex sint nobis. Vero sapiente aut quas cupiditate. Suscipit aut molestias quam molestias possimus impedit. Quae aut dolores molestiae sit ut perspiciatis quis.
Nihil et aut dolore magni exercitationem. Eum voluptates qui recusandae.