Musk says all human knowledge for AI training already 'exhausted'

Elon Musk says AI models are running short on real-world data and suggests technology firms should now turn to "synthetic" data.

Musk says AI models' tendency to produce "hallucinations," inaccurate or nonsensical outputs, poses a risk to the synthetic data process. / Photo: AA Archive
AA

Musk says AI models' tendency to produce "hallucinations," inaccurate or nonsensical outputs, poses a risk to the synthetic data process. / Photo: AA Archive

Elon Musk has joined other artificial intelligence experts in claiming that there was little real-world data left to train AI models and that "peak data" would soon be reached.

During a recent livestream, he explained that nearly all of humanity's available knowledge had been processed in AI training.

"We've exhausted basically the cumulative sum of human knowledge … in AI training," said Musk during the livestream on X. "That happened basically last year."

Musk, who launched his own AI business, xAI, in 2023, suggested technology companies would have no choice but to turn to "synthetic" data — that is, generated by AI that leads to self-learning.

"The only way to then supplement that is with synthetic data where … it will sort of write an essay or come up with a thesis and then will grade itself and … go through this process of self-learning," he added.

Musk cautioned, however, that AI models' tendency to produce "hallucinations," inaccurate or nonsensical outputs, poses a risk to the synthetic data process.

He said hallucinations make using artificial material "challenging" because "how do you know if it … hallucinated the answer or it's a real answer."

Read More
Read More

Will Artificial Intelligence reshape how we practice religion?

'Model collapse'

Andrew Duncan, director of foundational AI at the UK's Alan Turing Institute, noted that Musk's statement aligns with a recent academic paper suggesting that publicly available data for AI models could be depleted by 2026, as reported by the Guardian.

He warned that overreliance on synthetic data could lead to "model collapse," where model outputs degrade in quality.

"When you start to feed a model synthetic stuff, you start to get diminishing returns," he said, highlighting the risk of biased and uncreative outputs.

Duncan also pointed out that the rise of AI-generated content online could result in the material being incorporated into AI training data.

Route 6