By Tim Leogrande, BSIT, MSCP, Ed.S.
Sun December 7, 2025 at 06:30 AM ET
Since the public release of ChatGPT in 2022, millions of individuals have adopted large language models (LLMs) as tools for work and information seeking. The appeal is straightforward: pose a question, receive a polished summary, and proceed. The process feels remarkably fast and effortless.
A recent paper by Jin Ho Yun of New Mexico State University and Shiri Melumad of the University of Pennsylvania suggests, however, that this apparent efficiency may incur a substantial cost. Their evidence indicates that when learners rely on AI-generated responses, they tend to develop more superficial knowledge than when they learn via traditional web search using tools such as Google.
The authors base their conclusions on a review of seven research studies comprising more than 10,000 participants. Across most of these studies, the methodological framework was similar. Participants were instructed to learn about a particular topic and were then randomly assigned to do so either by using an LLM, such as ChatGPT, or by navigating the links returned by a standard Google search.
Participants were not constrained in how they could use these tools. Those in the web search condition could conduct as many searches as they wished, and those using ChatGPT could issue as many prompts and follow-up queries as they deemed necessary. After completing their research, participants were asked to write a piece of advice for a friend about the topic based on what they had learned.
<aside> đź’ˇ
The data showed a consistent pattern. Compared with participants who used web search, those who learned via an LLM believed they had learned less, reported exerting less effort when composing their advice, and ultimately produced advice that was briefer, less accurate, and more generic.
</aside>
When this advice was then presented to an independent group of readers who were unaware of which tool had been used, those readers judged the LLM-based advice to be less informative and less helpful, and they indicated they would be less likely to act on it.
These differences appeared robust across multiple contexts. One plausible explanation for the brevity and generic nature of the LLM-generated advice is that LLM outputs might expose users to less heterogeneous information than the diverse sources surfaced by a Google search. To examine this possibility, the researchers conducted an experiment in which participants were exposed to an identical set of facts, regardless of whether they accessed information via Google or ChatGPT. They kept the Google platform fixed in another experiment and changed participants' learning source from either regular Google search results or Google's AI Overview tool.
The findings indicated that even when the underlying facts and platform were held constant, learning from synthesized LLM responses led to more superficial knowledge than gathering, interpreting, and synthesizing information from standard web links.
Why might LLM use diminish learning depth? A central principle of skill acquisition is that people learn more effectively when they are actively engaged with the material they are attempting to master. Traditional web search typically introduces greater “friction” into the learning process: learners must navigate multiple links, consult primary sources, and independently interpret and integrate information from diverse materials.
<aside> đź’ˇ
Although more demanding, this friction encourages learners to construct richer, more personalized, and more original mental models of the subject matter. By contrast, LLMs perform much of this integrative work on the user’s behalf, shifting learning from an active to a more passive process.
</aside>
Importantly, the researchers do not argue that LLMs should be avoided altogether, particularly given the substantial benefits they provide across a wide range of applications. Rather, they advocate that users become more discerning and strategic in their deployment of these tools. This begins with recognizing the kinds of learning goals and knowledge domains for which LLMs are likely to be beneficial versus those in which they may hinder desired outcomes.
For quick, factual queries, relying on an AI “co-pilot” may be entirely appropriate. However, when the objective is to cultivate deep, flexible, and generalizable understanding, depending solely on LLM-generated summaries appears to be far less effective.
As part of their broader program of research on the psychology of emerging technologies, the authors are investigating whether LLM-based learning can be redesigned to become more active. In one experiment, they examined a specialized GPT model that provided real-time web links alongside its synthesized responses. Even in this condition, once participants had received an LLM-generated synthesis, they showed little inclination to explore the original sources in depth. Consequently, these participants again developed more superficial knowledge than those who relied on a standard Google search.
Looking ahead, the researchers plan to study generative AI tools that introduce “healthy frictions” into learning tasks. Specifically, they aim to identify which guardrails or hurdles most effectively encourage users to go beyond easily generated summaries and engage more actively with underlying materials. Such tools may be especially important in high schools, where teachers face the dual challenge of helping students build basic skills in reading, writing, and mathematics while also preparing them for a world in which LLMs are likely to be ubiquitous in everyday life.
© 2025 Tim Leogrande