CAMBRIDGE, Massachusetts — The hardest part of reading an ultrasound has never been the machine. It has been the brain behind it.
A trained diagnostic sonographer spends years learning to take a flat, gray, two-dimensional image scrolling across a monitor and mentally reconstruct the three-dimensional anatomy beneath the probe. That skill gap — the distance between a student squinting at a screen and a veteran who sees structure where others see noise — is also a healthcare bottleneck, one that slows the training pipeline at precisely the moment an aging population is demanding more imaging procedures than the system can provide.
Researchers at the Massachusetts Institute of Technology think they have found a way around it. The system they published Wednesday in Nature Communications Engineering replaces the mental reconstruction entirely. Instead of teaching trainees to imagine a three-dimensional structure from a two-dimensional image, it shows them the structure directly — rendered in real time through an augmented reality headset, floating over the body being scanned like a translucent map of what lies beneath.
The technology is called AR-VIU, for augmented real-time volumetric imaging in ultrasound. The results of the first controlled study, involving 18 participants split evenly between expert sonographers and people who had never operated an ultrasound machine, were striking enough that the researchers described them as a near-elimination of the novice-expert performance gap. Under the traditional two-dimensional system, experts vastly outperformed beginners. Under AR-VIU, beginners performed nearly as well as the veterans.
“For training, this could make ultrasound more intuitive and more understandable,” said Canan Dagdeviren, an associate professor of media arts and sciences at MIT and the study’s senior author. “On the clinical side, it could be less time-consuming, more accurate, and also give health care providers more peace of mind. They wouldn’t have to wonder if they missed anything.”
The workforce dimension of that claim is hard to overstate. The United States faces a projected shortfall of more than 3.2 million healthcare workers by 2026, and diagnostic staffing pressures have compounded across hospital systems already contending with visa and immigration constraints on recruiting trained specialists from abroad. Ultrasound, in particular, requires a combination of visuospatial reasoning, fine motor control, and anatomical knowledge that typically takes years to develop to clinical competency. Any technology that compresses that learning curve carries implications well beyond a single laboratory at MIT.

The hardware that makes AR-VIU work is, by design, low-complexity. The ultrasound probe itself — slightly smaller than a deck of cards — uses a square-array transducer configuration that acquires three-dimensional volumetric data rather than the single flat slices produced by conventional probes. Because the array uses fewer transducer elements than industrial 3D ultrasound systems, it draws less power and costs less to manufacture. The raw voxel data from the probe is compressed and streamed in real time into Unreal Engine, the same graphics software used to render video game environments, which converts it into a live three-dimensional point cloud. A clinician wearing an AR headset sees that point cloud superimposed over the patient’s actual body — not on a screen somewhere to the side, but directly over the anatomy it represents, at true scale.
“It’s a difficult skill to master, and there are long learning curves,” said Jason Hou, a graduate student at MIT and one of the study’s lead authors. “The hardest thing is this mental tomography bottleneck where you’re trained to reconstruct the 2D slices in your 3D mental space. That is a cognitive burden that can lead to inaccuracies in scanning.”
In the experiments, participants attempted two tasks. The first was object identification: using the ultrasound system to determine what was embedded in a gelatin-filled opaque container — a spring, a ball, or a screw. The second was localization: finding and marking the precise position of a tissue-mimicking phantom inside the container, a task designed to replicate the challenge of placing a needle accurately for a biopsy. AR-VIU improved performance across both tasks for all 18 participants, and it narrowed the gap between the novices and experts on every metric the researchers tracked.
But the study also contains a result that is less flattering to the technology’s near-term prospects, and the researchers are candid about it. Most of the experts said they preferred the traditional two-dimensional system. Not because AR-VIU performed worse — it performed better for them, too — but because decades of training have built a fluency with flat images that the three-dimensional display cannot replicate. Many acknowledged they could see the benefits of AR-VIU for specific procedures, like needle placement in a biopsy. They did not see it replacing their standard workflow anytime soon.
That tension is not unique to this technology. Virtually every major imaging innovation has encountered the same friction: the people who would benefit most from a new visualization tool are often not the expert practitioners already performing at high levels, but the trainees and generalists who currently cannot perform certain tasks at all. AI-guided surgical imaging has followed a similar arc in neurosurgery — adopted first in training environments, then gradually accepted for specific high-precision procedures, before any broader workflow integration. Point-of-care ultrasound in emergency medicine took decades to gain acceptance despite strong evidence of its diagnostic value, in part because its advocates were asking specialists to reconsider workflows they had refined over careers.
The MIT team’s immediate focus is on the training application, where resistance is likely to be lower. There, AR-VIU’s value proposition is less about replacing a veteran sonographer’s technique and more about giving a student something that no current training tool provides: a direct, real-time, spatially registered view of what the probe is seeing, without the interpretive leap from two dimensions to three. Shrihari Viswanath, the paper’s other lead author, said the system makes the invisible visible in a way that traditional instruction cannot. “Overlaying images with the anatomy and providing 3D visual context makes ultrasound significantly easier for novices to understand,” he said.
AR-VIU sits within a broader wave of neurotechnology and medical device breakthroughs that share a common premise: that the most persistent limitations in clinical practice are not hardware problems but cognitive ones, and that the right interface can redistribute expertise in ways that training alone cannot. Whether that premise holds up in a clinical trial, with real patients and real diagnoses at stake, is the question this paper does not yet answer.
The current version of AR-VIU is a research prototype. The probe requires a wired connection to external computing hardware, and the resolution of the volumetric rendering, while sufficient for the object-identification and localization tasks in the study, falls short of what would be required for fine clinical work like cardiac imaging or the detailed fetal measurements performed in obstetric ultrasound. The researchers acknowledge this openly, noting that resolution improvement and additional clinical accuracy testing are the next priorities before any hospital deployment could be considered.
The study was funded by the MIT Media Lab Consortium, the National Science Foundation, an MIT HEALS graduate fellowship, and an MIT-Tata graduate fellowship. The paper was authored by Hou and Viswanath along with Bowen Wu, a 2024 MIT graduate, and two undergraduates participating in MIT’s Summer Research Program: Cinay Dilibal of Dartmouth College and Tanisha Shende of Oberlin College.
What the study does not yet answer is the question that will ultimately determine whether AR-VIU or any technology like it changes how ultrasound is practiced at scale: whether the performance gains measured in a controlled laboratory setting with gelatin phantoms and embedded objects hold up in the full complexity of a clinical encounter, with a patient on a table, time pressure in the room, and a diagnosis that actually matters.

