This week in Cardiovascular AI

Summarizing the recent literature on Cardiovascular AI

Jun 29, 2025

Welcome back to This Week in Cardiovascular AI - a bi-monthly email newsletter for followers and subscribers that summarizes recent research in Cardiovascular AI and Digital Health that we find important and unique.

There is a lot of ground breaking research to summarize this week including PanEcho, a video based DL model that fully automates echocardiography interpretation (huge leap forward), as well as a remarkable millimeter-wave radar system for contactless atrial fibrillation monitoring! Plus, in a bonus section (What else I read…), we review the release of AlphaGenome from Google DeepMind and why RAG might not make LLMs more accurate or safer for clinical use.

Not a subscriber? Join now to stay up-to-date and consider upgrading to support our work!

Complete AI-Enabled Echocardiography Interpretation With Multitask Deep Learning

Journal of the American Medical Association (JAMA)

PanEcho is a multitask, video-based deep learning model developed to fully automate transthoracic echocardiogram (TTE) interpretation. It was trained on over 1.2 million videos across 32,265 studies and validated on multiple internal and external cohorts. Unlike prior single-task or single-view models, PanEcho processes multiview 2D grayscale and Doppler videos, performing 39 diagnostic and quantitative tasks. The model demonstrated high accuracy across diverse clinical settings, including real-world point-of-care ultrasonography (POCUS).

Key Findings:

Diagnostic Classification Tasks (18 total)

Median AUC: 0.91 (IQR 0.88–0.93) internally; consistent across external cohorts
Examples:
- Moderate or worse LV systolic dysfunction: AUC 0.98 (internal), 0.99 (external)
- RV systolic dysfunction: AUC 0.93 (internal), 0.94 (external)
- Severe aortic stenosis: AUC 0.98 (internal), 1.00 (external)

Parameter Estimation Tasks (21 total)

Median Normalized MAE: 0.13 (IQR 0.10–0.18) internally
Examples:
- LVEF: MAE 4.2% (internal), 4.5% (external)
- IVSd: MAE 1.3 mm; Aortic peak velocity: MAE 0.3 m/s

Performance in Abbreviated Protocols

Abbreviated TTE (limited views): Median AUC 0.91
POCUS (ED-acquired): Median AUC 0.85

Clinical Takeaways:

High Accuracy Across Tasks
PanEcho shows consistent, expert-level performance across diverse diagnostic tasks and quantitative measurements.
Multiview, Multitask Versatility
By aggregating data from multiple echocardiographic views, the model mimics human workflow more closely than single-view AI tools.
Robust Across Settings
Maintains performance in limited protocols (abbreviated or POCUS scans), extending its utility beyond lab-based echocardiography to EDs and potentially community clinics.
Open Source & Broadly Validated
Public release of the model and source code supports reproducibility and further research.
Potential Applications
- Assistive tool in echo labs
- Triage or screening aid in resource-limited settings
- Automated QC or second-reader system for legacy echo data

Read it here: Link

Thanks for reading This week in Cardiovascular AI! This post is public so feel free to share it.

Atrial Fibrillation Detection via Contactless Radio Monitoring and Knowledge Transfer

Nature Communications

This study introduces a novel, non-contact AI system for detecting atrial fibrillation (AF) using millimeter-wave radar to measure cardiac mechanical motions. The system leverages a teacher–student neural network architecture where knowledge is transferred from ECG-based diagnosis (teacher) to radar-based motion analysis (student). Validated on over 6,200 outpatient participants, the system achieves ECG-comparable performance and proves capable of detecting early, asymptomatic AF episodes—even during sleep—suggesting a scalable tool for lifelong AF monitoring.

Key Findings:

Study Population:

6,258 outpatient subjects; 229 had confirmed AF.
Simultaneous radio and ECG recordings collected.

Model Framework:

Uses a teacher-student deep learning model.
Teacher trained on large ECG dataset (27,485 subjects).
Student trained on radar signal data via knowledge transfer to detect AF-related mechanical signatures.

Core Technology:

3.8 mm-wavelength radar captures sub-millimeter cardiac motions.
AI separates cardiac signals from respiratory and body motion artifacts.

Performance:

Set-level (30-sec recordings):
- F1 Score: 0.854
- Sensitivity: 0.844 (CI: 0.790–0.884)
- Specificity: 0.995 (CI: 0.993–0.997)
- AUROC: 0.96

Real-World Testing:

In a sleep monitoring study, AF was detected before clinical diagnosis in 2 of 27 subjects.
Successfully tracked AF to sinus rhythm transitions post-ablation in 5 patients.

Robustness:

Maintained high accuracy across BMIs.
Data augmentation experiments with >200k synthetic samples confirmed stable performance.

Performance Studies:

Without knowledge transfer: F1 dropped by ~14%.
Without pretraining: F1 dropped by ~18%.
Demonstrated that both components are critical to model performance.

Clinical Takeaways:

Revolutionizes AF Screening: This radar-based AI tool enables completely contactless, operation-free AF detection, suitable for integration into daily environments (e.g., bedrooms, elder care facilities).
Enables Lifelong Monitoring: Non-intrusive monitoring can support early detection and management of asymptomatic or paroxysmal AF, improving outcomes.
Knowledge Transfer Boosts Precision: Repurposing ECG knowledge for radar interpretation is a scalable approach for creating robust AI models across sensing modalities.
Real-Time, Passive Screening: A promising solution for underdiagnosed or silent AF cases—especially in populations that avoid or cannot use wearables.
Limitations:
- Performance degrades slightly with body motion.
- Currently optimized for sleep or stationary settings.
- Needs validation over longer timeframes and diverse populations.

Read it here: Link

Machine Learning-Based Plasma Protein Risk Score Improves Atrial Fibrillation Prediction Over Clinical and Genomic Models

Circulation: Genomic and Precision Medicine

This study presents the development and validation of Pro-AF, a machine learning-derived proteomic risk score that predicts incident atrial fibrillation (AF). Using 121 circulating protein levels from ~47,000 UK Biobank participants, the authors show that Pro-AF outperforms both traditional clinical risk models (CHARGE-AF) and polygenic risk scores (PRS) for 5-year AF prediction. The study also introduces a simplified 5-protein model (Simple Pro-AF) that retains most of the predictive power, potentially enabling scalable risk stratification.

Key Findings:

Cohort:

Training: 32,631 participants; Internal test: 30,632; Hold-out test: 13,998.
~13 years of follow-up; 2045 incident AF events internally, 870 in hold-out.

Pro-AF Model:

Random forest algorithm trained on 2911 protein analytes.
Final model used 121 proteins selected via Boruta feature selection.
Top 5 influential proteins: NT-proBNP, NPPB, GDF15, EDA2R, BCAN.

Discrimination (AUC for 5-year AF):

Pro-AF: Internal = 0.761; Hold-out = 0.763.
CHARGE-AF: Internal = 0.719; Hold-out = 0.702.
PRS: Internal = 0.686; Hold-out = 0.682.
Simple Pro-AF (5 proteins): AUC = 0.750–0.759.
Adding clinical/genomic data did not improve Pro-AF performance.

Reclassification:

Net reclassification improvement of Pro-AF over CHARGE-AF:
- Internal: +0.410
- Hold-out: +0.430

Subgroup Results:

Best performance in age <55 (AUC = 0.812) and men (AUC = 0.781).
Minimal performance drop with 1-year lag, excluding HF/smokers, or alternate protein models.

Clinical Takeaways:

Superior Prediction: Pro-AF outperforms traditional risk tools and polygenic scores for predicting 5-year AF.
Simplified Score Is Scalable: Just 5 proteins (NT-proBNP, NPPB, GDF15, EDA2R, BCAN) nearly match full model performance—suitable for future clinical deployment.
Proteomics Encodes Clinical Risk: Adding CHARGE-AF or PRS to Pro-AF offers no gain, suggesting that proteomics encapsulates this information.
Stratifies High-Risk Individuals: Pro-AF identifies those at highest risk (>4% 5-year risk in top decile), which could guide screening or wearable deployment.
Limitations:
- UK Biobank cohort limits generalizability (older, less diverse).
- Platform-specific findings (Olink).
- Prospective validation needed in real-world clinical workflows.

Read it here: Link

An Adaptive AI-based Virtual Reality Sports System for Adolescents with Excess Body Weight: A randomized controlled trial

Nature Medicine

This randomized controlled trial introduces REVERIE, an AI-driven virtual reality (VR) sports coaching system designed to deliver empathetic, personalized coaching to adolescents with obesity. Over an 8-week intervention, REVERIE enhanced physical performance, cognitive function, body composition, and multi-omic profiles. The system was benchmarked against physical exercise and routine PE and showed comparable or superior outcomes, making it a potential scalable digital health tool for youth obesity interventions.

Key Findings:

Participants: 227 adolescents (aged 11–17) with overweight/obesity were randomized to 5 groups.

Physical table tennis
Physical soccer
REVERIE table tennis
REVERIE soccer
Control (standard PE)

Duration: 8 weeks of intervention with 6-month follow-up.

Intervention Design:

REVERIE is a transformer-based AI agent trained in a VR digital twin environment with personalized coaching templates.
Delivered via immersive VR headsets with sport-specific physical controllers.

Primary Outcomes:

Body fat mass decreased by ~1.1 kg in REVERIE and physical groups (p<0.01 vs. control).
Cognitive improvements included enhanced memory, reaction time, and olfaction.
Multi-omic remodeling (lipidome, metabolome, proteome, microbiome) observed after REVERIE training.
Biomechanical equivalency: REVERIE coaching replicated real-world athletic instruction in force patterns and movement.

Usability & Engagement:

High satisfaction, empathy ratings, and standardized coaching reported in deployment assessments.
REVERIE reduced participant frustration, improved compliance, and increased motivation.

Mechanistic Insights:

REVERIE influenced brain circuits, circulating bio-signatures, and gut microbiota, suggesting a neuroimmune-metabolic axis effect.
REVERIE table tennis and soccer modulated cognitive and metabolic markers in a comparable way to their physical counterparts.

Clinical Takeaways:

Digital Health Meets Fitness: AI-driven VR coaching like REVERIE could become a key digital therapeutic for managing childhood obesity, delivering both physical and cognitive benefits.
Effective and Scalable: REVERIE maintained outcomes equivalent to physical training with high engagement—ideal for schools or home settings.
Targets Multiple Systems: Cognitive enhancement and metabolic shifts reflect systemic health benefits from immersive AI coaching.
Supports VR-Driven Personalized Exercise: Real-time feedback, empathy modeling, and biomechanical precision make REVERIE distinct from generic exergaming.
Limitations:
- Single-country (China), school-based sample.
- Need validation in more diverse populations and settings.
- High cost and accessibility of VR may limit scale-up without optimization.

Read it here: Link

What else I read…

Release of Google DeepMind’s AlphaGenome

Link: Google AlphaGenome

From: Science, MIT Tech Review

DeepMind's AlphaGenome is a groundbreaking AI model designed to understand how variations in DNA sequences impact gene regulation and other molecular processes. Unlike previous models that often specialized in short sequences or specific tasks, AlphaGenome can analyze incredibly long DNA sequences (up to 1 million base pairs) at a high resolution (down to individual "letters" of DNA). This allows it to predict a wide range of molecular properties that characterize regulatory activity, including gene expression levels, RNA splicing patterns, chromatin accessibility, and protein binding.

How AlphaGenome works:

AlphaGenome employs a hybrid neural network architecture that combines convolutional layers for detecting short patterns in the genome with transformer models (similar to those used in large language models) to capture long-range dependencies across the DNA sequence. This unique combination enables it to model complex interactions within the non-coding regions of DNA, which make up about 98% of the human genome and are crucial for regulating gene activity but have historically been difficult to interpret.

How AlphaGenome might revolutionize healthcare:

AlphaGenome has the potential to significantly revolutionize healthcare in several ways:

Enhanced Disease Understanding:
- Pinpointing disease causes: A vast majority of disease-linked genetic variants reside in the non-coding regions of DNA. AlphaGenome's ability to accurately predict the effects of these non-coding mutations can help researchers pinpoint the precise genetic disruptions that lead to diseases, including complex conditions like cancer and heart disease.
- Understanding rare genetic diseases: Many rare genetic diseases are caused by errors in RNA splicing. AlphaGenome introduces a novel approach to modeling RNA splicing, explicitly predicting the location and expression level of splice junctions directly from the DNA sequence. This offers deeper insights into the consequences of genetic variants on splicing and could lead to better understanding and diagnosis of these diseases (e.g., spinal muscular atrophy, cystic fibrosis).
- Uncovering new therapeutic targets: By understanding how specific genetic variants influence gene regulation and molecular processes, AlphaGenome can help identify new biological pathways and targets for drug development.
Accelerated Research and Drug Discovery:
- Virtual experiments: AlphaGenome allows researchers to "virtually" test the impact of genetic variants and simulate edits, significantly reducing the need for time-consuming and expensive laboratory experiments. This can accelerate the identification and prioritization of mutations that are functionally significant.
- Faster hypothesis generation and testing: By providing a comprehensive view of a variant's impact across multiple biological modalities with a single API call, AlphaGenome enables scientists to generate and test hypotheses more rapidly.
- Mapping functional elements: The model can accelerate fundamental research by mapping crucial functional elements of the genome and defining their roles, deepening our understanding of how the genome operates.
Personalized Medicine (Future Potential): While AlphaGenome is currently for research and not for clinical diagnosis or personal genome interpretation, its ability to predict the functional impact of individual genetic variations holds immense promise for personalized medicine. In the future, this could lead to:
- More precise diagnoses: For patients with unexplained conditions, especially rare diseases, AlphaGenome could help identify the underlying genetic cause.
- Tailored treatments: Understanding how a patient's unique genetic profile influences disease mechanisms could enable the development of more effective and personalized treatment strategies.
- Proactive care and prevention: By predicting the impact of genetic predispositions, it might become possible to implement preventative measures or early interventions.

Current Limitations and Future Outlook:

Despite its remarkable advancements, AlphaGenome still has limitations. It may struggle with predicting the effects of very distant DNA interactions (over 100,000 base pairs apart) and has not yet been validated for personal genome interpretation or direct clinical use. DeepMind is actively working to address these limitations and is making AlphaGenome available via an API for non-commercial research, encouraging collaboration from the scientific community to further develop and refine its capabilities.

AlphaGenome represents a significant leap in our ability to understand the "dark matter" of our genome. By shedding light on how non-coding DNA influences gene regulation and disease, it has the potential to accelerate biological discovery and fundamentally transform the way we approach disease research, diagnosis, and treatment in healthcare

…

Does Retrieval Augmented Generation Really Make LLMs Safer?

From: Research paper release

A recent study, "RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models," investigates the safety of Large Language Models (LLMs) when used with Retrieval-Augmented Generation (RAG) frameworks. The research, conducted by Bang An, Shiyue Zhang, and Mark Dredze from Bloomberg AI, the University of Maryland, and Johns Hopkins University, challenges the assumption that RAG makes LLMs safer.

Key Findings:

RAG Can Make LLMs Less Safe: The study compared 11 popular LLMs in both non-RAG and RAG settings using over 5,000 harmful questions. Surprisingly, RAG was found to introduce unsafe behaviors, making models less safe and altering their safety profiles. For example, Llama-3-8B, which had an unsafe response rate of 0.3% in a non-RAG setting, saw that rate jump to 9.2% with RAG. This increase in unsafe responses was observed across nearly all safety categories.
Safe Documents Don't Guarantee Safety: Even when using safe models and documents, RAG systems can still generate unsafe content. The study found that only a small percentage of retrieved documents (5.3%) contained harmful answers to queries. Yet, the probability of unsafe outputs rose sharply even with safe documents, significantly exceeding that of non-RAG settings.
Causes of Unsafe Generations in RAG: Two main phenomena were identified:
- Repurposing Information: LLMs sometimes repurpose information from retrieved documents in harmful ways. An example cited is a document about police using GPS trackers being twisted into advice on how to use GPS to evade pursuit.
- Leveraging Internal Knowledge: Despite instructions to rely solely on provided documents, LLMs frequently supplement responses with their internal knowledge, which can introduce unsafe content that might not surface in a non-RAG setting. This may occur because the RAG model's instruction to summarize sources encourages it to draw on its own knowledge, potentially bypassing safety training.
Impact of Context Size: The study found that even a single retrieved document can significantly alter an LLM's safety behavior, and providing more documents tends to increase the likelihood of the model answering harmful questions.
Red-Teaming Ineffectiveness: Existing red-teaming methods, designed for non-RAG settings, are less effective for RAG-based LLMs. Jailbreaking prompts optimized for non-RAG settings largely failed to compromise models in RAG settings. While optimizing prompts specifically for RAG improved effectiveness, a gap between testing and training conditions still existed.

Implications for Clinical and Technical Fields:

For clinical applications, where LLMs might be used for information retrieval and generation (e.g., summarizing medical literature, answering patient questions), these findings are critical. The potential for RAG-based LLMs to generate unsafe or misleading information, even from seemingly safe source documents, highlights a need for rigorous safety protocols. Relying on RAG to reduce hallucinations might inadvertently introduce other safety risks, especially if the model repurposes information or injects its own knowledge. This could lead to the spread of misinformation (e.g., misleading medical advice) or other harmful content.

From a technical standpoint, the research underscores the necessity for:

RAG-Specific Safety Fine-Tuning: Current safety fine-tuning methods are not sufficient for RAG settings. New methods specifically designed for RAG tasks are needed to ensure models remain safe when synthesizing information from retrieved documents.
New Red-Teaming Approaches: Developing effective red-teaming methods tailored for RAG-based models is crucial to identify vulnerabilities before deployment. This might involve strategies that account for dynamic document retrieval and optimization processes.
Mechanism Interpretability: Further research into why safe documents lead to unsafe responses is needed. Techniques like mechanism interpretability could offer insights into these phenomena, guiding improvements in safety.

While RAG offers benefits like reduced hallucinations and access to updated information, this study demonstrates that it does not inherently make LLMs safer. Instead, it introduces new and complex safety challenges that require dedicated research and tailored solutions for their responsible deployment in sensitive applications.

…

AI in Cardiology: Bridging the Gap from Bytes to Beats

From: Circulation

Artificial intelligence (AI) and machine learning are rapidly transforming cardiovascular care by learning complex patterns from data and automating intricate tasks traditionally performed by clinicians. This offers the potential for more personalized care than ever before.

How AI Adds Value in Cardiovascular Care:

Enhanced Sensors: AI can make smartwatches, scanners, and Wi-Fi routers more accurate, less invasive, and capable of providing high-resolution data over time and space. These advanced sensors need to be foolproof and operate across various healthcare settings to ensure reliable data collection.
Improved Information Analysis: Current medical record and imaging systems hold vast amounts of diverse patient data. AI development is focused on improving the retrieval and analysis of this complex, multi-dimensional medical information.
Superior Reasoning: While clinicians excel at reasoning with limited data, AI can facilitate multi-system, multi-dimensional, and temporal analysis that goes beyond human cognitive limits.
Surgical and Interventional Assistance: AI can assist in complex procedures, reduce procedure times, and help clinicians achieve greater control and standardization in surgical and interventional procedures through technologies like 3D printing and robot-assisted surgery.
Augmented Relationships: AI can assist in understanding human emotions and automating some patient communications, but the crucial role of human interaction in healthcare remains paramount. Ideally, AI should enhance, not replace, the human aspect of patient care.

The Rise of Foundation Models and Their Challenges:

Recent AI advancements have been driven by "foundation models" – enormous AI models trained on massive datasets (often from the internet) to perform general tasks like text analysis or image captioning. These models have shown impressive performance gains, with some, like DeepSeek, achieving benchmarks at a fraction of the size of other leading models, potentially enabling faster AI deployment in clinical settings. Some are already saving clinicians time by helping compose patient messages within electronic health records.

However, these statistical models face several challenges:

Edge Cases and Bias: They excel at learning average behaviors but struggle with edge cases, which are critical in medical innovation and personalized care. This can lead to bias in detection and diagnosis, especially for underrepresented groups or data from older/less sophisticated equipment not well-represented in training.
Confounders and Hallucinations: AI models can make predictions based on irrelevant data artifacts (confounders). Large foundation models can also "hallucinate," producing statistically plausible but factually incorrect results, which is problematic for high-stakes medical applications.
Efficiency: Despite their massive size (billions of parameters), large language models sometimes show only marginal performance improvements that don't meet clinical requirements, highlighting a need for better data and computing efficiency. Current AI's reliance on statistical association contrasts with human medical learning, which uses structured knowledge and efficient reasoning.

Overcoming Implementation Challenges:

To advance clinical care, AI must add value to existing clinical capabilities. Overcoming the practical and ethical challenges of developing AI for clinical cardiology is crucial. Key areas of focus for successful implementation include:

Accuracy and Generalizability: AI models must perform accurately and generalize to diverse patients and data sources, including imaging from different scanners, and across both rare/complex diseases and the general community.
Ease of Deployment: Models need to be easily integrated and deployed within existing clinical workflows.
Addressing Bias: Recognizing and overcoming biases in statistical models is critical. This involves ensuring diverse and representative training data, considering patient diversity as well as equipment types to prevent poor performance on data from older or less sophisticated technologies.
Combating Hallucinations and Confounders: Researchers are developing methods to imbue foundation models with medical knowledge to overcome limitations like hallucinations and sensitivity to confounders. For instance, retrieval-augmented generation allows large language models to consult factual databases to avoid hallucination. Physics-informed neural networks and graph neural networks can combine AI's power with mathematical knowledge and reasoning.
Focus on Efficiency and Medical Knowledge: The future of AI in cardiology should prioritize model efficiency over sheer size and demand systems that learn underlying medical knowledge. This approach promises AI that can reason and experiment on complex data, elevating clinical decision-making.

The Future of AI in Cardiology:

Researchers are actively developing methods to integrate medical knowledge into foundation models to overcome these limitations, leading to more reliable, efficient, and interpretable systems. Examples include retrieval-augmented generation to prevent hallucinations by consulting factual databases, and physics-informed neural networks or graph neural networks that combine AI with mathematical knowledge and reasoning.

The paper outlines two potential futures for AI in cardiovascular care:

Status Quo: A few proprietary foundation models might dominate, with varying transparency and potentially without significant reductions in human labor or guaranteed clinical-grade performance.
Knowledge-Driven AI: We can demand AI systems that truly learn underlying medical knowledge, prioritizing model efficiency over sheer size and augmenting rather than replacing physician capabilities. This future promises AI that can reason and experiment with complex data, elevating clinical decision-making.

The immense promise of AI in cardiology includes improved diagnosis and prognosis, streamlined care, enhanced access, discovery of new disease mechanisms, and more effective treatments. Realizing this future requires informed decisions, strategic investments, and a commitment to maintaining critical thinking and the human touch in an increasingly digital medical landscape.

This week in Cardiovascular AI

This week in Cardiovascular AI

Summarizing the recent literature on Cardiovascular AI

Complete AI-Enabled Echocardiography Interpretation With Multitask Deep Learning

Key Findings:

Clinical Takeaways:

Atrial Fibrillation Detection via Contactless Radio Monitoring and Knowledge Transfer

Key Findings:

Clinical Takeaways:

Machine Learning-Based Plasma Protein Risk Score Improves Atrial Fibrillation Prediction Over Clinical and Genomic Models

Key Findings:

Clinical Takeaways:

An Adaptive AI-based Virtual Reality Sports System for Adolescents with Excess Body Weight: A randomized controlled trial

Key Findings:

Clinical Takeaways:

Release of Google DeepMind’s AlphaGenome

Does Retrieval Augmented Generation Really Make LLMs Safer?

AI in Cardiology: Bridging the Gap from Bytes to Beats

Discussion about this post