Welcome back to This Week in Cardiovascular AI - a bi-monthly email newsletter for followers and subscribers that summarizes recent research in Cardiovascular AI and Digital Health that we find important and unique.
The biggest news in this edition is the introduction of EchoNext, a deep learning model that can accurately detect various forms of structural heart disease (SHD) from electrocardiograms (ECGs). This represents a significant advancement in SHD screening. While other studies have shown the potential of AI in identifying heart disease they have often been limited by development in narrow populations or targeting only select heart conditions. The EchoNext platform was trained on over 1 million heart rhythm and imaging records across a large and diverse health system, detects a broad composite of clinically relevant SHDs and outperforms Cardiologists. The public release of the model's weights and a large, annotated dataset also fosters further development and transparency in the field of AI in healthcare.
Full coverage below, plus a lot more, including our special section, What else I read…
Not a subscriber? Join now to stay up-to-date and consider upgrading to support our work!
Detecting Structural Heart Disease from Electrocardiograms Using AI
Nature
This landmark study introduces EchoNext, a deep learning model trained on over 1.2 million ECG–echocardiogram pairs from 230,000+ patients across 8 hospitals, to detect a broad composite of structural heart diseases (SHD) from 12-lead ECGs. EchoNext demonstrated robust performance across diverse populations and care settings, surpassed cardiologists in head-to-head comparisons, and successfully identified new cases of SHD in a real-world clinical trial. All code, model weights, and a 100,000-patient dataset were publicly released, establishing a new benchmark for AI-ECG SHD detection.
Key Findings
Model Development & Cohorts:
Training data: 1,245,273 ECG–echo pairs from 230,318 adults (NYP hospitals, 2008–2022).
Train/Validation/Test: ~800k / 36k / 45k patients (1 ECG–echo pair per patient).
Outcome: Composite SHD label derived from echo reports:
LVEF ≤ 45%
LV wall thickness ≥ 1.3 cm
RV dysfunction
Pulmonary hypertension
Moderate/severe valve disease
Moderate/large pericardial effusion
Model Architecture:
EchoNext: Multitask convolutional neural network
Inputs: ECG waveforms + demographic and ECG-derived features
Outputs: Composite SHD prediction + individual disease probabilities
Internal Validation (NYP Test Set, n = 44,719):
Composite SHD prediction:
AUROC = 85.2%
AUPRC = 78.5%
Diagnostic odds ratio = 12.8
Top-performing individual labels:
RV dysfunction: AUROC 91%
LVEF ≤ 45%: AUROC 90%
TR regurgitation: AUROC 87%
Lowest-performing labels:
Aortic regurgitation, pericardial effusion, pulmonary regurgitation (AUROC ~78–80%)
External Validation:
Sites: Cedars-Sinai, UCSF, Montreal Heart Institute (n = 27,000+ patients)
AUROC: 78.4–80.0% (5–7% drop vs. NYP)
AUPRC: Similar or higher due to higher SHD prevalence at external sites
Performance remained robust across age, sex, race, and care settings.
Clinical Deployment & Trials
Silent Deployment (NYP, 2023):
84,875 patients with no prior echo
EchoNext ran silently on ECGs
18% (n = 15,094) received an echo during follow-up
38% of those were diagnosed with SHD
Model AUROC = 83%, AUPRC = 81%
Head-to-Head: AI vs. Cardiologists:
Cardiologists’ accuracy = 64% (without AI), 69% (with AI)
EchoNext alone = 77.3% accuracy, outperforming both
DISCOVERY Trial (Prospective):
100 patients without prior echo, stratified by AI score using ValveNet (prior model)
EchoNext post-hoc analysis:
High-risk group: 73% had SHD
Low-risk group: 6% had SHD
Statistically significant stratification (p < 0.001)
Clinical Takeaways
AI Surpasses Human Interpretation for SHD Detection: EchoNext outperformed experienced cardiologists in detecting structural heart disease using ECG alone—especially valuable in preclinical or non-specific presentations.
Broad Disease Coverage and Generalizability: Unlike previous models focused on a single pathology, EchoNext predicts a wide range of SHD phenotypes, generalizing across 11 hospitals and 4 health systems.
Actionable, Scalable, and Publicly Available: The model’s output can trigger echo referrals for patients who may otherwise be missed. Its public release (including 100k ECGs and model weights) enables rapid innovation and validation.
Supports Opportunistic Screening: Real-world deployment shows potential for early, low-cost detection of SHD in patients undergoing ECGs for unrelated reasons—especially impactful in under-tested populations.
Future Directions: Multimodal, Longitudinal AI: The authors advocate expanding AI-ECG to integrate serial ECGs, labs, or imaging data—moving from point prediction to continuous risk estimation.
Read it here: Link
Artificial Intelligence‐Enabled Short‐Term Ambulatory Monitoring ECG during Sinus Rhythm for Prediction of Hidden Atrial Fibrillation
Journal of Cardiovascular Electrophysiology
This retrospective study developed a deep learning model that predicts occult atrial fibrillation (AF) using short-term 3-lead Holter ECG recordings during normal sinus rhythm (SR). The AI model integrates a two-stage architecture that classifies both segment-level and patient-level AF risk from SR ECGs, demonstrating strong performance and outperforming traditional AF risk scoring systems. Night-time recordings yielded superior predictive accuracy, suggesting diurnal variation in electrocardiographic signatures of latent AF.
Key Findings
Model Architecture:
Two-Stage Deep Learning Framework:
Stage 1: ConvNeXt-B CNN classifies 60-second SR ECG segments using spectrograms derived from V1, V5, and lead II via short-time Fourier transform (STFT).
Stage 2: Features from 10-minute ECG recordings are extracted using the ConvNeXt-B encoder and passed to a two-layer LSTM network for patient-level classification.
Training Details:
Loss Function: Focal loss
Optimizer: AdamW
Learning Rate: 1e-5 with Reduce-on-Plateau
Epochs: 200
Study Cohort:
Total: 934 patients
AF Group: 640 with paroxysmal AF detected on Holter
Control Group: 294 with no AF history or detection
ECG data derived from Medilog® 3-lead Holter system, mimicking V1, V5, and lead II.
Performance:
Diurnal Analysis:
Night-time recordings: AUC = 0.903, Accuracy = 0.832, Specificity = 0.844
Day-time recordings: AUC = 0.869, Accuracy = 0.800, Specificity = 0.757
Comparative Performance:
Traditional AF risk scores showed inferior AUCs:
Taiwan AF score: 0.793
CHA2DS2-VASc: 0.731
mC2HEST: 0.768
HATCH: 0.717
Clinical Takeaways
AF Risk Detection from Sinus Rhythm: This study shows that AI can extract subtle electrocardiographic signatures from SR—likely related to P-wave morphology, autonomic tone, or atrial substrate—that are predictive of future AF.
Night-time ECGs Enhance Detection: Night-time recordings outperform day-time ones, potentially due to reduced noise, lower heart rates, or more consistent atrial conduction patterns.
Real-World Holter Applicability: Unlike most studies using 12-lead ECGs or implantable devices, this model performs well using 3-lead Holter data.
Improves on Existing Risk Scores: Outperforms all conventional clinical scoring systems for predicting AF, including those derived from large, population-based cohorts.
Guides Holistic AF Monitoring Strategy: The algorithm can be used to triage patients for prolonged ECG monitoring, identifying those with latent AF not captured in short Holter recordings.
Read it here: Link
Artificial Intelligence-enabled Electrocardiography and Echocardiography to Track Preclinical Progression of Transthyretin Amyloid Cardiomyopathy
European Heart Journal
This multicenter study demonstrates that artificial intelligence (AI) applied to standard transthoracic echocardiography (TTE) and 12-lead electrocardiograms (ECGs) can detect the preclinical progression of transthyretin amyloid cardiomyopathy (ATTR-CM) years before conventional diagnosis. The AI-Echo and AI-ECG models were trained to identify subtle phenotypic changes predictive of disease and validated across two large health systems. When used together, these tools offer a scalable, non-invasive, and highly sensitive strategy to risk-stratify and monitor individuals long before they undergo nuclear imaging.
Key Findings
Model Architecture:
AI-Echo: 3D ResNet-18 architecture trained on standard TTE views (e.g., apical 4-chamber, parasternal long axis) with mean-averaging across views.
AI-ECG: EfficientNet-B3 backbone trained on ECG image renderings (from GE and Philips machines) using contrastive learning initialization.
Both models used balanced binary cross-entropy losses, data augmentation, and early stopping during training.
Training & Validation:
Training set: De novo training on 308 TTE studies and a matched set of ECGs from patients with or without confirmed ATTR-CM.
Testing cohorts:
Internal: 984 patients from Yale-New Haven Health System (YNHHS).
External: 806 patients from Houston Methodist Hospitals (HMH).
Combined data: 7352 TTEs and 32,205 ECGs.
Performance:
Cross-sectional discrimination:
AI-Echo: AUROC = 0.93 (95% CI: 0.90–0.96)
AI-ECG: AUROC = 0.91 (95% CI: 0.88–0.93)
Longitudinal trends:
AI probabilities began to diverge 3+ years before diagnosis in those who eventually tested positive.
Annualized probability progression:
AI-Echo: 2.6%/yr (cases) vs 0.7%/yr (controls) at YNHHS.
AI-ECG: 1.4%/yr (cases) vs 0.7%/yr (controls) at YNHHS.
Similar patterns replicated in HMH cohort.
Diagnostic Utility:
1–3 years pre-diagnosis, threshold ≥0.05:
AI-Echo: Sensitivity 78% (YNHHS), 49% (HMH); Specificity 69% (YNHHS), 79% (HMH)
AI-ECG: Sensitivity 75% (YNHHS), 65% (HMH); Specificity 56% (YNHHS), 47% (HMH)
Combined AI-ECG + AI-Echo:
Double-negative (both below threshold): Sensitivity ~91% (YNHHS), ~86% (HMH)
Double-positive: Specificity ~86% (YNHHS), ~89% (HMH)
Clinical Takeaways
Scalable Early Detection: This study provides strong evidence that AI models can detect subclinical changes in routine TTE and ECGs that precede clinical diagnosis of ATTR-CM by years.
Multimodal Synergy Enhances Accuracy: Using both AI-Echo and AI-ECG in tandem improves screening performance. A double-negative result offers high sensitivity to safely rule out disease, while a double-positive boosts specificity.
Applicable to Real-World Populations: The models generalized well across two distinct health systems with differing demographics and referral patterns.
Supports Risk-Guided Therapeutic Deployment: AI-driven phenotyping could enable timely identification and monitoring of at-risk patients, potentially informing when to initiate disease-modifying therapies or refer for confirmatory imaging.
Future Integration with Longitudinal Monitoring: The models support a paradigm shift from episodic testing to continuous digital biomarker tracking, enabling precision surveillance in high-risk groups.
Read it here: Link
Feasibility of Machine Learned Intracardiac Electrograms to Predict Postinfarction Ventricular Scar Topography
Circulation: Arrhythmia and Electrophysiology
This preclinical study developed a machine learning pipeline to classify postinfarction ventricular scar topography—including scar depth—using intracardiac electrograms (EGMs) recorded from multielectrode catheters in a sheep infarct model. By co-registering 3D electroanatomic maps with whole-heart histology, the study demonstrates that convolutional neural networks (CNNs), especially when trained on full-length unipolar EGM time series, significantly outperform voltage-based and signal-processed features in predicting scar distribution across endocardial, intramural, and epicardial layers.
Key Findings
Model Architecture:
Signal Processing + Gradient Boosting:
794 features extracted from windowed bipolar EGMs using the
tsfresh
Python package.Gradient Boost classifiers trained on top 20 and top 100 features.
Deep Learning CNN (InceptionTime):
Trained separately on:
Windowed bipolar EGMs
Full 2.5-second bipolar EGMs
Full 2.5-second unipolar EGMs
Ensemble of 5 CNNs using varying kernel sizes for multiscale temporal feature capture.
Training & Validation:
Subjects: 5 infarcted sheep + 1 control; 18 electroanatomic maps (3 wavefronts per animal).
Data:
20,091 EGMs collected; 11,551 matched to 421 histologically segmented biopsies.
Scar patterns labeled as:
No scar
At least endocardial scar
At least intramural scar
Epicardial-only scar
Performance:
Voltage-based classification:
Bipolar + unipolar voltages alone: AUCs 0.586–0.706; poor sensitivity for deep scars.
Signal-processed features (Top 20):
AUC: 0.815 (no scar), 0.810 (endocardial), 0.704 (intramural), 0.681 (epicardial-only)
Many top features involved frequency-domain analysis (e.g., Fourier transforms).
CNN (Best = Full Unipolar Time Series):
No scar: AUC = 0.977, Accuracy = 0.929
Endocardial: AUC = 0.970, Accuracy = 0.919
Intramural: AUC = 0.909, Accuracy = 0.959
Epicardial-only: AUC = 0.926, Accuracy = 0.958
Mapping Output:
Generated 3D scar prediction maps importable into electroanatomic mapping systems, enabling direct integration into clinical workflows.
Clinical Takeaways
Limits of Voltage Thresholds: Traditional voltage cutoffs (e.g., bipolar <1.5 mV) underperform, especially for non-endocardial scars. Deep learning reveals richer electrogram features predictive of scar depth.
3D Scar Mapping from EGMs Alone: The CNN approach accurately classifies scar type and location from single-beat EGMs without requiring imaging, suggesting a pathway for real-time, image-free substrate mapping.
Histologically Grounded Training: The use of whole-heart histology as a gold standard strengthens the biological credibility of model predictions—critical for eventual translation to human VT ablation.
Tool for Human Validation: Custom code and CNN model are publicly available for application to human EGM datasets, offering a testable route for external validation.
Toward Smarter Substrate Ablation: By identifying arrhythmogenic substrates invisible to voltage mapping, this approach could enhance VT ablation success, particularly in structurally complex or emergent settings.
Read it here: Link
Enhanced Detection of Atrial Fibrillation in Single-Lead Electrocardiograms Using a Cloud-Based Artificial Intelligence Platform
Heart Rhythm
This study externally validated the Willem AI platform—a deep-learning system for arrhythmia detection—using 8,528 single-lead ECGs from the PhysioNet/Computing in Cardiology Challenge dataset. Without retraining, the cloud-based algorithm achieved cardiologist-level performance in identifying atrial fibrillation (AF) from 30-second KardiaMobile ECG tracings, vastly outperforming traditional rules-based algorithms and demonstrating promise for scalable, remote AF screening.
Key Findings
Data & Ground Truth:
Dataset: 8,528 30-second single-lead ECGs recorded via KardiaMobile (PhysioNet 2017 Challenge).
Labeling:
Ground truth based on consensus between benchmark algorithms and expert review.
External cardiologist adjudication performed for cases with discordance.
AI Platform:
Willem AI (Idoven):
Trained on >520,000 ECGs from diverse public and proprietary datasets.
CE-marked SaMD (Software as a Medical Device) capable of detecting 23 cardiac patterns.
Not trained on this dataset—true external validation.
Performance – Atrial Fibrillation Detection:
AI performance exceeded that of a rules-based algorithm on all metrics (e.g., Sensitivity: 84.2% vs. 48.7%).
Interpretability:
Expert review revealed that AI misclassifications were mostly due to:
Artifact-laden signals
Confusion between AF and PACs or sinus tachycardia with irregularity
Other Rhythm Findings:
AI detected other cardiac patterns in >1% of tracings, including:
PVCs (11.0%)
PACs (7.1%)
First-degree AV block (4.4%)
PAC couplets, bigeminy, and trigeminy
Clinical Takeaways
Cardiologist-Level AI from 1-Lead ECGs: Willem AI achieves expert-level AF detection from brief, 1-lead ECGs without retraining—supporting clinical deployment for opportunistic or remote screening.
Superior to Traditional Algorithms: The platform outperforms rules-based AF detectors, especially in sensitivity and PPV, which are crucial for screening accuracy.
Supports Remote, Scalable AF Screening: This study validates the promise of integrating smartphone-compatible 1-lead ECGs with cloud-based AI for large-scale AF detection in primary care, pharmacy, or at-home settings.
Toward Broader Rhythm Profiling: Although not adjudicated here, the AI's detection of other arrhythmias (e.g., PACs, PVCs, AV block) points toward comprehensive ECG interpretation beyond AF.
Robustness Across Devices: Although trained on a different population and devices, the AI maintained strong generalizability, reinforcing its clinical readiness and flexibility.
Read it here: Link
Clinical Phenotypes in Relation to Outcomes in Heart Failure Patients With Cardiac Resynchronization Therapy and Defibrillators (CRT‐D)
Journal of Cardiovascular Electrophysiology
This large registry study used unsupervised machine learning to define clinical phenotypes among 23,029 heart failure patients receiving CRT-D (cardiac resynchronization therapy with defibrillator). Hierarchical cluster analysis identified four reproducible patient subgroups with distinct comorbidity patterns and differing risks of all-cause mortality. These phenotypic clusters—ranging from low-risk to highly comorbid—were independently predictive of death at 1 year and over long-term follow-up, providing a more nuanced, data-driven framework for CRT-D risk stratification.
Key Findings
Methods:
Population: 23,029 CRT-D recipients in French hospitals (2010–2019), with no history of sustained VT/VF or cardiac arrest.
Analysis:
Patients divided into 3 groups:
Group 1: 50% random sample (n=11,514)
Group 2: Died within 1 year (n=1,604)
Group 3: Alive at 3 years (n=14,228)
Hierarchical clustering (Ward’s linkage) performed separately in each group using 35 clinical variables.
Missing data handled via multivariate imputation.
Identified Clusters (Phenotypes):
Cluster 1 – Low-Risk Phenotype
Younger, more female, low CAD, low CV risk factors, low comorbidities
Best survival outcomes
Cluster 2 – CAD-Dominant, Few Risk Factors
Predominantly male, high CAD burden (up to 84.4%), frequent revascularization
Lower burden of hypertension, diabetes, CKD
Intermediate mortality
Cluster 3 – Risk Factors/Comorbidities, Low CAD
High hypertension, diabetes, dyslipidemia
Lower CAD, but elevated non-CAD comorbidity burden
Intermediate mortality
Cluster 4 – Clinically Complex
High burden of CAD, AF, hypertension, diabetes, CKD, lung/liver disease
Overlap between ischemic and nonischemic cardiomyopathy
Worst prognosis
Mortality Outcomes:
All HRs statistically significant (p < 0.01).
Clusters were independently associated with mortality in multivariable Cox models.
Clinical Takeaways
Phenotyping Enhances Risk Stratification: Unsupervised clustering reveals subgroups with distinct comorbidity patterns and survival outcomes not captured by traditional CRT criteria.
Cluster 4 Requires Special Attention: Patients with combined CAD and systemic comorbidities had the highest mortality, highlighting the importance of managing modifiable risks post-CRT-D.
CAD Alone Isn’t the Whole Story: Cluster 3 (low CAD, high comorbidities) and Cluster 2 (high CAD, few comorbidities) had comparable mortality, suggesting that non-CAD risk factors carry prognostic weight.
Implications for Personalization: These data support a multiparametric, phenotype-guided approach to CRT-D candidacy, beyond LVEF and QRS alone—possibly informing future futility scores or shared decision-making.
Machine Learning in Real-World Registries: This study exemplifies how unsupervised ML can stratify large HF populations using routinely available clinical data.
Read it here: Link
What else I read…
Overcoming regulatory barriers to the implementation of AI agents in healthcare
From Nature Medicine
The integration of Artificial Intelligence (AI) agents into healthcare faces significant regulatory challenges, primarily because current frameworks are designed for static medical devices rather than the adaptive and autonomous nature of AI. Since the introduction of ChatGPT in 2022, there has been a broad surge in research and adoption of large language models (LLMs) and other forms of generative AI (GenAI), with this trend extending to healthcare. This has led to the development of numerous applications and professional products being implemented into clinical workflows. Many of these products, especially those with a clear medical purpose, may qualify as medical devices and are thus subject to high regulatory standards. However, a new challenge has emerged as GenAI applications are increasingly becoming autonomous agents, capable of independently executing and controlling goal-directed workflows on behalf of users with a high degree of autonomy.
The fundamental functioning of these AI agents relies on autonomous processes and decisions that are difficult for humans to monitor and model, making their regulation and approval pathway uncertain. Current regulatory frameworks are designed for products with a narrow scope, that are not adaptive, maintain human oversight, and do not evolve after market placement, which limits their applicability to meaningful GenAI applications and AI agents that tend to have a broader scope and high degree of autonomy.
Key points regarding the regulatory landscape and proposed solutions include:
Approvability Under Current Frameworks: AI-enabled health applications with a narrow scope and low autonomy are generally approvable, albeit with some difficulties. This includes some GenAI applications, as demonstrated by recent approvals in the EU and UK. However, systems that are both autonomous and broad in scope, such as meaningful AI agents or AI-driven surgical robots with autonomous decision-making, are currently considered non-approvable and would require major changes to regulatory frameworks.
Proposed Regulatory Adaptations: To enable the implementation of AI agents, policymakers and regulatory authorities will need to modify existing frameworks or develop new ones, aiming to minimize patient harm and provide clear requirements for safe innovation.
Minor to Medium Changes: Approaches could include the rational extension of "enforcement discretion" (where authorities choose not to enforce certain regulatory requirements for medical devices) or the risk-based qualification of certain devices as "non-medical devices". However, a blanket approach for sophisticated AI agents is unlikely given their potential impact.
Alternative Regulatory Pathways: The development of frameworks explicitly designed for AI-enabled medical devices has been proposed. These "voluntary alternative pathways" could be tailored to specific product types like GenAI or AI agents, enabling the use of novel evaluation approaches. "Adaptive pathways" represent a shift from static pre-market approval to dynamic oversight, involving real-world performance data and iterative updates, which is particularly relevant for agentic AI systems whose risk profiles are fully understood post-deployment.
More Progressive Ideas: One radical idea is to regulate GenAI applications analogously to human clinicians, involving structured "training" processes, tailored assessments, supervised real-life practice, and periodic re-evaluation. This approach, while compelling, raises critical questions about accountability, requiring mechanisms such as obligatory public registries, certification schemes, or financial surety bonds.
Ultimately, while GenAI models have initiated substantial developments in healthcare, the full potential of AI agents with their autonomy, adaptability, and broader scope is expected to drive the next transformation. However, achieving this will necessitate bold and forward-thinking reforms to current regulatory frameworks. Regulators must proactively prepare to balance safeguarding patient safety with promoting responsible innovation.