MAMA-SYNTH

Synthesizing Virtual Contrast-Enhancement in Breast MRI

Participate on Grand Challenge

Context

Dynamic contrast-enhanced MRI (DCE-MRI) plays a central role in breast cancer management, but its reliance on gadolinium-based contrast agents raises various concerns. MAMA-SYNTH introduces a standardized, clinically informed benchmark for evaluating generative models, with the goal of advancing the development of contrast-reduced and contrast-free breast MRI protocols.

Why?
64
Gd
Gadolinium

Side Effects

Gadolinium deposits, even from chelated agents, can lead to long-term accumulation, potential neurotoxicity and trigger nephrogenic systemic fibrosis.

πŸ’§

Contamination

Gadolinium is detected in drinking water supplies worldwide and beverages, raising concerns about long-term exposure and environmental health impacts.

πŸ’°

Accessibility

Gadolinium contrast agents significantly increase the cost of MRI examinations, limiting accessibility in resource-constrained settings.

Task

The task of the challenge is to synthesize single-timepoint 2-dimensional post-contrast breast DCE-MRI slices from corresponding pre-contrast T1-weighted MRI inputs using paired clinical data. Participating algorithms operate on pre-contrast images and generate synthetic peak enhancement post-contrast output.

Pre-contrast MRI

Pre-contrast

πŸ€–
β†’
Post-contrast MRI

Peak-enhancement

Data

The challenge utilizes diverse datasets to ensure algorithmic generalizability across different scanners and populations:

Training Cohort: MAMA-MIA Dataset

This dataset contains pre-treatment DCE-MRI from 1,506 patients from 25 + centers across the United States. The MAMA-MIA dataset was utilized as the training set in the first edition of MAMA Challenges for Primary Tumor Segmentation and Pathologic Complete Response Prediction. Learn more about the MAMA-MIA benchmark at the MAMA-MIA Challenge 2025 and access the complete dataset at MAMA-MIA Dataset with key properties shown below:


Acquisition Plane

Axial 84.4%
Sagittal 15.6%

Magnetic Field Strength

1.5T 72.1%
3T 27.9%

Scanner Manufacturers

GE 64.1%
Siemens 27.3%
Philips 8.6%


We note that participants are allowed to train their models on any further dataset, as long as said dataset is publicly available. To ensure fair evaluation across participating teams, the usage of private data is not allowed in this challenge. We further note that by participating in this challenge, participants agree to comply with EO 14117, 28 CFR Part 202, and Guide Notice NOT-OD-25-083 and acknowledge that the usage of NIH Controlled-access Data Repositories (CADRs) is prohibited in this challenge.

Testing Cohorts

The test data were acquired from two external centers located in the Netherlands and Argentina. Each test case refers to a 2D slice extracted from a patient’s DCE scan.

For each patient, the slice containing the largest malignant tumor area is selected from the peak enhancement phase. The peak-enhancement phase is defined as the time point with the highest signal intensity within the tumor region.

All images are fat-suppressed and acquired in the axial plane. The main statistics are summarized below:

Field Radboud UMC
The Netherlands
Instituto Alexander Fleming
Argentina
Number of Cases 200 100
Image Dimension 416 Γ— 416 px 512 Γ— 512 px
Contrast Agent DOTAREM (99%), GADOVIST (0.5%) DOTAREM, GADOVIST
Manufacturer Siemens GE
Magnetic Field Strength 3T 1.5T
Molecular Subtype
↳ Luminal 165 (85.7 %) 37 (37 %)
↳ Triple Negative 23 (9.4 %) 30 (30 %)
↳ Other 12 (4.9 %) 20 (20 %)

Evaluation Framework

Submissions are evaluated across four metric groups spanning pixel-level fidelity, perceptual realism, diagnostic classification performance, and segmentation accuracy.

Metric Group 1 Image-to-Image Comparison
Pixel Similarity

MSE Mean Squared Error

MSE measures the average squared difference between each pixel in the synthesized output and its corresponding pixel in the ground-truth post-contrast image. Lower values indicate greater pixel-level fidelity.

Perceptual Similarity

LPIPS Learned Perceptual Image Patch Similarity

LPIPS computes perceptual distance between images using deep network feature activations, capturing texture and structural similarity closer to human perception than pixel-wise metrics. Lower values indicate more realistic synthesis.

Metric Group 2 ROI-to-ROI Comparison
Tumor Texture Similarity

SSIM Structural Similarity Index

SSIM evaluates image quality by jointly measuring luminance, contrast, and structural similarity within local patches. Applied to the tumor ROI, it captures how well the synthesized enhancement texture matches the reference. Values range from 0–1; higher is better.

Distribution Realism

FRD FrΓ©chet Radiomics Distance

FRD adapts the FID framework to radiomic feature space, measuring the FrΓ©chet distance between the feature distributions of the synthesized ROI patches and real post-contrast patches. Lower values indicate that the synthesized tumors are more statistically indistinguishable from real enhancement patterns.

Metric Group 3 Downstream Task: Classification
Binary Classification

AUROC Luminal

Measures the classifier's ability to distinguish Luminal A/B tumors from all other subtypes on synthesized images. A score of 1.0 indicates perfect separation; 0.5 is chance.

Binary Classification

AUROC TNBC

Measures the classifier's ability to distinguish Triple-Negative tumors from all other subtypes on synthesized images. A score of 1.0 indicates perfect separation; 0.5 is chance.

Metric Group 4 Downstream Task: Segmentation
Overlap Accuracy

DICE SΓΈrensen–Dice Coefficient

The Dice coefficient measures voxel-level overlap between the predicted segmentation mask on synthesized post-contrast and the ground-truth mask. It is the harmonic mean of precision and recall over the segmented region. Values range from 0–1; higher values indicate better spatial overlap.

Boundary Accuracy

Hausdorff Distance 95th Percentile

The 95th-percentile Hausdorff Distance (HD95) measures the worst-case boundary deviation between the predicted segmentation mask on synthesized post-contrast and reference segmentation contours, excluding the top 5% of outlier distances for robustness. Lower values indicate more precise boundary delineation.

Ranking Scheme
RANKING LOGIC METRIC GROUP 1 Image METRIC GROUP 2 ROI METRIC GROUP 3 Classification METRIC GROUP 4 Segmentation MSE (1.1) SSIM (2.1) AUROC β€” Luminal DICE (4.1) LPIPS (1.2) FRD (2.2) AUROC β€” TNBC Hausdorff (4.2) Avg Rank 1 Avg Rank 2 Avg Rank 3 Avg Rank 4 Final Ranking Average of task ranks

Timeline

May 1 Validation Phase Opens
June 15 Test Phase Opens
June 30 Last Submission Deadline
August 1 Official Results Release
October 8 Winners Announcement at Deep-Breath Workshop (MICCAI 2026)

Organization Committee

Richard Osuala

Richard Osuala

Universitat de Barcelona, Spain

Challenge Co-Lead
Smriti Joshi

Smriti Joshi

Universitat de Barcelona, Spain

Challenge Co-Lead
Jarek van Dijk

Jarek van Dijk

Radboud University Medical Centre, Netherlands

Challenge Co-Lead
Luyi Han

Luyi Han

Radboud University Medical Centre, Netherlands

Lidia Garrucho

Lidia Garrucho

Universitat de Barcelona, Spain

Oliver Diaz

Oliver Diaz

Universitat de Barcelona & CVC, Spain

Maria Laura Cosaka

Maria Laura Cosaka

Instituto Alexander Fleming, Argentina

Antonio Portaluri

Antonio Portaluri

The Netherlands Cancer Institute (NKI), Netherlands

Advisory Committee

Simone Balocco

Simone Balocco

Universitat de Barcelona & CVC, Spain

Karim Lekadir

Karim Lekadir

Universitat de Barcelona & ICREA, Spain

Yaofei Duan

Yaofei Duan

Radboud University Medical Centre, Netherlands

Tianyu

Tianyu Zhang

Radboud University Medical Centre, Netherlands

Ritse Mann

Ritse Mann

Radboud University Medical Centre, Netherlands

Daniel Mysler

Daniel Mysler

Instituto Alexander Fleming, Argentina

Contact

For inquiries about the MAMA-SYNTH challenge, feel free to reach out to Richard Osuala (richard.osuala[at]ub.edu), and Smriti Joshi (smriti.joshi[at]ub.edu).

Partners & Institutions