Loading...
Tech articles
7 min read

CoralBay: The 3D Foundation Model for Radiology CT-Scans by kaiko

Introducing CoralBay: A fully open-source, native 3D foundation model for radiology that delivers state-of-the-art CT scan analysis with unmatched data efficiency.

Figure 1-4: Ground-truth annotations vs CoralBay predictions on various datasets.

Each year, over 300 million CT scans are captured as rich, high-fidelity 3D volumes that give doctors a detailed view of the human body for diagnosis and surgical planning. However, the status quo in current AI models is to process these volumes as stacks of 2D slices, effectively flattening the data and discarding critical spatial relationships, or to rely on complicated, highly specialised methods that are difficult to scale and build upon.

Enter CoralBay: our native 3D foundation model for CT, pre-trained on only 11K unlabeled volumes. With a simple architecture and training pipeline, it produces meaningful representations from raw data that generalize across the full clinical stack, from scan-level classification down to fine-grained lesion segmentation.

Quick Highlights

  • Native 3D Intelligence: Understands anatomy as continuous volumetric structures.
  • Simple, Strong, Generalizable: Strong multi-task performance from a simple training pipeline and just 11K unlabeled volumes, ready to adapt to various downstream tasks.
  • Clinical Depth: A single backbone with built-in invariance to HU values, excelling across the full task spectrum from broad organ identification to fine-grained tumour segmentation
  • Open Science: Fully open-source weights and benchmarks to accelerate global medical AI research.

The Challenge: 3D is Different

Standard vision models assume 2D RGB images; three colour channels, consistent resolution, and millions of available labeled examples. CT scans violate every one of these assumptions.

Figure 5: Challenges in 3D data representation. Left: Narrow windows clarify soft tissue; wide windows preserve high-density data. Center: Thin slices increase resolution; thick slices improve SNR but cause blurring. Right: 3D consistency is required across all orthogonal planes.

  • Hounsfield Units, not pixels: Intensities reflect tissue density, not color. Different window settings highlight different anatomy (e.g., lungs vs. soft tissue), so models must be robust to multiple visualizations of the same scan.
  • Anisotropic resolution: Slice thickness varies (e.g., 1–5 mm), causing partial volume effects. Thicker slices improve signal-to-noise but blur fine structures, unlike typical image noise.
  • Volumetric spatial complexity: CT is inherently 3D. Small findings, including early-stage tumours that may occupy only a handful of voxels, require integrating information across the full 3D space to be reliably detected. Treating slices independently breaks the spatial continuity that makes these critical, hard-to-spot findings visible in the first place.

The CoralBay Approach

Hierarchical 3D Feature Learning

CoralBay extends DINO self-supervised learning to native 3D CT volumes. Using a hierarchical 3D Swin-Transformer, it captures long-range spatial relationships efficiently and learns multi-scale features ranging from organ-level anatomy to fine structures such as vessels.

Figure 6: CoralBay’s technical overviews of the training pipeline

Radiology-Specific Augmentations

Figure 7: HU ranges for pre-training data augmentation.

CoralBay replaces generic image augmentations with CT-specific transformations that reflect real-world imaging variability.

  • Random HU Windowing: Samples from multiple clinically relevant HU ranges (e.g., lung, liver, brain, abdomen, and full CT) to learn features that are robust across viewing settings.
  • Scanner robustness: Gaussian smoothing and histogram shifts simulate differences in scanners and reconstruction protocols, improving generalization across sites.
  • Local-to-global 3D context: Combines large and small 3D crops during training, helping the model link fine pathological details with their surrounding anatomy.

Scan-Level Inference via Sliding Window

While the backbone is trained on 96×96×96 crops, inference uses a sliding-window approach: the full scan is divided into overlapping 3D patches, each encoded independently, then stitched together. For classification, pooling merges the patch-level features into a single scan-level vector. For segmentation, features are passed to a Swin-UNETR decoder with skip connections, which preserves fine boundary details for voxel-wise segmentation.

Figure 8a-b: 3D scan-level classification (left) and segmentation (right) sliding window Inference

How CoralBay Compares

As the CT foundation model space matures, CoralBay can be understood along three axes: how well it transfers to downstream tasks with both frozen and fine-tuned encoders, how little pre-training data it requires, and how broadly it generalises across classification and segmentation tasks.

To test this, we evaluated CoralBay on 11 datasets spanning classification and segmentation across diverse anatomical targets, with two model variants: CoralBayU96B (53.2M) and CoralBayU96H (847M).

Quantitative Results

Table A: Quantitative performance across classification (Multi-class Accuracy/Binary AUROC) and segmentation (Dice score) tasks, as evaluated via the eva framework.

- Classification: CoralBayU96H achieves the best or tied-best frozen-encoder results across all four classification benchmarks — organ identification (OrganMNIST3D), lung nodule malignancy (NoduleMNIST3D, LUNA25), and COVID-19 classification (CC-CCII).

- Segmentation: With a frozen encoder and a lightweight 22.8M-parameter decoder, CoralBay performs comparably to VoCo across seven segmentation benchmarks despite the significant data gap.

Qualitative Analysis

Visualising the model's performance (Figures 1-4) reveals its precision across diverse anatomical challenges:

  • Multi-Organ Segmentation: On benchmarks like BTCV and FLARE22, the model accurately delineates complex abdominal structures including the liver, spleen, and kidneys with high spatial consistency.
  • Lesion and tumour boundaries: Evaluation on LiTS17 and MSD Task 7 demonstrates its ability to resolve fine-grained tumor boundaries within the liver and pancreas, critical for clinical diagnostics.
  • Reliability: The difference maps highlight minimal variance between CoralBay segmentations and ground-truth annotations, confirming robust feature capture in low-contrast regions.

What the Ablations Reveal

Ablation studies in the paper confirm that CoralBay’s performance arises from a combination of structural and data-driven innovations:

  • Native 3D Dominance: Switching from the native dino 2D to CoralBay 3D spatial modeling provides a substantial improvement of 20% in Dice score (BTCV), proving that consistent 3D inductive biases are essential for medical volumes.
  • Effective Scaling: Segmentation accuracy improves consistently as both the model size and the pre-training dataset grow, resulting in an overall 15% improvement (LiTS17). This demonstrates the framework’s ability to benefit from larger data pools and suggests that training with even more data could further strengthen the model.
  • Superior Label Efficiency: On challenging tumor tasks, self-supervised pre-training acts as a powerful prior, outperforming heavily tuned models in low-data settings.

Why this matters for kaiko

CoralBay establishes a powerful 3D foundation for radiological data. Today, it masters native spatial reasoning and physics-informed intensity invariance, delivering highly data-efficient, clinically robust models for classification and segmentation across organs, pathologies, and scanners. This raises the ceiling for high‑resolution pathology detection while also providing a standard, open benchmark through the 3D radiology leaderboard.

Looking ahead, CoralBay serves as a visual anchor for Multimodal Medical Intelligence, where AI systems jointly interpret imaging, text, lab values, and other clinical signals in a unified clinical context. Built on a strong 3D visual backbone, it moves toward an agentic vision paradigm, where models go beyond interpretation to actively reason over scans, plan multi-step analyses, and coordinate tool use across modalities and time.

Open Science

CoralBay promotes open, reproducible evaluation in medical AI through core contributions:

  • Comprehensive Whitepaper: We provide a detailed whitepaper describing the model, training process, datasets, evaluation setup, and results.
  • Standardized Benchmarking: We’re scaling 🔗 eva — our open-source evaluation framework — to include comprehensive support for 3D radiology. This includes standardized data loaders, model backbones, and a public leaderboard designed to make 3D medical AI research transparent, reproducible, and easily comparable.
  • Open Access: Model weights are available on GitHub and Hugging Face.

Explore the 🔗 paper, 🔗 leaderboard, and 🔗 model weights to learn more.

CoralBay is released under the MIT License and is intended solely for research purposes. This model has not been validated or certified for clinical use and must not be used to inform, guide, or replace clinical decision-making, diagnosis, treatment, or patient management.


CoralBay's capabilities — including tumor segmentation, volume estimation, lesion tracking, and scan-level diagnosis — have been evaluated in research settings only. Performance may not generalise across patient populations, scanner hardware, imaging protocols, or clinical environments. The model has not been reviewed or approved by any regulatory authority (including but not limited to the FDA or under the EU MDR).


Users are solely responsible for assessing suitability for their intended use. Kaiko strongly recommends independent validation, ethical review, and compliance with applicable laws before any downstream application — particularly in healthcare AI contexts. CoralBay is provided "as is", without warranty of any kind. Kaiko accepts no liability for outcomes arising from its use.