Pipeline Documentation
The reconstruction pipeline transforms a collection of unordered images into a sparse 3D point cloud and a set of camera poses. This page explains each stage in the pipeline, both the offline DVC training pipeline and the online inference pipeline.
Pipeline Overview
┌──────────────┐ ┌───────────────┐ ┌─────────────┐ ┌────────────────┐
│ validate │ → │ eda_baselines │ → │ preprocess│ → │ prepare │
│ (data QC) │ │ (EDA + stats) │ │ (images) │ │ (input CSV) │
└──────────────┘ └───────────────┘ └─────────────┘ └────────────────┘
│
▼
┌────────────────┐
│ run_pipeline │
│ (MASt3R + │
│ COLMAP SfM) │
└────────────────┘
│
▼
┌────────────────┐
│ evaluate │
│ (mAA + MLflow)│
└────────────────┘
Each stage is defined in dvc.yaml and tracked by DVC. Metrics and artifacts
from each run are logged to MLflow.
Stage 1 — Data Validation
DVC stage: validate
Script: scripts/validate_data.py
What it does
Reads data/train_labels.csv and verifies that every image listed in the CSV
exists on disk under data/train/. It reports:
Total rows in the labels file
Number of distinct images and scenes
Missing files
Duplicate image entries
Malformed rotation matrices or translation vectors
Outputs
data/validation/validation_report.json— full issue reportdata/validation/validation_metrics.json— DVC metric file withissue_countandstatus_code
Acceptance threshold
A status_code of 0 means all files are present and valid. A value of 1
means warnings exist (e.g., missing files) but the pipeline can continue.
A value of 2 indicates a critical error that halts downstream stages.
Stage 2 — Exploratory Data Analysis and Baselines
DVC stage: eda_baselines
Script: scripts/eda_baselines.py
What it does
Computes image statistics across the training dataset to establish the drift baseline. These baselines are later used by the drift monitor to detect when production images differ from training data. Statistics computed include:
Image resolution distribution (width, height histograms)
Pairwise image similarity matrix (using global descriptors)
Sharpness distribution (Laplacian variance)
Brightness and contrast statistics
Outputs
data/baselines/resolution_hist.pngdata/baselines/similarity_matrix.pngdata/baselines/sharpness_hist.pngdata/baselines/eda_baselines.json— raw baseline statisticsdata/baselines/eda_metrics.json— DVC metric summary
Stage 3 — Image Preprocessing
DVC stage: image_preprocess
Script: scripts/image_processing.py
Config: conf/preprocess.yaml
What it does
Applies a configurable preprocessing pipeline to each training image:
Deblurring — images with Laplacian variance below
blurry_thresholdare sharpened or excluded depending on configuration.Orientation normalisation — corrects image rotation based on EXIF metadata or a learned orientation estimator, so all images are upright before matching.
The preprocessing module is designed to be pluggable. Only stages listed in
conf/preprocess.yaml are applied.
Outputs
data/processed/images/— preprocessed image tree mirroringdata/train/data/processed/preprocess_report.jsondata/processed/preprocess_metrics.json
Stage 4 — Data Preparation
DVC stage: prepare
Script: scripts/prepare_submission.py
What it does
Reads the preprocessed image paths and data/train_labels.csv to build
data/prepared/prepared_input.csv. This CSV is in the IMC2025 submission format
with nan placeholder values for rotation and translation — these are populated
by the reconstruction stage.
Columns: image_id, dataset, scene, image, rotation_matrix,
translation_vector.
Stage 5 — Scene Reconstruction (Core Pipeline)
DVC stage: run_pipeline
Script: scripts/reconstruct_scenes.py
Config: conf/mast3r.yaml (or conf/best_config.yaml in production)
This is the main computational stage. It implements the full
IMC2025Pipeline.run() loop.
Shortlist Generation
Before matching, the pipeline generates a shortlist of candidate image pairs to match. Matching all N×N pairs is computationally infeasible for large datasets, so the shortlist generator selects the most promising pairs using an ensemble of global descriptor retrievers:
MASt3R-ASMK — vocabulary-tree-based retrieval using MASt3R’s dense descriptors and the ASMK aggregation method. This is the primary retriever.
MASt3R-SPoC — an alternative global descriptor from the MASt3R retrieval head.
DINOv2 — a general-purpose vision transformer used as a secondary global descriptor for cross-domain robustness.
ISC — a descriptor trained specifically for image copy detection, effective for repeated structures.
Each retriever proposes its top-K most similar images per query. The union of all proposals forms the final shortlist.
Feature Extraction and Matching
For each pair in the shortlist, the pipeline runs matching via the MASt3R Hybrid
Matcher (type: mast3r_hybrid), which combines:
Dense matching — MASt3R’s end-to-end dense correspondence network operates at 512 px resolution and produces dense pixel-level matches.
Sparse matching — two local feature detectors provide complementary keypoints:
ALIKED (with LightGlue) — a learned keypoint detector with up to 4096 keypoints per image at 1280 px resolution.
MagicLeap SuperPoint — a classical-style detector with up to 4096 keypoints at 1600 px resolution.
Dense and sparse matches are fused late in the pipeline to maximise coverage.
COLMAP Incremental SfM
Fused matches are imported into a COLMAP database. COLMAP’s incremental Structure-from-Motion mapper then:
Selects an initial image pair with good homography overlap.
Triangulates an initial 3D point set.
Registers remaining images one by one via PnP.
Runs bundle adjustment after each batch of registrations.
Filters outlier points by reprojection error.
Key COLMAP parameters (from config):
mapper_min_model_size: 3— minimum images to form a valid reconstruction.mapper_max_num_models: 25— maximum number of disconnected sub-models.
Outputs
data/reconstruction/eval_prediction.csv— IMC2025 format posesdata/reconstruction/sparse_reconstruction.ply— point clouddata/reconstruction/reconstruction_metrics.json
Stage 6 — Evaluation
DVC stage: evaluate
Script: scripts/evaluate.py
What it does
Computes the mAA (mean Average Accuracy) metric, which is the primary quality measure for the IMC2025 competition. mAA measures the fraction of camera poses registered within a set of angular and translation error thresholds.
It also computes:
Per-dataset scores and mAA values
Clusterness score (how well images cluster geometrically)
Registration rate
All metrics are logged as a child MLflow run under the parent DVC run.
Outputs
data/evaluation/metrics.jsondata/evaluation/git_status.txt
Online Inference Pipeline
The online pipeline (triggered via POST /upload) mirrors the DVC pipeline but
runs directly without DVC:
ZIP extraction → temporary workspace
MASt3R hybrid matching on GPU worker (
GPUModelWorker.reconstruct())COLMAP SfM in the same temporary workspace
PLY export via
pycolmap.Reconstruction.export_PLY()Voxel downsampling (
utils/decimate.py) to ≤500,000 pointsResults persisted to
/app/results/
The pipeline configuration is loaded from conf/best_config.yaml (if present)
with fallback to conf/mast3r.yaml.
Model Selection and Promotion
After each DVC experiment run, scripts/select_best_run.py queries MLflow for
the run with the highest mAA_overall metric in the
scene_reconstruction_dvc experiment. It copies that run’s configuration to
conf/best_config.yaml, which becomes the active production config on the
next ray-serve restart.
Drift Monitoring and Retraining
The DriftMonitor class (scripts/drift_monitor.py) compares production image
statistics to the baselines in data/baselines/eda_baselines.json. It checks:
Mean brightness
Mean contrast
Mean sharpness
Aspect ratio
If any metric drifts beyond the configured threshold, an alert is raised. The
Airflow drift_detection_dag polls Prometheus every 30 minutes for the
feature_drift_status metric. If drift is detected, it sends an email alert
to the configured SMTP_USER. High-severity drift additionally triggers
experiment_pipeline_dag automatically via the Alertmanager webhook.