Pipeline Documentation

The reconstruction pipeline transforms a collection of unordered images into a sparse 3D point cloud and a set of camera poses. This page explains each stage in the pipeline, both the offline DVC training pipeline and the online inference pipeline.

Pipeline Overview

┌──────────────┐    ┌───────────────┐    ┌─────────────┐    ┌────────────────┐
│   validate   │ →  │ eda_baselines │ →  │   preprocess│ →  │    prepare     │
│  (data QC)   │    │ (EDA + stats) │    │  (images)   │    │ (input CSV)    │
└──────────────┘    └───────────────┘    └─────────────┘    └────────────────┘
                                                                     │
                                                                     ▼
                                                            ┌────────────────┐
                                                            │  run_pipeline  │
                                                            │  (MASt3R +     │
                                                            │   COLMAP SfM)  │
                                                            └────────────────┘
                                                                     │
                                                                     ▼
                                                            ┌────────────────┐
                                                            │    evaluate    │
                                                            │  (mAA + MLflow)│
                                                            └────────────────┘

Each stage is defined in dvc.yaml and tracked by DVC. Metrics and artifacts from each run are logged to MLflow.

Stage 1 — Data Validation

DVC stage: validate

Script: scripts/validate_data.py

What it does

Reads data/train_labels.csv and verifies that every image listed in the CSV exists on disk under data/train/. It reports:

Total rows in the labels file
Number of distinct images and scenes
Missing files
Duplicate image entries
Malformed rotation matrices or translation vectors

Outputs

data/validation/validation_report.json — full issue report
data/validation/validation_metrics.json — DVC metric file with issue_count and status_code

Acceptance threshold

A status_code of 0 means all files are present and valid. A value of 1 means warnings exist (e.g., missing files) but the pipeline can continue. A value of 2 indicates a critical error that halts downstream stages.

Stage 2 — Exploratory Data Analysis and Baselines

DVC stage: eda_baselines

Script: scripts/eda_baselines.py

What it does

Computes image statistics across the training dataset to establish the drift baseline. These baselines are later used by the drift monitor to detect when production images differ from training data. Statistics computed include:

Image resolution distribution (width, height histograms)
Pairwise image similarity matrix (using global descriptors)
Sharpness distribution (Laplacian variance)
Brightness and contrast statistics

Outputs

data/baselines/resolution_hist.png
data/baselines/similarity_matrix.png
data/baselines/sharpness_hist.png
data/baselines/eda_baselines.json — raw baseline statistics
data/baselines/eda_metrics.json — DVC metric summary

Stage 3 — Image Preprocessing

DVC stage: image_preprocess

Script: scripts/image_processing.py

Config: conf/preprocess.yaml

What it does

Applies a configurable preprocessing pipeline to each training image:

Deblurring — images with Laplacian variance below blurry_threshold are sharpened or excluded depending on configuration.
Orientation normalisation — corrects image rotation based on EXIF metadata or a learned orientation estimator, so all images are upright before matching.

The preprocessing module is designed to be pluggable. Only stages listed in conf/preprocess.yaml are applied.

Outputs

data/processed/images/ — preprocessed image tree mirroring data/train/
data/processed/preprocess_report.json
data/processed/preprocess_metrics.json

Stage 4 — Data Preparation

DVC stage: prepare

Script: scripts/prepare_submission.py

What it does

Reads the preprocessed image paths and data/train_labels.csv to build data/prepared/prepared_input.csv. This CSV is in the IMC2025 submission format with nan placeholder values for rotation and translation — these are populated by the reconstruction stage.

Columns: image_id, dataset, scene, image, rotation_matrix, translation_vector.

Stage 5 — Scene Reconstruction (Core Pipeline)

DVC stage: run_pipeline

Script: scripts/reconstruct_scenes.py

Config: conf/mast3r.yaml (or conf/best_config.yaml in production)

This is the main computational stage. It implements the full IMC2025Pipeline.run() loop.

Shortlist Generation

Before matching, the pipeline generates a shortlist of candidate image pairs to match. Matching all N×N pairs is computationally infeasible for large datasets, so the shortlist generator selects the most promising pairs using an ensemble of global descriptor retrievers:

MASt3R-ASMK — vocabulary-tree-based retrieval using MASt3R’s dense descriptors and the ASMK aggregation method. This is the primary retriever.
MASt3R-SPoC — an alternative global descriptor from the MASt3R retrieval head.
DINOv2 — a general-purpose vision transformer used as a secondary global descriptor for cross-domain robustness.
ISC — a descriptor trained specifically for image copy detection, effective for repeated structures.

Each retriever proposes its top-K most similar images per query. The union of all proposals forms the final shortlist.

Feature Extraction and Matching

For each pair in the shortlist, the pipeline runs matching via the MASt3R Hybrid Matcher (type: mast3r_hybrid), which combines:

Dense matching — MASt3R’s end-to-end dense correspondence network operates at 512 px resolution and produces dense pixel-level matches.
Sparse matching — two local feature detectors provide complementary keypoints:
- ALIKED (with LightGlue) — a learned keypoint detector with up to 4096 keypoints per image at 1280 px resolution.
- MagicLeap SuperPoint — a classical-style detector with up to 4096 keypoints at 1600 px resolution.

Dense and sparse matches are fused late in the pipeline to maximise coverage.

COLMAP Incremental SfM

Fused matches are imported into a COLMAP database. COLMAP’s incremental Structure-from-Motion mapper then:

Selects an initial image pair with good homography overlap.
Triangulates an initial 3D point set.
Registers remaining images one by one via PnP.
Runs bundle adjustment after each batch of registrations.
Filters outlier points by reprojection error.

Key COLMAP parameters (from config):

mapper_min_model_size: 3 — minimum images to form a valid reconstruction.
mapper_max_num_models: 25 — maximum number of disconnected sub-models.

Outputs

data/reconstruction/eval_prediction.csv — IMC2025 format poses
data/reconstruction/sparse_reconstruction.ply — point cloud
data/reconstruction/reconstruction_metrics.json

Stage 6 — Evaluation

DVC stage: evaluate

Script: scripts/evaluate.py

What it does

Computes the mAA (mean Average Accuracy) metric, which is the primary quality measure for the IMC2025 competition. mAA measures the fraction of camera poses registered within a set of angular and translation error thresholds.

It also computes:

Per-dataset scores and mAA values
Clusterness score (how well images cluster geometrically)
Registration rate

All metrics are logged as a child MLflow run under the parent DVC run.

Outputs

data/evaluation/metrics.json
data/evaluation/git_status.txt

Online Inference Pipeline

The online pipeline (triggered via POST /upload) mirrors the DVC pipeline but runs directly without DVC:

ZIP extraction → temporary workspace
MASt3R hybrid matching on GPU worker (GPUModelWorker.reconstruct())
COLMAP SfM in the same temporary workspace
PLY export via pycolmap.Reconstruction.export_PLY()
Voxel downsampling (utils/decimate.py) to ≤500,000 points
Results persisted to /app/results/

The pipeline configuration is loaded from conf/best_config.yaml (if present) with fallback to conf/mast3r.yaml.

Model Selection and Promotion

After each DVC experiment run, scripts/select_best_run.py queries MLflow for the run with the highest mAA_overall metric in the scene_reconstruction_dvc experiment. It copies that run’s configuration to conf/best_config.yaml, which becomes the active production config on the next ray-serve restart.

Drift Monitoring and Retraining

The DriftMonitor class (scripts/drift_monitor.py) compares production image statistics to the baselines in data/baselines/eda_baselines.json. It checks:

Mean brightness
Mean contrast
Mean sharpness
Aspect ratio

If any metric drifts beyond the configured threshold, an alert is raised. The Airflow drift_detection_dag polls Prometheus every 30 minutes for the feature_drift_status metric. If drift is detected, it sends an email alert to the configured SMTP_USER. High-severity drift additionally triggers experiment_pipeline_dag automatically via the Alertmanager webhook.