Data Sources ============ This page describes the datasets used for training, evaluation, and testing the reconstruction pipeline. ---- Dataset Summary ---------------- .. list-table:: :widths: 20 20 15 45 :header-rows: 1 * - Dataset - Source - License - Known Bias / Notes * - **IMC25 train** - Kaggle / CVG Group - CC BY - Outdoor scenes, heritage sites; well-lit, high-resolution imagery * - **IMC25 test** - Kaggle / CVG Group - CC BY - Includes staircase scenes and ET-type scenes; more challenging geometry * - **custom_warehouse** - Mobile camera (internal) - Internal use only - Single lighting condition, 30 fps video frames; indoor, repetitive textures ---- IMC 2025 Training Dataset -------------------------- The primary dataset is from the **Image Matching Challenge 2025** hosted on Kaggle (provided by the Computer Vision Group). **Contents** The training split contains multi-view image collections for a variety of outdoor and heritage scenes including: - Ancient monuments and archaeological sites (e.g., ``dioscuri``, ``cyprus``, ``baalshamin``) - Iconic urban landmarks (e.g., ``taj_mahal``, ``sacre_coeur``, ``trevi_fountain``, ``piazza_san_marco``, ``grand_place_brussels``) - Indoor and mixed scenes (e.g., ``stairs``, ``haiper`` series with bikes, chairs, fountains) - Vineyard and outdoor scenes (``fbk_vineyard``) - Scenes explicitly containing outlier images (``outliers`` sub-scenes) A full list of 34 dataset/scene pairs is defined in ``data/scenes.yaml``. **Labels file** ``data/train_labels.csv`` contains ground-truth rotation matrices and translation vectors for each image, formatted as semicolon-separated values: - ``rotation_matrix`` — 9 floats (row-major 3×3 rotation matrix) - ``translation_vector`` — 3 floats (camera centre in world coordinates) **Thresholds file** ``data/train_thresholds.csv`` defines per-scene angular and translation error thresholds used to compute the mAA metric. Different scenes have different tolerance levels reflecting their physical scale. **Downloading the dataset** .. code-block:: bash kaggle competitions download -c image-matching-challenge-2025 unzip image-matching-challenge-2025.zip -d data/ mv data/image-matching-challenge-2025/* data/ ---- IMC 2025 Test Dataset ---------------------- The test split is provided separately (``data/test/``, 75 files, ~83 MB). It includes scenes emphasising challenging conditions: - **Stairs** — repetitive geometry with few distinctive features; tests robustness of feature matching under ambiguous structure. - **ET-type scenes** — scenes from the ``ETs`` dataset with unusual viewpoints. These scene types were chosen specifically because they expose weaknesses of standard feature matchers, requiring semi-dense matching approaches like MASt3R. ---- Data Versioning ---------------- All datasets are tracked with **DVC**: - ``data/train/`` is tracked by ``data/train.dvc`` - ``data/test/`` is tracked by ``data/test.dvc`` - ``data/train_labels.csv`` is tracked by ``data/train_labels.csv.dvc`` - ``data/train_thresholds.csv`` is tracked by ``data/train_thresholds.csv.dvc`` To download the dataset: .. code-block:: bash bash kaggle competitions download -c image-matching-challenge-20 unzip image-matching-challenge-2025.zip -d data/ mv data/image-matching-challenge-2025/* data/ rm -r data/image-matching-challenge-2025 ---- Preprocessing Assumptions --------------------------- The preprocessing stage (``scripts/image_processing.py``) makes the following assumptions: - Images may contain EXIF orientation metadata; orientations are normalised before matching. - Blurry images (Laplacian variance below ``blurry_threshold`` in ``conf/preprocess.yaml``) are either sharpened or excluded. - All images for a given scene are expected to have reasonable overlap (>20% shared field of view with at least one other image in the scene). ---- Known Biases and Limitations ------------------------------ - The training data is heavily weighted towards **outdoor heritage and landmark scenes**. The model may be less accurate on indoor, industrial, or highly reflective surfaces. - All training images are high quality (DSLR or recent smartphone). Performance may degrade on low-resolution or heavily compressed imagery. - Scenes with **repetitive structures** (stairs, shelving, tiled floors) are systematically harder — the shortlist generator may propose incorrect pairs, and COLMAP may produce disconnected sub-models.