Business Understanding

This page defines the problem the system solves, the metrics used to measure success, and the operational targets that govern production deployment.

Problem Statement

Given a set of unordered, unstructured multi-view images captured by handheld phones, warehouse drones, or vehicle-mounted cameras, reconstruct the 3D environment by estimating the camera pose (rotation matrix R and translation vector t) for each image.

This is the core computer vision task of Structure-from-Motion (SfM). Traditional approaches rely on hand-crafted features (SIFT, ORB) and geometric verification (RANSAC). This system replaces the feature extraction and matching steps with state-of-the-art neural networks (MASt3R, ALIKED, DINOv2) to achieve higher accuracy on challenging real-world scenes.

Use Cases

Domain	Application
AR / VR	Reconstruct real environments for immersive mixed-reality experiences
Robotics	Generate scene maps for autonomous navigation and pick-and-place tasks
Autonomous Driving	3D mapping and localisation from dashcam imagery
Cultural Heritage	Digital preservation of historical monuments and artefacts
Surveying & Topography	Generate georeferenced point clouds from drone surveys
Inspection	Structural inspection of infrastructure from photo collections

ML Metric

The primary quality metric is mAA (mean Average Accuracy):

The fraction of camera poses registered within a set of angular and translation error thresholds defined per scene in data/train_thresholds.csv.

A pose is counted as “accurate” if both its rotation error (in degrees) and its translation error (normalised by scene scale) fall below the threshold.

Target: mAA ≥ 50%

The mAA metric is computed by scripts/evaluate.py using the official IMC2025 scoring function and logged to MLflow after every experiment run.

Business and Operational Metrics

In addition to model accuracy, the system must meet the following operational targets:

Metric	Target	Measured By
End-to-end latency per scene (dual GPU)	≤ 5 minutes	`inference_latency_seconds` (Prometheus)
API `/health` response time	≤ 200 ms	Prometheus scrape
Registration rate (images placed in model)	≥ 90%	`registered_images_ratio` (Prometheus)

These metrics are monitored continuously via the Grafana dashboard. Alerts fire in Prometheus and Alertmanager when thresholds are breached, triggering automated retraining or human review.

Data Source

The system is trained and evaluated on the IMC 2025 Kaggle dataset, provided by the Computer Vision Group (CVG). It is supplemented by an internal custom_warehouse dataset for industrial use-case validation.

See Data Sources for full details on dataset composition, biases, and licensing.

Stakeholders

Role	Interest
ML Engineers	Experiment tracking, model improvement, DVC pipeline management
Platform Engineers	Uptime, latency, GPU utilisation, Docker deployments
End Users	Easy upload workflow, fast results, accurate 3D models
Data Scientists	mAA scores, per-dataset breakdowns, drift monitoring

Definition of Done

A model version is considered production-ready when:

mAA_overall ≥ 0.50 on the training evaluation split.
registration_rate ≥ 0.90 on the standard test scenes.
End-to-end inference latency ≤ 5 minutes for a typical 50-image scene.
All Trivy and pip-audit CI checks pass with no CRITICAL/HIGH findings.
The configuration is committed to conf/best_config.yaml and tagged in Git.