Installation Guide

This guide covers two installation paths: the recommended Docker-based setup and the native Python setup for development.

Prerequisites

Docker >= 24.0 and Docker Compose >= 2.20
NVIDIA GPU with CUDA 12.6 support (required for inference)
NVIDIA Container Toolkit installed on the host
Git and Git LFS
At least 16 GB GPU VRAM and 32 GB system RAM recommended

Step 1 — Clone the Repository

git clone https://github.com/your-org/MLOps-Project-ME22B214.git
cd MLOps-Project-ME22B214
git lfs pull          # Downloads pre-trained model weights

Step 2 — Download the Dataset

kaggle competitions download -c image-matching-challenge-2025
unzip image-matching-challenge-2025.zip -d data/
mv data/image-matching-challenge-2025/* data/
rm -r data/image-matching-challenge-2025

The data/ directory should now contain train/, test/, train_labels.csv, and train_thresholds.csv.

Docker Setup (Recommended)

This is the standard production-ready setup. All services run as Docker containers on a shared network (mlops_net).

Step 3a — Configure the Environment File

Run the interactive setup script to generate your .env file:

./setup_env.sh

The script will prompt you for the following values:

Variable	Description
`AIRFLOW_UID`	Your host user ID (run `id -u` to find it)
`DOCKER_GID`	Docker group ID (run `getent group docker \| cut -d: -f3`)
`FERNET_KEY`	Airflow encryption key (generate with `python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"` )
`SMTP_USER`	Gmail address for Airflow email alerts
`SMTP_PASSWORD`	Gmail app password (not your regular password)
`HOST_PROJECT_ROOT`	Absolute path to this repository on the host machine

Step 3b — Generate Docker Secrets

./generate_secrets.sh
chmod 644 ./secrets/*

This creates two files under secrets/:

secrets/jwt_secret — used to sign API JWT tokens
secrets/grafana_admin_password — used for the Grafana admin account

Note

The secrets/ directory is listed in .gitignore and will never be committed.

Step 3c — Generate TLS Certificates

./generate-certs.sh

This creates self-signed certificates under certs/ for nginx TLS termination. For production, replace these with certificates from a trusted CA.

Step 3d — Clone External Dependencies

cd extra/
git clone https://github.com/jenicek/asmk
git clone https://github.com/naver/croco
cd ..

Step 3e — Launch the Full Stack

docker compose --profile inference up --build -d

This starts the following services:

Service	Port	Description
`scene3d-ui`	5173, 443	React + Three.js frontend
`ray-serve`	8000, 8265	FastAPI gateway + GPU inference worker
`mlflow`	5000	Experiment tracking server
`airflow-apiserver`	8080	Airflow web UI and REST API
`prometheus`	9090	Metrics scraping
`grafana`	3001	Monitoring dashboards
`postgres`	(internal)	Airflow metadata database

# Verify all containers are healthy
docker compose ps

Wait for the ray-serve healthcheck to pass — this can take up to 5 minutes as MASt3R model weights are loaded into GPU memory.

Native Python Setup (Developer Mode)

Use this path if you need to develop or debug outside Docker.

Step 3a — Build ASMK

cd extra/
git clone https://github.com/jenicek/asmk
cd asmk/cython/
cythonize *.pyx
cd ..
python -m build --no-isolation
pip install dist/*.whl
cd ../../

Step 3b — Build CroCo / DUSt3R Kernels

DUSt3R relies on RoPE positional embeddings, which require compiled CUDA kernels:

cd extra/
git clone https://github.com/naver/croco.git
cd croco/models/curope/
python -m build --no-isolation
pip install dist/*.whl
cd ../../

Step 3c — Build Remaining Packages

Build any additional packages in bundle/oss/ as .whl files using python -m build --no-isolation in their respective directories, then move the compiled .whl files to bundle/oss/.

Step 3d — Create the Python Virtual Environment

pip install uv
uv venv
source .venv/bin/activate
uv pip install -e .
export LD_LIBRARY_PATH=.venv/lib/python3.11/site-packages/torch/lib:$LD_LIBRARY_PATH

The project requires Python 3.11 exactly (requires-python = "==3.11.*").

Pre-trained Model Weights

Model weights are stored under extra/pretrained_models/ via Git LFS. If you need to download them manually:

Model	Download URL
ALIKED `aliked-n16.pth`	https://github.com/Shiaoming/ALIKED/raw/main/models/aliked-n16.pth
ISC `isc_ft_v107.pth.tar`	https://github.com/lyakaap/ISC21-Descriptor-Track-1st/releases/download/v1.0.1/isc_ft_v107.pth.tar
MASt3R main weights	https://download.europe.naverlabs.com/ComputerVision/MASt3R/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth
MASt3R retrieval weights	https://download.europe.naverlabs.com/ComputerVision/MASt3R/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric_retrieval_trainingfree.pth
MASt3R codebook	https://download.europe.naverlabs.com/ComputerVision/MASt3R/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric_retrieval_codebook.pkl

Verifying the Installation

Once all services are running, verify the stack is healthy:

# API health check
curl http://localhost:8000/health

# GPU worker readiness
curl http://localhost:8000/ready

# Obtain a JWT token
curl -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "admin"}'

A successful /health response looks like:

{
  "status": "ok",
  "version": "2.0.0",
  "timestamp": 1714300000.0
}

Troubleshooting Installation

``ray-serve`` container exits immediately: Check that the NVIDIA Container Toolkit is installed and that docker run --gpus all nvidia/cuda:12.6.3-base-ubuntu22.04 nvidia-smi succeeds.
Port conflicts: If ports 8000, 5000, or 8080 are in use on your host, edit the ports: mappings in docker-compose.yaml before launching.
Airflow DB migration fails: Ensure postgres is healthy before running airflow-init: docker compose logs postgres.
Git LFS quota exceeded: Download model weights manually using the URLs above and place them under extra/pretrained_models/.