Frequently Asked Questions ========================== ---- General -------- **What does this system actually do?** It takes a collection of photos of a scene (e.g., a building, a room, an archaeological site) and reconstructs a 3D point cloud from them. For each image it also estimates where the camera was positioned and which direction it was pointing. **What kind of images can I use?** The system accepts ``.jpg``, ``.jpeg``, ``.png``, ``.tif``, ``.tiff``, ``.bmp``, and ``.webp`` files, packaged into a single ZIP archive. Images should be taken with a real camera or phone; synthetic renders or heavily edited images may produce poor results. **How many images do I need?** A minimum of 3 images is required to form a reconstruction. In practice, at least 10–20 overlapping images of a scene will give usable results. More images (50–200) generally improve accuracy and coverage. **Do the images need to be ordered?** No. The system handles unordered collections. However, every image must share some visual overlap with at least one other image in the set. ---- Installation & Setup --------------------- **I get "CUDA error" when starting ray-serve. What should I do?** Ensure the NVIDIA Container Toolkit is installed on your host: .. code-block:: bash sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker Then verify GPU access: .. code-block:: bash docker run --rm --gpus all nvidia/cuda:12.6.3-base-ubuntu22.04 nvidia-smi **The ``generate_secrets.sh`` script fails.** Ensure you have ``openssl`` installed (``sudo apt install openssl``). The script generates random secret strings using ``openssl rand``. **I see "AIRFLOW_UID not set" warnings.** Add your user ID to ``.env``: .. code-block:: bash echo "AIRFLOW_UID=$(id -u)" >> .env **Git LFS download fails.** You can download model weights manually using the URLs listed in :doc:`installation`. Place the files in ``extra/pretrained_models/``. ---- Using the API -------------- **My JWT token keeps expiring mid-workflow.** Tokens are valid for 15 minutes. For long-running automation scripts, refresh the token proactively by calling ``POST /auth/token`` before each request, or increase ``JWT_EXPIRY_SECONDS`` in the environment configuration. **I get HTTP 503 on ``/ready``.** The GPU worker is still loading. MASt3R model weights take 1–3 minutes to load into VRAM at container startup. Wait for the ``ray-serve`` healthcheck to pass before sending inference requests. **``POST /upload`` returns HTTP 413.** Your ZIP file exceeds the 500 MB default limit. Either reduce the dataset size or increase ``SCENE3D_MAX_UPLOAD_MB`` in ``docker-compose.yaml``. **How do I run multiple jobs in parallel?** The system is currently configured for one concurrent job (``MAX_CONCURRENT_JOBS=1``) to prevent GPU memory exhaustion. Additional uploads will be queued and processed in order. ---- Reconstruction Quality ----------------------- **My registration rate is below 50%. What went wrong?** Low registration rates are usually caused by one or more of: - Images without sufficient overlap (each image should share at least 20–30% of its view with neighbouring images). - Images that are too blurry (Laplacian variance below threshold). - Scenes with repetitive textures where feature matching produces false positives. - Very few images (fewer than 10 in a connected scene). **Some images appear in the point cloud viewer but others don't.** Images that appear are those successfully registered by COLMAP. Excluded images did not have enough verified matches to determine their pose. Check the ``registration_rate`` value in the Stats Table for the proportion registered. **The point cloud looks very sparse.** The displayed point cloud is voxel-downsampled to at most 500,000 points for browser performance. The original full-density PLY files are available for download and will be much denser. **I uploaded 200 images but got only 1 cluster with 30 images.** COLMAP may have produced multiple disconnected sub-models and selected only the largest. This can happen when images fall into groups with little overlap between them. Try ensuring all images share some visual context, or increase the number of images from each viewpoint. ---- MLOps / DVC / MLflow --------------------- **How do I compare two experiment runs?** Open MLflow at http://localhost:5000, navigate to the ``scene_reconstruction_dvc`` experiment, and select multiple runs to compare. You can plot ``mAA_overall``, ``registration_rate``, and per-dataset metrics side by side. **How is the best config selected?** ``scripts/select_best_run.py`` queries the MLflow tracking server for the run with the highest ``mAA_overall`` metric in the experiment. It copies that run's config YAML to ``conf/best_config.yaml``. The ``ray-serve`` container reads this file at startup. **DVC repro says "nothing changed". How do I force a re-run?** .. code-block:: bash dvc repro --force Or invalidate a specific stage: .. code-block:: bash dvc repro --force run_pipeline **Where are MLflow artifacts stored?** Artifacts are stored in the ``mlflow-artifacts`` Docker volume, mounted at ``/opt/mlflow/artifacts`` inside the ``mlflow`` container. ---- Monitoring & Alerts -------------------- **I'm not receiving drift alert emails.** Check that your SMTP credentials are correctly set in ``.env`` (``SMTP_USER``, ``SMTP_PASSWORD``, ``SMTP_MAIL_FROM``). Verify the Airflow connection is active at http://localhost:8080/connection/list. **Grafana shows "No data" for most panels.** The ``ray-serve`` service must be running (``inference`` Docker Compose profile) for Prometheus to scrape metrics. Start it with: .. code-block:: bash docker compose --profile inference up -d ray-serve **Alertmanager is firing but Airflow retraining DAG is not triggered.** Check Alertmanager logs: ``docker compose logs alertmanager``. Verify the Airflow API is reachable from within the ``mlops_net`` network: .. code-block:: bash docker compose exec alertmanager wget -qO- http://airflow-apiserver:8080/api/v2/monitor/health ---- Data & Drift ------------- **What does "drift detected" mean for my results?** It means the statistical properties of your uploaded images differ from the training dataset. The reconstruction will still run, but accuracy may be lower. High-severity drift triggers an automatic Airflow retraining job. **How do I update the drift baselines?** Re-run the ``eda_baselines`` DVC stage with your new dataset: .. code-block:: bash dvc repro eda_baselines --force This regenerates ``data/baselines/eda_baselines.json``, which the drift monitor uses as the new reference.