Frequently Asked Questions
General
- What does this system actually do?
It takes a collection of photos of a scene (e.g., a building, a room, an archaeological site) and reconstructs a 3D point cloud from them. For each image it also estimates where the camera was positioned and which direction it was pointing.
- What kind of images can I use?
The system accepts
.jpg,.jpeg,.png,.tif,.tiff,.bmp, and.webpfiles, packaged into a single ZIP archive. Images should be taken with a real camera or phone; synthetic renders or heavily edited images may produce poor results.- How many images do I need?
A minimum of 3 images is required to form a reconstruction. In practice, at least 10–20 overlapping images of a scene will give usable results. More images (50–200) generally improve accuracy and coverage.
- Do the images need to be ordered?
No. The system handles unordered collections. However, every image must share some visual overlap with at least one other image in the set.
Installation & Setup
- I get “CUDA error” when starting ray-serve. What should I do?
Ensure the NVIDIA Container Toolkit is installed on your host:
sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker
Then verify GPU access:
docker run --rm --gpus all nvidia/cuda:12.6.3-base-ubuntu22.04 nvidia-smi
- The ``generate_secrets.sh`` script fails.
Ensure you have
opensslinstalled (sudo apt install openssl). The script generates random secret strings usingopenssl rand.- I see “AIRFLOW_UID not set” warnings.
Add your user ID to
.env:echo "AIRFLOW_UID=$(id -u)" >> .env
- Git LFS download fails.
You can download model weights manually using the URLs listed in Installation Guide. Place the files in
extra/pretrained_models/.
Using the API
- My JWT token keeps expiring mid-workflow.
Tokens are valid for 15 minutes. For long-running automation scripts, refresh the token proactively by calling
POST /auth/tokenbefore each request, or increaseJWT_EXPIRY_SECONDSin the environment configuration.- I get HTTP 503 on ``/ready``.
The GPU worker is still loading. MASt3R model weights take 1–3 minutes to load into VRAM at container startup. Wait for the
ray-servehealthcheck to pass before sending inference requests.- ``POST /upload`` returns HTTP 413.
Your ZIP file exceeds the 500 MB default limit. Either reduce the dataset size or increase
SCENE3D_MAX_UPLOAD_MBindocker-compose.yaml.- How do I run multiple jobs in parallel?
The system is currently configured for one concurrent job (
MAX_CONCURRENT_JOBS=1) to prevent GPU memory exhaustion. Additional uploads will be queued and processed in order.
Reconstruction Quality
- My registration rate is below 50%. What went wrong?
Low registration rates are usually caused by one or more of:
Images without sufficient overlap (each image should share at least 20–30% of its view with neighbouring images).
Images that are too blurry (Laplacian variance below threshold).
Scenes with repetitive textures where feature matching produces false positives.
Very few images (fewer than 10 in a connected scene).
- Some images appear in the point cloud viewer but others don’t.
Images that appear are those successfully registered by COLMAP. Excluded images did not have enough verified matches to determine their pose. Check the
registration_ratevalue in the Stats Table for the proportion registered.- The point cloud looks very sparse.
The displayed point cloud is voxel-downsampled to at most 500,000 points for browser performance. The original full-density PLY files are available for download and will be much denser.
- I uploaded 200 images but got only 1 cluster with 30 images.
COLMAP may have produced multiple disconnected sub-models and selected only the largest. This can happen when images fall into groups with little overlap between them. Try ensuring all images share some visual context, or increase the number of images from each viewpoint.
MLOps / DVC / MLflow
- How do I compare two experiment runs?
Open MLflow at http://localhost:5000, navigate to the
scene_reconstruction_dvcexperiment, and select multiple runs to compare. You can plotmAA_overall,registration_rate, and per-dataset metrics side by side.- How is the best config selected?
scripts/select_best_run.pyqueries the MLflow tracking server for the run with the highestmAA_overallmetric in the experiment. It copies that run’s config YAML toconf/best_config.yaml. Theray-servecontainer reads this file at startup.- DVC repro says “nothing changed”. How do I force a re-run?
dvc repro --force
Or invalidate a specific stage:
dvc repro --force run_pipeline
- Where are MLflow artifacts stored?
Artifacts are stored in the
mlflow-artifactsDocker volume, mounted at/opt/mlflow/artifactsinside themlflowcontainer.
Monitoring & Alerts
- I’m not receiving drift alert emails.
Check that your SMTP credentials are correctly set in
.env(SMTP_USER,SMTP_PASSWORD,SMTP_MAIL_FROM). Verify the Airflow connection is active at http://localhost:8080/connection/list.- Grafana shows “No data” for most panels.
The
ray-serveservice must be running (inferenceDocker Compose profile) for Prometheus to scrape metrics. Start it with:docker compose --profile inference up -d ray-serve
- Alertmanager is firing but Airflow retraining DAG is not triggered.
Check Alertmanager logs:
docker compose logs alertmanager. Verify the Airflow API is reachable from within themlops_netnetwork:docker compose exec alertmanager wget -qO- http://airflow-apiserver:8080/api/v2/monitor/health
Data & Drift
- What does “drift detected” mean for my results?
It means the statistical properties of your uploaded images differ from the training dataset. The reconstruction will still run, but accuracy may be lower. High-severity drift triggers an automatic Airflow retraining job.
- How do I update the drift baselines?
Re-run the
eda_baselinesDVC stage with your new dataset:dvc repro eda_baselines --force
This regenerates
data/baselines/eda_baselines.json, which the drift monitor uses as the new reference.