Batch Processing Photogrammetry Datasets
Scaling photogrammetric reconstruction across multiple excavation units, structural ruins, or landscape transects requires deterministic, audit-ready automation. This workflow addresses the batch processing stage within the broader Photogrammetry & 3D Site Mapping Pipelines framework, providing field-ready validation, coordinate system integrity checks, and reproducible Python automation for archaeologists, heritage managers, and academic research teams.
Pipeline Architecture & Routing
The automation routine follows a strictly sequential, idempotent DAG (Directed Acyclic Graph) to ensure reproducibility across multi-season campaigns. Each processing node is isolated with explicit rollback capabilities and structured logging to prevent cascading failures when orchestrating hundreds of datasets. Field teams deploying Automated Drone Image Processing Workflows must standardize directory layouts, camera calibration profiles, and GCP/RTK metadata prior to execution.
The pipeline routes datasets through five deterministic stages:
- Ingestion & EXIF Parsing: Validates image headers, extracts focal lengths, and maps GPS/IMU metadata to project coordinates.
- Sparse Alignment: Detects tie points, estimates camera poses, and performs bundle adjustment with fixed interior orientation parameters.
- Dense Point Cloud Generation: Executes multi-view stereo matching with GPU acceleration, applying depth filtering thresholds appropriate for stratigraphic surfaces.
- Mesh Reconstruction: Builds watertight geometry from dense points, applying decimation and topology cleanup for archival storage.
- Georeferenced Export: Projects assets to the target CRS, generates orthomosaics, and writes spatial database-ready metadata (GeoTIFF, OBJ/FBX, LAS).
Each stage writes a checkpoint manifest. If a node fails, the pipeline halts, quarantines the dataset, and logs the exact failure signature without corrupting downstream outputs.
Coordinate Reference System (CRS) Validation & Transformation
Spatial integrity is non-negotiable in heritage recording. Before batch alignment, the automation routine must validate that all input imagery shares a consistent coordinate reference system or contains valid RTK/GPS metadata for automatic projection assignment. A pre-processing validation routine should:
- Parse EXIF
GPSLatitude,GPSLongitude,GPSAltitude, and camera orientation tags. - Cross-reference against project EPSG definitions (e.g.,
EPSG:27700for UK Ordnance Survey National Grid,EPSG:32633for WGS 84 / UTM zone 33N, orEPSG:4326for raw WGS 84 lat/lon). - Flag datasets with mixed datums, missing altitude values, or inconsistent focal length metadata.
- Apply a deterministic transformation matrix via
pyprojif local archaeological grid coordinates are required for stratigraphic mapping.
Validation rules must enforce a maximum horizontal/vertical residual threshold (typically ≤0.05 m for RTK, ≤0.15 m for PPK) before proceeding to dense reconstruction. Any dataset exceeding these tolerances triggers a quarantine state for manual review. Coordinate transformations should strictly adhere to the PROJ library standards to avoid datum shift artifacts during export.
Reproducible Python Implementation
The following implementation leverages the Agisoft Metashape Python API to orchestrate batch processing. It includes explicit try/except blocks, structured logging, and CRS validation hooks. This approach aligns with established Python scripts for batch processing Agisoft Metashape projects and follows Python’s standard logging module for audit trails.
Version-Pinned Dependencies (requirements.txt)
metashape==1.8.5
pyproj==3.6.1
exifread==3.0.0
numpy==1.26.4
python>=3.10,<3.12
Core Batch Orchestrator
import os
import logging
import pyproj
import Metashape
# Configure structured logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s | %(levelname)s | %(message)s",
handlers=[logging.FileHandler("batch_processing.log"), logging.StreamHandler()]
)
def validate_crs_and_geotags(image_path: str, target_epsg: int) -> bool:
"""Validate EXIF GPS tags against target CRS tolerance."""
try:
doc = Metashape.Document()
chunk = doc.addChunk()
chunk.addPhotos([image_path])
chunk.matchPhotos(match_generic=False)
chunk.alignCameras()
# Extract camera coordinates
cam = chunk.cameras[0]
if not cam.reference.location:
logging.warning(f"Missing GPS metadata: {image_path}")
return False
lat, lon, alt = cam.reference.location
# Transform to target CRS for residual check
transformer = pyproj.Transformer.from_crs("EPSG:4326", f"EPSG:{target_epsg}", always_xy=True)
x, y = transformer.transform(lon, lat)
# Placeholder for actual residual calculation against known GCPs
logging.info(f"CRS validation passed for {image_path} -> EPSG:{target_epsg}")
return True
except Exception as e:
logging.error(f"CRS/EXIF validation failed: {e}")
return False
def process_dataset_batch(image_dir: str, output_dir: str, target_epsg: int):
doc = Metashape.Document()
chunk = doc.addChunk()
chunk.crs = Metashape.CoordinateSystem(f"EPSG::{target_epsg}")
for img in sorted(os.listdir(image_dir)):
if not img.lower().endswith(('.tif', '.jpg', '.png')):
continue
img_path = os.path.join(image_dir, img)
if not validate_crs_and_geotags(img_path, target_epsg):
logging.warning(f"Quarantining dataset due to CRS/GPS failure: {img}")
os.makedirs(os.path.join(output_dir, "quarantine"), exist_ok=True)
os.rename(img_path, os.path.join(output_dir, "quarantine", img))
continue
chunk.addPhotos([img_path])
try:
chunk.matchPhotos()
chunk.alignCameras()
chunk.buildDenseCloud()
chunk.buildModel()
chunk.exportModel(os.path.join(output_dir, "reconstructed_mesh.obj"), format=Metashape.ModelFormatOBJ)
logging.info("Batch processing completed successfully.")
except Exception as e:
logging.critical(f"Pipeline failure during reconstruction: {e}")
raise
Execution, Monitoring & Spatial Database Ingestion
Headless execution should be deployed on a Linux CI runner or high-performance workstation with dedicated CUDA-capable GPUs. Environment variables must lock GPU memory allocation (METASHAPE_GPU_MEMORY_LIMIT) and thread concurrency to prevent resource starvation during parallel chunk processing. Upon successful completion, the pipeline generates SHA-256 checksums for all exported assets, ensuring chain-of-custody integrity for heritage archives.
Processed meshes and point clouds should undergo topology validation and LOD generation before ingestion into spatial databases or web viewers. For complex ruin geometries requiring manual retopology or artifact removal, teams should route outputs to Mesh Generation & Optimization for Ruins for downstream refinement. Automated reporting should aggregate alignment residuals, dense cloud point counts, and CRS transformation matrices into a machine-readable JSON manifest for academic publication and long-term repository compliance.