Setting Up QGIS for Archaeological Surveys

Establishing a reproducible QGIS environment for archaeological fieldwork requires strict adherence to spatial data governance, automated validation, and compliance-ready metadata. This cluster operationalizes the foundational principles outlined in Heritage GIS Architecture & Fundamentals by delivering a field-tested configuration pipeline. The workflow targets archaeologists, heritage managers, Python GIS developers, and academic research teams who require deterministic data capture, automated QA/QC, and seamless integration with archival standards.

1. Environment Initialization & Dependency Pinning

Archaeological survey pipelines fail when environment drift introduces silent coordinate shifts or plugin incompatibilities. A production-ready setup isolates dependencies, pins versions to Long Term Release (LTR) baselines, and pre-configures the QGIS Python interpreter for headless batch processing.

Pinned Stack:

  • QGIS 3.34.x LTR (Prizren)
  • Python 3.10+ (bundled or isolated venv)
  • GDAL/OGR 3.6.4+
  • PyQGIS API v3.34
  • QFieldSync 4.x / Mergin Maps 2024.x

Initialize the survey workspace with a deterministic directory tree that enforces strict pipeline routing:

/archaeo_survey/
├── 01_raw_data/          # Unaltered field captures, drone exports, legacy CAD
├── 02_processed/         # Topology-cleaned, CRS-normalized, attribute-validated
├── 03_validation_logs/   # Automated QA/QC reports, transformation audit trails
├── 04_metadata/          # ISO 19115 / Dublin Core XML, provenance manifests
└── 05_qfield_sync/       # Offline packages, delta logs, conflict resolution

The following PyQGIS initialization script establishes baseline project settings, disables on-the-fly CRS guessing, registers a custom message log for audit trails, and pins the project to a deterministic coordinate reference system:

from qgis.core import (
    QgsProject, QgsCoordinateReferenceSystem, QgsMessageLog, QgsSettings
)
import os
import logging

def initialize_survey_project(project_dir: str, target_epsg: int = 27700) -> bool:
    """Initialize QGIS project with strict survey parameters and audit logging."""
    try:
        project = QgsProject.instance()
        project.setFileName(os.path.join(project_dir, "archaeo_survey.qgz"))
        
        # Disable automatic CRS guessing to prevent field data corruption
        QgsSettings().setValue("/Projections/defaultBehaviour", "useProject")
        
        # Set project CRS explicitly
        crs = QgsCoordinateReferenceSystem(f"EPSG:{target_epsg}")
        if not crs.isValid():
            raise ValueError(f"Invalid target CRS: EPSG:{target_epsg}")
        project.setCrs(crs)
        
        # Configure message log for automated QA
        QgsMessageLog.logMessage(
            f"Project initialized. Target CRS: EPSG:{target_epsg}", 
            "ArchaeoSurveySetup", 
            level=QgsMessageLog.INFO
        )
        return True
    except Exception as e:
        QgsMessageLog.logMessage(f"Initialization failed: {str(e)}", "ArchaeoSurveySetup", level=QgsMessageLog.CRITICAL)
        return False

2. Pipeline Routing & Data Flow Gates

Archaeological data moves through a linear, gate-controlled pipeline. Each stage must complete before advancing to prevent compounding errors.

  1. Ingestion (01_raw_data/): Raw GNSS logs, RTK corrections, photogrammetry point clouds, and legacy shapefiles are ingested without modification.
  2. Normalization (02_processed/): Data is projected to the project CRS. Attribute schemas are enforced via QgsVectorLayer field validators. Topology errors (slivers, self-intersections) are resolved using QgsGeometry methods.
  3. Validation (03_validation_logs/): Automated scripts run against processed layers. Failures generate JSON/CSV logs. Only layers passing all gates are promoted.
  4. Metadata Generation (04_metadata/): Provenance, coordinate accuracy, and collection methodology are serialized to XML.
  5. Sync (05_qfield_sync/): Validated layers are packaged for offline field deployment. Delta changes are version-controlled and merged post-survey.

Routing is enforced via a deterministic Makefile or Python orchestrator that checks 03_validation_logs/ for status: PASS before triggering downstream tasks.

3. CRS Configuration & Transformation Validation

Archaeological features frequently span multiple coordinate systems due to legacy Ordnance Survey maps, drone photogrammetry outputs, and international collaboration requirements. Hardcoding transformation parameters without validation introduces positional drift that compromises stratigraphic integrity and spatial relationships. Detailed guidance on selecting appropriate datums and grid shifts is available in CRS Selection for Heritage Sites.

Before ingesting external datasets, validate transformation paths using the official EPSG Geodetic Parameter Registry to confirm grid file availability (e.g., OSTN15 for UK, NADCON for North America). The following validation routine ensures deterministic transformations:

from qgis.core import QgsCoordinateTransform, QgsCoordinateReferenceSystem

def validate_transformation(source_epsg: int, target_epsg: int) -> bool:
    """Verify that a valid transformation path exists between source and target CRS."""
    src = QgsCoordinateReferenceSystem(f"EPSG:{source_epsg}")
    tgt = QgsCoordinateReferenceSystem(f"EPSG:{target_epsg}")
    
    if not src.isValid() or not tgt.isValid():
        return False
        
    transform = QgsCoordinateTransform(src, tgt, QgsProject.instance())
    if not transform.isValid():
        QgsMessageLog.logMessage(
            f"No valid transformation path: EPSG:{source_epsg} -> EPSG:{target_epsg}",
            "CRSValidator",
            level=QgsMessageLog.WARNING
        )
        return False
        
    # Check for datum transformation warnings (grid shifts)
    if transform.sourceDatumTransformId() == -1 and transform.destinationDatumTransformId() == -1:
        QgsMessageLog.logMessage(
            "No datum shift applied. Verify if NTv2/NADCON grids are required.",
            "CRSValidator",
            level=QgsMessageLog.INFO
        )
    return True

Always project to a metric system for spatial analysis (e.g., EPSG:27700 for UK National Grid, EPSG:32633 for WGS 84 / UTM zone 33N) and retain EPSG:4326 strictly for web publishing and GPS exchange.

4. Automated QA/QC & Metadata Compliance

Field-collected spatial data must survive archival review. Automated QA/QC pipelines enforce attribute completeness, topology rules, and coordinate precision thresholds. Metadata generation follows established frameworks documented in Metadata Standards for Archaeological Data.

The validation gate below checks for null geometries, empty mandatory fields, and CRS mismatches before promoting data to the archive:

def run_qa_gate(layer: QgsVectorLayer, required_fields: list[str]) -> dict:
    """Execute automated QA checks and return validation report."""
    report = {"layer": layer.name(), "passed": True, "errors": []}
    
    if not layer.isValid():
        report["passed"] = False
        report["errors"].append("Layer invalid or inaccessible")
        return report
        
    if layer.crs().authid() != QgsProject.instance().crs().authid():
        report["passed"] = False
        report["errors"].append(f"CRS mismatch: {layer.crs().authid()} != Project CRS")
        
    for feat in layer.getFeatures():
        if feat.geometry().isNull():
            report["passed"] = False
            report["errors"].append(f"Null geometry at feature ID {feat.id()}")
            break
        for field in required_fields:
            if feat[field] is None or str(feat[field]).strip() == "":
                report["passed"] = False
                report["errors"].append(f"Missing required field '{field}' at ID {feat.id()}")
                break
                
    return report

Metadata should be exported using ISO 19115-compliant XML or Dublin Core formats. The QGIS Metadata Editor can be scripted via QgsProject to auto-populate lineage, accuracy, and temporal coverage fields.

5. Field-to-Archive Synchronization & Version Control

Once validation passes, data is packaged for offline deployment or archival submission. Use QFieldSync or Mergin Maps to generate delta-sync packages. Store spatial data in GeoPackage (.gpkg) format to maintain ACID compliance and embedded metadata. For version control, pair Git with Git LFS for binary spatial assets, and maintain a CHANGELOG.md tracking CRS adjustments, attribute schema migrations, and validation gate updates.

Deterministic exports should use QgsVectorFileWriter with explicit encoding (UTF-8), geometry type enforcement, and layer options:

options = QgsVectorFileWriter.SaveVectorOptions()
options.driverName = "GPKG"
options.fileEncoding = "UTF-8"
options.actionOnExistingFile = QgsVectorFileWriter.CreateOrOverwriteLayer
QgsVectorFileWriter.writeAsVectorFormatV3(layer, "02_processed/validated_features.gpkg", QgsProject.instance().transformContext(), options)

This configuration guarantees that archaeological survey data remains spatially rigorous, audit-ready, and fully reproducible across research teams and institutional archives.