How to Model Stratigraphic Relationships in PostGIS: Debugging Topological Cycles and Automating Harris Matrix Validation
Stratigraphic modeling in heritage databases requires translating Harris Matrix logic into a spatially aware, relational schema. When excavation units, interfaces, and deposits are digitized as polygons and linework, the primary failure mode is topological inconsistency: overlapping contexts, orphaned stratigraphic edges, or circular dependencies that break matrix generation and violate archaeological recording standards. This guide provides exact diagnostic steps, fallback routing, and reproducible test cases for PostGIS implementations targeting archaeological and heritage GIS workflows.
1. Normalized Schema Architecture & Constraint Enforcement
A robust stratigraphic model separates physical geometry from relational topology. Begin with a normalized contexts table containing context_id (UUID), geom (POLYGON or MULTIPOLYGON), context_type (ENUM('deposit','cut','fill','interface','structure')), and recording_date. Stratigraphic precedence is stored in a directed adjacency table stratigraphic_edges with superior_id, inferior_id, relation_type, and validation_flag.
To prevent circular dependencies at the database level, implement application-side DAG validation alongside database constraints. For foundational Artifact & Feature Spatial Database Design, enforce referential integrity with foreign keys and add a CHECK (superior_id <> inferior_id) constraint. Use EXCLUDE USING gist (geom WITH &&) to block overlapping polygon ingestion at the transaction level, ensuring spatial exclusivity matches stratigraphic exclusivity. All geometry must be stored in a projected metric CRS (e.g., EPSG:32633 or EPSG:27700) to guarantee deterministic tolerance calculations.
CREATE TABLE contexts (
context_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
geom GEOMETRY(POLYGON, 32633) NOT NULL,
context_type VARCHAR(20) CHECK (context_type IN ('deposit','cut','fill','interface','structure')),
recording_date DATE NOT NULL
);
CREATE TABLE stratigraphic_edges (
edge_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
superior_id UUID REFERENCES contexts(context_id) ON DELETE CASCADE,
inferior_id UUID REFERENCES contexts(context_id) ON DELETE CASCADE,
relation_type VARCHAR(20) CHECK (relation_type IN ('over','under','contemporary','correlated')),
validation_flag BOOLEAN DEFAULT FALSE,
CHECK (superior_id <> inferior_id)
);
CREATE INDEX idx_contexts_geom ON contexts USING GIST (geom);
CREATE INDEX idx_edges_superior ON stratigraphic_edges (superior_id);
CREATE INDEX idx_edges_inferior ON stratigraphic_edges (inferior_id);
2. Three-Tier Spatial & Topological Diagnostic Routine
When matrix generation fails or field supervisors report inverted sequences, execute a three-tier diagnostic routine. All spatial tolerances below assume a metric CRS and are calibrated for high-precision RTK/Total Station survey data (±0.005 m).
2.1 Topological Overlap Audit
Overlapping deposits violate the Law of Superposition. Query for intersections exceeding a minimum threshold to ignore sliver polygons caused by digitization drift.
SELECT a.context_id AS id_a, b.context_id AS id_b,
ST_Area(ST_Intersection(a.geom, b.geom)) AS overlap_sqm
FROM contexts a
JOIN contexts b ON a.context_id < b.context_id
WHERE ST_Overlaps(a.geom, b.geom)
AND ST_Area(ST_Intersection(a.geom, b.geom)) > 0.001;
Any result with overlap_sqm > 0.001 indicates a digitization error or misaligned interface. Flag these records immediately and route to /srv/heritage_gis/logs/overlap_audit.csv for manual correction.
2.2 Directed Acyclic Graph (DAG) Cycle Detection
Circular references invalidate Harris Matrix generation. Use a recursive CTE to trace paths and isolate cycles.
WITH RECURSIVE cycle_check AS (
SELECT
superior_id AS start_node,
inferior_id AS current_node,
ARRAY[superior_id, inferior_id] AS path,
FALSE AS is_cycle
FROM stratigraphic_edges
UNION ALL
SELECT
cc.start_node,
e.inferior_id,
cc.path || e.inferior_id,
e.inferior_id = ANY(cc.path)
FROM stratigraphic_edges e
JOIN cycle_check cc ON e.superior_id = cc.current_node
WHERE NOT cc.is_cycle
AND array_length(cc.path, 1) < 50 -- Prevent infinite recursion
)
SELECT path, start_node, current_node
FROM cycle_check
WHERE is_cycle = TRUE;
If cycles exist, the stratigraphic sequence is logically invalid. Quarantine the affected edge_id records by setting validation_flag = FALSE before any matrix export.
2.3 Spatial Adjacency vs. Recorded Relation Mismatch
Stratigraphic edges must reflect physical adjacency. A mismatch occurs when a recorded over/under relationship lacks spatial contact within tolerance.
SELECT e.edge_id, e.superior_id, e.inferior_id,
ST_Distance(a.geom, b.geom) AS separation_m
FROM stratigraphic_edges e
JOIN contexts a ON e.superior_id = a.context_id
JOIN contexts b ON e.inferior_id = b.context_id
WHERE ST_Distance(a.geom, b.geom) > 0.015 -- Tolerance threshold
AND e.relation_type IN ('over', 'under');
Records exceeding 0.015 m separation require field verification or spatial snapping via ST_Snap(geom, target_geom, 0.01).
3. Automating Harris Matrix Validation via Python DAG Pipelines
Field-to-database synchronization requires automated validation before matrix rendering. The following Python pipeline uses psycopg2 for database connectivity and networkx for topological verification. Save to /srv/heritage_gis/scripts/validate_harris.py.
import psycopg2
import networkx as nx
from pathlib import Path
DB_CONFIG = {
"host": "localhost",
"dbname": "heritage_excavation",
"user": "gis_admin",
"password": "secure_passphrase"
}
def build_and_validate_dag():
conn = psycopg2.connect(**DB_CONFIG)
cur = conn.cursor()
cur.execute("SELECT superior_id, inferior_id FROM stratigraphic_edges WHERE validation_flag = TRUE")
edges = cur.fetchall()
G = nx.DiGraph()
G.add_edges_from(edges)
# Detect cycles
cycles = list(nx.simple_cycles(G))
if cycles:
print(f"CRITICAL: {len(cycles)} stratigraphic cycles detected. Halting matrix export.")
for c in cycles:
print(f" -> Cycle path: {' -> '.join(c)}")
return False
# Verify topological sort exists (valid Harris sequence)
try:
sequence = list(nx.topological_sort(G))
print(f"VALID DAG: {len(sequence)} contexts sequenced successfully.")
Path("/srv/heritage_gis/exports/harris_sequence.txt").write_text("\n".join(sequence))
return True
except nx.NetworkXUnfeasible:
print("ERROR: Graph contains undetected cycles or disconnected components.")
return False
finally:
cur.close()
conn.close()
if __name__ == "__main__":
build_and_validate_dag()
This pipeline integrates seamlessly with workflows for Automating Artifact Attribute Synchronization, ensuring context-level metadata propagates correctly before matrix serialization.
4. Query Optimization, Tolerance Calibration & Data Integrity Protocols
Large excavation datasets degrade query performance without explicit indexing and tolerance management. Implement BRIN indexes for temporal filtering and maintain GiST for spatial joins. Review execution plans using EXPLAIN (ANALYZE, BUFFERS) and enforce work_mem = '256MB' for recursive CTE operations. For comprehensive Query Optimization for Large Excavation Datasets, partition the contexts table by recording_date or trench_id when exceeding 500,000 records.
Spatial tolerance must be explicitly defined per project phase. Early trenching may require 0.05 m snapping, while micro-stratigraphic sampling demands 0.002 m. Store tolerance parameters in a project configuration table to prevent hard-coded drift across scripts.
Implement Emergency Data Freeze & Recovery Protocols before bulk matrix regeneration. Set transaction isolation to REPEATABLE READ during validation runs to prevent phantom reads. Maintain WAL archiving at /var/lib/postgresql/data/pg_wal/ and execute pg_dump --format=custom --compress=9 -d heritage_excavation -f /srv/backups/stratigraphy_$(date +%F).dump prior to schema migrations.
By enforcing strict spatial tolerances, isolating topological cycles at ingestion, and automating DAG validation, heritage teams can maintain mathematically sound Harris Matrices that withstand peer review and long-term archival standards.