Eliminating SPI from RI Triggers: A Fast Path for Foreign Key Checks
Introduction
Referential Integrity (RI) triggers in PostgreSQL traditionally execute SQL queries via SPI (Server Programming Interface) to verify that inserted or updated rows in a referencing table have matching rows in the referenced (primary key) table. For bulk operations—large INSERT or UPDATE statements—this means starting and tearing down a full executor plan for each row, with significant overhead from ExecutorStart() and ExecutorEnd().
Amit Langote has been working on eliminating this overhead by performing RI checks as direct index probes instead of SQL plans. The latest iteration of this work, "Eliminating SPI / SQL from some RI triggers - take 3," achieves up to 57% speedup for bulk foreign key checks by bypassing the SPI executor and calling the index access method directly when the constraint semantics allow it.
The patch set has evolved through several versions, with Junwang Zhao joining the effort in late 2025. The current direction is a hybrid fast-path + fallback design: use a direct index probe for straightforward cases, and fall back to the existing SPI path when correctness requires executor behavior that would be difficult or risky to replicate.
Why This Matters
Foreign key constraints are ubiquitous. Every INSERT or UPDATE into a referencing table triggers RI checks that must verify each new or modified row against the referenced table's primary key. With the traditional approach:
CREATE TABLE pk (a int PRIMARY KEY);
CREATE TABLE fk (a int REFERENCES pk);
INSERT INTO pk SELECT generate_series(1, 1000000);
INSERT INTO fk SELECT generate_series(1, 1000000); -- 1M RI checks
Each of the 1 million inserts triggers an RI check that:
- Builds a query plan to scan the PK index.
- Runs
ExecutorStart()andExecutorEnd(). - Executes the plan to find (or not find) the matching row.
This per-row plan setup/teardown dominates the cost. With Amit's v3 patches, the same bulk insert drops from ~1000 ms to ~432 ms (57% faster) on his benchmark machine—by probing the PK index directly without going through the executor.
Technical Background
The Traditional RI Path
RI trigger functions in ri_triggers.c (e.g. RI_FKey_check) call ri_PerformCheck(), which:
- Builds an SQL string for a query like
SELECT 1 FROM pk WHERE pk.a = $1. - Uses
SPI_prepareandSPI_execute_planto run it. - The executor performs an index scan on the PK, returning a row if the referenced value exists.
This works correctly for all cases—partitioned tables, temporal foreign keys, concurrent updates—but pays the full plan-execution cost per row.
The Fast-Path Idea
For simple foreign keys (non-partitioned referenced table, non-temporal semantics), the check is conceptually: "probe the PK index for this value; if found and lockable, the check passes." That can be done by:
- Opening the PK relation and its unique index.
- Building a scan key from the FK column values.
- Calling
index_getnext()(or equivalent) to find the tuple. - Locking it with
LockTupleKeyShareunder the current snapshot.
No SQL, no plan, no executor. Just a direct index probe and tuple lock.
Patch Evolution
v1: The Original Approach (December 2024)
The first patch set (3 patches) introduced:
- 0001: Refactoring of the
PartitionDescinterface to explicitly pass the snapshot needed foromit_detachedvisibility (detach-pending partitions). This addressed a bug where PK lookups could return incorrect results underREPEATABLE READbecausefind_inheritance_children()'s visibility of detach-pending partitions depended onActiveSnapshot, which RI lookups were manipulating. - 0002: Avoid using SPI in RI trigger functions by introducing a direct index probe path.
- 0003: Avoid using an SQL query for some RI checks—the main performance optimization.
Amit noted that temporal foreign key queries would remain on the SPI path, as their plans involve range overlap and aggregation and are not amenable to a simple index probe. He also added an equivalent of EvalPlanQual() for the new path to handle concurrent updates correctly under READ COMMITTED.
v2: Junwang's Hybrid Fast Path (December 2025)
Junwang Zhao took the work forward with a hybrid design:
- 0001: Add fast path for foreign key constraint checks. Applies when the referenced table is not partitioned and the constraint does not involve temporal semantics.
- 0002: Cache fast-path metadata (operator hash entries, operator OIDs, strategy numbers, subtypes). At that stage, the metadata cache did not yet improve performance.
Benchmarks (1M rows, numeric PK / bigint FK):
- Head: INSERT 13.5s, UPDATE 15s
- Patched: INSERT 8.2s, UPDATE 10.1s
v3: Amit's Rework with Per-Statement Caching (February 2026)
Amit reworked Junwang's patches into two patches:
- 0001: Functionally complete fast path. Includes concurrency handling,
REPEATABLE READcrosscheck, cross-type operators, security context (RLS/ACL), and metadata caching. Most logic lives inri_FastPathCheck();RI_FKey_checkjust gates the call and falls back to SPI when needed. - 0002: Per-statement resource caching. Instead of sharing
EStatebetweentrigger.candri_triggers.c, a new AfterTriggerBatchCallback mechanism fires at the end of each trigger-firing cycle. It allows caching the PK relation, index, scan descriptor, and snapshot across all FK trigger invocations within a single cycle, rather than opening and closing them per row.
Benchmarks on Amit's machine:
| Scenario | Master | 0001 | 0001+0002 |
|---|---|---|---|
| 1M rows, numeric/bigint | 2444 ms | 1382 ms (43% faster) | 1202 ms (51% faster) |
| 1M rows, int/int | 1000 ms | 520 ms (48% faster) | 432 ms (57% faster) |
The incremental gain from 0002 (~13–17%) comes from eliminating per-row relation open/close, scan begin/end, slot allocation/free, and replacing per-row GetSnapshotData() with a snapshot copy in the cache.
Design: When to Use Fast Path vs. SPI
The fast path applies when:
- The referenced table is not partitioned.
- The constraint does not involve temporal semantics (range overlap,
range_agg(), etc.). - Multi-column keys, cross-type equality (via index opfamily), collation matching, and RLS/ACL are all handled directly in the fast path.
The code falls back to SPI when:
- Concurrent updates or deletes: If
table_tuple_lock()reports that the target tuple was updated or deleted, the code delegates to SPI so thatEvalPlanQualand visibility rules apply as today. - Partitioned referenced tables: Require routing the probe through the correct partition via
PartitionDirectory. Can be added later as a separate patch. - Temporal foreign keys: Use range overlap and containment semantics that inherently involve aggregation; they stay on the SPI path.
Security behavior mirrors the existing SPI path: the fast path temporarily switches to the parent table's owner with SECURITY_LOCAL_USERID_CHANGE | SECURITY_NOFORCE_RLS around the probe, matching ri_PerformCheck().
Future Directions
David Rowley suggested off-list that batching multiple FK values into a single index probe could further improve performance, leveraging the ScalarArrayOp btree improvements from PostgreSQL 17. The idea: buffer FK values across trigger invocations in the per-constraint cache, build a SK_SEARCHARRAY scan key, and let the btree AM traverse matching leaf pages in one sorted pass instead of one tree descent per row. Locking and recheck would remain per-tuple. This could be explored as a separate patch on top of the current series.
Current Status
- The series is in PG19-Drafts. Amit moved it there in October 2025; Junwang Zhao is continuing the work.
- Amit's v3 patches (February 2026) are in reasonable shape and ready for review. He welcomes feedback, especially on concurrency handling in
ri_LockPKTuple()and the snapshot lifecycle in 0002. - Pavel Stehule has offered to help with testing and review.
Conclusion
Eliminating SPI from RI triggers for simple foreign key checks yields substantial performance gains for bulk operations. The hybrid fast-path + fallback design addresses reviewer concerns about correctness by deferring to SPI whenever executor behavior is non-trivial to replicate. The per-statement resource caching in v3 adds a second layer of optimization by amortizing relation/index setup across many rows within a single trigger-firing cycle.
For workloads with large bulk inserts or updates on tables with foreign keys—common in ETL, staging loads, and data migrations—this work could significantly reduce runtimes. The current limitations (partitioned PKs, temporal FKs) leave those cases on the existing path, preserving correctness while optimizing the majority of FK workloads.
References
- Thread: Eliminating SPI / SQL from some RI triggers - take 3
- [1] Simplifying foreign key/RI checks (earlier thread)
- [2] Eliminating SPI from RI triggers - take 2 (earlier thread)