2026 Archive
PostgreSQL Weekly posts for 2026.
Week 03 (2026)
PostgreSQL mailing list discussions for Week 03, 2026.
🇨🇳 中文版本
Extended Statistics Import/Export Functions for PostgreSQL 19
Introduction
PostgreSQL's extended statistics feature, introduced in version 10, allows the optimizer to understand correlations between columns that simple per-column statistics cannot capture. This includes n-distinct coefficients (how many distinct combinations exist), functional dependencies (when one column determines another), and Most Common Values (MCV) lists for multi-column combinations.
While PostgreSQL 17 introduced functions for importing and exporting relation and attribute statistics (pg_restore_relation_stats, pg_restore_attribute_stats), extended statistics were left out of this initial implementation. A recent thread on pgsql-hackers, initiated by Corey Huinker, addresses this gap with a comprehensive patch series that adds pg_restore_extended_stats(), pg_clear_extended_stats(), and related infrastructure.
This work is significant for several reasons:
- Enables complete statistics preservation across pg_dump/pg_restore and pg_upgrade
- Allows query planner experimentation with hypothetical statistics
- Supports schema-only dumps with statistics for testing query plans without actual data
Technical Analysis
The Problem with the Original Format
The original output format for pg_ndistinct and pg_dependencies types used a JSON structure where the keys themselves contained structured data:
{"1, 2": 2323, "1, 3": 3232, "2, 3": 1500}
While technically valid JSON, this format posed several problems:
- Keys containing comma-separated attribute numbers require additional parsing
- Difficult to manipulate programmatically
- No working input function existed—these types were effectively output-only
The New JSON Format
The patch series introduces a cleaner, more structured JSON format. For pg_ndistinct:
[
{"attributes": [2, 3], "ndistinct": 4},
{"attributes": [2, -1], "ndistinct": 4},
{"attributes": [2, 3, -1], "ndistinct": 4}
]
For pg_dependencies:
[
{"attributes": [2], "dependency": 3, "degree": 1.000000},
{"attributes": [2, 3], "dependency": -1, "degree": 0.850000}
]
Key improvements:
- Proper JSON arrays with named keys for each element
- Clear separation of attributes, values, and metadata
- Machine-readable without custom parsing logic
- Negative attribute numbers represent expressions in the statistics object (e.g.,
-1is the first expression)
Input Function Implementation
The new input functions use PostgreSQL's JSON parser infrastructure with a custom semantic action handler. Here's a simplified view of the parsing state machine for pg_ndistinct:
typedef enum
{
NDIST_EXPECT_START = 0,
NDIST_EXPECT_ITEM,
NDIST_EXPECT_KEY,
NDIST_EXPECT_ATTNUM_LIST,
NDIST_EXPECT_ATTNUM,
NDIST_EXPECT_NDISTINCT,
NDIST_EXPECT_COMPLETE
} ndistinctSemanticState;
The parser validates:
- Proper JSON structure (array of objects)
- Required keys (
attributesandndistinctfor ndistinct statistics) - Attribute numbers within valid ranges (positive for columns, negative for expressions, but not beyond
STATS_MAX_DIMENSIONS) - No duplicate attributes within a single item
Extended Statistics Functions
The patch introduces three main SQL functions:
pg_restore_extended_stats() — Imports extended statistics from a previously exported value:
SELECT pg_restore_extended_stats(
'public', -- relation schema
'my_table', -- relation name
'public', -- statistics schema
'my_stats', -- statistics name
false, -- inherited
'{"version": ..., "ndistinct": [...], "dependencies": [...], "mcv": [...], "exprs": [...]}'::text
);
pg_clear_extended_stats() — Removes extended statistics data from pg_statistic_ext_data:
SELECT pg_clear_extended_stats(
'public', -- statistics schema
'my_stats', -- statistics name
false -- inherited
);
The functions follow the same patterns established for relation/attribute statistics:
- Return boolean indicating success
- Issue
WARNING(notERROR) on problems to avoid breaking pg_restore scripts - Require
MAINTAINprivilege on the target relation
Validation and Safety
The implementation includes careful validation:
- Attribute bounds checking: Positive attnums must exist in
stxkeys, negative attnums must not exceed the number of expressions - Combination completeness: For
pg_ndistinct, all N-choose-K combinations must be present based on the longest attribute list - Soft error handling: Uses PostgreSQL's
ErrorSaveContextfor safe error reporting without crashing
Example validation for attribute numbers:
if (attnum == 0 || attnum < (0 - STATS_MAX_DIMENSIONS))
{
errsave(parse->escontext,
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
errdetail("Invalid \"%s\" element: %d.",
PG_NDISTINCT_KEY_ATTRIBUTES, attnum));
return JSON_SEM_ACTION_FAILED;
}
Community Insights
Key Discussion Points
Format Change Timing: Tomas Vondra initially suggested a more structured JSON format. The community recognized this was the last opportunity to change the format before a working input function locked in backward compatibility requirements.
Validation Scope: There was significant discussion about how much validation to perform:
- Early patches had extensive checks for statistical consistency (e.g., MCV frequencies summing to 1.0)
- Reviewers pushed back, preferring minimal validation to avoid breaking legitimate but unusual imports
- Final consensus: validate structure and attribute references, but not statistical values
pg_dependencies Special Case: Unlike pg_ndistinct which stores all combinations, pg_dependencies may omit statistically insignificant combinations. This means the input function cannot enforce complete combination coverage for dependencies.
Reviewer Feedback Integration
Michael Paquier provided extensive review and contributed significant improvements:
- Restructured the patch series for cleaner commits
- Split format changes from input function additions
- Added comprehensive regression tests achieving >90% code coverage
- Fixed compiler warnings on older GCC versions
Tom Lane caught style issues:
- Error detail messages converted to complete sentences
- Replaced
SOFT_ERROR_OCCURRED()macro with direct state checks to avoid warnings
Current Status
As of January 2026, the patch series has progressed significantly:
Committed:
- Output format changes for
pg_ndistinct(new JSON array format) - Output format changes for
pg_dependencies(new JSON array format) - Input functions for both types with comprehensive validation
pg_clear_extended_stats()function
In Review (v27):
pg_restore_extended_stats()function- pg_dump integration for extended statistics export/import
The pg_dump integration supports backward compatibility to PostgreSQL 10, with version-specific SQL generation to handle format differences.
Technical Details
Internal Storage Unchanged
Importantly, the internal binary storage format remains unchanged. The new input/output functions only affect the text representation. This means:
- No catalog changes required
- Existing data remains valid
- Binary COPY operations unaffected
Expression Statistics Support
Extended statistics can include expressions (e.g., CREATE STATISTICS s ON (a + b), c FROM t). The implementation handles these via negative attribute numbers:
-1= first expression-2= second expression- etc.
The exprs element in the restore format contains per-expression statistics similar to pg_statistic entries, enabling complete round-trip preservation.
MCV List Handling
MCV (Most Common Values) lists for extended statistics are particularly complex, containing:
- Value combinations across multiple columns
- Frequency and base frequency arrays
- Per-value null bitmaps
The implementation reuses infrastructure from attribute statistics import, with extensions for multi-column value arrays.
Conclusion
This patch series represents a significant enhancement to PostgreSQL's statistics infrastructure. By enabling import/export of extended statistics, it:
- Completes the statistics story started in PostgreSQL 17 for relation and attribute statistics
- Enables realistic testing with production-like statistics on sanitized schemas
- Improves upgrade reliability by preserving optimizer information across pg_upgrade
For DBAs and developers:
- Extended statistics created with
CREATE STATISTICSwill now survive pg_dump/pg_restore - Query plan testing becomes more practical with
--no-datadumps that include full statistics - The new JSON format is human-readable for debugging and hypothetical scenario testing
The target release is PostgreSQL 19, with the remaining restore function and pg_dump integration expected to land soon.
References
pg_plan_advice: A New Approach to PostgreSQL Query Plan Control
Introduction
PostgreSQL's query planner is sophisticated and generally produces excellent execution plans. However, experienced DBAs and developers occasionally encounter situations where they wish they could influence or stabilize the planner's decisions. Robert Haas from EnterpriseDB has been working on a significant new contrib module called pg_plan_advice that aims to address this long-standing need.
This article examines the pg_plan_advice thread on the pgsql-hackers mailing list, which has been actively discussed since October 2025.
What is pg_plan_advice?
pg_plan_advice is a proposed contrib module that introduces a special-purpose "advice mini-language" for controlling key planning decisions. The module can:
- Generate advice strings from existing query plans using
EXPLAIN (PLAN_ADVICE) - Apply advice strings via the
pg_plan_advice.adviceGUC parameter to reproduce or constrain future planning decisions
The advice language allows control over:
- Join order: Which tables are joined in what sequence
- Join methods: Nested loop, merge join, hash join
- Scan types: Sequential scan, index scan (with specific index selection)
- Parallelism: Where and how parallel execution is used
- Partitionwise joins: How partitioned table joins are handled
Key Design Philosophy
Robert Haas emphasizes in the README that the principal use case is not about users "out-planning the planner" but rather about reproducing plans that worked well in the past:
"We don't need to accept the proposition that users can out-plan the planner. We only need to accept that they can tell good plans from bad plans better than the planner. That is a low bar to clear. The planner never finds out what happens when the plans that it generates are actually executed, but users do."
This positions pg_plan_advice as a plan stability tool rather than a hint system for micromanaging the optimizer.
Technical Architecture
The Relation Identifier System
One of the most innovative aspects of pg_plan_advice is its relation identifier system. This system provides unambiguous references to parts of a query, handling complex scenarios like:
- Multiple references to the same table with different aliases
- Subqueries and CTEs
- Partitioned tables and their partitions
The identifier syntax uses special notation like t#2 to distinguish between the first and second occurrence of table t in a query.
Example Usage
Here's an example from Jakub Wartak's testing showing the power of the system:
-- Generate advice for a query with aliasing
EXPLAIN (PLAN_ADVICE, COSTS OFF)
SELECT * FROM (SELECT * FROM t1 a JOIN t2 b USING (id)) a, t2 b, t3 c
WHERE a.id = b.id AND b.id = c.id;
-- Output includes:
-- Generated Plan Advice:
-- JOIN_ORDER(a#2 b#2 c)
-- MERGE_JOIN_PLAIN(b#2 c)
-- SEQ_SCAN(c)
-- INDEX_SCAN(a#2 public.t1_pkey)
-- NO_GATHER(c a#2 b#2)
You can then selectively apply constraints:
-- Force a specific scan type
SET pg_plan_advice.advice = 'SEQ_SCAN(b#2)';
-- Re-explain to see the new plan
EXPLAIN (PLAN_ADVICE, COSTS OFF)
SELECT * FROM (SELECT * FROM t1 a JOIN t2 b USING (id)) a, t2 b, t3 c
WHERE a.id = b.id AND b.id = c.id;
-- The output shows:
-- Supplied Plan Advice:
-- SEQ_SCAN(b#2) /* matched */
Patch Structure (v10)
The implementation is split into five patches:
| Patch | Description | Size |
|---|---|---|
| 0001 | Store information about range table flattening | 7.8 KB |
| 0002 | Store information about elided nodes in the final plan | 9.8 KB |
| 0003 | Store information about Append node consolidation | 40.4 KB |
| 0004 | Allow for plugin control over path generation strategies | 56.1 KB |
| 0005 | WIP: Add pg_plan_advice contrib module | 399.1 KB |
The first four patches add necessary infrastructure to the planner, while the fifth contains the actual module. This separation allows the infrastructure to potentially benefit other extensions in the future.
Community Review and Testing
The thread has seen active participation from several community members:
Jakub Wartak (EDB)
Conducted extensive TPC-H benchmark testing and found several bugs:
- Crashes in debug/ASAN builds with NULL pointer dereferences
- Issues with semijoin uniqueness detection without statistics
- Join order advice conflicts in complex queries
Jacob Champion (EDB)
Applied fuzzing techniques to discover edge cases:
- Parser crashes with malformed advice strings
- Issues with partition-related advice on non-partitioned tables
- AST utility bugs revealed through corpus-based fuzzing
Other Contributors
- Alastair Turner: Appreciated the ability to test alternative plans
- Hannu Krosing (Google): Referenced VLDB research showing 20% of real-world queries have 10+ joins
- Lukas Fittl: Interested in pg_stat_statements integration possibilities
Issues Discovered and Fixed
The collaborative review process has uncovered and fixed several issues across versions:
- Compiler warnings (gcc-13, clang-20) - Fixed in early versions
- NULL pointer crashes in
pgpa_join_path_setup()when extension state wasn't allocated - Join order conflict detection incorrectly treating join method advice as positive constraints
- Semijoin uniqueness tracking not working correctly without PLAN_ADVICE in EXPLAIN
- Partial match detection in nested join order specifications
Current Status
As of v10 (posted January 15, 2026):
- The patch is registered in Commitfest
- Still marked as WIP (Work In Progress)
- Active testing continues, particularly with TPC-H queries
- Robert Haas is seeking substantive code review, especially for patch 0001
Implications for PostgreSQL Users
If committed, pg_plan_advice would provide:
- Plan Stability: Capture and reproduce known-good query plans
- Debugging Aid: Understand why the planner makes specific choices
- Testing Tool: Experiment with alternative plan shapes without modifying queries
- Production Safety Net: Guard against unexpected plan regressions after statistics changes
Comparison with pg_hint_plan
Unlike the popular pg_hint_plan extension, pg_plan_advice focuses on round-trip safety:
- Plans can be captured and reapplied reliably
- The relation identifier system handles complex aliasing automatically
- Designed to work with any query structure without manual identifier management
Conclusion
pg_plan_advice represents a significant step forward in PostgreSQL's planner extensibility story. Rather than replacing the optimizer's judgment, it provides a safety mechanism for preserving proven execution strategies. The active community review process has already improved the code substantially, and continued testing is helping ensure robustness.
For DBAs managing complex workloads, particularly those with queries that occasionally suffer from plan regressions, this module offers a promising solution that works with the planner rather than against it.
Thread Link: pg_plan_advice - pgsql-hackers
Commitfest Entry: CF 6184
Author: Robert Haas (EnterpriseDB)
Reviewers: Jakub Wartak, Jacob Champion, Alastair Turner, Hannu Krosing, John Naylor, and others
2026 年归档
2026 年 PostgreSQL 周报文章。
第 03 周(2026)
2026 年第 03 周的 PostgreSQL 邮件列表讨论。
PostgreSQL 19 扩展统计信息导入/导出功能
引言
PostgreSQL 的扩展统计信息功能自版本 10 引入以来,使优化器能够理解简单单列统计信息无法捕获的列间相关性。这包括 n-distinct 系数(存在多少个不同的组合)、函数依赖(当一个列决定另一个列时)以及多列组合的最常见值(MCV)列表。
虽然 PostgreSQL 17 引入了用于导入和导出关系和属性统计信息的函数(pg_restore_relation_stats、pg_restore_attribute_stats),但扩展统计信息被排除在这一初始实现之外。最近在 pgsql-hackers 邮件列表上,由 Corey Huinker 发起的一个讨论线程解决了这一空白,提供了一个全面的补丁系列,添加了 pg_restore_extended_stats()、pg_clear_extended_stats() 及相关基础设施。
这项工作意义重大,原因如下:
- 实现跨 pg_dump/pg_restore 和 pg_upgrade 的完整统计信息保留
- 允许使用假设统计信息进行查询计划器实验
- 支持仅包含 schema 和统计信息的转储,用于在没有实际数据的情况下测试查询计划
技术分析
原始格式的问题
pg_ndistinct 和 pg_dependencies 类型的原始输出格式使用了一种 JSON 结构,其中键本身包含结构化数据:
{"1, 2": 2323, "1, 3": 3232, "2, 3": 1500}
虽然这在技术上是有效的 JSON,但这种格式存在几个问题:
- 包含逗号分隔属性号的键需要额外解析
- 难以以编程方式操作
- 不存在可用的输入函数——这些类型实际上只能输出
新的 JSON 格式
补丁系列引入了一种更清晰、更结构化的 JSON 格式。对于 pg_ndistinct:
[
{"attributes": [2, 3], "ndistinct": 4},
{"attributes": [2, -1], "ndistinct": 4},
{"attributes": [2, 3, -1], "ndistinct": 4}
]
对于 pg_dependencies:
[
{"attributes": [2], "dependency": 3, "degree": 1.000000},
{"attributes": [2, 3], "dependency": -1, "degree": 0.850000}
]
主要改进:
- 规范的 JSON 数组,每个元素都有命名的键
- 清晰分离属性、值和元数据
- 机器可读,无需自定义解析逻辑
- 负数属性号表示统计对象中的表达式(例如,
-1是第一个表达式)
输入函数实现
新的输入函数使用 PostgreSQL 的 JSON 解析器基础设施,配合自定义语义动作处理器。以下是 pg_ndistinct 解析状态机的简化视图:
typedef enum
{
NDIST_EXPECT_START = 0,
NDIST_EXPECT_ITEM,
NDIST_EXPECT_KEY,
NDIST_EXPECT_ATTNUM_LIST,
NDIST_EXPECT_ATTNUM,
NDIST_EXPECT_NDISTINCT,
NDIST_EXPECT_COMPLETE
} ndistinctSemanticState;
解析器验证:
- 正确的 JSON 结构(对象数组)
- 必需的键(ndistinct 统计信息需要
attributes和ndistinct) - 属性号在有效范围内(正数表示列,负数表示表达式,但不超过
STATS_MAX_DIMENSIONS) - 单个项目内没有重复属性
扩展统计信息函数
补丁引入了三个主要 SQL 函数:
pg_restore_extended_stats() — 从先前导出的值导入扩展统计信息:
SELECT pg_restore_extended_stats(
'public', -- 关系 schema
'my_table', -- 关系名称
'public', -- 统计信息 schema
'my_stats', -- 统计信息名称
false, -- inherited(是否继承)
'{"version": ..., "ndistinct": [...], "dependencies": [...], "mcv": [...], "exprs": [...]}'::text
);
pg_clear_extended_stats() — 从 pg_statistic_ext_data 中删除扩展统计信息数据:
SELECT pg_clear_extended_stats(
'public', -- 统计信息 schema
'my_stats', -- 统计信息名称
false -- inherited(是否继承)
);
这些函数遵循为关系/属性统计信息建立的相同模式:
- 返回布尔值表示成功与否
- 遇到问题时发出
WARNING(而非ERROR),以避免破坏 pg_restore 脚本 - 需要目标关系的
MAINTAIN权限
验证与安全
实现包括仔细的验证:
- 属性边界检查:正数 attnum 必须存在于
stxkeys中,负数 attnum 不得超过表达式数量 - 组合完整性:对于
pg_ndistinct,基于最长属性列表,必须存在所有 N 选 K 组合 - 软错误处理:使用 PostgreSQL 的
ErrorSaveContext进行安全的错误报告而不会崩溃
属性号验证示例:
if (attnum == 0 || attnum < (0 - STATS_MAX_DIMENSIONS))
{
errsave(parse->escontext,
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
errdetail("Invalid \"%s\" element: %d.",
PG_NDISTINCT_KEY_ATTRIBUTES, attnum));
return JSON_SEM_ACTION_FAILED;
}
社区洞察
关键讨论点
格式更改时机:Tomas Vondra 最初建议采用更结构化的 JSON 格式。社区认识到这是在可用输入函数锁定向后兼容性要求之前更改格式的最后机会。
验证范围:关于应执行多少验证存在重大讨论:
- 早期补丁对统计一致性进行了广泛检查(例如,MCV 频率总和为 1.0)
- 审查者提出反对意见,倾向于最小化验证以避免破坏合法但不寻常的导入
- 最终共识:验证结构和属性引用,但不验证统计值
pg_dependencies 特殊情况:与存储所有组合的 pg_ndistinct 不同,pg_dependencies 可能会省略统计上不显著的组合。这意味着输入函数无法对依赖项强制执行完整的组合覆盖。
审查反馈整合
Michael Paquier 提供了广泛的审查并贡献了重大改进:
- 重构补丁系列以获得更清晰的提交
- 将格式更改与输入函数添加分开
- 添加全面的回归测试,实现超过 90% 的代码覆盖率
- 修复旧版 GCC 上的编译器警告
Tom Lane 发现了风格问题:
- 错误详细消息转换为完整句子
- 用直接状态检查替换
SOFT_ERROR_OCCURRED()宏以避免警告
当前状态
截至 2026 年 1 月,补丁系列已取得重大进展:
已提交:
pg_ndistinct的输出格式更改(新的 JSON 数组格式)pg_dependencies的输出格式更改(新的 JSON 数组格式)- 两种类型的输入函数及全面验证
pg_clear_extended_stats()函数
审查中 (v27):
pg_restore_extended_stats()函数- pg_dump 集成用于扩展统计信息导出/导入
pg_dump 集成支持向后兼容到 PostgreSQL 10,通过特定版本的 SQL 生成来处理格式差异。
技术细节
内部存储未更改
重要的是,内部二进制存储格式保持不变。新的输入/输出函数只影响文本表示。这意味着:
- 不需要目录更改
- 现有数据保持有效
- 二进制 COPY 操作不受影响
表达式统计支持
扩展统计信息可以包含表达式(例如,CREATE STATISTICS s ON (a + b), c FROM t)。实现通过负数属性号处理这些:
-1= 第一个表达式-2= 第二个表达式- 以此类推
恢复格式中的 exprs 元素包含类似于 pg_statistic 条目的每个表达式的统计信息,实现完整的往返保留。
MCV 列表处理
扩展统计信息的 MCV(最常见值)列表特别复杂,包含:
- 跨多列的值组合
- 频率和基础频率数组
- 每个值的空值位图
实现重用了属性统计信息导入的基础设施,并针对多列值数组进行了扩展。
结论
这个补丁系列代表了 PostgreSQL 统计信息基础设施的重大增强。通过启用扩展统计信息的导入/导出,它:
- 完善了统计信息功能,延续了 PostgreSQL 17 中为关系和属性统计信息开始的工作
- 实现了真实的测试,在清理后的 schema 上使用类似生产环境的统计信息
- 提高了升级可靠性,通过 pg_upgrade 保留优化器信息
对于 DBA 和开发人员:
- 使用
CREATE STATISTICS创建的扩展统计信息现在可以在 pg_dump/pg_restore 后保留 - 使用包含完整统计信息的
--no-data转储,查询计划测试变得更加实用 - 新的 JSON 格式便于人类阅读,用于调试和假设场景测试
目标发布版本是 PostgreSQL 19,剩余的恢复函数和 pg_dump 集成预计很快会合并。
参考资料
pg_plan_advice:PostgreSQL 查询计划控制的新方案
背景介绍
PostgreSQL 的查询优化器功能强大,通常能生成优秀的执行计划。然而,经验丰富的 DBA 和开发者偶尔会遇到需要影响或稳定优化器决策的场景。来自 EnterpriseDB 的 Robert Haas 正在开发一个重要的 contrib 模块 pg_plan_advice,旨在解决这一长期存在的需求。
本文分析了 pgsql-hackers 邮件列表上自 2025 年 10 月以来持续讨论的 pg_plan_advice 线程。
pg_plan_advice 是什么?
pg_plan_advice 是一个提议中的 contrib 模块,引入了一种专门用于控制关键规划决策的"建议迷你语言"(advice mini-language)。该模块支持:
- 使用
EXPLAIN (PLAN_ADVICE)从现有查询计划生成建议字符串 - 通过
pg_plan_advice.adviceGUC 参数应用建议字符串,以复现或约束后续的规划决策
建议语言允许控制:
- 连接顺序:表的连接顺序
- 连接方法:Nested Loop、Merge Join、Hash Join
- 扫描类型:顺序扫描、索引扫描(可指定具体索引)
- 并行执行:并行执行的位置和方式
- 分区连接:分区表连接的处理方式
核心设计理念
Robert Haas 在 README 中强调,主要使用场景并非让用户"战胜优化器",而是复现过去表现良好的执行计划:
"我们不需要接受用户能比优化器做出更好规划的观点。我们只需要接受用户比优化器更能分辨好计划和坏计划。这是一个很低的门槛。优化器永远不知道它生成的计划实际执行时会发生什么,但用户知道。"
这将 pg_plan_advice 定位为计划稳定性工具,而非微观管理优化器的提示系统。
技术架构
关系标识符系统
pg_plan_advice 最创新的方面之一是其关系标识符系统(Relation Identifier System)。该系统提供对查询各部分的无歧义引用,能处理复杂场景:
- 同一表使用不同别名的多次引用
- 子查询和 CTE
- 分区表及其分区
标识符语法使用特殊表示法如 t#2 来区分查询中表 t 的第一次和第二次出现。
使用示例
以下是 Jakub Wartak 测试中展示的系统能力:
-- 为带别名的查询生成建议
EXPLAIN (PLAN_ADVICE, COSTS OFF)
SELECT * FROM (SELECT * FROM t1 a JOIN t2 b USING (id)) a, t2 b, t3 c
WHERE a.id = b.id AND b.id = c.id;
-- 输出包含:
-- Generated Plan Advice:
-- JOIN_ORDER(a#2 b#2 c)
-- MERGE_JOIN_PLAIN(b#2 c)
-- SEQ_SCAN(c)
-- INDEX_SCAN(a#2 public.t1_pkey)
-- NO_GATHER(c a#2 b#2)
然后可以选择性地应用约束:
-- 强制使用特定扫描类型
SET pg_plan_advice.advice = 'SEQ_SCAN(b#2)';
-- 重新执行 EXPLAIN 查看新计划
EXPLAIN (PLAN_ADVICE, COSTS OFF)
SELECT * FROM (SELECT * FROM t1 a JOIN t2 b USING (id)) a, t2 b, t3 c
WHERE a.id = b.id AND b.id = c.id;
-- 输出显示:
-- Supplied Plan Advice:
-- SEQ_SCAN(b#2) /* matched */
补丁结构(v10)
实现分为五个补丁:
| 补丁 | 描述 | 大小 |
|---|---|---|
| 0001 | 存储范围表扁平化信息 | 7.8 KB |
| 0002 | 在最终计划中存储省略节点的信息 | 9.8 KB |
| 0003 | 存储 Append 节点合并信息 | 40.4 KB |
| 0004 | 允许插件控制路径生成策略 | 56.1 KB |
| 0005 | WIP:添加 pg_plan_advice contrib 模块 | 399.1 KB |
前四个补丁为优化器添加必要的基础设施,第五个包含实际模块。这种分离设计使基础设施在未来可能惠及其他扩展。
社区审查与测试
该线程得到了多位社区成员的积极参与:
Jakub Wartak(EDB)
进行了大量 TPC-H 基准测试,发现了若干 bug:
- debug/ASAN 构建中的空指针崩溃
- 无统计信息时的半连接唯一性检测问题
- 复杂查询中的连接顺序建议冲突
Jacob Champion(EDB)
应用模糊测试技术发现边缘情况:
- 畸形建议字符串导致的解析器崩溃
- 对非分区表使用分区相关建议的问题
- 通过语料库模糊测试发现的 AST 工具 bug
其他贡献者
- Alastair Turner:赞赏测试替代计划的能力
- Hannu Krosing(Google):引用 VLDB 研究,显示 20% 的实际查询有 10+ 个连接
- Lukas Fittl:对与 pg_stat_statements 集成的可能性感兴趣
发现并修复的问题
协作审查过程发现并修复了多个版本中的若干问题:
- 编译器警告(gcc-13、clang-20)- 在早期版本中修复
- 扩展状态未分配时
pgpa_join_path_setup()中的空指针崩溃 - 连接顺序冲突检测错误地将连接方法建议视为正向约束
- 在 EXPLAIN 中未使用 PLAN_ADVICE 时半连接唯一性追踪工作不正确
- 嵌套连接顺序规范中的部分匹配检测问题
当前状态
截至 v10(2026 年 1 月 15 日发布):
- 补丁已注册在 Commitfest
- 仍标记为 WIP(进行中)
- 测试仍在进行,特别是 TPC-H 查询测试
- Robert Haas 正在寻求实质性的代码审查,特别是针对补丁 0001
对 PostgreSQL 用户的意义
如果被合并,pg_plan_advice 将提供:
- 计划稳定性:捕获并复现已知良好的查询计划
- 调试辅助:理解优化器为何做出特定选择
- 测试工具:在不修改查询的情况下实验替代计划形状
- 生产安全网:防止统计信息变化后的意外计划退化
与 pg_hint_plan 的比较
与流行的 pg_hint_plan 扩展不同,pg_plan_advice 专注于往返安全性(round-trip safety):
- 计划可以被可靠地捕获和重新应用
- 关系标识符系统自动处理复杂的别名
- 设计为可与任何查询结构配合使用,无需手动管理标识符
总结
pg_plan_advice 代表了 PostgreSQL 优化器可扩展性方面的重要进步。它不是要取代优化器的判断,而是提供一种安全机制来保留经过验证的执行策略。活跃的社区审查过程已经大幅改进了代码,持续的测试正在帮助确保其健壮性。
对于管理复杂工作负载的 DBA,特别是那些查询偶尔遭受计划退化的场景,该模块提供了一个有前途的解决方案,它与优化器协同工作而非对抗。
邮件列表链接: pg_plan_advice - pgsql-hackers
Commitfest 条目: CF 6184
作者: Robert Haas(EnterpriseDB)
审查者: Jakub Wartak、Jacob Champion、Alastair Turner、Hannu Krosing、John Naylor 等