InPlace Online Major PostgreSQL Upgrades with Rollback Capability
This article summarizes a deep dive into how Yugabyte has implemented inplace, online major PostgreSQL upgrades with the ability to rollback. The presentation covers the challenges with native PostgreSQL upgrades, Yugabyte's solution, and a demonstration of the upgrade process.
Challenges with Native PostgreSQL Upgrades
PostgreSQL, while a popular and powerful database, **does not natively offer inplace, online major upgrades**. Every year brings a new major release, and upgrading from one major release to another typically involves significant downtime or complex workarounds.
Traditional Upgrade Approaches and Their Drawbacks
Several alternative approaches exist, each with its own set of drawbacks:
- Backup and Restore: Slow, requires significant storage, and involves a large outage window.
- Blue/Green Deployment or Logical Replication: Complex to set up and maintain, requires double the hardware, and has limitations on what can be replicated.
- InPlace Offline Upgrade (PG Upgrade): The most common approach, but still requires an outage window that can range from tens of minutes to several hours.
A critical limitation across all these approaches is **the lack of a straightforward rollback mechanism**. Once upgraded, reverting to a previous version is exceptionally difficult.
The Root Cause: The System Catalog
The primary reason for these challenges lies in the PostgreSQL system catalog. The system catalog stores metadata about the database itself (tables, columns, data types, etc.). **This catalog is not backward compatible between major releases**. Changes to the catalog's structure and content necessitate a complete upgrade of the database's internal structures.
The system catalog consists of internal tables used by the database system itself. The schema of these tables are hardcoded into the PostgreSQL C code, creating a tight coupling that makes online, inplace upgrades difficult.
Every database component (backup, select queries, vacuum, analyze, etc.) relies on the system catalog. This widespread dependency further complicates any attempt to modify or upgrade the catalog without significant disruption.
Yugabyte's Solution: InPlace Online Upgrades with Rollback
Yugabyte addresses these challenges with a novel approach that enables inplace, online major PostgreSQL upgrades with the capability to rollback.
Key Principles of the Yugabyte Upgrade Framework
The Yugabyte upgrade framework is designed to be easy to use and requires minimal intervention from the user.
The core principles include:
- Blocking DDLs: During the upgrade process, DDL (Data Definition Language) operations (e.g., CREATE TABLE, ALTER TABLE) are blocked to ensure consistency. Exceptions are made for temporary tables and materialized view refreshes related to DML operations.
- Dual Catalogs: Yugabyte creates and maintains two system catalogs simultaneously: the original PostgreSQL version catalog (e.g., PG11) and a semantically equivalent catalog for the target version (e.g., PG15). Both catalogs operate in readonly mode.
- Master and Tablet Server Architecture: Yugabyte separates metadata (masters) from user data (tablet servers). This separation simplifies the upgrade process.
- Rolling Upgrade: Tablet servers are upgraded one at a time, minimizing downtime and allowing for monitoring during the upgrade.
The Upgrade Process
- Master Upgrade: The Yugabyte masters are upgraded to the new version while still maintaining the old version of the PostgreSQL catalog.
- Catalog Creation: A new system catalog for the target PostgreSQL version is created, using the PG Upgrade tool to convert the old catalog. DDL operations are blocked at this stage.
- Tablet Server Rolling Upgrade: The tablet servers are upgraded one by one to the new PostgreSQL version. During this phase, the system operates in a mixed mode, with some servers running the old version and some the new. Servers use the correct catalog version for their running PostgreSQL version.
- Monitoring Phase: Once all tablet servers are upgraded, the system enters a monitoring phase where the application can be thoroughly tested. DDL operations remain blocked.
- Finalization: If the monitoring phase is successful, the old catalog is deleted, and writes are enabled for the new catalog. The upgrade is complete.
- Rollback: If issues arise during the monitoring phase, the system can be rolled back to the previous version by reverting the tablet servers to the old version. The new catalog is deleted, and the masters are reverted as well.
Demo Highlights
The demo showcased a live upgrade of a threenode Yugabyte cluster from PostgreSQL 11 to PostgreSQL 15. The demo clearly showed
- No downtime during the upgrade process.
- The ability to monitor performance during the mixed mode.
- The smooth finalization of the upgrade.
Conclusion
Yugabyte's inplace online major PostgreSQL upgrade solution offers a significant improvement over traditional approaches by **eliminating downtime and providing a reliable rollback mechanism**. This solution simplifies the upgrade process and reduces the risk associated with migrating to newer PostgreSQL versions.