Automatically do post upgrade steps (vacuum analyze, database reindexing) #50

justinclift · 2024-09-30T11:54:07Z

This is initial code for automatically doing post upgrade tasks (vacuum analyze to refresh query planner stats, reindex the databases) after a major version upgrade.

It's passing in the initial testing I've been running locally, though I haven't tried anything too funky yet. The code itself is pretty straight forward, so it shouldn't be all that fragile. Hopefully. 😄

justinclift · 2024-09-30T11:55:44Z

~~@andyundso It seems like we could potentially reindex the databases in a fairly simple way after upgrades using an approach like this.~~

~~What do you reckon?~~ The code has been significantly reworked since originally writing this comment. 😉

justinclift · 2024-09-30T12:08:10Z

Note that I'm unsure (so far) how to properly test this. Might need to ask on an appropriate PostgreSQL developers mailing list for ideas. 😄

andyundso · 2024-09-30T12:50:51Z

I have a couple of thoughts, already sorry for the wall of text 😅

How do we know what we need to do after an upgrade? I recently manually upgraded Postgres from v13 to v16 (application is not containerized so we also did not bother to containize Postgres) and there pg_upgrade printed something like this:

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade.
Once you start the new server, consider running:
    /home/greg/pg/16/bin/vacuumdb --all --analyze-in-stages
Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

I also read through https://www.crunchydata.com/blog/examining-postgres-upgrades-with-pg_upgrade and there they suggested that extensions sometimes also require their own upgrade.

Reindexing seems to only necessary only in certain cases, like you mentioned here.

I'm kind of wondering if we might want to take a different approach, and have the reindex operation run concurrently (ie REINDEX DATABASE CONCURRENTLY) after actually starting the database properly so applications can use it meanwhile.

I would suggest we run CONCURRENTLY just to speed up things and prevent unnecessary blocks on the table. I would implement it as follows:

Run pg_upgrade if necessary, as currently implemented.
Launch the PostgreSQL server.
In a separate process, launch the reindex operation.

I know it is against the rule of Docker to have more than one "main" process in a container. I think it would be good to have the server up as soon as possible, but still run the reindex operation, therefore the suggestion with the separate process.

I am wondering if we should drop PGAUTO_REINDEX and just always execute these "post upgrade actions". Or have a variable to opt-out of this behaviour. I personally would like that these post-upgrade actions are executed automatically as I do not want to bother with that, but other people might have different preferences.

Note that I'm unsure (so far) how to properly test this.

as an idea: maybe we can query PG from our test.sh about tasks that have ran recently. we could define a timeout of 1 minute where we would expect that the REINDEX operation has run.

justinclift · 2024-09-30T13:12:19Z

Ugh, I totally forgot about the whole "vacuum analyse" thing afterwards. 🤦

Yeah, we'd better do that too.

... extensions sometimes also require their own upgrade.

😬

Well, I'm calling "edge case" on that and reckon we can look at PG extension upgrades at a later point. Lets get the fundamentals working ok first. 😄

Reindexing seems to only necessary only in certain cases, like you mentioned #10 (comment).

Yeah. While we could create a big multi-dimensional matrix to track the version upgrades where it is and isn't needed... I reckon that's more an optimisation for a (maybe) later date. 😁

I would suggest we run CONCURRENTLY

Heh Heh Heh, I was afraid of that. 😁

I guess we'd better do it properly rather than cheap+easy. 😉

We'll probably need to figure out a reliable way of forking a background process to do the reindexing in the background when the main database process is started.

We can probably just launch some kind of bash script in the background that waits for the database to report it's available, then we run the post upgrade tasks and log the progress/errors somewhere from that.

I am wondering if we should drop PGAUTO_REINDEX and just always execute these "post upgrade actions".

After going through the implementation process a bit, I reckon you're right. That variable has been removed. 😄

maybe we can query PG from our test.sh about tasks that have ran recently.

Oh, that's a good idea. PostgreSQL also apparently has a progress counter thing for reindexes available via some system table. At least, that's what the docs mention though I've not looked at the details:

Each backend running REINDEX will report its progress in the pg_stat_progress_create_index
view. See Section 27.4.4.

This is the Section 27.4.4 it refers to.

we could define a timeout of 1 minute where we would expect that the REINDEX operation has run.

Nah, probably not the right approach. Large databases can take ages to reindex (even hours), especially if it's done concurrently.

andyundso · 2024-09-30T14:51:46Z

We can probably just launch some kind of bash script in the background that waits for the database to report it's available, then we run the reindexing operations and log the progress/errors somewhere from that.

had the same thought. we can probably just something like that in the entrypoint script.

bash /opt/reindex.sh &
exec "$@"

we could define a timeout of 1 minute where we would expect that the REINDEX operation has run.

Nah, probably not the right approach. Large databases can take ages to reindex (even hours), especially if it's done concurrently.

I meant a timeout of 1 minute when we test the container with test.sh. I agree for real world use cases we should not define a timeout.

justinclift · 2024-10-01T04:04:38Z

Cool, yeah good thoughts. I'll probably have a go at adding some of those pieces over the next few days if no-one else gets to it first.

That being said, if anyone else wants to try implementing bits first then please do. 😁

justinclift · 2024-10-02T08:29:43Z

k, I think the post upgrade scripting is mostly done. Or at least to the point where it's suitable for looking over for any obvious problems. 😄

One thing it doesn't yet handle is upgrading any extensions. We'll probably want to have a look at automating that in this PR too if it seems fairly straight forward (I haven't checked).

justinclift · 2024-10-02T10:32:42Z

As a data point, I'm manually testing this on my local development computer by first manually creating a PG 16 database directory locally:

$ mkdir postgres-data
$ docker run --rm --name pg -it \
  --mount type=bind,source=$(pwd)/postgres-data,target=/var/lib/postgresql/data \
  -e POSTGRES_PASSWORD=abc123 \
  postgres:16-alpine

Then running the locally built pgautoupgrade/pgautoupgrade:dev container on that same PG data directory:

$ docker run --rm --name pgauto -it \
  --mount type=bind,source=$(pwd)/postgres-data,target=/var/lib/postgresql/data \
  -e POSTGRES_PASSWORD=abc123 \
  pgautoupgrade/pgautoupgrade:dev

When the post upgrade script runs, its stdout and stderr output is captured into a file in the $PGDATA directory. For example, 2024.10.02-03.28-pgautoupgrade.log in the directory listing below:

$ sudo ls -la postgres-data
total 46
drwx------ 19 70 jc    27 Oct  2 13:30 .
drwxrwxr-x  7 jc jc    19 Oct  2 13:27 ..
-rw-r--r--  1 70 70 94211 Oct  2 13:28 2024.10.02-03.28-pgautoupgrade.log
drwx------  5 70 70     5 Oct  2 13:28 base
-rwx------  1 70 70    49 Oct  2 13:28 delete_old_cluster.sh
drwx------  2 70 70    65 Oct  2 13:30 global
drwx------  2 70 70     2 Oct  2 13:28 pg_commit_ts
drwx------  2 70 70     2 Oct  2 13:28 pg_dynshmem
-rw-------  1 70 70  5743 Oct  2 13:28 pg_hba.conf
[etc]

and:

$ sudo head postgres-data/2024.10.02-03.28-pgautoupgrade.log
----------------------------
Updating query planner stats
----------------------------
INFO:  vacuuming "postgres.pg_catalog.pg_statistic"
INFO:  finished vacuuming "postgres.pg_catalog.pg_statistic": index scans: 0
pages: 0 removed, 19 remain, 19 scanned (100.00% of total)
tuples: 0 removed, 410 remain, 0 are dead but not yet removable
removable cutoff: 753, which was 0 XIDs old when operation ended
new relfrozenxid: 753, which is 22 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
[etc]

justinclift · 2024-10-02T10:48:02Z

If anyone's interested in testing this PR on their database, I've just pushed an image (x86_64 only) built using this PR to Docker Hub using the dev tag. ie:

image: pgautoupgrade/pgautoupgrade:dev

It'd expect it to work without issue. Shouldn't be any errors or anything along those lines. In theory. 😄

andyundso

very cool stuff 😎

docker-entrypoint.sh

pgautoupgrade-postupgrade.sh

docker-entrypoint.sh

justinclift · 2024-10-03T13:29:31Z

It took the one shot testing a bit more effort than I was expecting, but it should be workable for now.

Still need to properly look into the upgrading of extensions. Probably a tomorrow thing, as it's not something for tonight.

andyundso

looks good to me. thanks for adding the feature!

I suppose we can close #10 after merging this?

andyundso · 2024-10-03T13:42:00Z

docker-entrypoint.sh

+				echo "'One shot' automatic upgrade was requested, so exiting as there is no upgrade to do"
+				echo "If you're seeing this message and expecting an upgrade to be happening, it probably"
+				echo "means the container is being started in a loop and a previous run already did it :)"


good messages, will make it very clear to users what is happening.

Thanks. 😄

justinclift · 2024-10-03T14:03:11Z

I suppose we can close #10 after merging this?

Almost. Lets merge this PR for now, but investigate (and maybe implement?) the extension upgrade piece before closing #10.

I mean, it might turn out to be mostly straight forward. Hopefully. 🙏

justinclift · 2024-10-03T14:12:22Z

Uh oh. Right after merging this I've realised that hard coding the database name of postgres in the post upgrade script could break things for people that use (only) a different database name:

docker-pgautoupgrade/pgautoupgrade-postupgrade.sh

Line 40 in 93c5fd8

    
           DB_LIST=$(echo 'SELECT datname FROM pg_catalog.pg_database WHERE datistemplate IS FALSE' | psql -1t --csv postgres)

I'll test locally if using one of template databases will work instead (pretty sure it will), then I'll create a PR to fix that.

justinclift · 2024-10-03T14:19:47Z

Instead of using a template database, I've just passed through the name of the database being used by the docker entrypoint script and re-used that. Theoretically (!) it should be fine. 😁

justinclift changed the title ~~Initial code (untested) for automatically reindexing the databases~~ Initial code for automatically reindexing the databases Sep 30, 2024

justinclift force-pushed the automatic_reindex_v1 branch from e9f1be5 to 51fe3ce Compare October 1, 2024 08:10

justinclift changed the title ~~Initial code for automatically reindexing the databases~~ WIP: Initial code for automatically reindexing the databases Oct 1, 2024

justinclift changed the title ~~WIP: Initial code for automatically reindexing the databases~~ WIP: Initial code for automatically doing post upgrade steps (vacuum analyze, database reindexing) Oct 2, 2024

Automatically analyze & reindex the database cluster after upgrade

7ea0a4b

justinclift force-pushed the automatic_reindex_v1 branch from b138ebc to 7ea0a4b Compare October 2, 2024 10:29

andyundso reviewed Oct 2, 2024

View reviewed changes

docker-entrypoint.sh Outdated Show resolved Hide resolved

pgautoupgrade-postupgrade.sh Show resolved Hide resolved

docker-entrypoint.sh Outdated Show resolved Hide resolved

justinclift added 2 commits October 3, 2024 12:28

One-shot mode shouldn't start PG if theres no upgrade to do

6d4a9e7

Lets not keep a separate log file hanging around

db51d2d

justinclift changed the title ~~WIP: Initial code for automatically doing post upgrade steps (vacuum analyze, database reindexing)~~ Automatically do post upgrade steps (vacuum analyze, database reindexing) Oct 3, 2024

justinclift changed the title ~~Automatically do post upgrade steps (vacuum analyze, database reindexing)~~ WIP. Automatically do post upgrade steps (vacuum analyze, database reindexing) Oct 3, 2024

justinclift force-pushed the automatic_reindex_v1 branch from c4aa2f2 to f3864d4 Compare October 3, 2024 07:48

Add reasonable initial tests of 'one shot' mode

dcbed7e

justinclift force-pushed the automatic_reindex_v1 branch from a569006 to dcbed7e Compare October 3, 2024 12:49

andyundso approved these changes Oct 3, 2024

View reviewed changes

justinclift changed the title ~~WIP. Automatically do post upgrade steps (vacuum analyze, database reindexing)~~ Automatically do post upgrade steps (vacuum analyze, database reindexing) Oct 3, 2024

justinclift merged commit 93c5fd8 into main Oct 3, 2024
26 checks passed

justinclift deleted the automatic_reindex_v1 branch October 3, 2024 14:05

justinclift mentioned this pull request Oct 3, 2024

Don't hard code the postgres database name #52

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically do post upgrade steps (vacuum analyze, database reindexing) #50

Automatically do post upgrade steps (vacuum analyze, database reindexing) #50

justinclift commented Sep 30, 2024 •

edited

Loading

justinclift commented Sep 30, 2024 •

edited

Loading

justinclift commented Sep 30, 2024

andyundso commented Sep 30, 2024

justinclift commented Sep 30, 2024 •

edited

Loading

andyundso commented Sep 30, 2024

justinclift commented Oct 1, 2024

justinclift commented Oct 2, 2024

justinclift commented Oct 2, 2024 •

edited

Loading

justinclift commented Oct 2, 2024 •

edited

Loading

andyundso left a comment

justinclift commented Oct 3, 2024

andyundso left a comment

andyundso Oct 3, 2024

justinclift Oct 3, 2024

justinclift commented Oct 3, 2024

justinclift commented Oct 3, 2024

justinclift commented Oct 3, 2024

Automatically do post upgrade steps (vacuum analyze, database reindexing) #50

Automatically do post upgrade steps (vacuum analyze, database reindexing) #50

Conversation

justinclift commented Sep 30, 2024 • edited Loading

justinclift commented Sep 30, 2024 • edited Loading

justinclift commented Sep 30, 2024

andyundso commented Sep 30, 2024

justinclift commented Sep 30, 2024 • edited Loading

andyundso commented Sep 30, 2024

justinclift commented Oct 1, 2024

justinclift commented Oct 2, 2024

justinclift commented Oct 2, 2024 • edited Loading

justinclift commented Oct 2, 2024 • edited Loading

andyundso left a comment

Choose a reason for hiding this comment

justinclift commented Oct 3, 2024

andyundso left a comment

Choose a reason for hiding this comment

andyundso Oct 3, 2024

Choose a reason for hiding this comment

justinclift Oct 3, 2024

Choose a reason for hiding this comment

justinclift commented Oct 3, 2024

justinclift commented Oct 3, 2024

justinclift commented Oct 3, 2024

justinclift commented Sep 30, 2024 •

edited

Loading

justinclift commented Sep 30, 2024 •

edited

Loading

justinclift commented Sep 30, 2024 •

edited

Loading

justinclift commented Oct 2, 2024 •

edited

Loading

justinclift commented Oct 2, 2024 •

edited

Loading