Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically do post upgrade steps (vacuum analyze, database reindexing) #50

Merged
merged 4 commits into from
Oct 3, 2024

Conversation

justinclift
Copy link
Member

@justinclift justinclift commented Sep 30, 2024

This is initial code for automatically doing post upgrade tasks (vacuum analyze to refresh query planner stats, reindex the databases) after a major version upgrade.

It's passing in the initial testing I've been running locally, though I haven't tried anything too funky yet. The code itself is pretty straight forward, so it shouldn't be all that fragile. Hopefully. 😄

@justinclift
Copy link
Member Author

justinclift commented Sep 30, 2024

@andyundso It seems like we could potentially reindex the databases in a fairly simple way after upgrades using an approach like this.

What do you reckon? The code has been significantly reworked since originally writing this comment. 😉

@justinclift justinclift changed the title Initial code (untested) for automatically reindexing the databases Initial code for automatically reindexing the databases Sep 30, 2024
@justinclift
Copy link
Member Author

Note that I'm unsure (so far) how to properly test this. Might need to ask on an appropriate PostgreSQL developers mailing list for ideas. 😄

@andyundso
Copy link
Member

I have a couple of thoughts, already sorry for the wall of text 😅


How do we know what we need to do after an upgrade? I recently manually upgraded Postgres from v13 to v16 (application is not containerized so we also did not bother to containize Postgres) and there pg_upgrade printed something like this:

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade.
Once you start the new server, consider running:
    /home/greg/pg/16/bin/vacuumdb --all --analyze-in-stages
Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

I also read through https://www.crunchydata.com/blog/examining-postgres-upgrades-with-pg_upgrade and there they suggested that extensions sometimes also require their own upgrade.

Reindexing seems to only necessary only in certain cases, like you mentioned here.


I'm kind of wondering if we might want to take a different approach, and have the reindex operation run concurrently (ie REINDEX DATABASE CONCURRENTLY) after actually starting the database properly so applications can use it meanwhile.

I would suggest we run CONCURRENTLY just to speed up things and prevent unnecessary blocks on the table. I would implement it as follows:

  1. Run pg_upgrade if necessary, as currently implemented.
  2. Launch the PostgreSQL server.
  3. In a separate process, launch the reindex operation.

I know it is against the rule of Docker to have more than one "main" process in a container. I think it would be good to have the server up as soon as possible, but still run the reindex operation, therefore the suggestion with the separate process.


I am wondering if we should drop PGAUTO_REINDEX and just always execute these "post upgrade actions". Or have a variable to opt-out of this behaviour. I personally would like that these post-upgrade actions are executed automatically as I do not want to bother with that, but other people might have different preferences.


Note that I'm unsure (so far) how to properly test this.

as an idea: maybe we can query PG from our test.sh about tasks that have ran recently. we could define a timeout of 1 minute where we would expect that the REINDEX operation has run.

@justinclift
Copy link
Member Author

justinclift commented Sep 30, 2024

Ugh, I totally forgot about the whole "vacuum analyse" thing afterwards. 🤦

Yeah, we'd better do that too.


... extensions sometimes also require their own upgrade.

😬

Well, I'm calling "edge case" on that and reckon we can look at PG extension upgrades at a later point. Lets get the fundamentals working ok first. 😄


Reindexing seems to only necessary only in certain cases, like you mentioned #10 (comment).

Yeah. While we could create a big multi-dimensional matrix to track the version upgrades where it is and isn't needed... I reckon that's more an optimisation for a (maybe) later date. 😁


I would suggest we run CONCURRENTLY

Heh Heh Heh, I was afraid of that. 😁

I guess we'd better do it properly rather than cheap+easy. 😉

We'll probably need to figure out a reliable way of forking a background process to do the reindexing in the background when the main database process is started.

We can probably just launch some kind of bash script in the background that waits for the database to report it's available, then we run the post upgrade tasks and log the progress/errors somewhere from that.


I am wondering if we should drop PGAUTO_REINDEX and just always execute these "post upgrade actions".

After going through the implementation process a bit, I reckon you're right. That variable has been removed. 😄


maybe we can query PG from our test.sh about tasks that have ran recently.

Oh, that's a good idea. PostgreSQL also apparently has a progress counter thing for reindexes available via some system table. At least, that's what the docs mention though I've not looked at the details:

Each backend running REINDEX will report its progress in the pg_stat_progress_create_index
view. See Section 27.4.4.

This is the Section 27.4.4 it refers to.

we could define a timeout of 1 minute where we would expect that the REINDEX operation has run.

Nah, probably not the right approach. Large databases can take ages to reindex (even hours), especially if it's done concurrently.

@andyundso
Copy link
Member

We can probably just launch some kind of bash script in the background that waits for the database to report it's available, then we run the reindexing operations and log the progress/errors somewhere from that.

had the same thought. we can probably just something like that in the entrypoint script.

bash /opt/reindex.sh &
exec "$@"

we could define a timeout of 1 minute where we would expect that the REINDEX operation has run.

Nah, probably not the right approach. Large databases can take ages to reindex (even hours), especially if it's done concurrently.

I meant a timeout of 1 minute when we test the container with test.sh. I agree for real world use cases we should not define a timeout.

@justinclift
Copy link
Member Author

Cool, yeah good thoughts. I'll probably have a go at adding some of those pieces over the next few days if no-one else gets to it first.

That being said, if anyone else wants to try implementing bits first then please do. 😁

@justinclift justinclift force-pushed the automatic_reindex_v1 branch from e9f1be5 to 51fe3ce Compare October 1, 2024 08:10
@justinclift justinclift changed the title Initial code for automatically reindexing the databases WIP: Initial code for automatically reindexing the databases Oct 1, 2024
@justinclift justinclift changed the title WIP: Initial code for automatically reindexing the databases WIP: Initial code for automatically doing post upgrade steps (vacuum analyze, database reindexing) Oct 2, 2024
@justinclift
Copy link
Member Author

k, I think the post upgrade scripting is mostly done. Or at least to the point where it's suitable for looking over for any obvious problems. 😄

One thing it doesn't yet handle is upgrading any extensions. We'll probably want to have a look at automating that in this PR too if it seems fairly straight forward (I haven't checked).

@justinclift justinclift force-pushed the automatic_reindex_v1 branch from b138ebc to 7ea0a4b Compare October 2, 2024 10:29
@justinclift
Copy link
Member Author

justinclift commented Oct 2, 2024

As a data point, I'm manually testing this on my local development computer by first manually creating a PG 16 database directory locally:

$ mkdir postgres-data
$ docker run --rm --name pg -it \
  --mount type=bind,source=$(pwd)/postgres-data,target=/var/lib/postgresql/data \
  -e POSTGRES_PASSWORD=abc123 \
  postgres:16-alpine

Then running the locally built pgautoupgrade/pgautoupgrade:dev container on that same PG data directory:

$ docker run --rm --name pgauto -it \
  --mount type=bind,source=$(pwd)/postgres-data,target=/var/lib/postgresql/data \
  -e POSTGRES_PASSWORD=abc123 \
  pgautoupgrade/pgautoupgrade:dev

When the post upgrade script runs, its stdout and stderr output is captured into a file in the $PGDATA directory. For example, 2024.10.02-03.28-pgautoupgrade.log in the directory listing below:

$ sudo ls -la postgres-data
total 46
drwx------ 19 70 jc    27 Oct  2 13:30 .
drwxrwxr-x  7 jc jc    19 Oct  2 13:27 ..
-rw-r--r--  1 70 70 94211 Oct  2 13:28 2024.10.02-03.28-pgautoupgrade.log
drwx------  5 70 70     5 Oct  2 13:28 base
-rwx------  1 70 70    49 Oct  2 13:28 delete_old_cluster.sh
drwx------  2 70 70    65 Oct  2 13:30 global
drwx------  2 70 70     2 Oct  2 13:28 pg_commit_ts
drwx------  2 70 70     2 Oct  2 13:28 pg_dynshmem
-rw-------  1 70 70  5743 Oct  2 13:28 pg_hba.conf
[etc]

and:

$ sudo head postgres-data/2024.10.02-03.28-pgautoupgrade.log
----------------------------
Updating query planner stats
----------------------------
INFO:  vacuuming "postgres.pg_catalog.pg_statistic"
INFO:  finished vacuuming "postgres.pg_catalog.pg_statistic": index scans: 0
pages: 0 removed, 19 remain, 19 scanned (100.00% of total)
tuples: 0 removed, 410 remain, 0 are dead but not yet removable
removable cutoff: 753, which was 0 XIDs old when operation ended
new relfrozenxid: 753, which is 22 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
[etc]

@justinclift
Copy link
Member Author

justinclift commented Oct 2, 2024

If anyone's interested in testing this PR on their database, I've just pushed an image (x86_64 only) built using this PR to Docker Hub using the dev tag. ie:

image: pgautoupgrade/pgautoupgrade:dev

It'd expect it to work without issue. Shouldn't be any errors or anything along those lines. In theory. 😄

Copy link
Member

@andyundso andyundso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool stuff 😎

docker-entrypoint.sh Outdated Show resolved Hide resolved
pgautoupgrade-postupgrade.sh Show resolved Hide resolved
docker-entrypoint.sh Outdated Show resolved Hide resolved
@justinclift justinclift changed the title WIP: Initial code for automatically doing post upgrade steps (vacuum analyze, database reindexing) Automatically do post upgrade steps (vacuum analyze, database reindexing) Oct 3, 2024
@justinclift justinclift changed the title Automatically do post upgrade steps (vacuum analyze, database reindexing) WIP. Automatically do post upgrade steps (vacuum analyze, database reindexing) Oct 3, 2024
@justinclift justinclift force-pushed the automatic_reindex_v1 branch from c4aa2f2 to f3864d4 Compare October 3, 2024 07:48
@justinclift justinclift force-pushed the automatic_reindex_v1 branch from a569006 to dcbed7e Compare October 3, 2024 12:49
@justinclift
Copy link
Member Author

It took the one shot testing a bit more effort than I was expecting, but it should be workable for now.

Still need to properly look into the upgrading of extensions. Probably a tomorrow thing, as it's not something for tonight.

Copy link
Member

@andyundso andyundso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. thanks for adding the feature!

I suppose we can close #10 after merging this?

Comment on lines +609 to +611
echo "'One shot' automatic upgrade was requested, so exiting as there is no upgrade to do"
echo "If you're seeing this message and expecting an upgrade to be happening, it probably"
echo "means the container is being started in a loop and a previous run already did it :)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good messages, will make it very clear to users what is happening.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. 😄

@justinclift
Copy link
Member Author

I suppose we can close #10 after merging this?

Almost. Lets merge this PR for now, but investigate (and maybe implement?) the extension upgrade piece before closing #10.

I mean, it might turn out to be mostly straight forward. Hopefully. 🙏

@justinclift justinclift changed the title WIP. Automatically do post upgrade steps (vacuum analyze, database reindexing) Automatically do post upgrade steps (vacuum analyze, database reindexing) Oct 3, 2024
@justinclift justinclift merged commit 93c5fd8 into main Oct 3, 2024
26 checks passed
@justinclift justinclift deleted the automatic_reindex_v1 branch October 3, 2024 14:05
@justinclift
Copy link
Member Author

Uh oh. Right after merging this I've realised that hard coding the database name of postgres in the post upgrade script could break things for people that use (only) a different database name:

DB_LIST=$(echo 'SELECT datname FROM pg_catalog.pg_database WHERE datistemplate IS FALSE' | psql -1t --csv postgres)

I'll test locally if using one of template databases will work instead (pretty sure it will), then I'll create a PR to fix that.

@justinclift
Copy link
Member Author

Instead of using a template database, I've just passed through the name of the database being used by the docker entrypoint script and re-used that. Theoretically (!) it should be fine. 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants