Skip to content

Commit

Permalink
docs(dev): database-related information
Browse files Browse the repository at this point in the history
Signed-off-by: Mike Fiedler <[email protected]>
  • Loading branch information
miketheman committed Jan 17, 2025
1 parent 5c8c415 commit 96a5a45
Show file tree
Hide file tree
Showing 2 changed files with 87 additions and 4 deletions.
75 changes: 71 additions & 4 deletions docs/dev/development/database-migrations.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
###################
Database migrations
===================
###################

Modifying database schemata will need database migrations (even for adding and
removing tables). To autogenerate migrations::
Expand All @@ -9,8 +10,9 @@ removing tables). To autogenerate migrations::
Verify your migration was generated by looking at the output from the command
above:

``Generating /opt/warehouse/src/warehouse/migrations/versions/390811c1c\
dbe_.py ... done``
.. code-block:: console
Generating /opt/warehouse/src/warehouse/migrations/versions/390811c1cdbe_.py ... done
Then migrate and test your migration::

Expand All @@ -24,6 +26,11 @@ This makes it more difficult to make breaking changes, since you must phase
them in over time. See :ref:`destructive-migrations` for tips on doing
migrations that involve column deletions or renames.

.. _migration-timeouts:

Migration Timeouts
==================

To help protect against an accidentally long running migration from taking down
PyPI, by default a migration will timeout if it is waiting more than 4s to
acquire a lock, or if any individual statement takes more than 5s.
Expand Down Expand Up @@ -56,7 +63,7 @@ environment like PyPI, there is related reading available at:
.. _destructive-migrations:

Destructive migrations
----------------------
======================

.. warning::

Expand Down Expand Up @@ -104,3 +111,63 @@ a data migration. To rename a column:

In total, this requires three separate migrations: one to add the new column,
one to backfill to it, and a third to remove the old column.

Creating Indexes
================

Indexes should be declared as part of SQLAlchemy Table definitions,
often under the ``__table_args__`` attribute.
Once declared, auto-generating a migration will create the index for alembic.

See more in index definition in
`SQLAlchemy documentation <https://docs.sqlalchemy.org/en/20/core/constraints.html#schema-indexes>`_.

Since index creation will often take longer than a few seconds
for tables that are large or active,
it is recommended to create indexes concurrently.

To create an index concurrently, adding ``postgresql_concurrently=True``
to the index definition is incompatible with our deployment migration process
as it runs in a transaction, and concurrent index creation requires a separate transaction.

Instead, manually update the migration to start a separate transaction.

After auto-generating the new migration, update the migration to create the index concurrently.
Here's an example of an migration for a definition of ``Index("tbl1_column1_idx", "column1")``
that was auto-generated, and manually updated to create the index concurrently:

.. code-block:: diff
def upgrade():
- op.create_index(
- "tbl1_column1_idx",
- "tbl1",
- ["column1"],
- unique=False,
- )
+ # CREATE INDEX CONCURRENTLY cannot happen inside a transaction. We'll close
+ # our transaction here and issue the statement.
+ op.get_bind().commit()
+ with op.get_context().autocommit_block():
+ op.create_index(
+ "tbl1_column1_idx",
+ "tbl1",
+ ["column1"],
+ unique=False,
+ if_not_exists=True
+ postgresql_concurrently=True,
+ )
The original ``op.create_index()`` call is indented under a context manager,
and the keyword args ``if_not_exists=True`` and ``postgresql_concurrently=True``
are added to the call.

Leave the generated ``downgrade()`` function as normal.

If the index creation is likely to continue to take longer than a few seconds,
and most indexes on existing tables in use are likely to take longer than a few seconds,
it is recommended to modify the migration to increase the statement timeout
as described in :ref:`migration-timeouts`.

Another option is to share the SQL statement to create the index concurrently
on the Pull Request, and have a maintainer run the statement manually.
16 changes: 16 additions & 0 deletions docs/dev/development/development-database.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,19 @@ Now commit the result::
git checkout -b update_dev_db_dump && \
git add dev/example.sql.xz && \
git commit -m "Update development database dump"


Connecting to the Development Database
======================================

To connect to the development database, use the following command::

make dbshell

This will spawn a one-time container that connects to the database,
and opens a shell for you to interact with it.

If you want to use another tool from your host machine,
you can connect to the database using a connection string, like this::

pgcli postgresql://postgres@localhost:5433/warehouse

0 comments on commit 96a5a45

Please sign in to comment.