Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(dev): database-related information #17448

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 71 additions & 4 deletions docs/dev/development/database-migrations.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
###################
Database migrations
===================
###################

Modifying database schemata will need database migrations (even for adding and
removing tables). To autogenerate migrations::
Expand All @@ -9,8 +10,9 @@ removing tables). To autogenerate migrations::
Verify your migration was generated by looking at the output from the command
above:

``Generating /opt/warehouse/src/warehouse/migrations/versions/390811c1c\
dbe_.py ... done``
.. code-block:: console
Generating /opt/warehouse/src/warehouse/migrations/versions/390811c1cdbe_.py ... done
Then migrate and test your migration::

Expand All @@ -24,6 +26,11 @@ This makes it more difficult to make breaking changes, since you must phase
them in over time. See :ref:`destructive-migrations` for tips on doing
migrations that involve column deletions or renames.

.. _migration-timeouts:

Migration Timeouts
==================

To help protect against an accidentally long running migration from taking down
PyPI, by default a migration will timeout if it is waiting more than 4s to
acquire a lock, or if any individual statement takes more than 5s.
Expand Down Expand Up @@ -56,7 +63,7 @@ environment like PyPI, there is related reading available at:
.. _destructive-migrations:

Destructive migrations
----------------------
======================

.. warning::

Expand Down Expand Up @@ -104,3 +111,63 @@ a data migration. To rename a column:

In total, this requires three separate migrations: one to add the new column,
one to backfill to it, and a third to remove the old column.

Creating Indexes
================

Indexes should be declared as part of SQLAlchemy Table definitions,
often under the ``__table_args__`` attribute.
Once declared, auto-generating a migration will create the index for alembic.

See more in index definition in
`SQLAlchemy documentation <https://docs.sqlalchemy.org/en/20/core/constraints.html#schema-indexes>`_.

Since index creation will often take longer than a few seconds
for tables that are large or active,
it is recommended to create indexes concurrently.

To create an index concurrently, adding ``postgresql_concurrently=True``
to the index definition is incompatible with our deployment migration process
as it runs in a transaction, and concurrent index creation requires a separate transaction.

Instead, manually update the migration to start a separate transaction.

After auto-generating the new migration, update the migration to create the index concurrently.
Here's an example of an migration for a definition of ``Index("tbl1_column1_idx", "column1")``
that was auto-generated, and manually updated to create the index concurrently:

.. code-block:: diff
def upgrade():
- op.create_index(
- "tbl1_column1_idx",
- "tbl1",
- ["column1"],
- unique=False,
- )
+ # CREATE INDEX CONCURRENTLY cannot happen inside a transaction. We'll close
+ # our transaction here and issue the statement.
+ op.get_bind().commit()
+ with op.get_context().autocommit_block():
+ op.create_index(
+ "tbl1_column1_idx",
+ "tbl1",
+ ["column1"],
+ unique=False,
+ if_not_exists=True
+ postgresql_concurrently=True,
+ )
The original ``op.create_index()`` call is indented under a context manager,
and the keyword args ``if_not_exists=True`` and ``postgresql_concurrently=True``
are added to the call.

Leave the generated ``downgrade()`` function as normal.

If the index creation is likely to continue to take longer than a few seconds,
and most indexes on existing tables in use are likely to take longer than a few seconds,
it is recommended to modify the migration to increase the statement timeout
as described in :ref:`migration-timeouts`.

Another option is to share the SQL statement to create the index concurrently
on the Pull Request, and have a maintainer run the statement manually.
16 changes: 16 additions & 0 deletions docs/dev/development/development-database.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,19 @@ Now commit the result::
git checkout -b update_dev_db_dump && \
git add dev/example.sql.xz && \
git commit -m "Update development database dump"


Connecting to the Development Database
======================================

To connect to the development database, use the following command::

make dbshell

This will spawn a one-time container that connects to the database,
and opens a shell for you to interact with it.

If you want to use another tool from your host machine,
you can connect to the database using a connection string, like this::

pgcli postgresql://postgres@localhost:5433/warehouse