Skip to content

Commit

Permalink
Resolve pygrep-hooks issues in documentation (#311)
Browse files Browse the repository at this point in the history
  • Loading branch information
peytondmurray authored Feb 13, 2024
1 parent 469a2b6 commit 40a7052
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 22 deletions.
16 changes: 8 additions & 8 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,11 @@ repos:
# - id: ruff
- id: ruff-format

# - repo: https://github.com/pre-commit/pygrep-hooks
# rev: v1.10.0
# hooks:
# - id: rst-backticks
# - id: rst-directive-colons
# - id: rst-inline-touching-normal
# - id: python-no-log-warn
# - id: python-check-mock-methods
- repo: https://github.com/pre-commit/pygrep-hooks
rev: v1.10.0
hooks:
- id: rst-backticks
- id: rst-directive-colons
- id: rst-inline-touching-normal
- id: python-no-log-warn
- id: python-check-mock-methods
10 changes: 5 additions & 5 deletions docs/performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,16 @@ which biases the modifications towards the end of the array, simulating a possib

The tests are as follows:

1. A large fraction of changes is made to the dataset with each new version: The dataset initially has three arrays with 5000 rows, and 1000 positions are chosen at random and changed, and a small number (at most 10) rows are added or deleted with each new version. We will refer to this test as `test_large_fraction_changes_sparse`.
1. A large fraction of changes is made to the dataset with each new version: The dataset initially has three arrays with 5000 rows, and 1000 positions are chosen at random and changed, and a small number (at most 10) rows are added or deleted with each new version. We will refer to this test as ``test_large_fraction_changes_sparse``.

2. A small fraction of changes is made to the dataset with each new version: The dataset initially has three arrays with 5000, but only 10 positions are chosen at random and changed, and a small number (at most 10) rows are added or deleted with each new version. We will refer to this test as `test_small_fraction_changes_sparse`.
2. A small fraction of changes is made to the dataset with each new version: The dataset initially has three arrays with 5000, but only 10 positions are chosen at random and changed, and a small number (at most 10) rows are added or deleted with each new version. We will refer to this test as ``test_small_fraction_changes_sparse``.

3. A large fraction of changes is made to the dataset with each version, with the same three arrays of 5000 rows defined initially, 1000 positions are chosen at random and changed, but the size of the final array remains constant (no new rows are added and no rows are deleted). We will refer to this test as `test_large_fraction_constant_sparse`.
3. A large fraction of changes is made to the dataset with each version, with the same three arrays of 5000 rows defined initially, 1000 positions are chosen at random and changed, but the size of the final array remains constant (no new rows are added and no rows are deleted). We will refer to this test as ``test_large_fraction_constant_sparse``.

4. The number of modifications is dominated by the number of appended rows. This is divided into two tests:

- In the first case, the dataset contains three one-dimensional arrays with 1000 rows initially, and 1000 rows are added with each new version. A small number (at most 10) values are chosen at random, following the power law described above, and changed or deleted. We call this test `test_mostly_appends_sparse`.
- In the second case, the dataset contains one two-dimensional array with shape `(30, 30)` and two one-dimensional arrays acting as indices to the 2d array. In this case, rows are only appended in the first axis of the two-dimensional array, and a small number of positions (at most 10) is chosen at random and changed. We call this test `test_mostly_appends_dense`.
- In the first case, the dataset contains three one-dimensional arrays with 1000 rows initially, and 1000 rows are added with each new version. A small number (at most 10) values are chosen at random, following the power law described above, and changed or deleted. We call this test ``test_mostly_appends_sparse``.
- In the second case, the dataset contains one two-dimensional array with shape ``(30, 30)`` and two one-dimensional arrays acting as indices to the 2d array. In this case, rows are only appended in the first axis of the two-dimensional array, and a small number of positions (at most 10) is chosen at random and changed. We call this test ``test_mostly_appends_dense``.

To test the performance of VersionedHDF5 files, we have chosen to compare a few different chunk sizes and compression algorithms. These values have been chosen heuristically, and optimal values depend on different use cases and nature of the datasets stored in the file.

Expand Down
2 changes: 1 addition & 1 deletion docs/performance_filesizes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ For the number of transactions, chunk sizes and compression algorithms, we tests
(note that chunk sizes are taken as power of 2, so an exponent of :math:`12` means that the chunk size is :math:`2^12` or 4096.)

If you want to generate your own tests, you can modify the appropriate constants
for the desired tests, and run them on the notebook included in the `analysis` directory of the VersionedHDF souces. **Please keep in mind that file sizes can become very large for large numbers of transactions (above 5000
for the desired tests, and run them on the notebook included in the ``analysis`` directory of the VersionedHDF souces. **Please keep in mind that file sizes can become very large for large numbers of transactions (above 5000
transactions).**

Analysis
Expand Down
16 changes: 8 additions & 8 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,21 +83,21 @@ Other Options

When a version is committed to a VersionedHDF5File, a timestamp is automatically
added to it. The timestamp for each version can be retrieved via the version's
`attrs`::
``attrs``::

>>> versioned_file['version1'].attrs['timestamp']

Since the HDF5 specification does not currently support writing
`datetime.datetime` or `numpy.datetime` objects to HDF5 files, these timestamps
``datetime.datetime`` or ``numpy.datetime`` objects to HDF5 files, these timestamps
are stored as strings, using the following format::

`"%Y-%m-%d %H:%M:%S.%f%z"`
``"%Y-%m-%d %H:%M:%S.%f%z"``

The timestamps are registered in UTC. For more details on the format string
above, see the `datetime.datetime.strftime` function documentation.
above, see the ``datetime.datetime.strftime`` function documentation.

The timestamp can also be used as an index to retrieve a chosen version from the
file. In this case, either a `datetime.datetime` or a `numpy.datetime64` object
file. In this case, either a ``datetime.datetime`` or a ``numpy.datetime64`` object
must be used as a key. For example, if

.. code::
Expand All @@ -110,11 +110,11 @@ then using
>>> versioned_file[t]
returns the version with timestamp equal to `t` (converted to a string according
returns the version with timestamp equal to ``t`` (converted to a string according
to the format mentioned above).

It is also possible to assign a timestamp manually to a file. Again, this
requires using either a `datetime.datetime` or a `numpy.datetime64` object as
requires using either a ``datetime.datetime`` or a ``numpy.datetime64`` object as
the timestamp::

>>> ts = datetime.datetime(2020, 6, 29, 23, 58, 21, 116470, tzinfo=datetime.timezone.utc)
Expand All @@ -125,4 +125,4 @@ Now::

>>> versioned_file[ts]

returns the same as `versioned_file['version1']`.
returns the same as ``versioned_file['version1']``.

0 comments on commit 40a7052

Please sign in to comment.