diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 7cb9c74e..c2ff5061 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -50,11 +50,11 @@ repos: # - id: ruff - id: ruff-format - # - repo: https://github.com/pre-commit/pygrep-hooks - # rev: v1.10.0 - # hooks: - # - id: rst-backticks - # - id: rst-directive-colons - # - id: rst-inline-touching-normal - # - id: python-no-log-warn - # - id: python-check-mock-methods + - repo: https://github.com/pre-commit/pygrep-hooks + rev: v1.10.0 + hooks: + - id: rst-backticks + - id: rst-directive-colons + - id: rst-inline-touching-normal + - id: python-no-log-warn + - id: python-check-mock-methods diff --git a/docs/performance.rst b/docs/performance.rst index cdf01e80..24808fa2 100644 --- a/docs/performance.rst +++ b/docs/performance.rst @@ -24,16 +24,16 @@ which biases the modifications towards the end of the array, simulating a possib The tests are as follows: -1. A large fraction of changes is made to the dataset with each new version: The dataset initially has three arrays with 5000 rows, and 1000 positions are chosen at random and changed, and a small number (at most 10) rows are added or deleted with each new version. We will refer to this test as `test_large_fraction_changes_sparse`. +1. A large fraction of changes is made to the dataset with each new version: The dataset initially has three arrays with 5000 rows, and 1000 positions are chosen at random and changed, and a small number (at most 10) rows are added or deleted with each new version. We will refer to this test as ``test_large_fraction_changes_sparse``. -2. A small fraction of changes is made to the dataset with each new version: The dataset initially has three arrays with 5000, but only 10 positions are chosen at random and changed, and a small number (at most 10) rows are added or deleted with each new version. We will refer to this test as `test_small_fraction_changes_sparse`. +2. A small fraction of changes is made to the dataset with each new version: The dataset initially has three arrays with 5000, but only 10 positions are chosen at random and changed, and a small number (at most 10) rows are added or deleted with each new version. We will refer to this test as ``test_small_fraction_changes_sparse``. -3. A large fraction of changes is made to the dataset with each version, with the same three arrays of 5000 rows defined initially, 1000 positions are chosen at random and changed, but the size of the final array remains constant (no new rows are added and no rows are deleted). We will refer to this test as `test_large_fraction_constant_sparse`. +3. A large fraction of changes is made to the dataset with each version, with the same three arrays of 5000 rows defined initially, 1000 positions are chosen at random and changed, but the size of the final array remains constant (no new rows are added and no rows are deleted). We will refer to this test as ``test_large_fraction_constant_sparse``. 4. The number of modifications is dominated by the number of appended rows. This is divided into two tests: - - In the first case, the dataset contains three one-dimensional arrays with 1000 rows initially, and 1000 rows are added with each new version. A small number (at most 10) values are chosen at random, following the power law described above, and changed or deleted. We call this test `test_mostly_appends_sparse`. - - In the second case, the dataset contains one two-dimensional array with shape `(30, 30)` and two one-dimensional arrays acting as indices to the 2d array. In this case, rows are only appended in the first axis of the two-dimensional array, and a small number of positions (at most 10) is chosen at random and changed. We call this test `test_mostly_appends_dense`. + - In the first case, the dataset contains three one-dimensional arrays with 1000 rows initially, and 1000 rows are added with each new version. A small number (at most 10) values are chosen at random, following the power law described above, and changed or deleted. We call this test ``test_mostly_appends_sparse``. + - In the second case, the dataset contains one two-dimensional array with shape ``(30, 30)`` and two one-dimensional arrays acting as indices to the 2d array. In this case, rows are only appended in the first axis of the two-dimensional array, and a small number of positions (at most 10) is chosen at random and changed. We call this test ``test_mostly_appends_dense``. To test the performance of VersionedHDF5 files, we have chosen to compare a few different chunk sizes and compression algorithms. These values have been chosen heuristically, and optimal values depend on different use cases and nature of the datasets stored in the file. diff --git a/docs/performance_filesizes.rst b/docs/performance_filesizes.rst index a9abccb7..d49daec0 100644 --- a/docs/performance_filesizes.rst +++ b/docs/performance_filesizes.rst @@ -55,7 +55,7 @@ For the number of transactions, chunk sizes and compression algorithms, we tests (note that chunk sizes are taken as power of 2, so an exponent of :math:`12` means that the chunk size is :math:`2^12` or 4096.) If you want to generate your own tests, you can modify the appropriate constants -for the desired tests, and run them on the notebook included in the `analysis` directory of the VersionedHDF souces. **Please keep in mind that file sizes can become very large for large numbers of transactions (above 5000 +for the desired tests, and run them on the notebook included in the ``analysis`` directory of the VersionedHDF souces. **Please keep in mind that file sizes can become very large for large numbers of transactions (above 5000 transactions).** Analysis diff --git a/docs/quickstart.rst b/docs/quickstart.rst index ab9e84af..22266bbd 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -83,21 +83,21 @@ Other Options When a version is committed to a VersionedHDF5File, a timestamp is automatically added to it. The timestamp for each version can be retrieved via the version's -`attrs`:: +``attrs``:: >>> versioned_file['version1'].attrs['timestamp'] Since the HDF5 specification does not currently support writing -`datetime.datetime` or `numpy.datetime` objects to HDF5 files, these timestamps +``datetime.datetime`` or ``numpy.datetime`` objects to HDF5 files, these timestamps are stored as strings, using the following format:: - `"%Y-%m-%d %H:%M:%S.%f%z"` + ``"%Y-%m-%d %H:%M:%S.%f%z"`` The timestamps are registered in UTC. For more details on the format string -above, see the `datetime.datetime.strftime` function documentation. +above, see the ``datetime.datetime.strftime`` function documentation. The timestamp can also be used as an index to retrieve a chosen version from the -file. In this case, either a `datetime.datetime` or a `numpy.datetime64` object +file. In this case, either a ``datetime.datetime`` or a ``numpy.datetime64`` object must be used as a key. For example, if .. code:: @@ -110,11 +110,11 @@ then using >>> versioned_file[t] -returns the version with timestamp equal to `t` (converted to a string according +returns the version with timestamp equal to ``t`` (converted to a string according to the format mentioned above). It is also possible to assign a timestamp manually to a file. Again, this -requires using either a `datetime.datetime` or a `numpy.datetime64` object as +requires using either a ``datetime.datetime`` or a ``numpy.datetime64`` object as the timestamp:: >>> ts = datetime.datetime(2020, 6, 29, 23, 58, 21, 116470, tzinfo=datetime.timezone.utc) @@ -125,4 +125,4 @@ Now:: >>> versioned_file[ts] -returns the same as `versioned_file['version1']`. +returns the same as ``versioned_file['version1']``.