Skip to content

Releases: man-group/ArcticDB

v4.4.0

05 Apr 13:24
Compare
Choose a tag to compare

🚀 Features

  • Prevent writing empty types by default (gives compatibility with v1.6.2 readers) in #1440
  • Improved resilience to external (out of order) replication in #1355
  • Support modifying library options, and introduce enterprise library options in #1457

You can now modify library options on an existing Arctic library:

from arcticdb.options import LibraryOptions

ac: Arctic = ...
lib = ac.create_library("lib")
ac.modify_library_option(lib, ModifiableLibraryOption.ROWS_PER_SEGMENT, int(1e6))

See arcticdb/options.py for a description of the modifiable options.

🐛 Fixes

  • Bugfix 902: Cannot filter on nans and nones in string and float columns #1276
  • Bugfix: Empty type in #1227
  • Bugfix 1334: optimise version ref key access #1345
  • Fix version map cache invalidation policies in #1351
  • Fix empty column default type in #1378
  • Bugfix #1388: Correctly check whether we are in ec2 in #1415
  • Bugfix 1423: Raise a meaningful error message when trying to use QueryBuilder with sparse data in #1435
  • Bug Fix Windows: Remove lmdb files when delete_library is called in #1437
  • Bugfix 1256: reject parallel appends to unsorted data in #1442
  • Bugfix 1209: Consistently return metadata from write, append, update, write_metadata, and batch versions thereof in #1444
  • Bugfix/935/match pandas behviour when aggregating columns with nans in #1450
  • LZ4 decoding empty type: move the segment buffer forward by the compressed data size for empty types in #1463
  • Bugfix 1268: swap out xxhash for grouping in #1416
  • Bugfix 1260: allow broader range of numeric type promotion with dynamic schema #1426

💾 Storage Exception Normalization

We have made storage-related exceptions uniform across different backend storage platforms, despite the fact that the underlying behaviour varies.

  • #447 LMDB Exceptions Normalization in #1285
  • #447 Memory Storage Exception Normalization in #1297
  • Adds a MockS3Client which can simulate s3 failures in #1281
  • #447 S3 Storage Exceptions Normalization in #1304
  • Add a MockAzureClient which can simulate azure failures in #1331
  • #447 Azure Storage Exceptions Normalization in #1344
  • #447 Exception normalization for RocksDB in #1360
  • Refactor: Move mongo client errors handling into mongo_storage.cpp before normalization in #1383
  • #447 Add a MockMongoClient which can simulate mongo failures in #1395
  • #447 Mongo Exceptions Normalization in #1411
  • LMDB Exception Normalization with mock client in #1414
Uncategorized
  • Prevent writing empty types by default (gives compatibility with v1.6.2 readers) in #1440
  • Add a way to enable/disable silencing of errors when deleting a library in #1271
  • Feature flag to use WinInet client not WinHttp in #1284
  • Abstract an S3ClientWrapper out of details-inl.hpp in #1274
  • GitHub Workflows: Make can_merge run for all files to allow any docs changes to be mergeable. in #1292
  • Fix benchmarks in #1286
  • Reduce the hashes that we use to benchmark against in #1303
  • Refactor 1278: Column data dense forward iterator in #1301
  • Add contains to Arctic class, to support lib in arctic in #1309
  • Update BSL table for v4.3.0 in #1282
  • Utility to analyze the size of various key types in a library in #1291
  • Do not compile wheel build on EC2 in #1318
  • Docs: For top level imports use arcticdb.object instead of using full path to object in #1323
  • conda-build: Adaptations for folly in #1320
  • Align docstring with the behaviour in #1243
  • Added 142 new tests for empty/missing operations in #1319
  • Abstract AzureClientWrapper out of azure_storage.cpp in #1315
  • Stop using ec2 runners in the conda+linux workflow in #1302
  • Adds python tests which simulate s3 storage exceptions in #1330
  • S3 local delete failure raising meaningful error in #1329
  • Fix dynamic strings append to fixed strings issue in #1346
  • Minor improvement on analysis flow in #1290
  • Build time improvements in #1263
  • build: Disable compilers' extensions in #1335
  • maint: Fully specify fmt::format_to in #1333
  • Fixed pd_delete_replace + added single tests in #1342
  • Use type-deduced functor for all column iterating functions in #1347
  • Disable ec2 runners on PR in #1357
  • Use 14.39 toolset in #1359
  • Removed pytest dependency from arcticdb in #1350
  • build: Use C++20 in #1332
  • mark test_symbol_list_parallel_stress_with_delete as flaky in #1368
  • Docs: Increase CSS max-wdith and build docs from a branch in #1363
  • Print error msg in ExponentialBackoff exception in #1365
  • Disable missing key warnings when expected in #1379
  • Remove pin on civetweb in #1380
  • Multiple segments within the same block: storage and library refactoring in #1307
  • Not allowing snapshotting tombstoned versions in #1280
  • Clarify Intel/AMD build support in #1389
  • Fix debug formatting in #1397
  • maint: Replace robin_hood with unordered_dense in #1390
  • Test benchmarking improvements in #1326
  • maint: Remove dependency on some elements of folly in #1370
  • Allow different testing dependency version in pipeline in #1410
  • Add metadata extraction functions in library_tool in #1375
  • Add a global timeout for pytests in #1381
  • Set upper bound for supported protobuf version in #1421
  • 1 year and 1k stars readme banner in #1425
  • read_batch set include_deleted to false by default when reading a version in #1419
  • build: Update to fmt 10 in #1427
  • Make changes for prometheus metrics in #1418
  • maint: Replace use of folly getCurrentThreadId with STL in #1417
  • maint: Ignore the diff of #1263 in #1340
  • Roll back vcpkg version to fix failing abseil build in #1436
  • maint: Remove use of folly/portability/PThread.h in #1447
  • maint: Remove use of folly/system/ThreadName.h in #1446
  • Better error messaging around pickling in #1451
  • Update analysis_workflow.yml in #1455
  • Adding update, append and delete asv benchmarks in #1434
  • Support generators for metadata vectors again in #1456
  • Remove the brotli dep in #1458

The wheels are on PyPI. Below are for debugging:

v4.0.4

14 Mar 10:44
Compare
Choose a tag to compare

🚀 Features

  • Better backend storage retryable error handling and error message printout #1365

🐛 Fixes

  • Stop allowing snapshotting tombstoned versions #1280
  • Add a way to enable/disable silencing of errors when deleting a library #1271

The wheels are on Pypi. Below are for debugging:

v4.3.1

09 Feb 09:29
Compare
Choose a tag to compare

🐛 Fixes

  • Fix regression in round-tripping empty type for dynamic schema (#1313)

The wheels are on PyPI. Below are for debugging:

v4.3.0

07 Feb 13:47
Compare
Choose a tag to compare

Version 4.3.0 was pulled from PyPi and Conda Forge due to a regression. We no longer provide builds for 4.3.0.
Regression is fixed in 4.3.1 release. Please use 4.3.1 instead.

🚀 Features

  • Exposes existing regex filter in lib.list_symbols (#1123)
>>> from arcticdb import Arctic
>>> import pandas as pd
>>> ac.create_library("test")
>>> lib = ac["test"]
>>> lib.write("sym0", pd.DataFrame())
>>> lib.write("sym1", pd.DataFrame())
>>> lib.list_symbols()
['sym0', 'sym1']
>>> lib.list_symbols(regex="1$")
['sym1']
  • Introduce jitter in symbol list compaction threshold (#1174)
  • Sorting speed improvements in SegmentInMemory (#1181)
  • Reduce log level from warn to debug for "Failed to find segment for key" message where appropriate (#1130)
  • Speed up writes by parellising aggregator_set_data over data segments (#1065)
  • Support sortedness checks and maintenance with parallel writes and appends (#1251)
  • #1014 Introduce storage fixtures to easily test ArcticDB against various storage backends. See arcticdb.storage_fixtures package. (#1054)

🐛 Fixes

  • Release the symbol list's storage lock if it has existed for longer than its TTL (#1134)
  • Ensure that the version chain is always updated atomically (#1104)
  • Return empty pd.DataFrame with MultiIndex if originally provided (#1126)
  • conda-build: Explicitly depend on openssl and libcurl (#1244)
  • Reintroduce attrs as a runtime dependency (#1272)
  • Speedup reading wide dataframes that have no empty columns (#1225)
  • Bugfix 1046: Prevent appending/updating numeric columns with non-identical types with static schema (#1205)
  • Bugfix 1173: Correctly apply sortedness checks when calling update with date_range argument (#1238)
  • Fix non-deterministic hashing in Linux conda builds (#1261)
  • Improve date range returned by get info for unordered and range indexed dataframes (#1241)
  • Bugfix 1248 and 1249: compact_incomplete reject incomplete segments that overlap each other, or existing segments in the case of append (#1255)
  • Detailed error in case of S3's libcurl network failure (#1265)
Uncategorized
  • [Aggregation tests] Replace non_zero_numeric_type_strategies with numeric_type_strategies (#968)
  • Fixes reuse_name for azure storage #1061 (#1115)
  • small getting-started-docs tweaks (#1103)
  • Improve fixture reliability (#1116)
  • maint: Define arcticdb::proto::logger in log.hpp (#1117)
  • maint: Remove unneeded includes (#1113)
  • [Column] Move some definitions to cpp file (#1100)
  • maint: Move implementations in memory_segment_impl.hpp to memory_segment_impl.cpp (#1092)
  • Update git blame file (#1118)
  • Flaky test hypothesis mean agg (#496) (#1125)
  • Use same region for S3 and EC2 to avoid data transfer costs (#1128)
  • build: Remove attrs from the dependencies (#1135)
  • Only build on pull request events (#1127)
  • More fixture robustness improvements (#1132)
  • Remove releasing docs as they are now in GitHub wiki (#1136)
  • Update README.md (#1141)
  • Remove test parellism, and speed up test bottleneck (#1143)
  • Fix support for shared/unique S3 prefixes (#1140)
  • maint: Remove headers in types.hpp (#1121)
  • Skip flaky pytests which check log messages (#1161)
  • Update README.md (#1156)
  • Refactor: Move DataError method implementaitons into cpp (#1155)
  • Update .git-blame-ignore-revs for DataError implementation move (#1165)
  • Add MSVC 2022 preset. Tweak MSVC CMake settings. (#1133)
  • Build-time improvments: allocator.hpp, log.hpp, buffer.hpp (#1152)
  • Fix publish.yml workflow (#1167)
  • README - put third party tools in alphabetical order (#1172)
  • Fix persistent tests (#1147)
  • Introduce sorting and merging google benchmarks (#1138)
  • Skip array type tests due to occsional segfaults (#1187)
  • build: Remove some adherence to folly (#1144)
  • Add equity options notebook + data (#1178)
  • maint: Ignore some references (#1190)
  • Added equity options notebook to index (#1193)
  • Use vcpkg for gbench (#1189)
  • Forward port internal PR #1082 (#1180)
  • Bugfix 1191: Propagate storage failures in version map batch methods to calling code (#1194)
  • Link against python explicitly in order to make MSVS builds work (#1192)
  • Final version of equity opts notebook (#1196)
  • Bugfix 1182: Unskip test that is no longer flaky (#1197)
  • Docs that StorageFailureSimulator is not used in all stores (#1203)
  • Clean and reorganize OffsetString and StringPool (#1137)
  • build: Do not depend on protobuf-lite (#1212)
  • docs: Fix documentation links (#1038)
  • Fix recurse_segment forward declaration to match the signature of its implementation (#1217)
  • Update git blame file for OffsetString and StringPool implementation move (#1211)
  • Add frequently used items at the top level of arcticdb (#1219)
  • Switch from arcticdb to adb in the demo Notebooks (#1228)
  • Add a way to handle non-string values for index names (#1170)
  • Pass unmodified argument by const& to FieldCollection::add_field (#1234)
  • Switch from arcticdb to adb in python docstrings (#1236)
  • Remove obsolete test log level environment variable (#1231)
  • Update incorrect docs for validate_index (#1233)
  • Bugfix 1207: Use pandas.Timestamp.max - 1 day in test_read_ts. Remove pointless snapshot. Improve error message when index key reading fails. (#1235)
  • Bugfix invalid library name (#1206)
  • Enhancement/1253/skip temporary allocation when decoding dynamic schema columns (#1259)
  • Expose headers for consumers via arcticdb_core_static (#1257)
  • Update WarnVersionTypeNotHandled::warn() warning message (#1273)
  • Update README correcting spelling (#1275)
  • build: Adapt protobuf compilation (#1199)
  • Enable skipped test_partial_write_hashed (#1215)

The wheels are on PyPI. Below are for debugging:

v4.2.1

15 Dec 14:23
Compare
Choose a tag to compare

This is a patch release to version 4.2.0 which fixes Issue #1157 regarding the defragment_symbol_data method.

🐛 Fixes

  • Defragmenting a symbol no longer invalidates previous versions (#1163)

The wheels are on Pypi. Below are for debugging:

v4.2.0

12 Dec 14:08
Compare
Choose a tag to compare

🚀 Features

  • Remove python deps that are no longer needed (#1005)
  • New row_range argument on read and ReadRequest (#864)
>>> from arcticdb import Arctic
>>> import pandas as pd
>>> df = pd.DataFrame({"col1": np.arange(10), "col2": np.arange(100, 110)}, index=np.arange(10))
>>> ac = Arctic("lmdb://test")
>>> lib = ac.get_library("test_lib", create_if_missing=True)
>>> lib.write("test_symbol", df)
>>> lib.read("test_symbol", row_range=(3,7)).data
   col1  col2
3     3   103
4     4   104
5     5   105
6     6   106

🐛 Fixes

  • Symbol list refactor (#796)
  • Fixed aggregation on sparse grouping columns (#1068).
    Depending on timestamps being accurate in the symbol list has proved to be troublesome. Instead, we should use the most recent version id known to a client as an indication of the client's view of the world at the time as symbol list entry is written. That way, we can identify and correct symbol list entries that refer to conflicting writes.

Notebooks

  • Added AWS blockchain notebook (#1040)
  • Added AWS blockchain to docs index (#1043)
  • Add Snapshot + Equity Notebooks (#1071)
Uncategorized
  • 744 extend real storage tests to run with large lifelike data and all api methods (#989)
  • Update BSL table for 4.1 (#1023)
  • Centralise the pytest marks (#1024)
  • Document the S3 backends that we have tested against and "un-beta" LMDB on Windows (#1016)
  • Sparse aggregation (#1007)
  • Docs versioning (#1008)
  • set-default after deploy so that 'latest' alias can be created first (#1029)
  • build: Remove old Cython configuration and adaptation (#1028)
  • Docs workflow fixes. (#1030)
  • build: Replace emilib with robin_hood (#995)
  • hot-fix: Use previous build of libmongocxx to avoid missing symbols (#1050)
  • Remove unused C++ Wangle dep (#1047)
  • Bugfix 1055: Unflake test_read_batch_time_stamp (#1058)
  • Change tag format in docs build (#1062)
  • Snapshot notebook typos (#1088)
  • Update vcpkg dep (#1091)
  • Enhancement/732/processing unit ecs model (#960)
  • Add xfail to flaky tests (#1087)
  • Add a mechanism to extend storage transaction lifetime to lifetime of… (#975)
  • Tweak release docs (#1019)
  • Change tmpdir to tmp_path (#1093)
  • Remove unnecessary xfails (#1097)
  • Add checks to see whether we should be validating version entries during compaction (#1099)
  • 941 self hosted runners for ci (#997)
  • Make dependency of pymongo optional in running (#1027)
  • Add preliminary change for slowdown error test (#1064)
  • Add mutex to ensure only single thread at pybind->c++ layer (#973)
  • Issue #1017 Only warn if the "base" LMDB env is opened twice (#1022)
  • Fix run-cmake action (#1034)
  • Add a fallback to free GH runners, when there is a problem with the self-hosted ones (#1063)
  • fix: Empty column handling improvements (#1049)
  • Pin all our Github actions deps (#1090)

The wheels are on Pypi. Below are for debugging:

v4.2.0rc0

13 Dec 18:01
Compare
Choose a tag to compare
v4.2.0rc0 Pre-release
Pre-release

🚀 Features

  • Remove useless python deps (#1005)
  • feat: Allow row_range to be treated as a clause (#864)
>>> from arcticdb import Arctic
>>> import pandas as pd
>>> df = pd.DataFrame({"col1": np.arange(10), "col2": np.arange(100, 110)}, index=np.arange(10))
>>> ac = Arctic("lmdb://test")
>>> lib = ac.get_library("test_lib", create_if_missing=True)
>>> lib.write("test_symbol", df)
>>> lib.read("test_symbol", row_range=(3,7)).data
   col1  col2
3     3   103
4     4   104
5     5   105
6     6   106

🐛 Fixes

  • Symbol list refactor (#796)
  • Fixed aggregation on sparse grouping columns (#1068).
    Depending on timestamps being accurate in the symbol list has proved to be troublesome. Instead, we should use the most recent version id known to a client as an indication of the client's view of the world at the time as symbol list entry is written. That way, we can identify and correct symbol list entries that refer to conflicting writes.

Notebooks

  • Added AWS blockchain notebook (#1040)
  • Added AWS blockchain to docs index (#1043)
  • Add Snapshot + Equity Notebooks (#1071)
Uncategorized
  • 744 extend real storage tests to run with large lifelike data and all api methods (#989)
  • Update BSL table for 4.1 (#1023)
  • Centralise the pytest marks (#1024)
  • Document the S3 backends that we have tested against and "un-beta" LMDB on Windows (#1016)
  • Sparse aggregation (#1007)
  • Docs versioning (#1008)
  • set-default after deploy so that 'latest' alias can be created first (#1029)
  • build: Remove old Cython configuration and adaptation (#1028)
  • Docs workflow fixes. (#1030)
  • build: Replace emilib with robin_hood (#995)
  • hot-fix: Use previous build of libmongocxx to avoid missing symbols (#1050)
  • Remove unused C++ Wangle dep (#1047)
  • Bugfix 1055: Unflake test_read_batch_time_stamp (#1058)
  • Change tag format in docs build (#1062)
  • Snapshot notebook typos (#1088)
  • Update vcpkg dep (#1091)
  • Enhancement/732/processing unit ecs model (#960)
  • Add xfail to flaky tests (#1087)
  • Add a mechanism to extend storage transaction lifetime to lifetime of… (#975)
  • Tweak release docs (#1019)
  • Change tmpdir to tmp_path (#1093)
  • Remove unnecessary xfails (#1097)
  • Add checks to see whether we should be validating version entries during compaction (#1099)
  • 941 self hosted runners for ci (#997)
  • Make dependency of pymongo optional in running (#1027)
  • Add preliminary change for slowdown error test (#1064)
  • Add mutex to ensure only single thread at pybind->c++ layer (#973)
  • Issue #1017 Only warn if the "base" LMDB env is opened twice (#1022)
  • Fix run-cmake action (#1034)
  • Add a fallback to free GH runners, when there is a problem with the self-hosted ones (#1063)
  • fix: Empty column handling improvements (#1049)
  • Pin all our Github actions deps (#1090)

The wheels are on Pypi. Below are for debugging:

v4.0.3

23 Nov 16:17
Compare
Choose a tag to compare

This is a patch release to version 4.0 that backports some changes from master.

🚀 Features

  • Add preliminary change for slowdown error test (#1070)

🐛 Fixes

  • Empty column handling improvements (#1079)
  • Use previous build of libmongocxx to avoid missing symbols (#1083)
  • Remove docs publish step so we don't overwrite the docs (#1025)
  • Remove Black and pre-commit setup (#1085)

The wheels are on Pypi. Below are for debugging:

v4.1.0

01 Nov 17:54
Compare
Choose a tag to compare

⭐ New APIs

In-memory Backend

You can now open ArcticDB with an in-memory backend,

from arcticdb import Arctic
ac = Arctic("mem://")
ac.create_library("test")
assert ac.list_libraries() == ["test"]
# Create libraries as normal. Each `Arctic` object manages its own in-memory storage, so the lifetime
# of your libraries and data is the same as the lifetime of the `Arctic` instance that owns them.

ac2 = Arctic("mem://")
assert ac2.list_libraries() == []  # ac2 is backed by different memory to ac so the "test" library is not returned

Query Builder

We now support a new "count" aggregator. You can invoke it with:

q = QueryBuilder()
q = q.groupby("grouping_column").agg({"a": "count"})

⚠️ Breaking and API Changes

LMDB Backend

This release includes a fix for issue #850 : Ensure that LMDB libraries are readable after being moved to a different location. The fix means that LMDB libraries created with arcticdb>=4.1.0 will not be readable by older clients and those clients must update.

This is because the fix stops us from serializing the LMDB library path (instead we always prefer the one in the Arctic URI), but older clients still expect to see the LMDB path serialized. Older clients reading a new LMDB library will in fact ignore the path passed in to the Arctic constructor and instead read the current working directory.

When you exceed the LMDB map size, we now raise a custom exception arcticdb.exceptions.LmdbMapFullError that explains how to re-open LMDB with a larger map size, whereas previously we raised a less helpful arcticdb.exceptions.InternalException.

🚀 Features

  • Support count aggregator with groupby (#948)
  • Warning for LMDB when two Arctic instances open over the same storage (#1000)
  • Small LMDB Fixes: 2GiB map size for Windows, Validation before delete (#918)
  • Custom exception when LMDB map is full (#1006)
  • Memory backed API (#860)
  • Allow ampersand in symbol names (#900)
  • Add querybuilder notebook demo into the docs (#875)
  • Extended testing against real cloud storages (#789)
  • ASV Benchmarking published here (#962 #970)
  • Preparatory work for RocksDB backend (#945)

🐛 Fixes

  • Fix LZ4 decoding error issues that occurred with a mix of empty and non-empty columns (#964)
  • Performance improvement for read_batch when called with many symbols (4-5x improvement) (#870)
  • Convert semimap to switch which has resolved some segmentation fault issues (#912)
  • Upgrade cUrl to 8.4.0 (#977)
  • Cache open libraries in the LibraryManager (#990)
  • Enhancement 914: Improve error messaging when string column encoding fails due to the presence of a non-string object (#933)
  • Fix storage lock mutex implementation (#966)
  • Make azure sdk stick to winhttp if possible (#851)
  • Extra update checks (#539)
  • maint: Indicate the non-support of PyArrow (#882)
  • Add pymongo to the list of install dependencies (#891)
  • conda-build: Run python tests for macos-latest (#873)
  • fix: Change comparison in test_hypothesis_{sum,mean}_agg (#931)
Uncategorized
  • Fixes IFNDR issue due to mismatching inline/non-inline functions depending on the translation unit (fixes #943) (#949)
  • docs: Post 4.0.0 release documentation (#967)
  • Remove Black and pre-commit setup (#972)
  • Improve string writing performance (#969)
  • Skip docs in build.yml (#984)
  • Move api docs from sphinx to mkdocs (#897)
  • Document support for Mac on intel (#982)
  • Update PAT for publish to master (#988)
  • Fix ccache's non-existence exiting the workflow (#996)
  • Refactor get_descriptions lib methods to be more consistent (#994)
  • Fail docs build on sphix failure (#883)
  • docs: Better document publishing release candidates on conda-forge (#901)
  • conda-build: Unpin some dependencies (#888)
  • docs: Reword conda-forge section mentioning libevent-2.1.10 (#905)
  • Update readme to reflect supported/beta status of Windows PyPi/MacOS conda-forge builds (#907)
  • Skip flaky Mac test (#924)
  • docs: Improve section "Building using mamba and conda-forge" (#917)
  • fix interface in ManualClockVersionStore getter (#925)
  • docs: Add high-level documentation of abstractions (#628)
  • Update storage compatibility (#916)
  • Update copyright notice (#939)
  • Use new get_library argument in ArcticDB_demo_lmdb.ipynb (#930)

The wheels are on Pypi. Below are for debugging:

v4.0.2

02 Nov 12:51
Compare
Choose a tag to compare

This is a patch release to version 4.0 that backports some changes from master.

🐛 Fixes

Bugfix for a deadlock issue when using Python multithreading and batch_read (#1021)


The wheels are on Pypi. Below are for debugging: