Mechanism to limit global memory usage #213

mkitti · 2025-01-16T23:50:59Z

If there a method to limit global memory usage?

Say I want to do something simple such as copying one array to another. Implementing this as follows can consume a lot of memory if the arrays are large?

import tensorstore as ts

input_arr = ts.open({ ... })
output_arr = ts.open({ ... })
output_arr.write(input_array).result()

I usually end up having to manually loop over the arrays chunk by chunk to force it to copy in parts. I'm aware of the Context API and concurrency limits, but this does not seem to have effectively constrain operations such as the one above.

Is there some way to limit the total memory used to avoid the Out Of Memory killer?

The text was updated successfully, but these errors were encountered:

mkitti · 2025-01-17T00:11:06Z

Attemping to answer my own question, it seems that applying resource limits on the input_array is particuarly useful.

input_array = ts.open({
    "driver": ...
    "context": {
        "cache_pool": {
            "total_bytes_limit": 100_000_000_000
        },
        "data_copy_concurrency": {
            "limit": 16
        }
    }
}

laramiel · 2025-01-17T02:36:55Z

There is no such semaphore built-in at the moment, though I'm working towards some internal updates that will allow such a feature in the future. You can do part of this with concurrency limits, but it will not provide any guarantees.

For an example of how you might do this yourself (in C++), look here:

tensorstore/tensorstore/internal/benchmark/multi_read_benchmark.cc

Line 143 in 5f59990

struct ReadContinuation {

Basically that benchmark uses a counter to track the expected in-flight bytes, and starts many full tensorstore reads as will fit into the limit, using the future completion to trigger the next read.

Orbax also limits memory use when loading in a very similar way. They use a python context manager with a byte_limiter type to act as a semaphore. See, for example: https://github.com/google/orbax/blob/22695625af8a7c22140c6c5270253e1d0aa7aa64/checkpoint/orbax/checkpoint/_src/serialization/serialization.py#L435

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mechanism to limit global memory usage #213

Mechanism to limit global memory usage #213

mkitti commented Jan 16, 2025

mkitti commented Jan 17, 2025

laramiel commented Jan 17, 2025

Mechanism to limit global memory usage #213

Mechanism to limit global memory usage #213

Comments

mkitti commented Jan 16, 2025

mkitti commented Jan 17, 2025

laramiel commented Jan 17, 2025