Skip to content

Commit

Permalink
NPUW: Hotfix - delay the original weight memory deallocation (#27886)
Browse files Browse the repository at this point in the history
### Details:
- Since recently (after #27767 ), NPUW drops links to the original
weights to avoid memory duplication;
- The drop happens after the LazyTensor evaluation which supposedly
creates a new tensor;
- It is not always the case, as sometimes tensors stay in the original
format/precision and all that needs to be done is to copy the buffer
into L0 memory as is;
- Previously, the `.detach()` only destroyed the associated `Constant`
node that held the weight buffer, but in case of memory-mapped weights,
the buffer was kept alive until the reference to `ov::Model` was
destroyed in the end of the `ov::npuw::CompiledModel` constructors;
- There was found a case when, before reaching the NPUW partitioning &
transformation pipeline, the model weights were first altered - from
`BF16` to `FP16` precision. In this case, the relevant `Constant` nodes
in the IR referred to their own weight buffers, not shared via mmap, so
in this case detaching a lazy tensor led to these buffers prematurely
destroyed, causing a segfault when the "evaluated" tensor (in this case,
just a reference to the original one) was copied to L0.
- Moving the `.detach()` to the very end of `eval_and_alloc` fixes this
problem.

### Tickets:
 - *ticket-id*
  • Loading branch information
dmatveev authored Dec 3, 2024
1 parent 395340e commit f0925bc
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions src/plugins/intel_npu/src/plugin/npuw/weights_bank.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,6 @@ ov::Tensor Bank::eval_and_alloc(const LazyTensor& tensor,
return transformed_tensor;
}

// Non-CPU case: detach the evaluated LazyTensor from its memory
const_cast<LazyTensor&>(tensor).detach();

ov::SoPtr<ov::ITensor> remote_tensor;
ov::Tensor allocated_tensor;

Expand All @@ -124,6 +121,12 @@ ov::Tensor Bank::eval_and_alloc(const LazyTensor& tensor,
guard.unlock(); // Unlock the guard, map update is done - copy can continue in parallel

transformed_tensor.copy_to(allocated_tensor);

// Detach the evaluated LazyTensor from its memory here - when it is 100%
// not needed anymore (transformations, if any, and copies are done)
// Note: this is the non-CPU path!
const_cast<LazyTensor&>(tensor).detach();

return allocated_tensor;
}

Expand Down

0 comments on commit f0925bc

Please sign in to comment.