Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
NPUW: Hotfix - delay the original weight memory deallocation (#27886)
### Details: - Since recently (after #27767 ), NPUW drops links to the original weights to avoid memory duplication; - The drop happens after the LazyTensor evaluation which supposedly creates a new tensor; - It is not always the case, as sometimes tensors stay in the original format/precision and all that needs to be done is to copy the buffer into L0 memory as is; - Previously, the `.detach()` only destroyed the associated `Constant` node that held the weight buffer, but in case of memory-mapped weights, the buffer was kept alive until the reference to `ov::Model` was destroyed in the end of the `ov::npuw::CompiledModel` constructors; - There was found a case when, before reaching the NPUW partitioning & transformation pipeline, the model weights were first altered - from `BF16` to `FP16` precision. In this case, the relevant `Constant` nodes in the IR referred to their own weight buffers, not shared via mmap, so in this case detaching a lazy tensor led to these buffers prematurely destroyed, causing a segfault when the "evaluated" tensor (in this case, just a reference to the original one) was copied to L0. - Moving the `.detach()` to the very end of `eval_and_alloc` fixes this problem. ### Tickets: - *ticket-id*
- Loading branch information