Profiler: Show memory state on deferred allocation OOM #1797

manopapad · 2024-11-26T23:40:24Z

Separating out a side discussion from #1739.

Honestly it might not even be necessary to visualize the "history" of mapping-stage allocations, for the purposes of OOM debugging. Just a visualization of the deferred memory state at the point of OOM might be enough. That gives enough information to understand what valid deferred allocations are stopping the incoming allocation from succeeding. No need to even visualize the invalid instances.

Right, so perhaps that deserves a different visualization, perhaps one with matplotlib or something that generates a static visualization of all the instances and where they are in memory and how much memory they take up so you can see the holes and the fragmentation and what instances are currently valid (uncollectable)

In order to get a full picture of memory usage, we would need to visualize a number of different objects that take up space on a Realm memory, some of which are only visible internally to the Runtime:

valid PhysicalInstances
DeferredBuffers / DeferredValues
upper bound eager reservations (in the one-pool world)
Future instances
other?

We also need a way to let the mapper request this logging (today e.g. the DefaultMapper simply aborts on deferred allocation failure).

The text was updated successfully, but these errors were encountered:

lightsighter · 2024-11-27T02:30:12Z

@manopapad I think the real challenge of this is picking a visualization tool. I can dump all the data out of Legion to make that picture say with graphviz or matplotlib, but there are going to be hundreds, if not thousands, of instances and holes to report, so I think we need a more dynamic visualization tool for rendering that because the zoomed-in close representation is not going to be comprehensible to a human. They aren't going to see what they need to need to see in large and then be able to zoom in on things to look at. Do you have thoughts on how you'd want to do that? Alternatively we can do a text-based representation for now and just have a tool that reports the largest holes in sorted order and the total size of all holes.

manopapad · 2024-11-27T19:55:24Z

Yes, we can start with a text dump for now, and iterate on the actual visualization. Maybe @bryevdv has a good idea.

One more thing to note, in Legate we would also like to include additional information in this visualization, e.g. which user-level object corresponds to each field, so we would need to dump additional information on top of this.

lightsighter · 2024-11-27T21:38:21Z

So my plan was to add the following method to the mapper runtime:

void MapperRuntime::dump_memory_state(Memory m, const char *filename);

Any mapper could invoke that at any time to dump the memory state of a particular memory. You don't have to wait until you are OOM, but can do it as many times as you want throughout you run. I'm not promising that it will be fast as it will finish writing to the file and close the file before returning, but there's nothing stopping you from using it periodically.

What would you add to that function call to record what you want and then how would you write the tool to parse it?

manopapad · 2024-12-03T09:13:10Z

What would you add to that function call to record what you want and then how would you write the tool to parse it?

I don't think we would add extra information to the call directly, but would possibly include extra information in the output file. In particular, we'd want to record which Legate-level Stores correspond to which Legion fields, and record relevant information on the Stores that would help a user track values back to their code:

information on the Store(s) that an Instance is (partially) covering: (global) shape, type, transformation (e.g. "slice [1:,1:]")
what partition(s) an Instance corresponds to (e.g. "2x4 Tiling of the root Store in 300x150 tiles")
provenance of operation that created the Store (e.g. x = np.empty(...) creates the Store in user-land, even though the actual Instance allocation happens later)

lightsighter · 2024-12-16T06:10:46Z

I don't think we would add extra information to the call directly, but would possibly include extra information in the output file.

Well that's what I mean: you'll need to pass some kind of data to the call because Legion is going to be the thing writing the output file. You can't add anything yourself (at least not directly) to the output file. The file format will be black box because I want to reserve the right to change the format in the future as the internals of Legion change. There will be an agreed upon format between Legion and the tool that parses the file given a particular Legion commit (similar to how Legion Prof and Legion Spy work today). That means that if you want to pass data yourself in some form it's going to need to be passed through Legion.

manopapad · 2024-12-16T16:31:00Z

There will be an agreed upon format between Legion and the tool that parses the file given a particular Legion commit

I would much prefer to avoid this pattern, and just document (and version) the output format. This way multiple tools can read it, and it's not just the one tool that you provide. And you don't even have the responsibility to provide and maintain the one tool.

We should honestly do the same for the Legion profiler output (if we're not already), but that's a different discussion.

That means that if you want to pass data yourself in some form it's going to need to be passed through Legion.

We would either append data to the end of the same file, or dump our info into a separate file that we ship together with Legion's output.

elliottslaughter · 2024-12-16T17:28:01Z

I would much prefer to avoid this pattern, and just document (and version) the output format. This way multiple tools can read it, and it's not just the one tool that you provide. And you don't even have the responsibility to provide and maintain the one tool.

I'm on the fence about this. We don't have multiple tools that read the output right now, and for Legion Prof at least we don't particularly want it: there is a lot of business logic that goes into the legion_prof binary after we parse the logs and before generating the final output. We don't want to duplicate that, or even particularly to document it (at the level required to make alternative implementations). That's just a non-goal.

I think it's reasonable, to the extent that the user is expected to provide this directly, to document any formats and version them. That's fair. But I think the status quo for Legion Prof is still the best trade-off all around given the constraints and inherent complexity in the problem we are solving. This is not the type of situation where open standards help, because there is (again) so much business logic we need to deal with.

So overall I'm with Mike on this one in terms of how it would likely be implemented. We can always add modes or passes to Legion Prof to do whatever data manipulation you need to do to extract what you want from the logs themselves.

manopapad · 2024-12-16T21:29:34Z

@elliottslaughter I see your point regarding documenting the Legion Profiler format (I might disagree, but I need to educate myself more before I can express a meaningful opinion). But do you hold the same opinion for the (proposed) standalone information of "memory state dump"?

This is a new set of information, not necessarily expected to integrate with the existing Legion profiler, so I suggest we build this up using a documented format, rather than having a single tool that knows how to parse the information. Then we can use the well-documented format to build up a tool that shows information specific to Legate's semantics (see #1797 (comment)), rather than trying to cram everything into the one tool that Legion provides.

lightsighter · 2024-12-20T08:34:39Z

I'll separate the discussion on Legion Prof from the discussion on the new tool for out-of-memory conditions. First on the Legion Prof front, I pretty strongly agree with Elliott that we should have one tool for parsing and organizing the profiling data. I've made the case to Elliott (and I think I've mostly convinced him) that the runtime should just log stuff in the fastest way possible to minimize profiling overheads and it should be up to the parsing tool to put the pieces back together offline where overhead doesn't matter. Additionally there are lots of "connections" to be made between disparate logging statements. All this adds up to some really non-trivial code with semantics that are difficult to get right and change often. I don't think anyone should be replicating our effort to do that and there should be just one Legion Prof tool that people which people can write different backends for that can extract specific information that they want.

This new tool that we're discussing for OOM conditions is different, but also shares some similarities to Legion Prof. The one thing this new tool has going for it is that all the information will be logged at a single point in time (when we actually run out of memory). This means that there will be fewer "connections" and semantics that we need to worry about. However, there are still some gotchas that we need to worry about. For example, how do you name instances? Everything user visible in the mapping interface goes on Realm instance IDs, but those are recycled when instances are deleted so they are not unique. A mapper might be holding references to multiple PhysicalInstance handles each with the same Realm instance ID (of course only one of them will be backed by an actual instance at a time) so how do you know which one refers to the real instance (without testing)? Legion has unique identifiers for instances ("unique events" and "distributed IDs") but those are not exposed up through the mapping interface. So you can't just document Legion's dumping format and then do your own thing separately (e.g., dump your own file that you parse in parallel) because there will be no way to build associations between what Legion is describing and the way that your own log file is describing things. Another problem is that there are going to be lots of other things that have backing instances which are going to be hard to associate with other application level things. For example, futures and memory pools (from onepool) will also have backing instances that need to be considered and again there's very little ways for applications to describe the semantics of where those came from (other than maybe provenance strings that are associated with them). In the future, there will even be instances associated with Legion runtime memory usage (once I get Legion to start doing arena memory allocation) which won't have any relationship to anything the user has done. At a minimum I want to preserve the flexibility for Legion to make and use instances however it wants to satisfy the needs of applications, even if that is difficult to map back to things the application understands.

So where does that leave us? I feel like we still want a Legion-specific component of this tool that allows the runtime to dump whatever format it wants and then for others to be able to query it similar to Legion Prof. The question then is do clients pass semantic information into that tool and log through it's interface, or do they log their own data independently and then have some way to create associations with the things they know? If someone can describe a way to create such associations in a sane way, then I wouldn't mind supporting the independent pathway where you log separately, and then use the Legion tool to parse and extract what you want and do some kind of a "join" with your own data to make the needed connections and report what you want.

manopapad · 2024-12-20T20:58:33Z

I don't think anyone should be replicating our effort to do that and there should be just one Legion Prof tool that people which people can write different backends for that can extract specific information that they want.

Fair enough

A mapper might be holding references to multiple PhysicalInstance handles each with the same Realm instance ID (of course only one of them will be backed by an actual instance at a time) so how do you know which one refers to the real instance (without testing)?

This doesn't matter to the interface we're talking about here. You will be printing out based on the "internal" state of the runtime, so every instance ID you're printing is presumably "live". Making sure that the user isn't printing out garbage-collected instance names on its "side-logging" is not your problem.

I want to preserve the flexibility for Legion to make and use instances however it wants to satisfy the needs of applications, even if that is difficult to map back to things the application understands.

Sure, with you there. And I'm not asking that everything in the log format be mappable back to application-level things. If you just document what each instance is being used for, for example:

Legion has unique identifiers for instances ("unique events" and "distributed IDs")

Instance{type="unique_event", size=32, ptr=0x123}
Instance{type="distributed_id", size=32, ptr=0x456}

futures and memory pools (from onepool) will also have backing instances that need to be considered

Instance{type="future_buffer", size=2048, task_id=42, future_idx=0, ptr=0x678}
Instance{type="task_memory_pool", size=10000, task_id=42, ptr=0x988}

In the future, there will even be instances associated with Legion runtime memory usage

Instance{type="legion_internal_buffer", size=9999, ptr=0x786}

then we can "pick and choose" what we combine with user-level info, and dump the rest "as-is".

Another way to say this: If you can make the format self-describing (e.g. json with an associated schema), then just do that and avoid introducing another middleman tool. IMHO there's not the same requirement here as with the main profiling to be as lean as possible, so you can get away with this.

As I write this, I realize that some explanatory information, like the association from task ID to task name / provenance, will also need to be made available to the processing tool (whether that's the user's tool directly, or an intermediate Legion tool). You probably don't want to be dumping the task name on every entry that references that task ID.

lightsighter · 2024-12-21T10:07:23Z

Sure, with you there. And I'm not asking that everything in the log format be mappable back to application-level things. If you just document what each instance is being used for, for example:
then we can "pick and choose" what we combine with user-level info, and dump the rest "as-is".

How are you going to "combine" things if all the runtime names are in terms of internal things like distributed IDs/unique events and all your instances are names are in terms of Realm instance IDs (which are not unique)? You're not going to have any way to build the associations that you care about because you won't be able to map your instances to the things Legion is telling you about.

As I write this, I realize that some explanatory information, like the association from task ID to task name / provenance, will also need to be made available to the processing tool

I don't think I'm going to be dumping any task IDs in this tool as Legion doesn't track tasks/provenances for instances. If you want you can log those on the side.

You probably don't want to be dumping the task name on every entry that references that task ID.

We'll be smart about deduplicating strings.

lightsighter self-assigned this Nov 27, 2024

lightsighter added the enhancement label Nov 27, 2024

lightsighter mentioned this issue Nov 27, 2024

Profiler: Show "truly-in-use" memory usage line #1739

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiler: Show memory state on deferred allocation OOM #1797

Profiler: Show memory state on deferred allocation OOM #1797

manopapad commented Nov 26, 2024

lightsighter commented Nov 27, 2024

manopapad commented Nov 27, 2024

lightsighter commented Nov 27, 2024

manopapad commented Dec 3, 2024

lightsighter commented Dec 16, 2024

manopapad commented Dec 16, 2024

elliottslaughter commented Dec 16, 2024

manopapad commented Dec 16, 2024

lightsighter commented Dec 20, 2024

manopapad commented Dec 20, 2024

lightsighter commented Dec 21, 2024

Profiler: Show memory state on deferred allocation OOM #1797

Profiler: Show memory state on deferred allocation OOM #1797

Comments

manopapad commented Nov 26, 2024

lightsighter commented Nov 27, 2024

manopapad commented Nov 27, 2024

lightsighter commented Nov 27, 2024

manopapad commented Dec 3, 2024

lightsighter commented Dec 16, 2024

manopapad commented Dec 16, 2024

elliottslaughter commented Dec 16, 2024

manopapad commented Dec 16, 2024

lightsighter commented Dec 20, 2024

manopapad commented Dec 20, 2024

lightsighter commented Dec 21, 2024