-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect timing information by default? #273
Comments
This looks very useful for expert users. However, I don't think this information is useful for a user that isn't also developing -- unless we at some point get to problems where specific choices of processor count are faster than others because of different splits. I don't think we are there yet because the distributed memory MPI is only for r and z and only one set of magic numbers are permitted.
I would go for printing this optionally.
I would not do this by default, and I would also be concerned that the timer would slow down the code.
The present output looks acceptable. If data is removed, perhaps work noting that anything <1% is hidden. |
I agree with Michael. I do really like the idea of having this as an option (though I suppose the default should be that it is turned off). I sometimes worry about whether I've written an inefficient function or caused a memory leak, and this would be a very quick way of checking that compared to using |
There is the point of view that at the moment all users are developers, so we could print by default for now, and make it optional at some point in the future. If you are a developer and regularly changing code, the timings are a useful sanity check that you didn't do something awful! |
That depends on how difficult it is to make it optional. If you really want this as a default feature, it would be nice to have an option to turn off this output so that any future changes are limited to input option defaults. |
That would be an easy and sensible compromise. I'd probably stick an extra flag in the |
Not sure where best to record this question, so reopening this issue: Why does the timer present strange information regarding total runtime in the two examples below? Specifically, why is the total measured field seemingly filled with numbers modified from 100% by an undetermined prefactor? Everything in the details
|
Don't know, don't think I've ever seen that... |
This example was made with |
I cannot reproduce this problem (on my Linux laptop) using the current master branch
|
Have you tried running the same experiments with multiple cores? Can you explain why we output data
when the actual wallclock time is closer to
? |
I think I might know the cause of this now. |
I have another question about the memory usage or allocations reporting. Can this information be used to estimate the maximum memory required for a job? It looks like this measures something else -- if arrays are allocated and deallocated often, does this measure only allocations, without counting that only some arrays are persistent and remain allocated between timesteps? |
I think it just measures the allocations, and ignores deallocations (deallocations would be difficult to measure because they would only happen during garbage collection...). So nothing to do with maximum memory usage. Even current memory usage is surprisingly hard to measure - I tried at one point and found some options (which I've now forgotten), but as far as I remember none of them seemed very accurate/reliable. I think a very long time ago, for a different project, I looked for command line tools that would measure memory usage and found some reasonable things there - the only notes I can find are for looking at memory usage of Python code, but they use OS utilities, so there are probably command line tools to measure the memory used by some process. |
Proposal: collect and print some high-level timing data by default. TimerOutputs.jl (which we've used a little bit, but not routinely) has some nice features, like being able to annotate function definitions to gather timing data each time that function is called.
Output from a run might look something like this:
To get that output I've filtered out some function calls that took a very short amount of time, to avoid cluttering up the screen.
Questions:
With the current setup, the extra cost of timing is not noticable. I did an
@benchmark
on a run that takes about 13.5s, and the I couldn't see any difference difference with/without timing, on a sample of 18 runs of each.There is an option for 'debug timers' that use a macro that will be removed by the compiler if 'debug timing' is not explicitly activated (so 'debug timers' have zero overhead for production runs). I'll add some of these to (optionally) time some MPI calls and LU-solves.
If we add this as a feature, I'll probably try to write out the timing information for each MPI process into the output file in case we want to look at it later. That will require a bit of work to gather all the timing info and write it in a sensible format to HDF5.
The motivation for this was that in preparation for trying to parallelise the kinetic electron solver a bit better, I wanted to be able to profile the code. Using Julia's
@profile
with MPI was giving me strange results - the run with profiling was a lot slower than a normal run, with an unexpectedly large amount of time spent in MPI calls. I think I've used@profile
before and not had this problem, so not sure what's going on, but in thinking about a workaround it seemed like collecting some timing data might be generically useful.The text was updated successfully, but these errors were encountered: