-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using casacore measures for computing parallactic angles acquires/drops the GIL excessively #265
Comments
This comment has been minimized.
This comment has been minimized.
Also, I still believe from our discussions at the workshop that the use of the garbage collector in Quartical leads to poor performance. Has this changed? |
I think that I had come to an inaccurate conclusion due to the dataset in question containing several fields and correspondingly varied xds sizes. While I will make an effort to confirm that the garbage collector isn't to blame, I am pretty sure something else is afoot. Does the beam automatically compute the parallactic angle or does it rely on the user enabling it? |
The above code does not compute parallactic angles, but it does consume parallactice angles computed by a transformer which computes them per unique time, feed and antenna. |
Both the Beam and Feed Rotation get their parallactic angles from a common source: |
This line: |
I do have a decent test now where I have all the necessary beams etc, so I will make sure that the GC isn't involved. |
Yes, if there's an issue affecting feed rotation, then it will affect the beam because they both depend on transformation of parallactic angles |
I can confirm that the GC is not the culprit. I have written a tiny hack for Edit: Though how robust it is I do not yet know. But very cool that it works from an objmode call inside the fused RIME. :-) |
Could you please push your changes up to a branch, in whatever exploratory state it's in? |
Will do! |
My hack currently lives in https://github.com/ska-sa/codex-africanus/tree/improve_parangles. |
This definitely doesn't seem to work with multiple workers yet. Does work from threads. So there will be some pickling required somewhere. |
No problem, just exists to demonstrate the idea |
Actually, the problem is weirder. So the beams seem to have a problem with serialisation:
That seems to be independent of the parallactic angle problem which, with the beam disabled, is:
Which, incidentally, is the most awesome sounding error I have ever encountered. |
Hmmmm, this means that |
I have found a workaround for this - it requires making dask's worker processes non-daemonic when initialising the Doing so requires using I am a little wary of this as a solution though, as it will make interrupting runs a little more likely to leave zombies (if my intuition is correct). |
Hmmmm, useful to know. Off the top of my head it's probably desirable for Dask Worker's to be daemons. I feel that there may be some way around this. |
Out of interest, does using |
I gave it a try and got the same message - I think this is something we cannot really avoid. If the dask workers are created as daemonic I don't know if there will be a way to create processes from processes. For what it is worth, the child processes should be very short lived. The other idea you suggested may work - having a ProcessPool associated with the RIME factory. I am not exactly sure how to hack that together quickly though and even then you may run into issues if the pool is created by a worker (although I suspect you are intending to create it at graph construction time, much like the thread pool associated with the reads/writes). |
OK. It seems that if nanny processes are turned off then things might work, although that might interfere with other parts of the dask machinery: dask/distributed#2142
This is more a design decision within the RimeFactory itself and I don't think it'll escape the issue of daemon's not allowing child process.
Yep, I think I'll need to devote some thought here, but essentially, RimeFactory's (like TableProxies) are embeddable in the graph and pickleable. Think of the FITSFile in Quartical we discussed. While the internal file object is heavyweight and impossible to pickle, the arguments to create the FITSFile are not class FITSFile:
def __init__(self, filename: str):
# lightweight, easy to pickle
self.filename = filename
# heavyweight, difficult to pickle
self.file = open(file)
def __reduce__(self):
# but we don't need self.file to represent the object for transmission
return (FITSFile, (self.filename,)) What this means is that we can embed very complex objects in the graph, as long as the state required to reconstruct them via the pickle protocol is simple and itself pickleable. This is the case for the It's also the case for I find that it's a very powerful paradigm for use with dask and it would be good to devote more time to explaining this in further detail. In conjunction with |
My thought on this based on what we have seen in DDFacet pools are that you really want to spawn them as early in the process as possible and use them throughout. Child processes will inherit the memory pages which will be copy on write by the OS. This has lead to poor memory performance for us in the past so the best thing to do is to get a pool going outside of dask here perhaps? Another option would perhaps be IPC to support the non-shared memory distributed case (see https://docs.python.org/3/library/multiprocessing.html#multiprocessing-listeners-clients) with dask graph nodes requesting arrays from a server which can have its own pool? |
(I think this is especially true if you are going to configure your system to enforce hugepages which typically substantially improves performance with the kind of large array sizes we work with typically) |
I think the strategy here will be to use the
I think the approach you're describing is appropriate to situations where large visibility buffers need to be shared between processes. The data that must be shared here (unique times, feeds and antennas) are far smaller so the IPC costs should be negligble relative to the compute required to produce visibilities. The Pool will indeed live "outside" dask though. |
Going a bit further, the patterns I'm describing in #265 (comment) are very useful for spawning resource objects that "live outside dask" but are used by it. |
Ah ok yes a spawned would go some ways to prevent the memory issues here. |
Description
When utilising the beam inside the fused RIME (in QuartiCal), performance takes a massive hit. @ulricharmel is seeing 60+ hour runtimes with the beam compared to around 8 without. Is this the same issue as the parallactic angle computation i.e. is the GIL to blame?
The text was updated successfully, but these errors were encountered: