Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended source response #223

Merged
merged 17 commits into from
Oct 25, 2024
Merged

Conversation

hiyoneda
Copy link
Contributor

@hiyoneda hiyoneda commented Jul 18, 2024

I created the first version of the extended source response related to #216 .

  • A function (get_integrated_spectral_model) was moved from threeml/custom_functions.py to response/functions.py.
    • Basically, I copied these codes from PointSourceResponse.py. So, this function can also be used in PointSourceResponse.py @israelmcmc
  • A new function, get_integrated_extended_model was added in threeml/custom_functions.py, which calculates integrated flux over the sky from a given astromodel.ExtendedSource
  • Added ExtendedSourceResponse. Currently, the functionalities are limited. I assume that we will add more functions in the future
    • The overflow/underflow bins are not loaded when an hdf5 file has them. It saves the memory usage significantly if the energy axis is one bin.
  • threeml/COSILike was updated to use ExtendedSourceResponse .

Now, the updates are minimal. Any feedback is very welcome.

@saurabhmittal23 Please use this PR and test your extended model fitting. I am happy if you could check the 511 keV notebook first and see if the results are the same as before.

@israelmcmc
Copy link
Collaborator

I'm closing and reopening this PR to kick codecov. There was a recent change in GitHub that broke it, but I think it's fixed now.

@israelmcmc israelmcmc closed this Jul 22, 2024
@israelmcmc israelmcmc reopened this Jul 22, 2024
Copy link

codecov bot commented Jul 22, 2024

Codecov Report

Attention: Patch coverage is 83.14607% with 15 lines in your changes missing coverage. Please review.

Project coverage is 74.02%. Comparing base (d02846a) to head (5bdaf39).
Report is 36 commits behind head on develop.

Files with missing lines Patch % Lines
cosipy/response/functions.py 76.19% 10 Missing ⚠️
cosipy/response/ExtendedSourceResponse.py 90.90% 3 Missing ⚠️
cosipy/threeml/COSILike.py 33.33% 2 Missing ⚠️
Files with missing lines Coverage Δ
cosipy/image_deconvolution/allskyimage.py 100.00% <100.00%> (ø)
cosipy/response/PointSourceResponse.py 100.00% <100.00%> (+40.54%) ⬆️
cosipy/response/__init__.py 100.00% <100.00%> (ø)
cosipy/source_injector/source_injector.py 100.00% <ø> (ø)
cosipy/threeml/custom_functions.py 33.66% <ø> (-4.92%) ⬇️
cosipy/threeml/COSILike.py 20.88% <33.33%> (+0.76%) ⬆️
cosipy/response/ExtendedSourceResponse.py 90.90% <90.90%> (ø)
cosipy/response/functions.py 76.19% <76.19%> (ø)

@israelmcmc
Copy link
Collaborator

Thanks for the changes @hiyoneda. I haven't gone through all of the code, but I noticed that in ExtendedSourceModel you are tracking _contents, _axes, and _units manually. You can instead inherit from Histogram, as PointSourceModel does, and that way you'll get all features, which is also less error-prone.

@hiyoneda
Copy link
Contributor Author

hiyoneda commented Jul 23, 2024

I needed to do it on purpose. If histpy.Histogrom is used, the overflow/underflow bins are automatically created. For the gamma-ray line case (when the number of energy bins is one), the loaded matrix size gets 9x larger than the necessary size. (actually, the extended model fitting notebook could not be performed on Saurabh's environment due to this issue). So the response components are manually loaded, and the overflow/underflow bins are ignored in this update.

@hiyoneda
Copy link
Contributor Author

Another future concern is when the response is very large (~O(100-1000) GB), for example, nside = 64.
In this case, we cannot open the response at once even if the underflow/overflow bins are not loaded, and we would need some functionalities for lazy reading (or the response components can be generated each time they are referred).

If these features can be implemented in Histogram, I agree that we should use it as a base class. But if not, I am worrying that it rather reduces flexibility in the future.

@israelmcmc
Copy link
Collaborator

I see. Yeah, the underflow/overflow bins are an issue in that case, and there's no easy way to change that in histpy. It's certainly possible, but since the assumption of those bins permeates the code, it will take a while to implement.

Partial reading was part of the reason for using HDF5, and FullDetectorResponse only loads one direction at a time. Indeed it might not be enough. On a tangent note, if we move to an unbinned analysis, we can load from the HDF5 only the nearest bins for each event, one event at a time ¯_(ツ)_/¯

@hiyoneda
Copy link
Contributor Author

Personally, I want histpy to keep the underflow/overflow bins, because I like the concept of histpy very much, which is useful for manipulating histograms in data analysis. It is also useful that histograms have the underflow/overflow bins in many cases. I think that the fundamental issue here would be that the response is a matrix or tensor rather than a histogram.

Partial reading was part of the reason for using HDF5, and FullDetectorResponse only loads one direction at a time.

Actually, I was wondering if we could keep the instance of h5py instead of its contents in a future update. FullDetectorResponse will be a good reference for how to update ExtendedSourceResponse in the future.

On a tangent note, if we move to an unbinned analysis, we can load from the HDF5 only the nearest bins for each event, one event at a time

It sounds interesting. Considering that it seems always useful to keep the HDF5 in the response classes, should any response class have three attributes, the HDF5 data, axes, unit?

@israelmcmc
Copy link
Collaborator

@hiyoneda I didn't want to give away all the slicing, projecting, plotting, etc. capabilities that have shown useful for debugging and concise coding, so I spent some time adding the capability to histpy to disable under/overflow tracking (axis by axis). I haven't merged it yet, but you can checked it out here: https://gitlab.com/burstcube/histpy/-/merge_requests/1 . Please let me know what you think!

@hiyoneda
Copy link
Contributor Author

hiyoneda commented Aug 6, 2024

@israelmcmc Thanks for making the change on histpy. I tried the new code from the no_overflow branch, and I like it so much. I made a histogram with similar dimensions of gamma-ray line response and confirmed that the over/underflow bins are not used with the new option. It saves memory and storage significantly, as expected. Once it is merged, I can update the extended source response using this new functionality.

@hiyoneda
Copy link
Contributor Author

hiyoneda commented Oct 3, 2024

I modified the class as a child class of Histogram. Now it can save the memory space.
@israelmcmc I want to ask two questions:

  • Has the no_overflow branch of histpy already been merged to the main?
  • Currently, the image axis in the response is always "NuLambda," but sometimes it is confusing because 'NuLambda' sounds like it in the local coordinates even when it is a galactic response. Can we change the name of the image axis to match the actual coordinates, e.g., lb, radec etc.?

@israelmcmc
Copy link
Collaborator

Thanks, @hiyoneda:

  • I haven't merged it yet, but I think I'll do it soon. I've used it for a while now with no issues. I'll try to make a new release tomorrow
  • I agree it would be better to label the axis based on the coordinates it represents, and leave NuLamda only for SC coordinates. Can you please open an issue about it?

@israelmcmc
Copy link
Collaborator

@hirokiyoneda I made a new histpy release (1.1.4) that includes the optional under/overflow

@hiyoneda
Copy link
Contributor Author

  • I agree it would be better to label the axis based on the coordinates it represents, and leave NuLamda only for SC coordinates. Can you please open an issue about it?

opened #250

@hiyoneda hiyoneda requested a review from Yong2Sheng October 22, 2024 23:47
Copy link
Contributor

@Yong2Sheng Yong2Sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a few general questions. Everything looks good to me, I will run TS map and spectral fitting notebook to see if everything goes through.

unit = unit,
track_overflow = track_overflow)

del hist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After del hist, the data referenced by hist will still remain in the RAM (or it will cleaned when the python does the reference count)
Using gc.collect() will remove the data right away. Since the response is quiet large, so might help to remove from RAM right away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you reconstruct the Histogram instead of using the opened one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the code corresponding to the first comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you reconstruct the Histogram instead of using the opened one?

This is because the opened one is an instance of histpy.Histogram, not ExtendedSourceResponse. So, I needed to instantiate ExtendedSourceResponse at a new line. @israelmcmc Is there any nice way to directly open a hdf5 file as an instance of ExtendedSourceResponse?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hiyoneda I think that you can do cls.open(filename, name) instead of super().open(filename, name). The output would be an ExtendedSourceResponse object.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. In principle this should work, but there's an issue with histpy that prevents it at the moment. I think this can be merged as is, and we can come back to it one I fix the issue in histpy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the code corresponding to the first comment.

Thanks @hirokiyoneda !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. In principle this should work, but there's an issue with histpy that prevents it at the moment. I think this can be merged as is, and we can come back to it one I fix the issue in histpy

Sounds good!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the corresponding issue, for the record https://gitlab.com/burstcube/histpy/-/issues/23

@Yong2Sheng
Copy link
Contributor

I ran through the spectral and TS map fitting notebooks without issues. They return the same fitted values. I will merge.

Copy link
Contributor

@Yong2Sheng Yong2Sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge.

Copy link
Contributor

@Yong2Sheng Yong2Sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge.

@hiyoneda hiyoneda merged commit 9eb70c1 into cositools:develop Oct 25, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants