Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose some functions under bruker.api #121

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

sem-geologist
Copy link
Contributor

@sem-geologist sem-geologist commented May 19, 2023

Description of the change

This PR aims to modernise and extend the bruker._api.py:

  • expose middle layer functions and classes under bruker.api
  • review, update and extend the docstrings, in particular of these functions and class'es which or which child are
    going to be exposed through .api
  • review the code for possible cleanup, scope-separation and streamlining for public exposition in .api

Progress of the PR

  • expose SfsReader
  • expose xml_to_spectrum
  • expose xml_to_image
  • update docstring (if appropriate),
  • update user guide (if appropriate),
  • add an changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
  • Check formatting changelog entry in the readthedocs doc build of this PR (link in github checks)
  • add tests,
  • ready for review.

???
are xml_to_spectrum and xml_to_image good naming for these functions. functions require particular etree nodes - maybe et_node_to_spectrum and et_node_to_image would be better naming?

Minimal example of the bug fix or the new feature

from sciio.bruker import api as b_api

b_api.SFSReader('somefile.pan')  # *.pan are particle analysis files using same container as bcf 

@sem-geologist sem-geologist marked this pull request as draft May 19, 2023 14:51
@codecov
Copy link

codecov bot commented May 19, 2023

Codecov Report

Patch coverage: 96.62% and project coverage change: +0.13 🎉

Comparison is base (5f9e746) 85.16% compared to head (a57fd72) 85.29%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #121      +/-   ##
==========================================
+ Coverage   85.16%   85.29%   +0.13%     
==========================================
  Files          73       74       +1     
  Lines        9030     9042      +12     
  Branches     1932     2045     +113     
==========================================
+ Hits         7690     7712      +22     
+ Misses        873      870       -3     
+ Partials      467      460       -7     
Impacted Files Coverage Δ
rsciio/bruker/_api.py 88.74% <96.59%> (+0.95%) ⬆️
rsciio/bruker/api.py 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

rsciio/bruker/_api.py Outdated Show resolved Hide resolved
rsciio/bruker/_api.py Outdated Show resolved Hide resolved
@sem-geologist
Copy link
Contributor Author

sem-geologist commented May 29, 2023

@ericpre , @jlaehne I have some question. As I am doing pretty extensive review of bruker _api.py, so that it could be more streamlined, I came again over idea how to handle images. Bruker xml image node can have plains (that is called "plains" within the XML), it is basically different video channels, which register signal at same time/simultaniously when beam is rastered over sample ROI (In some sense it is very similar to spectroscopy to be honest). So basically most of metadata is the same (actually in XML it is physically shared) except description string of the every channel. Hitherto in this case all metadata was being duplicated, and every plain was returned as separate independent set with its data and metadata. But... is it the right and most efficient way? would it not be better way to stack those plains, alongside axis called as "channels"? can axes have labeled scale (that is instead of numbers it would be channel description)? Does this makes any sense at all? I mean for me it does a lot, as those plains are different representations of exactly same material interaction or result of beam: i.e. say as example 4 channels: BF, DF, DF4, HAADF. They all represent different interacted electrons with matter.
Even if taking 2 channel (most common setup with SEM having Bruker EDS) of BSE and SE images, those represents different energy electrons, and in some sense combining it forms kind of spectral image.

Such change probably is going break some established workflows. But I think it would be very right thing to do.

@jlaehne
Copy link
Member

jlaehne commented May 29, 2023

We had a similar topic brought up for LumiSpy (CL spectral image + corresponding SE image) by @jordiferrero, which was never implemented so far: LumiSpy/lumispy#73
Indeed, it would be nice to have an upstream solution for that in HyperSpy!

Concerning the 'labeled' axes for stacked data, @CSSFrancis was working on a PR.

@CSSFrancis
Copy link
Member

So adding labeled axes wouldn't be terribly hard in hyperspy. I have done most of the work in hyperspy/hyperspy#3031 Right now there is a little bit of hesitation with this change but I think that there isn't really anything stopping this from happening. This might be something worth voting on or having a larger discussion about how these signals are handled.

Handling multiple signals is a little bit tricky and becomes more tricky if they have different numbers of pixels, but even that can be handled by hyperspy.

There are a couple of things:

  1. If all of the signals have the same dimensionality and same size --> pass as labeled axes and create a stacked array
  2. If all of the signals have the same dimensionality and different --> pass as labeled axes and create a stacked ragged array
  3. If one of the signals has a higher dimensionality (i.e. HAADF and 4-D STEM) pass one as navigator and one as array

For 3 I also think that in the case where you have multiple different signals you should be able to pass all of them. In that case you could have multiple different navigators. For example with a 4-D STEM you could have brightfield and darkfield images.

@sem-geologist
Copy link
Contributor Author

sem-geologist commented May 29, 2023

thanks @CSSFrancis , Your 3 point division makes it pretty clear. So "planes" used in bruker XML image actually not only fulfills point 1. of same dimentiality and resolution/size, but are even more closely related, as it have exactly same column conditions, and is pixel-to-pixel generated with exactly same beam simultaneously - thus even more - it should be stacked. Hopefully Your PR hyperspy/hyperspy#3031 will get accepted. I probably will stack images with unitless axes as for now.

Hyperspy is not even my main/only aim with this streamlining attempt. I want this to stay same useful for Hyperspy 2.0 (actually be more useful for extensions, with EBSD, XRF... in mind, which I plan to address in some following PR's), but also to pave easier road for my own software (HussariX).

@CSSFrancis
Copy link
Member

Hyperspy is not even my main/only aim with this streamlining attempt. I want this to stay same useful for Hyperspy 2.0 (actually be more useful for extensions, with EBSD, XRF... in mind, which I plan to address in some following PR's), but also to pave easier road for my own software (HussariX).

In these types of situations "what is good for the goose is good for the gander" or both software packages will probably benefit from a consistent approach to these low level problems :) It also helps with interoperability which can only be a good thing.

To be honest the case #2 above is quite difficult from the perspective of hyperspy. There is nothing that says hyperspy cannot have a signal of signals similar to how numpy can have a ragged array of arrays. I considered something like that here but I am not sure if there is a consistent way to approach that. For now there hasn't really been a real need for it so I haven't really paid it much attention.

@ericpre
Copy link
Member

ericpre commented May 30, 2023

Just for me to make sure that I don't misunderstand, @sem-geologist what you are asking is to have a signal, which contains BSE and SE images stacked together and another one which contains the EBSD or EDS data?
Could it be easily stack after loading the data by the library itself, at a stage, where it is known how the data needs to be structure/handle. I am not sure that this is the responsibility of RosettaSciIO to define this?

Currently, in hyperspy, these are considered as treated as different dataset and they expected to be handle at the workflow level (script, notebook, library etc.). Maybe what RosettaSciIO could do is to assign some dataset to a specify category: "main", "auxiliary" (data acquired simultaneously but not "main", or already processed data, i.e. map from a spectrum image), "survey", etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants