Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MontSection and FrameSet files are incompatible with single df model #13

Open
alisterburt opened this issue Apr 12, 2023 · 6 comments
Open

Comments

@alisterburt
Copy link
Collaborator

Support was recently added for montage section and frameset mdoc files in #10 and #11

In an effort to continue with the 'single dataframe' model I added extra fields to the section data but this means we have two incompatible data types in the dataframe and a lot of nans

It seems like ZValue is used across all types of mdoc files when there are sets of images.
For FrameSet/MontSection files would it be better to return a dict[str, pd.DataFrame] with one df stored under ZValue and one stored at FrameSet/MontSection?

Given that there only ever seems to be one entry for FrameSet/MontSection maybe that table is better as a dict than a dataframe?

Thoughts @jojoelfe @m-albert?

@jojoelfe
Copy link
Contributor

At least for the FrameSet with multiple ZValue, my preference would be to return one DataFrame with rows only for the ZValue sections, but to include the columns from the FrameSet. So this:

ZValue FrameSet TiltAngle StagePosition StageZ Magnification MagIndex Intensity ExposureDose DoseRate SpotSize Defocus TargetDefocus ImageShift RotationAngle ExposureTime Binning UsingCDS CameraIndex DividedBy2 LowDoseConSet PriorRecordDose SubFramePath NumSubFrames DateTime FilterSlitAndLoss Voltage titles
0 nan 0.0 32.986 (-142.831, -457.473) -13.7203 33000.0 26.0 0.11378 291.302 2.15073 9.0 -7.933 -4.5 (0.0, 0.0) 178.414 40.9854 0.5 False 1.0 False 4.0 nan X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif 243.0 08-Oct-21 07:47:29 (20.0, 0.0) 300.0 []
1 0.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 0.0 nan 300.0 []
2 1.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 7.66184 nan 300.0 []
3 2.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 15.3237 nan 300.0 []
4 3.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 22.9855 nan 300.0 []
5 4.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 30.6473 nan 300.0 []
6 5.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 38.3092 nan 300.0 []
7 6.0 nan 33.0 nan nan nan nan 7.55522 nan nan nan nan nan nan nan nan nan 45.971 nan 300.0 []
8 7.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 53.5262 nan 300.0 []
9 8.0 nan 33.0 nan nan nan nan 7.54812 nan nan nan nan nan nan nan nan nan 61.1881 nan 300.0 []
10 9.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 68.7362 nan 300.0 []
11 10.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 76.398 nan 300.0 []
12 11.0 nan 33.0 nan nan nan nan 7.66894 nan nan nan nan nan nan nan nan nan 84.0599 nan 300.0 []
13 12.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 91.7288 nan 300.0 []
14 13.0 nan 33.0 nan nan nan nan 8.10961 nan nan nan nan nan nan nan nan nan 99.3906 nan 300.0 []
15 14.0 nan 33.0 nan nan nan nan 7.66894 nan nan nan nan nan nan nan nan nan 107.5 nan 300.0 []
16 15.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 115.169 nan 300.0 []
17 16.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 122.831 nan 300.0 []
18 17.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 130.493 nan 300.0 []
19 18.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 138.155 nan 300.0 []
20 19.0 nan 33.0 nan nan nan nan 7.66184 nan nan nan nan nan nan nan nan nan 145.817 nan 300.0 []

becomes this:

ZValue FrameSet TiltAngle StagePosition StageZ Magnification MagIndex Intensity ExposureDose DoseRate SpotSize Defocus TargetDefocus ImageShift RotationAngle ExposureTime Binning UsingCDS CameraIndex DividedBy2 LowDoseConSet PriorRecordDose SubFramePath NumSubFrames DateTime FilterSlitAndLoss Voltage titles
1 0.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 0.0 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
2 1.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 7.66184 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
3 2.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 15.3237 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
4 3.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 22.9855 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
5 4.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 30.6473 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
6 5.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 38.3092 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
7 6.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.55522 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 45.971 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
8 7.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 53.5262 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
9 8.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.54812 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 61.1881 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
10 9.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 68.7362 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
11 10.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 76.398 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
12 11.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66894 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 84.0599 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
13 12.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 91.7288 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
14 13.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 8.10961 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 99.3906 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
15 14.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66894 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 107.5 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
16 15.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 115.169 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
17 16.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 122.831 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
18 17.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 130.493 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
19 18.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 138.155 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []
20 19.0 [0.0] 33.0 [-142.831, -457.473] [-13.7203] [33000.0] [26.0] [0.11378] 7.66184 [2.15073] [9.0] [-7.933] [-4.5] [0.0, 0.0] [178.414] [40.9854] [0.5] [False] [1.0] [False] [4.0] 145.817 [PosixPath('X:\Johannes_20211007\grid2_lamella3\frames\s_mmm_01013_33.0_Oct08_07.46.46.tif')] [243.0] ['08-Oct-21 07:47:29'] [20.0, 0.0] 300.0 []

@m-albert
Copy link

@alisterburt thanks for the ping :)

@jojoelfe I might be misunderstanding the design you mention, but wouldn't sections other than ZValue in this case not be able have their own values for shared columns?

I think I like the option of returning a dict[str, pd.DataFrame] regardless of whether it contains or not non-ZValue sections. Because

  • it's general yet explicit
  • reduces the need to fill in nans and
  • represents a minimal change wrt to before (i.e. still providing easy and convenient access to a dataframe containing only ZValues).

@alisterburt
Copy link
Collaborator Author

Thanks for the feedback both! It's great to see what everyone would want

@jojoelfe replicating the data from the MontSection or FrameSet entry across all ZValues seems like the nicest API to me but column names do clash with differing data between the two tables and I'm not 100% on the best way to handle that.

Some options

  1. replicate values from the FrameSet section across new columns with a FrameSet prefix
  2. same as above, but only add the prefix for clashing names

I think I prefer option 1 of these two as it's less magical and more predictable

@m-albert because the MontSection DataFrame would only ever have one row I think I prefer the merging strategy overall, would solution 1 above be an okay solution for you?

Final question would be on how we handle single FrameSet files (example added by @jojoelfe) - I think saying that ZValue is the special case which gets no prefix and FrameSet/MontSection always get a prefix feels natural/predictable

If we've got consensus here I'll go ahead and implement, unless either of you particularly wants to submit a PR? Happy either way, just let me know

@m-albert
Copy link

Solution 1 would definitely be okay with me, thanks for asking :)

@jojoelfe
Copy link
Contributor

Number 1 is definitely ok. I wonder if it would make sense to check if the values are the same for a column in FrameSet/MontSection and ZValues, and then to skip adding a column with prefix. This seems to be the case for a lot of columns in the MontSection files.

because the MontSection DataFrame would only ever have one row I think I prefer the merging strategy overall, would solution 1 above be an okay solution for you?

This is not necessarily true. SerialEM allows for multiple montages in one mrc files. I've submitted PR #14 to add a test for such a case. Not sure what would be the best way there. I still think merging and presenting a flat DataFrame would be best, but I would be happy with any option.

@alisterburt
Copy link
Collaborator Author

brill - I'll have a play around this weekend then! Thanks again both

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants