-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for SPC files from Shimadzu instruments #102
Comments
I wrote the spc converter @ximeg mentioned above. I've put up the source code here. In this repo, The basic idea of the format is that it breaks the file in 512-byte sectors. These are organized into streams with allocation tables. One of these streams is filled with directory entries, which give organization to the information in the file. These directories form a tree structure and some contain pointers out to data they contain. To extract the spectra, we basically find the directory containing the data (e.g. X. Data 1) and follow the pointer to the data. This diagram illustrates the general structure, with the streams, allocation tables, and directories. I still haven't quite remembered how I got to the data itself from the directories, but I'll update when I do. For now, here's some more documentation on the file format in general. I'd recommend the first one in particular. |
Hi @uri-t , OLE CF seems to be quite a complex file format, ideally we would like to find a generic open-source OLE reader for R and adapt it to import Shimadzu SPC. This would save us a ton of effort. Here is an example output of running oledir:$ oledir 112.spc oledir 0.54 - http://decalage.info/python/oletools OLE directory entries in file 112.spc: ----+------+-------+----------------------+-----+-----+-----+--------+------ id |Status|Type |Name |Left |Right|Child|1st Sect|Size ----+------+-------+----------------------+-----+-----+-----+--------+------ 0 |<Used>|Root |Root Entry |- |- |1 |6 |2560 1 |<Used>|Stream |Contents |2 |3 |- |5 |4 2 |<Used>|Storage|Version |- |- |10 |0 |0 3 |<Used>|Stream |\x05SummaryInformation|5 |4 |- |2 |132 4 |<Used>|Stream |DataStorageHeaderInfo |- |- |- |1 |4 5 |<Used>|Storage|DataStorage1 |- |6 |7 |0 |0 6 |<Used>|Stream |NumberofSaved |- |- |- |0 |4 7 |<Used>|Stream |DataStorageName |8 |- |- |6 |15 8 |<Used>|Storage|DataSetGroup |- |- |12 |0 |0 9 |<Used>|Stream |CLSID |- |- |- |9 |16 10 |<Used>|Stream |Module Version |9 |11 |- |8 |8 11 |<Used>|Stream |File Format Version |- |- |- |7 |4 12 |<Used>|Stream |DataSetGroupHeaderInfo|13 |- |- |A |4 13 |<Used>|Storage|DataSet1 |- |- |14 |0 |0 14 |<Used>|Stream |DataSetHeaderInfo |15 |16 |- |B |125 15 |<Used>|Storage|MethodStorage |- |- |32 |0 |0 16 |<Used>|Storage|DataSpectrumStorage |19 |18 |28 |0 |0 17 |<Used>|Storage|DataPeakPickStorage |- |- |27 |0 |0 18 |<Used>|Storage|DataPointPickStorage |- |- |25 |0 |0 19 |<Used>|Storage|DataAreaCalcStorage |20 |17 |24 |0 |0 20 |<Used>|Storage|DataHistoryStorage |- |- |23 |0 |0 21 |<Used>|Stream |HistoryVersion |- |- |- |F |4 22 |<Used>|Stream |HistoryHeader |- |- |- |E |4 23 |<Used>|Stream |DataSetHistory |22 |21 |- |D |63 24 |<Used>|Stream |AreaCalcRegions |- |- |- |10 |50 25 |<Used>|Stream |PointPickData |- |26 |- |12 |4 26 |<Used>|Stream |PointPickColWidths |- |- |- |11 |16 27 |<Used>|Stream |PeakPickPAV |- |- |- |13 |288 28 |<Used>|Stream |Version |30 |29 |- |18 |4 29 |<Used>|Storage|DataHeader |- |- |40 |0 |0 30 |<Used>|Storage|Data |- |- |39 |0 |0 31 |<Used>|Stream |Contents |- |- |- |25 |60 32 |<Used>|Stream |PageTexts0 |31 |34 |- |22 |171 33 |<Used>|Stream |PageTexts1 |- |- |- |20 |112 34 |<Used>|Stream |PageTexts2 |33 |35 |- |1C |245 35 |<Used>|Stream |PageTexts3 |- |36 |- |1B |53 36 |<Used>|Stream |PageTexts4 |- |- |- |19 |95 37 |<Used>|Stream |Data Header.1 |- |- |- |26 |8 38 |<Used>|Stream |X Data.1 |- |- |- |2A |11208 39 |<Used>|Stream |Y Data.1 |38 |37 |- |14 |11208 40 |<Used>|Stream |Header Info |- |- |- |27 |61 41 |unused|Empty | |- |- |- |0 |0 42 |unused|Empty | |- |- |- |0 |0 43 |unused|Empty | |- |- |- |0 |0 ----+----------------------------+------+-------------------------------------- id |Name |Size |CLSID ----+----------------------------+------+-------------------------------------- 0 |Root Entry |- | 3 |\x05SummaryInformation |132 | 1 |Contents |4 | 5 |DataStorage1 |- | 8 | DataSetGroup |- | 13 | DataSet1 |- |7FAC4E0B-5987-11D0-954C-0800096B7523 19 | DataAreaCalcStorage |- |60F779CB-D341-11CF-91E2-0800096BCA1F 24 | AreaCalcRegions |50 | 20 | DataHistoryStorage |- | 23 | DataSetHistory |63 | 22 | HistoryHeader |4 | 21 | HistoryVersion |4 | 17 | DataPeakPickStorage |- |D069DE03-FFBB-11CF-A7AD-0800096A3C5E 27 | PeakPickPAV |288 | 18 | DataPointPickStorage |- |2303D603-1C5B-11D0-9649-0800096BAA1D 26 | PointPickColWidths |16 | 25 | PointPickData |4 | 14 | DataSetHeaderInfo |125 | 16 | DataSpectrumStorage |- |1851B2E3-83F4-11CF-BD45-0800096B1920 30 | Data |- | 37 | Data Header.1 |8 | 38 | X Data.1 |11208 | 39 | Y Data.1 |11208 | 29 | DataHeader |- | 40 | Header Info |61 | 28 | Version |4 | 15 | MethodStorage |- | 31 | Contents |60 | 32 | PageTexts0 |171 | 33 | PageTexts1 |112 | 34 | PageTexts2 |245 | 35 | PageTexts3 |53 | 36 | PageTexts4 |95 | 12 | DataSetGroupHeaderInfo |4 | 7 | DataStorageName |15 | 4 |DataStorageHeaderInfo |4 | 6 |NumberofSaved |4 | 2 |Version |- | 9 | CLSID |16 | 11 | File Format Version |4 | 10 | Module Version |8 | |
I def agree that something off-the-shelf would be ideal. I did a bit of looking and couldn't find any R packages for this (:()--I tried the antiword library from CRAN but it only accepts OLE files that are actually Word documents alas. I'm guessing you might have better luck finding something given that you're not a stranger to R like me. If there ends up not being anything available in R would it be possible to wrap the python library you linked to so it can be used in R? This in particular looks pretty promising. |
Thanks for suggestions, we have to think how this can be solved. For now I just call the python bulk convertor to make CSV files from SPC, and then I read them into R. We can always call python from R, this is not a problem. We can even pass data back and forth between Python and R without creating any intermediate files (thanks to the So far I see several options how we can address this Option 1: add Python script
This solution looks kinda dirty, but requires a minimal effort. Option 2: translate Python script into RAnother option would be to re-write the uri-t Python script into R. I think this would take about a week to do, including creation of unit tests and writing Roxygen docs and vignettes. The downside is that this script looks a bit hacky, but it does work, at least with files from UV-2600. I am not sure whether it supports all possible combinations of Shimadzu parameters and metadata, but it is anyway a good starting point. Option 3: Implement a generic OLE CF file readerWe could implement a generic R reader for OLE CF files as a separate project, and use it to import Shimadzu SPC. This would be beneficial to the whole R community, not only hyperSpec users. However, this is a tremendous amount of work, and I believe we don't have resources for that. We could go with option 1 and then replace it later option 2. @cbeleites, do you have any opinion on this topic? |
A small update from my end which might inform this decision: over the past few days I've written part of a more general OLE reader. Right now, it's relatively short and and can do most of the things we'd need from a generic parser--making the directory tree and retrieving the data corresponding to each entry. Based on this it seems like building a generic OLE file reader might not be so bad--not too much more effort than translating the existing script at least. More generally, now that I have good handle on the format I'll upload a more complete explanation either later tonight or tomorrow--this might be useful for either option 2 or 3. |
@cbeleites I noticed there is a file |
First of all, @uri-t thanks for helping - your experience with those files is valuable for us! @ximeg , Yes, I have opinions on the topic :-)
|
These files have the same internal organization, only data differs. Yes, one is enough! |
The structure doesn't vary, but when the files get bigger there are couple extra cases the code has to handle (since the sector allocation tables get filled up and have to expand). There's a hulking 9 MB file here should work as a test file for these cases. I've also uploaded a collection of files people have tried to convert on my website here. This should give us better coverage in case other Shimadzu spectrometers have different directory structures for some reason. Of the 903 files, 341 are OLE files, and 46 are the Applesoft Basic files that you've seen. I'm not sure what the rest are, but some are likely Galactic SPC files. See |
Also on the issue of test coverage, I've extracted the instrument information for all the OLE files in the folder I linked to above. I realized that @ximeg and I have the same UV-2600 model, but other people have tried the UV-1700, UV-1800, and UV-1900 models so we have examples from those instruments as well. The instrument info is in |
I think, the discussion should continue here: |
read.spc
does not work with spc files from Shimadzu spectrometers, because they use a proprietary binary format. The displayed error message is confusing.I suggest to detect this file format (first four bytes are
D0 CF 11 E0
) and display an error message that says 'Support for Shimadzu SPC file format (OLE CF) is not yet implemented'After that we can try to implement an import filter for these files. There is an experimental support for Shimadzu SPC format in the spc Python module, which we can look at. There is also an online converter for Shimadzu SPC files – I emailed the author regarding availability of its source code.
Attached there are four SPC files from our Shimadzu UV-2600 spectrometer:
Shimadzu_UV-2600.zip
The text was updated successfully, but these errors were encountered: