Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/API Separate reading of file from storing file data #335

Open
drewejohnson opened this issue Sep 6, 2019 · 0 comments
Open

ENH/API Separate reading of file from storing file data #335

drewejohnson opened this issue Sep 6, 2019 · 0 comments
Labels
API - Compatible Changes to our API that require no user actions API - Incompatible Incompatible changes to our API that require user actions discussion pythonicness

Comments

@drewejohnson
Copy link
Collaborator

There is a lot of clutter on some of our reader classes that are not necessary as they strictly pertain to file locations and information related to reading the file. Once the file is read, we don't need to store how many times we counted past a block, like results and branching files. Furthermore, if we want to expand the API to export to alternative storage like HDF5, it would make sense to be able to recreate the readers and objects back from this file.

I propose that we break most if not all of the current readers into two classes: those responsible for reading files and those responsible for storing file data. Then the latter class doesn't care how the data was obtained, just that we have the data necessary for accessing and plotting etc. We can use a class method to create these file-objects using their associated readers. To demonstrate,

class ResultsReader:
    """Class for reading results files"""

    def __init__(self, filepath):
        # do some prep-work on the file, checking for existence?
    def read(self):
        # read the file and create a ResultsFile object
        return resultsFile

class ResultsFile:
    """Class for storing results data"""
    def __init__(self):
        self.metadata = {}
        self.universes = {}
        self.resultsData = {}

    @classmethod
    def fromSerpent(cls, fileP):
        reader = ResultsReader(fileP)
        return reader.read()

Then the main read function serpentTools.read just needs to know the expected File type and we can use fileType.fromSerpent(inputFile) and we're off to the races.
This can allow us to experiment with different file readers without mucking up how file data is stored, and makes it very clear what each object does: Readers read, Files hold file data. Maybe less clear for the file...

Easier support for additional export/input file types

Down the road, if we add exporting to HDF5, we can have companion methods toHdf5 and fromHdf5. This would support the following workflow

  1. Run some potentially large job on a cluster
  2. Compress outputs into one or more alternative files, potentially taking advantage of filtering, e.g. input_res.h5, input_dep.h5 or outputs.h5
  3. Retrieve smaller compressed file[s] from cluster to local work station
  4. Re-open the files using fromHdf5 for further post processing and analysis

Level of incompatibility

Ideally, anyone using serpentTools.read should not experience any difference on a basic level. The object returned from serpentTools.read is expected to behave exactly like the original. The issues will come if people are doing type checking or subclassing our readers. I'm not sure how many people are doing this, but I will reach out to the users list and ask people to comment when we think this through more seriously.

Phasing the changes in

We can rename our current readers to their file-like companions, e.g. ResultsReader ➡️ ResultsFile so that the output of serpentTools.read will not change as we introduce these changes. Then, we can add the original readers back as aliases to their replacements, with deprecation warnings is someone tries to initialize / subclass from them.

class ResultsReader(ResultsFile):
    def __init__(self, *args, **kwargs):
        warn("Future versions will restrict the ResultsReader simply to file reading, not long-term storage", 
          FutureWarning)
        ResultsFile.__init__(self, *args, **kwargs)

We could also add in the fromSerpent class method to the ResultsFile-like objects to at least provide the method, even if it does refer back to itself.

@drewejohnson drewejohnson added pythonicness discussion API - Compatible Changes to our API that require no user actions API - Incompatible Incompatible changes to our API that require user actions proposal labels Sep 6, 2019
@drewejohnson drewejohnson added this to the 0.10.0 milestone Nov 25, 2019
drewejohnson added a commit to drewejohnson/serpent-tools that referenced this issue Dec 29, 2019
Living in serpentTools/base.py, this class provides a very
basic interface that will begin the transition in separating
file objects and file readers [GH CORE-GATECH-GROUP#335]. Concrete classes must
provide a single class method, ``fromSerpent``, that should perform
the following actions:

1. Accept a string or pathlib.Path object representing the file
   name, or a readable stream to be read
2. Create a new instance of the SerpentFile, e.g. DetectorFile,
3. Process the data in the file or stream, and
4. Return the SerpentFile instance with the new data

A helper function is also provided that can handle this flexibility
in input arguments - serpentTools.base.getStream. Used as a context
manager, this function ensures a readable stream is returned.

Future commits will begin implementing concrete classes piecewise.
drewejohnson added a commit to drewejohnson/serpent-tools that referenced this issue Aug 31, 2020
Interface outside of the rc object for expanding variable groups.
Abstracted away because the setting interface will eventually
be removed (GH CORE-GATECH-GROUP#339) as each reader (built from CORE-GATECH-GROUP#335 and CORE-GATECH-GROUP#400)
will control their own settings.
drewejohnson added a commit to drewejohnson/serpent-tools that referenced this issue Aug 31, 2020
Interface outside of the rc object for expanding variable groups.
Abstracted away because the setting interface will eventually
be removed (GH CORE-GATECH-GROUP#339) as each reader (built from CORE-GATECH-GROUP#335 and CORE-GATECH-GROUP#400)
will control their own settings.
drewejohnson added a commit to drewejohnson/serpent-tools that referenced this issue Aug 31, 2020
Interface outside of the rc object for expanding variable groups.
Abstracted away because the setting interface will eventually
be removed (GH CORE-GATECH-GROUP#339) as each reader (built from CORE-GATECH-GROUP#335 and CORE-GATECH-GROUP#400)
will control their own settings.
@drewejohnson drewejohnson removed this from the 0.10.0 milestone Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Compatible Changes to our API that require no user actions API - Incompatible Incompatible changes to our API that require user actions discussion pythonicness
Projects
None yet
Development

No branches or pull requests

1 participant