-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/lookup #78
Open
staleyLANL
wants to merge
55
commits into
develop
Choose a base branch
from
feature/lookup
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Feature/lookup #78
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some files aren't finished yet, and will be uploaded in a few days. So, this particular update won't build!! Added broad infrastructure for HDF5 handling... ...Wrapper class ...Assignment operators ...convert() functions ...Constructors ...read() functions ...write() functions Etc. Completed many (not all) HDF5 capabilities. Very basic HDF5 write needs to be smarter (not string-only). HDF5 reading still needs work. Touchups are needed here and there throughout the new HDF5 code. In particular, we need smart, non-string handling. Small tweaks were made to XML and JSON material, for consistency. Some comment changes and other odds and ends.
To be used for non-GNDS products, e.g. NDI3. When ProjectDir is given, namespaces and various other constructs are done differently. Made miscellaneous small improvements in various other files. For example: direct read() and write() in Component, aligning with Node's. So: autogenerated classes have those read()s and write(); there's no need for a user to explicitly go through Node.
Won't compile right now, but wanted to get this in.
For [optional] vector child-node fields, there's a setter that takes an element to push_back. If the optional has no value, it's given one first. Experience showed that something like this would be helpful. Updated code generator accordingly. Also, combined the default and "from fields" constructors in generated classes. This makes the generated classes shorter, and also, based on our experience with a prototype data format, proves to make object construction simpler. A generated class might have, for instance, a few metadata, followed by one or more vector or optional-vector child nodes. It's convenient for someone to initialize the object with metadata, and then to add vector elements individually later. This change helps make this process simpler and shorter. I also implemented another feature that experience with a prototype autogenerated set of classes suggested would be helpful. Before, getters with index or label looked for *metadata* named "index" or "label". Now, if no metadatum called "index" exists in the type of the vector's element (or any of its alternatives, if it's a variant), then index is interpreted to be a regular vector index, as in the "[]" operator. Note that someone who uses the code generator must be aware that the interpretation of (index) getters therefore depends fundamentally on whether or not the relevant vector element type has, or doesn't have, an "index" metadatum. This change means we essentially get an additional feature for free, as (index) previously would just fail unless an "index" metadatum were found. Removed a "GNDS"-centric comment in Component prettyprinted output. The comment probably wasn't particularly informative anyway. Updated test codes to reflect this change.
Regenerated the prototype codes. The changes to these also reflect other work I'd done recently on the code generator, as they hadn't been regenerated for a while.
That is to say, classes that have no metadata or child nodes, but still have data, e.g. with <foo>1.2 3.4 5.6</foo>. OCD'd a cmake-related file. Some more work regarding HDF5 format. Still not entirely complete. I think ".hdf"/".HDF" aren't used as HDF5 file extensions. They need a '5'. Made some diagnostic-message terminology more consistent.
Isn't very smart for output; just keeps everything as strings. That will be remedied at a later time. Was tested for functionality on entire set of GNDS v1.9 files. All converted. ...original total XML size: 2.9G ...new total HDF5 size: 21G So, a little over a factor of 7 larger. :-/ Miscellaneous other code changes were put in place during this work. Made a few diagnostic messages, and other minor constructs here and there, more consistent with one another. Added a convert() that proved to be needed for a certain disambiguation. This was noticed while working on some HDF5 input capabilities.
The former was taken from GNDS, but the latter is much more descriptive.
This is the first of a few simplifications I've been planning to do with the code base. Having `Child<>` objects contain a flag, indicating "is a node of this name allowable as a top-level GNDS node," originally seemed like a good idea. With our later decision to autogenerate GNDS Standard Interfaces from GNDS specs, having such a capability in the Core Interface really isn't necessary. The Core Interface can be safely generic, uncluttered with the capability of checking such a thing. Users will prefer our GNDS Standard Interfaces, which, having been generated from GNDS specifications, will be designed so that GNDS hierarchies will have the correct structure. Our Code Generator is also being used to design another (non-GNDS) library, and more such uses may follow. Checking if a root node has a valid *GNDS* name clearly wouldn't be wanted for other libraries. For this reason, we'd already switched off the check by default. At this point, however, we don't think that having even the ability to check such a thing, in the Core Interface, is worth the extra clutter that it creates in the code base.
The removed material was for a very old capability that was intended to help users make Meta<> and Child<> objects. I hadn't given it much in the way of capabilities yet, but then it was all superseded by the ability to easily manipulate `Meta<>` and `Child<>` objects, e.g. with constructs such as `int{}/Meta<>(...)` to change the type associated with the `Meta<>` object. The old keyword builder stuff never got much use, didn't have much to offer, and simply isn't needed any longer -- better capabilities exist, and in fact have existed for a long time. I just hadn't gotten around to removing the old code. Removing it means a bit less code, one less executable to build in the test suite, and nothing really lost.
Gets rid of the annoying compiler warning about `tmpnam()`. The new formulation required a number of changes in the logic here and there.
Made certain HDF5-related operations more efficient.
A while back, I renamed class BodyText to class BlockData. It turns out that the "body [text]" terminology still appeared here and there. This PR basically completes the terminology change. I renamed relevant constructs, and removed "body" terminology at least in regards to block data.
A class generated by our code generator currently looks something like this: ``` class Foo : public Component { // Some constructs to help with the Component base // ... struct { // Objects for metadata and child nodes } content; // ... // Getters for the objects in struct content // ... // Setters for the objects in struct content // ... // Constructors, etc. }; ``` For upcoming work, I'm considering a fundamental change to the above layout with respect to data, getters, and setters. The change would, I believe, make it easier to substantially enhance the capabilities offered by generated classes. The change - if it ultimately happens - involves, among other things, getting rid of `struct content`. In anticipation of this possible change, then, I went through the code base and tried to reduce the use of expressions like `obj.content.foo`, which assumes the existence of a data member `foo` in something called `content`. Even with the current structure of the generated classes, `obj.content.foo` can be replaced by `obj.foo()`, i.e. the getter than gets, well, `obj.content.foo`. With the possible future work, the getter (or something that replaces it) will do the moral equivalent to what it does now, but `struct content` won't be there. So, better to write `obj.foo()` where we previously wrote `obj.content.foo`. So, the purpose of the changes in this commit can be summarized as follows: (1) They don't change the meaning of the current code. (2) They help set the stage for the possible future more-substantial changes to the generated code, by reducing the reliance one `content.something` terminology all over the place. The changes, while mostly simple and easy to understand, triggered some SFINAE issues that were tricky to track down. New comments in BlockData's `detail.hpp` function explain it, if anyone cares. Also in this PR... General changes: - Some minor updates to SFINAE, on top of those discussed above. - Simplified a few things in the code generator. - Code generator updates reflect goal of using "content." less often. - Ditto for a relevant custom.hpp file. Also, consistent with the goals described above, I replaced the various uses of "content." in several test codes, by writing getters (where they weren't there already) and calling those instead.
This is a change I've wanted to do for a while. I think it's the right thing to do. See the largest change in ``primer.rst`` for a brief description of why I did this. The Core Interface's internal structure has always made use of a handful of special names, to identify certain nodes that aren't in GNDS proper, but which need to be used in order to store certain special content. Naturally, special content needs special handling. Example: what arrives through PugiXML as XML CDATA content *was* stored in GNDStk's internal structure under a child node called ``cdata``. In fact, it was in a ``text`` metadatum inside of a ``cdata`` node. Now, the node is called ``#cdata`` and the metadatum is called ``#text``. That is, each name is now prefixed with ``#``. Adding the ``#`` prefix is just a small change, but it's meant for a couple of important purposes. First, its presence allows someone who looks at GNDStk's code base to easily identify special nodes -- which, generally, need special handling, at least in regards to activities such as I/O. Second, it's always possible that a future GNDS standard (or, importantly, another format that someone might design with our code generator) might want to have a field called ``foo``, where foo is one of our handful of special names that trigger special handling. By renaming ours to ``#foo`` instead, no conflicts will arise. (And someone won't use the ``#`` themselves for the name of a field. It wouldn't work when the associated class or object is created in C++, which of course doesn't allow such a character in variable names.) I'd have liked to use ``$`` in place of ``#``, but realized right away that doing so would be problematic. Functions like ``Node.one()`` use regular-expression searches, and the ``$`` character has a special meaning in C++ regular expressions. Clearly, then, using ``$`` would lead to various problems. So, we chose what we believe is a reasonable alternative, and one that we think is as good as any other. Note that the changes in this PR, while relatively simple, were a bit more involved than just changing ``foo`` to ``#foo`` for every ``special`` name ``foo``. Where JSON is involved, for example, you'll see that the use of ``#`` sometimes necessitated changes in node order, given that the JSON library orders nodes lexicographically. Also, some strings for special node names, for example ``xml``, also have meaning for other things: ``xml`` was a special node name, but ``xml`` is also a string someone can use, in certain places, to say that they want XML format. So, some instances of the string ``xml`` are now ``#xml``, but others remain as ``xml``.
For quite some time, GNDStk's Core Interface had a large number of `Meta` and `Child` objects in its `basic::` and `misc::` namespaces. This was set up prior to us making our current plan: to have GNDStk Standard Interfaces for GNDS 2.0, etc. Going forward, we anticipate that each Standard Interface will have all necessary `Meta` and `Child` objects built for it automatically, by our code generator, when our code generator generates the Standard Interface's classes. That means we need to keep only the current `Meta` and `Child` objects that happen to be used in existing test codes. Or, in a few instances, elsewhere in the Core Interface. Future work (not included right now) will consist of phasing out any remaining uses of these objects in Core Interface code. (Where that happens, we'll just make `Meta` or `Child` objects directly where they're needed.) Then, finally, we'll move any remaining `Meta` and `Child` objects - ones that are used in the test suite - into non-Core-Interface files that would be included only in test-suite code.
I implemented a useful feature that was requested by a user. When using Component's prettyprinter, previously block data were *all* printed. For example, if a block of data had 1000 elements, then all 1000 elements were printed by the prettyprinter. Now, we provide a user-settable variable, `GNDStk::truncate`. A negative value for this variable means: print all block-data elements, just like we did before. The default is `-1`, so printing all elements is the default. If `GNDStk::truncate` is `0` or positive, it means: print at most that number of elements. Whenever fewer elements are printed than exist, a comment to that effect is emitted. The comments mentions how many values actually exist, as that information may be important to someone to know. I also updated BlockData's `write.test.cpp` so that it tests the new feature, and I made some miscellaneous cosmetic changes to that file as well. Finally (and this is related to Component's prettyprinting, so this PR seemed like a good place to do it), I removed the `GNDStk::across` alias to `GNDStk::columns`. I'd had it, previously, because there was a `columns` in another namespace, and I wanted to provide something that worked even in the face of multiple `using` directives. The other `columns` disappeared, however, in an earlier PR in which we got rid of most `Meta` and `Child` objects.
We're generally trying to limit source-code lines to 80 characters or less in length. This isn't an absolute requirement, but, in keeping with the idea, I reformatted longer lines here and there in the hpp and cpp files throughout the code base, where doing so made sense. Sometimes, such codes have string literals (as with `R"..."`) with lines of more than 80 characters. Those I left as-is; the lines in question are supposed to be precisely the way they are. Also, for now, I didn't reformat long lines in other types of files. That includes, for example, various `CMakeLists.txt` files throughout GNDStk. Building very slightly on an earlier PR, I also made a small change to the `GNDStk::columns` variable, so that it would be more consistent, in terms of usage, with the related `GNDStk::truncate` variable. This has nothing to do with 80+ character lines, but was a simple change, so I decided to include it here. As usual, I modified a few comments and made small cosmetic changes in a few places.
Earlier during the development of GNDStk, we examined existing GNDS files and accounted for all of the node and metadata names that were used throughout those files. We then made two sets (one generic, one "type-aware") of `Meta` and `Child` objects for those nodes and metadata. The generic ones were in a namespace called `basic`, the type-aware ones in a namespace called `misc`. Given our later decision to separate GNDStk's Core Interface from individual Standard Interfaces that we'll have for each GNDS version, we no longer see the need to clutter the Core Interface with the `basic` and `misc` namespaces and all of their many `Meta` and `Child` objects. However, our test suite used some of the material in those namespaces, so we couldn't just remove it. In an earlier PR, we removed *unused* `Meta` and `Child` objects. A modest number of them, however, remained. In this PR, we've completely reworked the relevant material so that it still exists in its present form namespace-wise, but is no longer in the Core Interface. Instead, we have a new file, `src/GNDStk/test/keys.hpp`, that's included in test codes that need any of those old `Meta` and `Child` objects. But that file is *not* part of the Core Interface. This substantially declutters the Core Interface. The new `src/GNDStk/test/keys.hpp` file could arguably be split up. Right now, for example, it has several namespaces within it, which isn't consistent with our typical style, elsewhere in GNDStk, regarding source-file content. We may, therefore, split up `keys.hpp` sometime. For now, however, the file is included only in test-suite files, so we're not going to worry about it. The main point right now was to move, to a "test" location, material that we don't see as belonging in the Core Interface any longer. As a consequence of the changes we've made, we no longer have `namespace core`. Previously, `namespace core` had simply brought in (via using) the `njoy::GNDStk` namespace itself, and `njoy::GNDStk::basic` as well. The latter (`basic`) no longer exists in the main GNDS namespace, as described earlier. So, there's no point in having an `njoy::GNDStk::core` that just brings in `njoy::GNDStk`. The changes allowed for `GNDStk.hpp` itself to be somewhat decluttered. We also Removed the *empty* top-level GNDStk test. If we're going to have something there at all, we can put it in at another time. For now, having an empty test code just slowed the test-suite compilation. By removing it, we'll get a small but nonzero speedup in compilation. Content that was previously in `basic.hpp`, `basic/src/`, `misc.hpp`, `misc/src`, `common.hpp`, and `common/src` was moved to the brand-new file `test/keys.hpp`, the new file mentioned above. Autogenerated codes were updated to reflect all of the above.
Based on a user's experience with GNDStk, I reworked some of the details of how the `BlockData` class handles, in particular, the values of `length`, `start`, and `valueType`. Those values - `length`, `start`, and `valueType` - can appear in GNDS as metadata for elements that contain block data. Example: the `<values>` element. Our `BlockData` class is built so that it's able to handle all of those in a reasonable way. `BlockData` is a base of `Component`, which is always a base of some "high-level" class (as are produced by our code generator). A question arises about the need to "sync" `length`, `start`, and `valueType` in `BlockData` with whichever ones someone might choose to support (remember, the values aren't necessary) in a high-level class. Without going into details, we believe (sort of an editorial remark here) that having `length`, `start`, and `valueType` in GNDS introduces some goofiness, in terms of redundancies and such. If something is goofy, then its handling in any code is likely to be goofy as well. The changes we've made change from one plausible way of handling things, to another plausible way. The new scheme can't be characterized as perfect, and, given the nature of the values in question, no scheme is likely to be perfect. We believe, however, that our new scheme is less imperfect than what we had before, and it may (to be determined) fix the problem the user had. As a result of the changes, we were able to remove some SFINAE, and to clean up some code here and there. Relevant test codes were of course updated to reflect our changes.
Cleaned up some slight inconsistencies here and there. I'll write a detailed description in a pull request.
Also added `Component`-to-`Tree` (not just -to-`Node`) conversion.
GNDStk, as well as its code generator, uses an external library for its basic JSON I/O. A relatively recent improvement to that library was its `ordered_json` class, which preserves a file's existing ordering of key/value pairs, rather than reordering them lexicographically by key. In principle, the ordering of GNDS metadata and nodes isn't supposed to matter. In practice, we've always preferred the idea of maintaining the order in which values are given. If a user reads a file, makes a small modification, and then writes the file back out, then an across-the-board reordering of metadata and child nodes might seem disconcerting. Moreover, our code generator uses input JSON specification files to produce classes - including, importantly, constructors for those classes. Imagine that someone gives specs indicating, say, that a class `Element` contains two metadata: `symbol` ("H", "He", "Li", etc.) and `atomicNumber` (1, 2, 3, etc.), in that order. Prior to this PR, the underlying JSON library's lexicographic reordering of key/value pairs meant that the code generator would give `Element` a constructor that accepted `(atomicNumber,symbol)`, not `(symbol,atomicNumber)` as a person might expect if they gave `symbol` first, and `atomicNumber` second, in the spec passed to the code generator. Now, in both the GNDStk library and the code generator, order is maintained. (Note that for the code generator, generated constructors still take metadata before child nodes. Within metadata and within child nodes, no reordering will take place now. However, there still isn't a way to intermix metadata and child nodes. We may or may not allow that at some point.) Important note: because a later version of the underlying JSON library must be used, a user may need to entirely rebuild GNDStk in order for this update to work. With the typical CMake-style workflow, this probably means removing a `build/` directory entirely, then `cd build`, `cmake ..` etc., from there. Test codes were of course updated to reflect the changes.
Attempt to determine the underlying type of something that's represented as a `string`; for example, a `metadatum="1.23"` is probably a `double`. Allow for certain patterns that are stored internally for "special" nodes, such as those for plain character data, to be "flattened", that is, stored in a simpler, less hierarchical manner in an HDF5 file. Important: at the moment, HDF5 *writing* is improved, but HDF5 *reading* has not at this time kept pace. At the moment, we can't necessarily read back an HDF5 file that was read, and recover information properly. I'll hold off on putting in a pull request until that part is done as well. Even the new improvements to HDF5 writing aren't quite to where I want them to be. There's more work to come. This commit also includes a small amount of work that really has nothing to do with our HDF5 capabilities. I just happened to visit the affected code while working on the new material. For example, I put in another `convert()`, and also made sure it was tested.
Split the (very long) convert detail.hpp function into several parts.
Did some non-const-to-const forwarding differently, such that code bulk was reduced. Split getter() functions in Component's detail.hpp into their own file, as detail.hpp was getting a bit too long and cluttered. Prepared the way for adding some new capabilities that will use the new Lookup class. More work to come....
… respect to references. Function has<Lookup<false>> creates has<Lookup<true>>; will be used for queries. Tweaks to type names, parameter names, and some formatting. Added some traits classes; will use them soon.
Necessary modifications to class Lookup. Updated the code generator. Tedious and sometimes difficult Component::getter() modifications to make it all work smoothly. Return types had to be more general, and some new SFINAE was in order. Ditto, even more so, for the detail::getter() functions. Lots of new SFINAE, function templates, and other constructs. The goal is ultimately to support some nifty new capabilities in Component-derived classes, in particular those that are generated with the code generator. Some separate (from GNDStk proper) work I have seems to indicate that the new constructs work as intended, but lots of tests, *in* GNDStk, should be written. Also, I'll need to run the code generator to regenerate our GNDS Standard Interface prototype. This isn't included quite yet, in this commit, because I'll want more time to review and validate what it generates.
…r function *templates*.
…at it deals with the presence of member function templates (not just member function non-templates) in the generated C++ classes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Substantial new capability here, one that was wanted by a user.
Consider a class that's generated by our code generator. If the class has a vector of objects of some other class, we used to have getters and setters that took (index) or (label), and performed lookup based on the presence of "index" or "label" metadata (which are common throughout the GNDS format).
Based on some needs of a user, I extended the "find a vector element" capability so that it can be done on the basis of any metadatum, not just index or label.
Omitting the details, for now, this was done essentially by extending the idea of the Core Interface's
Meta<>
class. The new class,Lookup<>
, derives fromMeta<>
, but includes extra machinery that allow users to do fun and fabulous things with generated classes. (Those that derive from GNDStk'sComponent
class.)Moreover, because
Lookup<>
objects derive fromMeta<>
objects, they double asMeta<>
objects if someone wants to use the Core Interface in concert with generated code. And, the new autogeneratedLookup<>
objects replace what were previouslyMeta<>
objects in the generated code, so that we have all these new capabilities without cluttering the namespaces of generated code with any additional objects.On top of that, I was able to consolidate the previously-separate (index) and (label) getters/setters in the generated code. Therefore, generated codes are actually a bit shorter, in spite of having the new capabilities. Note that the behind-the-scenes code to actually do the lookups could, like so much else, be placed down into the Component class. Generated codes thus aren't cluttered with new code in that respect either.
In the relevant generated classes, the new work did involve making member function templates out of what were once plain old member functions (not templates). This initially made some of the pybind11 machinery unhappy. I made what appeared to be proper fixes for that, but someone with more pybind11 experience may want to review those changes. We should also have a discussion at some point about whether or not the current state of the autogenerated Python-binding code allows for as much flexibility (in terms of those getters and setters) as the C++ code does. If it doesn't, then we may want to work on ways to extend it.