Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/improve generator #81

Open
wants to merge 78 commits into
base: develop
Choose a base branch
from
Open

Conversation

staleyLANL
Copy link
Contributor

This PR has a variety of new things, many of them, but not all of them, related to the code generator.

I changed the terminology used for a couple of special nodes. Special nodes are the ones whose names begin with #, indicating that they need special treatment. (And, the # also guarantees that they can't conflict with valid C++ names that someone might use for classes generated by our code generator.)

Special nodes include, for example, #cdata and #pcdata, which are used in our primary Node data structure in order to represent what are called, respectively, CDATA and PCDATA nodes in XML.

The following two changes have been made:

  1. #nodeName to #nodename. Other special nodes didn't use camelCase, so I decided this one shouldn't either. The GNDS manual had suggested using nodeName to encode a node's name in JSON -- something that must be done due to certain limitations of JSON relative to XML. We're apparently the first ones to create JSON-format GNDS files, and we already took it upon ourselves to add the # prefix. Therefore, changing another thing doesn't seem too unreasonable. In my opinion, camelCase feels distracting when used in this context. And, as already stated, we're now being consistent: all special node names are now entirely lowercase.

  2. #attributes to #metadata. As with the node-name business described above, our earlier name (without the #) was suggested by the GNDS manual for use in JSON. Throughout GNDStk, we prefer the more-generic, less XML-biased term metadata, and decided to use it here.

For the above changes, tests and resource files, as well as verbiage in the documentation, were, of course, updated accordingly.

Editorial remark: GNDS codes from other institutions may end up doing things differently. For now, with JSON at least (which is where this is all relevant), we may be the first to support it, and have chosen this system. If there's disagreement, at some point, about how to handle special nodes, we can deal with that at that time. Perhaps, even if different GNDS codes choose to write things a bit differently, all could support reading GNDS files from anyone. That would just be a matter of accepting, say, either #metadata or attributes, and either nodeName or #nodename. Supporting a handful of such choices wouldn't create much complexity.

On other matters...

The code generator now allows "times" in place of "occurrence". We still accept the latter term, because GNDS specs have it, but the name "occurrence" was just getting me down. It's too verbose, and is easy to misspell. "times" is simply better, in my opinion. I've updated our prototype GNDS specs to use "times", but left "occurrence" in our copy of the full v1.9 specs.

I also made some changes (now, when it's still possible -- before the whole world is using GNDStk!!) to the functions that must be provided inside classes that derive from GNDStk::Component:

  • namespaceName to NAMESPACE
  • className to CLASS
  • GNDSName to FIELD
  • keys to KEYS

The uppercase names stand out, and indicate that there's something special about these functions (which there is: Component expects them). All of them being all-uppercase makes them more consistent with one another. Moreover, for non-GNDS libraries that somebody might want to build with the code generator, we anticipate that the all-uppercase names are less likely to conflict with names that might be used elsewhere in the class. What if someone wanted a node called keys, for instance (<keys> in XML)? Now, as long as a user of the code generator uses lowercase names, they'll stay lowercase when used as fields in other classes, and in getters and setters. So, no conflict. And, they'll be capitalized (e.g., keys to Keys) for class names. So, still no conflict with ALL UPPERCASE.

Another editorial remark: I don't take changes, like all the above, lightly. After GNDStk gets out there and takes the world by storm, changing such things would have the potential to break what users are doing. For now, before that happens, I'm taking opportunities, when I see them, to make changes that I think are for the better, before doing so would create problems for users.

In the or.hpp file, I fixed up a couple of comments that weren't quite right, and also added back (1) a certain constructor, and (2) a certain free function, that I'd removed in an earlier commit. I'd thought that neither of those were used, but then realized that it only appeared that way because the test suite wasn't completely testing all of or.hpp's functions regarding multi-queries.

And, with the above paragraph in mind: I added hundreds of lines of brand new multi-query test code to Node's call.test.cpp.

I've done some work with the code generator as well. Duh, this branch is called feature/improve-generator, so one would think that I'd done something that improves the code generator.

The changes.json file no longer needs the "string" to "std::string" change, because the code generator does that change automatically. Many entries in the present changes.json are super specific (say, relating to somebody's enums) and couldn't possibly be known, automatically, by the code generator. Others are very GNDS-specific, such as the change from "double" to "Double".

With the above modification, a user of the code generator might be able to simply not have a changes.json file. (The code generator already didn't require users to have it; just remove any "Changes" key/value pair in the main input JSON.) We'll still get string to std::string, and, unless we're working with an actual GNDS spec that might have GNDS-isms like the "double" node, other mappings probably just aren't needed. (We wouldn't even provide string to std::string, given that users could always write "std::string" in the first place, but figured that users would like it.) Note that in the unlikely event that you have your own "string" and don't want it changed to std::string automatically, you'd need to use a namespace prefix, e.g. "mystuff::string". But we don't think most people will write their own string.

The code generator has a few additional changes. The generated key.hpp file, which has Lookup<> and Child<> objects for the generated classes, #includes GNDStk.hpp, and itself is #included in each class file (and no longer needs to be #included in the version file). The class files thus no longer need to #include GNDStk.hpp, and also don't need a particular using directive that key.hpp now already provides for them. (Besides, we believe that someone who brings in a class file should automatically get the Lookup<> and Child<> objects. And, those are simple, so bringing them in automatically won't slow compilation by any significant amount. Note that we have retained the concept that generated class files should be usable independently, bringing in only the dependencies that they need. The modifications we've spoken of don't change that.)

The above changes, while relatively simple, make the file for each generated class just a few lines shorter and simpler. That's something we like, especially considering how many generated files we'll eventually have.

Finally, we dealt with a few fixmes, improved a bit of detail:: terminology, and made some simplifications in several Component-related files.

staleyLANL and others added 30 commits November 1, 2021 22:19
Some files aren't finished yet, and will be uploaded in a few days.
So, this particular update won't build!!

Added broad infrastructure for HDF5 handling...
...Wrapper class
...Assignment operators
...convert() functions
...Constructors
...read() functions
...write() functions
Etc.

Completed many (not all) HDF5 capabilities.
Very basic HDF5 write needs to be smarter (not string-only).
HDF5 reading still needs work.
Touchups are needed here and there throughout the new HDF5 code.
In particular, we need smart, non-string handling.

Small tweaks were made to XML and JSON material, for consistency.
Some comment changes and other odds and ends.
To be used for non-GNDS products, e.g. NDI3.
When ProjectDir is given, namespaces and various other constructs are done differently.
Made miscellaneous small improvements in various other files.
For example: direct read() and write() in Component, aligning with Node's.
So: autogenerated classes have those read()s and write(); there's no need for a user to explicitly go through Node.
Won't compile right now, but wanted to get this in.
For [optional] vector child-node fields, there's a setter that takes an element to push_back. If the optional has no value, it's given one first. Experience showed that something like this would be helpful.

Updated code generator accordingly.

Also, combined the default and "from fields" constructors in generated classes. This makes the generated classes shorter, and also, based on our experience with a prototype data format, proves to make object construction simpler. A generated class might have, for instance, a few metadata, followed by one or more vector or optional-vector child nodes. It's convenient for someone to initialize the object with metadata, and then to add vector elements individually later. This change helps make this process simpler and shorter.

I also implemented another feature that experience with a prototype autogenerated set of classes suggested would be helpful. Before, getters with index or label looked for *metadata* named "index" or "label". Now, if no metadatum called "index" exists in the type of the vector's element (or any of its alternatives, if it's a variant), then index is interpreted to be a regular vector index, as in the "[]" operator. Note that someone who uses the code generator must be aware that the interpretation of (index) getters therefore depends fundamentally on whether or not the relevant vector element type has, or doesn't have, an "index" metadatum. This change means we essentially get an additional feature for free, as (index) previously would just fail unless an "index" metadatum were found.

Removed a "GNDS"-centric comment in Component prettyprinted output. The comment probably wasn't particularly informative anyway. Updated test codes to reflect this change.
Regenerated the prototype codes. The changes to these also reflect other work I'd done recently on the code generator, as they hadn't been regenerated for a while.
That is to say, classes that have no metadata or child nodes, but still have data, e.g. with <foo>1.2 3.4 5.6</foo>.

OCD'd a cmake-related file.

Some more work regarding HDF5 format. Still not entirely complete.

I think ".hdf"/".HDF" aren't used as HDF5 file extensions. They need a '5'.

Made some diagnostic-message terminology more consistent.
Isn't very smart for output; just keeps everything as strings.

That will be remedied at a later time.

Was tested for functionality on entire set of GNDS v1.9 files. All converted.
...original total XML size: 2.9G
...new total HDF5 size: 21G
So, a little over a factor of 7 larger. :-/

Miscellaneous other code changes were put in place during this work.

Made a few diagnostic messages, and other minor constructs here and there, more consistent with one another.

Added a convert() that proved to be needed for a certain disambiguation. This was noticed while working on some HDF5 input capabilities.
The former was taken from GNDS, but the latter is much more descriptive.
This is the first of a few simplifications I've been planning to do with the
code base.

Having `Child<>` objects contain a flag, indicating "is a node of this name
allowable as a top-level GNDS node," originally seemed like a good idea.

With our later decision to autogenerate GNDS Standard Interfaces from GNDS
specs, having such a capability in the Core Interface really isn't necessary.

The Core Interface can be safely generic, uncluttered with the capability of
checking such a thing. Users will prefer our GNDS Standard Interfaces, which,
having been generated from GNDS specifications, will be designed so that GNDS
hierarchies will have the correct structure.

Our Code Generator is also being used to design another (non-GNDS) library,
and more such uses may follow. Checking if a root node has a valid *GNDS* name
clearly wouldn't be wanted for other libraries. For this reason, we'd already
switched off the check by default. At this point, however, we don't think that
having even the ability to check such a thing, in the Core Interface, is worth
the extra clutter that it creates in the code base.
The removed material was for a very old capability that was intended to help users make Meta<> and Child<> objects.

I hadn't given it much in the way of capabilities yet, but then it was all superseded by the ability to easily manipulate `Meta<>` and `Child<>` objects, e.g. with constructs such as `int{}/Meta<>(...)` to change the type associated with the `Meta<>` object.

The old keyword builder stuff never got much use, didn't have much to offer, and simply isn't needed any longer -- better capabilities exist, and in fact have existed for a long time. I just hadn't gotten around to removing the old code.

Removing it means a bit less code, one less executable to build in the test suite, and nothing really lost.
Gets rid of the annoying compiler warning about `tmpnam()`.
The new formulation required a number of changes in the logic here and there.
Made certain HDF5-related operations more efficient.
A while back, I renamed class BodyText to class BlockData. It turns out that the "body [text]" terminology still appeared here and there. This PR basically completes the terminology change. I renamed relevant constructs, and removed "body" terminology at least in regards to block data.
…at it deals with the presence of member function templates (not just member function non-templates) in the generated C++ classes.
Both Node-to-HDF5 and5 HDF-to-Node (write, and read) now have the ability to deal with original string-form content (as from an original XML file, in GNDStk's Node), and "type-ified" versions thereof. For example, `"1.2 3.4"` is a string, but for HDF5 we can write it as two doubles. Also, importantly, content related to "special" nodes, in particular what we'd see in XML as CDATA, comments, and PCDATA, are now handled in a full, and I believe proper, manner. For both input and output of HDF5.

At some point I'll have to document all the HDF5 capabilities. For now, for anyone who wants to play around with HDF5, I recommend trying different combinations of the HDF5::reduced and HDF5::typed boolean flags; and then using perhaps the `h5dump` tool, or any other good HDF5 tool, for seeing what GNDStk's HDF5 capabilities can produce.

Also: I filled out the main HDF5 test code with tons of new tests. This was definitely needed at this point.

Also...

Changed some HDF5-related function names that are in the detail:: namespace. The earlier names had tried to use capitalization (e.g. HDF5 instead of hdf5) in a manner that was consistent with the non-detail:: API names. For detail stuff, however, certain capitalization and long names felt heavy and painful to the eyes.

Added a couple of tests for constructs that, it turned out, had not been interpreted correctly by the "type guesser" algorithm that the generic Node-to-HDF5 conversion uses.

Did a few things that were unrelated to the main work (HDF5) here, but I was
thinking about them...

Tweaked a couple of autogen files, per some remarks about earlier pull requests.

Figured out how to provide certain *constexpr* `has()` functions for classes
derived from `Component`. We anticipate that these may eventually prove to be
quite useful. Importantly, these are fully in `class Component` itself. Nothing
had to be added to the generated classes. Things like this is why `Component`
is awesome.
FileType::null to FileType::guess
FileType::text to FileType::debug
This is a prerequisite for some upcoming work.
Could replace fileName with filePtr->getName().
Basically, the former was redundant.
Possible in part because of some other recent work.
For example, >> and << (relative to streams and to strings) for Component.
Slightly changed some names and parameters here and there, to make things more consistent.
That's really the right thing to do (until/unless they fix it).
XML, JSON, etc. are now more consistent - no spurious newline in XML.
Updated test codes accordingly.

Fixed some fixmes.

General detail work, with the aim of tightening up the code.

Tweaked some comments.

More Node/Tree consolidation.
We use Node in most places, Tree only when necessary.

Removed some "inline"s when the function was a template anyway. Inline doesn't really mean "inline it" these days (compilers are good at making that decision), and it isn't necessary for header-only if the function is a template.
Rearranged some code.
Tweaked some terminology.
In the generated classes, and the code that works with them, struct content is now struct Content. That is, we just capitalized Content.

This simple change is consistent with NJOY classes generally having capitalized names.

More importantly, perhaps, I'm trying to make the code generator allow for as much flexibility as reasonably possible, given that users might be designing their own data format. (Not just using the code generator to build GNDS version-specific data structures.)

If someone actually writes a code-generator input spec that has a node called "content", it'll work now. :-) Not that we really expect that someone would call anything "content", but it *would* be a completely reasonable node name. By using upper-case terms for what we generate automatically - aside from terminology in someone's spec - people can use lower-case words in their specs and be confident that no conflict will arise.

That's the main thinking behind this simple name change.
Made miscellaneous small improvements in comments, names, etc.

BlockData now has implicit conversion to vector, when warranted.
It can be used where .get() was previously required.

Fixed the performance issue when default-constructing certain generated classes under certain circumstances.

Fixed some outdated (so, wrong) comments and strings in a couple of the test codes.
That fixed a problem with certain initializations being very time-consuming.
Properly split out and clarified "write" versus "print".
Removed some old debug messages.
Changed some terminology.
Trimmed some unused stuff from the KeyTuple (formerly KeywordTup) class.
Fix (I hope) error in g++ compilation.
I'll write much more in an upcoming pull request.
@staleyLANL staleyLANL requested a review from whaeck April 1, 2022 18:36
@staleyLANL staleyLANL self-assigned this Apr 1, 2022
@whaeck whaeck changed the base branch from master to develop June 30, 2022 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants