Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Java,C#,C++] Support field access order checking (#948)
This is a squashed merge of many commits. I've kept the original commit messages below for posterity; however, a better summary is available in the description of [PR #948](#948). Squashing makes it easy to revert, but sadly, it loses the fine-grained history available through `git blame`. --- * [Java] Add disabled tests around checking field access order. SBE does not support arbitrary access to fields when encoding and decoding. In particular, accessing groups and variable length fields advances a `limit`, which marks where the start of the next group/variable length field is expected. We aim to introduce some checking (in debug mode - when bounds checks are enabled) that verifies field access order is consistent enough with the order defined in the schema to encode or decode a valid message. * [Java] Spike an approach for checking field encode order. Also, enable some tests around field order checking. Outline of the approach: - Using a FSM, model the states of an encoder during encoding. - Hopefully, this model will be language independent. - Generate code to perform state transitions upon field accesses. - And to throw when transitions are not in the model. Assumptions: - JIT compilation will remove `if (DEBUG_MODE) { ... }` blocks when the static boolean constant, `DEBUG_MODE`, is `false`. Questions: - Is the approach reasonable? - Is the newly disallowed behaviour reasonable? - For example, it prevents encoding block fields after a group or variable length field has been encoded. Further work: - Decode path - More tests (including tests for decoding old messages where some fields may not be present) * [Java] Support printing field order access state machine. Use `-Dsbe.print.state.machine=$NAME_OF_MSG`. * [Java] Add checking around decoding order. This change builds upon the work to check field encoding order. It is a slightly more-complex problem than with encoding, as when decoding it is necessary to deal with multiple versions of encoded data. To handle this, we fuse together the state machines for decoding each possible version and branch on "decoder.wrap(...)". Further work: - More tests - Better error messages * [Java] Test field order checks across versions. "New" decoders (in Java at least) read null-like values for fields that are not present in "old" messages. There are no restrictions on the order in which these "new" fields are accessed when decoding "old" data. In Java, a version check returns a null-like value when a field is not present in the acting version. This behaviour matches the wiki documentation. C#, currently, has different behaviour: the application code needs to interrogate a boolean property, e.g., `IsFooInActingVersion`, before accessing the data property, e.g., `Foo`. The property itself assumes that the field is present. It will attempt to read it from the buffer and possibly advance a `limit` field. In this commit, I have moved the state transition beneath the existing version check in Java decoders. This change allows these fields to be accessed when decoding "old" messages without changing the state of decoding, which seems appropriate as there are no side-effects on internal state, e.g., the `limit` field. Also, in this commit, I have expanded the test coverage to include the getter and setter variants for fixed-size arrays and variable-length data. * [Java] Generate better error messages. Also, use an enum to represent codec states rather than a state number and generate (dot-based) state machine diagram in Javadoc for states enum. * [Java] Test field order checks with fixed-length strings. Also, fix some spots where I had forgotten to move the state transition below the version check in decoders. * [Java] Test field checks do not affect toString. * [Java] Fix field order checks to work with `decodedLength()`. * [Java] Test content of field order error messages. There is still work to do in this area. In particular: 1. States should be more descriptive. 1. Errors should contain full field paths. 2. Group error messages should be more clear, e.g., "cannot write group element before calling next()". * [Java] Fix field access order in tests and re-enable. * [Java] Permit block field access whenever it is safe (and, unfortunately, in some cases where it is not). This commit makes the field order checks more-lenient. The aim is to allow the application code to access fields in the root/top-level block or blocks inside groups whenever it is safe. However, the approach is broken and does not work in general. To (try to) achieve this aim, we rely on the fact that once a block is positioned correctly, it is safe to access during the remainder of encoding/decoding. For example, the following code is safe: ``` encoder.wrap(...); encoder.blockField1(42); encoder.varData("abc"); encoder.blockField2(43); ``` As soon as `wrap(...)` is called, the top-level block will be positioned over a valid region for the remainder of encoding. Groups are slightly trickier. The following code is _safe_: ``` encoder.wrap(...); Foo.BarEncoder bar = encoder.groupBarCount(2); bar.next() .x(1).y(2) .next(); .x(3).y(4); ``` But the following code is _unsafe_: ``` encoder.wrap(...); Foo.BarEncoder bar = encoder.groupBarCount(2); bar.x(1).y(2) .next(); .x(3).y(4); ``` And the following code is _unsafe_ too: ``` encoder.wrap(...) Foo.BarEncoder bar = encoder.groupBarCount(0); encoder.baz("abc"); bar.x(1).y(2); ``` The `offset` for the group encoder is only set when `next()` is called. Therefore, calling `x(1)` and `y(2)` will likely overwrite some data elsewhere in the buffer (probably near offset 0). We can still rely on the fact that once `next()` has been called - even if it is called multiple times, or too many times (causing it to throw) - the group's block fields will be safe to encode/decode. Thinking about this in terms of our state machine that models codec state and field accesses, we can put this another way. There is at least one transition that represents the `next()` method call. The states (transitively) reachable from these transitions can safely access the associated group block. As already mentioned, despite the passing tests, the approach in this commit doesn't always work. I had thought it would be possible to assign a number to each state as we find them using DFS and use this number as a shortcut to determining whether a block access was safe. For example, in each block field we would check: ``` if (currentState.number() >= firstStateAfterNextIsCalled.number()) { // ALLOW } else { // DENY } ``` However, (even ignoring the cycles due to repeating groups and the use of DFS) this solution makes the following assumptions: 1. There is a single state: `firstStateAfterNextIsCalled`, such that all subsequent transitively reachable states are safe. 2. It is possible to number the states in such a way that `numberOfState(b) >= numberOfState(a)` implies `b` is (transitively) reachable from `a`. Let's consider the following psuedo-schema w.r.t. our assumptions above: ``` group a { x: int32 } group b { y: int32 } ``` Can we set `x` if we've called `BEncoder#next`? Sometimes. The following code is invalid. ``` AEncoder a = encoder.aCount(0); BEncoder b = encoder.bCount(1).next(); b.y(2); a.x(1); // Whoops! We overwrote some data somewhere. ``` But this code is valid. ``` AEncoder a = encoder.aCount(1).next(); BEncoder b = encoder.bCount(1).next(); b.y(2); a.x(1); ``` Therefore, multiple transitions must exist for the `BEncoder#next` call with different exit states, which breaks Assumption 1. Assumption 2 is also problematic. As alluded to above, we must have a bifurcation of states depending on whether a group has elements or is empty. With arbitrary bifurcations, it is not possible to number the states in such a way that Assumption 2 holds. Consider the following state diagram: ``` state N / \ empty non-empty / \ state X state Y ``` Depending on whether a group is empty, e.g., `aCount(0)`, or not, e.g., `aCount(1)`, we transition to either `state X` or `state Y` from `state N`. For Assumption 2 to hold, we need `numberOf(state X) >= numberOf(state N)` (reachable) and `numberOf(state Y) >= numberOf(state N)` (reachable) and `numberOf(state X) < numberOf(state Y)` (unreachable) and `numberOf(state Y) < numberOf(state X)` (unreachable). Clearly, this is impossible. Prior to this commit, we avoided such problems by enforcing a more-strict ordering of field accesses. With this more-strict ordering, setting fields in a group block in later states was not allowed. Therefore, the state machine didn't have to have to split into mutually exclusive subgraphs of reachable states depending on whether or not the group is empty. For example, let's consider a psuedo-schema with two groups. ``` group a { x: int32 } group b { y: int32 } ``` Once the application code calls `encoder.bCount(2).next()`, our state machine will enter the `V0_B_N_BLOCK` state regardless of whether previously the application code called `encoder.aCount(0)` or `encoder.aCount(1).next()`. In `V_B_N_BLOCK` the application code is not allowed to call `AEncoder#x`; therefore, it is safe, i.e., it has no false positives. However, it is also too-strict, i.e., it has false negatives, e.g., the `encoder.aCount(1).next()` case. How likely are these false negatives to come up in the "wild"? If we think false negatives will often come up, is it possible to remove them whilst still using a state machine? We could consider generating more states to represent the reachable states under each fork of group emptiness. The number of states would grow exponentially w.r.t. the number of top-level groups, but let's ignore that for the moment and just consider whether or not it is possible. Continuing the example above with two groups, when the application code calls `encoder.bCount(2).next()`, the state it enters would depend on whether it had previously called `encoder.aCount(0)` or `encoder.aCount(n)...next()` for some value `n > 0`. To safely model this example, we'd need at least the following states: 1. has no `a` elements 2. has no `a` elements, and has no `b` elements 3. has `a` elements, but has not called `AEncoder#next` 4. has `a` elements, and has called `AEncoder#next` 5. has `a` elements, has called `AEncoder#next`, has `b` elements, but has not called `BEncoder#next` 6. has `a` elements, has called `AEncoder#next`, has `b` elements, and has called `BEncoder#next` 7. has no `a` elements, has `b` elements, but has not called `BEncoder#next` 8. has no `a` elements, has `b` elements, and has called `BEncoder#next` A trickier case is with nested groups: ``` group foo { x: int32 group bar { y: int32 } r: string s: string } ``` When is `Bar#y` safe to call? I believe it is safe to call in states where `FooEncoder#next` and `BarEncoder#next` have been called in-order more recently than `BarEncoder#count(0)`. Note, as `bar` is a group within `foo`, the `BarEncoder#next` and `BarEncoder#count` methods may be called repeatedly during the encoding of a single message. Implementing a more-permissive safety check, using this model, would entail checking that the current state of the encoder/decoder is in a state where this condition holds. To safely model this example, we'd need (at least) the following states: 1. has no `foo` elements 2. has `foo` elements, but has not called `FooEncoder#next` 3. has `foo` elements, and has called `FooEncoder#next` 4. has `foo` elements, has called `FooEncoder#next`, and has no `bar` elements 5. has `foo` elements, has called `FooEncoder#next`, has no `bar` elements, and has encoded `r` 6. has `foo` elements, has called `FooEncoder#next`, has no `bar` elements, and has encoded `s` 7. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, but has not called `BarEncoder#next` 8. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, has not called `BarEncoder#next`, and has encoded `r` 9. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, has not called `BarEncoder#next`, and has encoded `s` 10. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next` 11. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next`, and has encoded `r` 12. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next`, and has encoded `s` ``` FooEncoder foo = encoder.fooCount(3); // State 2 foo.next(); // State 3 foo.barCount(0); // State 4 foo.r("abc"); // State 5 foo.s("def"); // State 6 foo.next(); // State 4 BarEncoder bar = foo.barCount(1); // State 7 foo.r("abc"); // State 8 bar.next(); // State 11 bar.y(42); // State 11 foo.s("def"); // State 12 foo.next(); // State X? bar.y(43); // Weird but allowed. foo.barCount(0); // State Y? State X cannot be State 10, as State 10 should not allow _re-encoding_ of the bar group count ``` OK. We didn't have enough states. In the case where we "wrap back around", we need to model whether `bar` block access is allowed (State 13 vs State 3): 13. has `foo` elements, has called `FooEncoder#next`, has called `BarEncoder#next` for last iteration, but hasn't set `bar` group size in this iteration. An even trickier case is with doubly-nested groups: ``` group foo { x: int32 group bar { y: int32 group baz { z: int32 } } } ``` When is `Baz#z` safe to call? I believe it is safe to call in states where `FooEncoder#next`, `BarEncoder#next` and `BazEncoder#next` have been called in-order more recently than `BarEncoder#count(0)` or `BazEncoder#count(0). Encoding this in states is likely to result in more cases like State 13 above. Generalising: a group block field may be accessed only when all outer groups' and its own `#next` methods have been called in-order more recently than any outer groups' or its own `#count` method with a `0` argument. Can model "more-recently" exactly, i.e., with no false positives and no false negatives, using a state machine? Another way of looking at the state machines we generate, is as parsers for a language. Where our language, based on an SBE schema, comprises valid "strings" of calls to an encoder/decoder. Is this language a regular language? If not, representing it using a FSM will not work. Cue Pumping Lemma? Next steps: - Add more tests to show where this approach breaks - Revert the changes and see if it is acceptable that some valid uses of codecs will be prevented when enabling field order checking. I.e., it will give false negatives when testing the hypothesis "this encoding/decoding is valid". - Personally, I think false negatives are better than false positives. (False positives are possible as of this commit.) Having said that, it will discourage use of these checks in existing systems where valid code already exists that does not pass the checks. * [Java] Add (disabled) failing tests for false positives. (When testing the hypothesis "this decoding/encoding code is valid".) These tests show that there are problems with the approach in the last commit. * Revert "[Java] Permit block field access whenever it is safe (and, unfortunately, in some cases where it is not)." This reverts commit d9eb33d. * [Java] Treat top-level block fields as a special case. To reduce the number of false negatives the field access order checker generates, in this commit, we treat the top-level block fields differently. We know that once the top-level encoder is wrapped, these fields are always safe to access. As a result, we were able to enable several tests that previously failed. * [Java] Support turning on access checks independently of bounds checks. Introduces the system property: ``` sbe.enable.access.order.checks ``` * [Java] Attempt to optimise code with disabled access order checks. Some JIT compilers are able to eliminate dead code within branches of if-statements where the conditional is a static expression. Some JIT compilers perform inlining to some degree and use heuristics, e.g., method size. It isn't clear whether code size metrics will be calculated after dead code elimination is performed. In this commit, I've moved the code that performs state transitions when access order checks are enabled into separate methods that the encoders/decoders call upon field accesses, e.g., the getter for a field `xyz` might look like this: ``` int xyz() { if (ENABLED_ACCESS_ORDER_CHECKS) { onXyzAccessed(); } // ... } ``` This keeps the initial size of `xyz` smaller than when the code of `onXyzAccessed()` is "inlined" by the code generator. There is a lot of noise on my ThinkPad P14s, but benchmarks do show improvements after this change. Before: ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 18051405.792 ± 444296.898 ops/s CarBenchmark.testEncode thrpt 6 11964827.418 ± 221996.747 ops/s ``` After: ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 18132922.415 ± 1964596.044 ops/s CarBenchmark.testEncode thrpt 6 20018131.180 ± 671367.070 ops/s ``` Baseline (d2ec8f8): ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 19270366.357 ± 2618336.027 ops/s CarBenchmark.testEncode thrpt 6 22519718.535 ± 794169.180 ops/s ``` * [Java] Add feature toggle for access order check generation. The code generator will omit checks when the `sbe.generate.access.order.checks` system property is set to `false`. Adding the feature toggle allowed me to run the following benchmarks on my ThinkPad P14s, where (unfortunately) there is some noise. Baseline (d2ec8f8): ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# JMH version: 1.36 \# JMH version: 1.36 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 19270366.357 ± 2618336.027 ops/s CarBenchmark.testEncode thrpt 6 22519718.535 ± 794169.180 ops/s ``` This commit with new code generation disabled, i.e., `-Dsbe.generate.access.order.checks=false` at build time: ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 19290928.253 ± 2226604.698 ops/s CarBenchmark.testEncode thrpt 6 21957767.395 ± 567000.570 ops/s ``` This commit with new code generation enabled, but disabled checks, i.e., `-Dsbe.enable.access.order.checks=false` at runtime. ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 19238066.328 ± 2557533.640 ops/s CarBenchmark.testEncode thrpt 6 22284274.779 ± 520984.898 ops/s ``` Whilst it is hard to definitively conclude anything using my noisy machine to benchmark, it seems like there is no obvious performance regression when the runtime checks are disabled in comparison to omitting the checks at build time. * [Java] Improve performance of field access order checking. This commit switches the generated code to using an int to represent the state rather than a reference to an enum case. The change improves performance significantly, which I suspect is due to avoiding a write barrier that updates the card table. Relevant benchmark runs are below. Perhaps different JVM versions and GC options will yield different relative improvements. All benchmark runs use: - `-Dsbe.generate.access.order.checks=true` at build time - `-Dsbe.enable.access.order.checks=true` at runtime Using an enum: ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=true ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=true ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 8075519.641 ± 363346.224 ops/s CarBenchmark.testEncode thrpt 6 7211523.145 ± 504174.913 ops/s ``` Using an int: ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=true ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=true ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 13030684.127 ± 155782.171 ops/s CarBenchmark.testEncode thrpt 6 9781612.505 ± 1342559.378 ops/s ``` * [Java] Provide more context in error messages. In this commit, we associate field tokens with the "path" through the message to it. We can then use this to generate code with better error messages that include the path to the field rather than just the field name, which could have been ambiguous when there are groups. * [Java] Provide more context in error messages. In this commit, I've customised the error messages around group access to give a better indication of what was attempted. * [Java] Specify expected transitions in error messages. In this commit, I've improved the error messages around field access order checks so that it reports the legal transitions in the current state. Hopefully, this will make it easier for users to fix problems. * [Java] Use fewer states to model encoding. Encoders can only encode the latest version, unlike decoders. Therefore, we do not need to model field access patterns other than that of the latest version. * [Java] Refactor field access order checking. Overview: - `AccessOrderModel.CodecInteraction` models the possible interactions with an encoder or decoder: - Wrapping it around a buffer - Accessing its fields (block or var data) - Encoding/decoding the count of a repeating group - Moving to the next element in a repeating group - Moving to the last element in a repeating group - `AccessOrderModel` models the encoder/decoder state transitions w.r.t. these interactions * [Java] Address review comment. A disabled test had an incorrect assertion. Also, I've replaced the term "group length" with "group count" in the test names. * [Java] Add tests for variable-length data in nested group. These tests are based on Mike's review comments. * [Java] Generate check that encoding is complete. Previously, we checked the order of encoding was correct but not that the encoding reached a terminal state. In this commit, I've added code to generate a new method `checkEncodingIsComplete` on Java encoders. It verifies that a terminal state has been reached; otherwise, it throws an exception with information about the current state and transitions. * [Java] Initialise buffer bytes before encoding in tests. Check that null checks are actually working, rather than just returning 0s from the buffer. Also, fix usage of bitsets to always set all options. * [C#] Ignore personal Rider file. * [C#] Fix bug in string encoding method. Previously it would generate something like: ``` buffer.PutByte(someOffset, (ushort) someValue); // ^^^^^^^^ ``` Where the source value was too wide for the destination. * [C#] Port field access order tests. In this commit, I've ported most of the tests from the Java suite of tests for field access order checking to CSharp. To get the tests to compile, I had to generate an empty method stub for `CheckEncodingIsComplete()` on the codecs. Some generated APIs are different, e.g., around strings and arrays. Therefore, test cases and their numbers differ slightly in some places. * [Java] Fix indent. * [C#,Java] Port access order checking implementation to C#. Tests remain ignored due to two pre-existing problems in the C# codecs: 1. `ToString()` alters the codec state, e.g., `_limit`. 2. Some accessors do not check the field is present in the version being decoded. * [C#] Recover codec state after `ToString()`. Previously, the `ToString()` call would fail or complete but alter the state of the codec such that it was in a terminal state. Now, we treat `codecState` much like `limit` and recover the original value before returning control from `ToString()`. * [C#] Generate (more) presence checks in accessors. Previously, some accessors did not check that the acting version (being decoded) had the field/group/varData being accessed. Therefore, it might return garbage. But other accessors did check. In this commit, I've attempted to make more of the accessors (hopefully all) check the acting version before attempting to read data from the underlying buffer. This change should make the C# codecs closer to the Java decoders. * [C#] Enable access order checking tests. Having ported the implementation from Java and fixed some issues where the C# codecs were not checking the acting version when decoding data, the tests now pass. * [C++] Port access order checking tests and implementation to C++. In this commit, I also addressed these issues (1-3): --- 1. Tidy up string format usage. --- For convenience and readability, I had inlined some concatenation into `string.format(...)` templates; however, this makes it unclear whether the variable/method-output being concatenated uses template parameters. Therefore, I have adjusted to inject these strings via template parameters. FWIW I'd prefer to use `StringBuilder` everywhere, as I think it is hard to read large string format templates. --- 2. Explcitly delete copy constructors of flyweights. --- I found it easy to accidentally write sub-optimal code where I relied on copy constructors. For example: ``` MyMessage::MyGroup myGroup = myMessage.myGroup(10); ``` rather than: ``` MyMessage::MyGroup &myGroup = myMessage.myGroup(10); ``` Copying semantics are a bit strange when it comes to flyweights and trees of flyweights. Therefore, I think it is best to avoid entirely. This change is behind a feature flag. To re-enable implicit copy constructors specify the system property `-Dsbe.generate.implicit.copy.constructors=true`. --- 3. Adjust state machines to allow repeated decoding of variable-length --- data lengths, as accesses do not advance `limit`. * [C#] Improve indentation generation. Previously, some sections were not correctly aligned. * [Java, C#, C++] Model `resetCountToIndex()` state machine transition. Previously, the access order checks did not work with the `resetCountToIndex()` methods (generated on flyweights for repeating groups). Now, the valid transitions using these methods are modelled in the state machine and this is reflected in the code we generate for Java, C# and C++. Note that the `resetCountToIndex()` method does not change the `limit`/`position` of the message; therefore, it is only valid to use it when the `limit` aligns with a boundary between group elements. * [Java, C++, C#] Disable access order checking feature by default. We can deliver this new feature (and the C++ changes around copy constructors and copy assignment operators) without breaking any existing consumers by disabling the new code generation by default. Once released, and we've started to get feedback, we can decide whether we want to change the default and encourage our users to make use of the new feature. * [C++] Make access listener methods private. Also, fix some indentation issues. * [C++, C#] Tidy up. Removes superfluous space before SBE_NOEXCEPT specifier in C++. Tightens access modifiers in C#. * [Java, C++] Tidy up. Removes extraneous space in Java output when access order checks are disabled. Removes unnecessary access modifiers in C++ when access order checks are disabled. * [Java, C#, C++] Rename flags to enable field access order checking. The aim of this change is to avoid naming collisions with other flags, particularly in financial domains where "order" is used frequently. * [Java,C#,C++] Allow custom flag/properties to enable precedence checks. Some users would like the ability to enable precedence checks on some schemas but not others. Initially, I considered generating flags or system properties with the schema namespace; however, this forces the more-complex scheme on all users. It also might get unwieldly with long namespaces. It also assumes a one-to-one correspondence between namespaces and schemas. Instead, I have introduced two new system properties that can be applied at code generation time: - `sbe.precedence.checks.flagName`, which controls the symbol/macro used to enable precedence checks at build time in the generated C#/C++ code. It defaults to `"SBE_ENABLE_PRECEDENCE_CHECKS"`. - `sbe.precedence.checks.propName`, which controls the property name used to enable precedence checks at runtime in the generated Java code. It defaults to `"sbe.enable.precedence.checks"`. Now, users can run SbeTool with different values for these system properties to generate code with different enablement flags. Note above that the defaults are: - `SBE_ENABLE_PRECEDENCE_CHECKS` rather than `SBE_ENABLE_SEQUENCING_CHECKS`, and - `sbe.enable.precedence.checks` rather than `sbe.enable.sequencing.checks`. These changes and a change to another system property: - `sbe.generate.sequencing.checks` -> `sbe.generate.precedence.checks` address some feedback from Martin on the associated PR #948. * Address Mike's feedback around depending directly on system properties. Also, addresses Mike's feedback around the names of those system properties. In this commit, I've introduced a factory-like class: `PrecedenceChecks`. It creates models for message field access when the checks are enabled. It also provides a convenient place to group together some properties affecting the generation of field precedence checks. I have also renamed `AccessOrderModel` to `FieldPrecedenceModel` in this commit, as it aligns better with the system property names. * Address Mike's feedback: provide more specific diagram location. Previously, we only gave the (quite ambiguous) name of the inner class in the error message. Now, we give a qualified name that includes the codec/encoder/decoder class name (for C++, C#, and Java). * Validate PrecedenceChecks#Context in conclude. * Address Mike's feedback: move sys prop eval higher. In this commit, I've removed the `PrecedenceChecks#Configuration` class in favour of a static method within `TargetCodeGeneratorLoader`. Unfortunately, C# behaves differently to all the other code generators. To avoid duplicating the default configuration for precedence checks in the `CSharp` class, I've made the method publicly visible, which is a bit ugly. It has also introduced a cyclic dependency between the code generators, e.g., `JavaGenerator`, and `TargetCodeGeneratorLoader`, as in order to preserve the existing public signatures I have kept the old constructors around and supplied a default `PrecedenceChecks`. * Avoid cyclic dependency between classes. Move defaults out of property evaluation and into context.
- Loading branch information