Skip to content

Commit

Permalink
[Java,C#,C++] Support field access order checking (#948)
Browse files Browse the repository at this point in the history
This is a squashed merge of many commits. I've kept the original commit messages below for posterity; however, a better summary is available in the description of [PR #948](#948).

Squashing makes it easy to revert, but sadly, it loses the fine-grained history available through `git blame`.

---

* [Java] Add disabled tests around checking field access order.

SBE does not support arbitrary access to fields when encoding and
decoding. In particular, accessing groups and variable length fields
advances a `limit`, which marks where the start of the next
group/variable length field is expected.

We aim to introduce some checking (in debug mode - when bounds checks
are enabled) that verifies field access order is consistent enough with
the order defined in the schema to encode or decode a valid message.

* [Java] Spike an approach for checking field encode order.

Also, enable some tests around field order checking.

Outline of the approach:

- Using a FSM, model the states of an encoder during encoding.
    - Hopefully, this model will be language independent.
- Generate code to perform state transitions upon field accesses.
- And to throw when transitions are not in the model.

Assumptions:

- JIT compilation will remove `if (DEBUG_MODE) { ... }` blocks when the
static boolean constant, `DEBUG_MODE`, is `false`.

Questions:

- Is the approach reasonable?
- Is the newly disallowed behaviour reasonable?
    - For example, it prevents encoding block fields after a group or
variable length field has been encoded.

Further work:

- Decode path
- More tests (including tests for decoding old messages where some
fields may not be present)

* [Java] Support printing field order access state machine.

Use `-Dsbe.print.state.machine=$NAME_OF_MSG`.

* [Java] Add checking around decoding order.

This change builds upon the work to check field encoding order.
It is a slightly more-complex problem than with encoding, as when
decoding it is necessary to deal with multiple versions of encoded data.
To handle this, we fuse together the state machines for decoding each
possible version and branch on "decoder.wrap(...)".

Further work:

- More tests
- Better error messages

* [Java] Test field order checks across versions.

"New" decoders (in Java at least) read null-like values for fields
that are not present in "old" messages. There are no restrictions on the
order in which these "new" fields are accessed when decoding "old" data.

In Java, a version check returns a null-like value when a field is not
present in the acting version. This behaviour matches the wiki
documentation.

C#, currently, has different behaviour: the application code needs to
interrogate a boolean property, e.g., `IsFooInActingVersion`, before
accessing the data property, e.g., `Foo`. The property itself assumes
that the field is present. It will attempt to read it from the buffer
and possibly advance a `limit` field.

In this commit, I have moved the state transition beneath the existing
version check in Java decoders. This change allows these fields to be
accessed when decoding "old" messages without changing the state of
decoding, which seems appropriate as there are no side-effects on
internal state, e.g., the `limit` field.

Also, in this commit, I have expanded the test coverage to include the
getter and setter variants for fixed-size arrays and variable-length
data.

* [Java] Generate better error messages.

Also, use an enum to represent codec states rather than a state number
and generate (dot-based) state machine diagram in Javadoc for states
enum.

* [Java] Test field order checks with fixed-length strings.

Also, fix some spots where I had forgotten to move the state transition
below the version check in decoders.

* [Java] Test field checks do not affect toString.

* [Java] Fix field order checks to work with `decodedLength()`.

* [Java] Test content of field order error messages.

There is still work to do in this area. In particular:

1. States should be more descriptive.
1. Errors should contain full field paths.
2. Group error messages should be more clear, e.g., "cannot write group
element before calling next()".

* [Java] Fix field access order in tests and re-enable.

* [Java] Permit block field access whenever it is safe (and, unfortunately, in some cases where it is not).

This commit makes the field order checks more-lenient. The aim is to
allow the application code to access fields in the root/top-level block
or blocks inside groups whenever it is safe. However, the approach is
broken and does not work in general.

To (try to) achieve this aim, we rely on the fact that once a block is
positioned correctly, it is safe to access during the remainder of
encoding/decoding. For example, the following code is safe:

```
encoder.wrap(...);
encoder.blockField1(42);
encoder.varData("abc");
encoder.blockField2(43);
```

As soon as `wrap(...)` is called, the top-level block will be positioned
over a valid region for the remainder of encoding.

Groups are slightly trickier. The following code is _safe_:

```
encoder.wrap(...);
Foo.BarEncoder bar = encoder.groupBarCount(2);
bar.next()
    .x(1).y(2)
    .next();
    .x(3).y(4);
```

But the following code is _unsafe_:

```
encoder.wrap(...);
Foo.BarEncoder bar = encoder.groupBarCount(2);
bar.x(1).y(2)
    .next();
    .x(3).y(4);
```

And the following code is _unsafe_ too:

```
encoder.wrap(...)
Foo.BarEncoder bar = encoder.groupBarCount(0);
encoder.baz("abc");
bar.x(1).y(2);
```

The `offset` for the group encoder is only set when `next()` is called.
Therefore, calling `x(1)` and `y(2)` will likely overwrite some data
elsewhere in the buffer (probably near offset 0).

We can still rely on the fact that once `next()` has been called - even
if it is called multiple times, or too many times (causing it to throw)
- the group's block fields will be safe to encode/decode.

Thinking about this in terms of our state machine that models codec
state and field accesses, we can put this another way. There is at least
one transition that represents the `next()` method call. The states
(transitively) reachable from these transitions can safely access the
associated group block.

As already mentioned, despite the passing tests, the approach in this
commit doesn't always work.

I had thought it would be possible to assign a number to each state as
we find them using DFS and use this number as a shortcut to determining
whether a block access was safe. For example, in each block field we
would check:

```
if (currentState.number() >= firstStateAfterNextIsCalled.number())
{
    // ALLOW
}
else
{
    // DENY
}
```

However, (even ignoring the cycles due to repeating groups and the use
of DFS) this solution makes the following assumptions:

1. There is a single state: `firstStateAfterNextIsCalled`, such that all
subsequent transitively reachable states are safe.

2. It is possible to number the states in such a way that
`numberOfState(b) >= numberOfState(a)` implies `b` is (transitively)
reachable from `a`.

Let's consider the following psuedo-schema w.r.t. our assumptions above:

```
group a {
  x: int32
}
group b {
  y: int32
}
```

Can we set `x` if we've called `BEncoder#next`? Sometimes.

The following code is invalid.

```
AEncoder a = encoder.aCount(0);
BEncoder b = encoder.bCount(1).next();
b.y(2);
a.x(1); // Whoops! We overwrote some data somewhere.
```

But this code is valid.

```
AEncoder a = encoder.aCount(1).next();
BEncoder b = encoder.bCount(1).next();
b.y(2);
a.x(1);
```

Therefore, multiple transitions must exist for the `BEncoder#next` call
with different exit states, which breaks Assumption 1.

Assumption 2 is also problematic. As alluded to above, we must have a
bifurcation of states depending on whether a group has elements or is
empty. With arbitrary bifurcations, it is not possible to number the
states in such a way that Assumption 2 holds.

Consider the following state diagram:

```
        state N
         /   \
     empty   non-empty
      /        \
state X        state Y
```

Depending on whether a group is empty, e.g., `aCount(0)`, or not, e.g.,
`aCount(1)`, we transition to either `state X` or `state Y` from
`state N`.

For Assumption 2 to hold, we need
`numberOf(state X) >= numberOf(state N)` (reachable) and
`numberOf(state Y) >= numberOf(state N)` (reachable) and
`numberOf(state X)  < numberOf(state Y)` (unreachable) and
`numberOf(state Y)  < numberOf(state X)` (unreachable).
Clearly, this is impossible.

Prior to this commit, we avoided such problems by enforcing a
more-strict ordering of field accesses. With this more-strict ordering,
setting fields in a group block in later states was not allowed.
Therefore, the state machine didn't have to have to split into mutually
exclusive subgraphs of reachable states depending on whether or not the
group is empty.

For example, let's consider a psuedo-schema with two groups.

```
group a {
  x: int32
}
group b {
  y: int32
}
```

Once the application code calls `encoder.bCount(2).next()`, our state
machine will enter the `V0_B_N_BLOCK` state regardless of whether
previously the application code called `encoder.aCount(0)` or
`encoder.aCount(1).next()`. In `V_B_N_BLOCK` the application code is not
allowed to call `AEncoder#x`; therefore, it is safe, i.e., it has no
false positives. However, it is also too-strict, i.e., it has false
negatives, e.g., the `encoder.aCount(1).next()` case.

How likely are these false negatives to come up in the "wild"?

If we think false negatives will often come up, is it possible to remove
them whilst still using a state machine?

We could consider generating more states to represent the reachable
states under each fork of group emptiness. The number of states would
grow exponentially w.r.t. the number of top-level groups, but let's
ignore that for the moment and just consider whether or not it is
possible.

Continuing the example above with two groups, when the application code
calls `encoder.bCount(2).next()`, the state it enters would depend on
whether it had previously called `encoder.aCount(0)` or
`encoder.aCount(n)...next()` for some value `n > 0`.

To safely model this example, we'd need at least the following states:

1. has no `a` elements
2. has no `a` elements, and has no `b` elements
3. has `a` elements, but has not called `AEncoder#next`
4. has `a` elements, and has called `AEncoder#next`
5. has `a` elements, has called `AEncoder#next`, has `b` elements, but
has not called `BEncoder#next`
6. has `a` elements, has called `AEncoder#next`, has `b` elements, and
has called `BEncoder#next`
7. has no `a` elements, has `b` elements, but has not called
`BEncoder#next`
8. has no `a` elements, has `b` elements, and has called
`BEncoder#next`

A trickier case is with nested groups:

```
group foo {
  x: int32
  group bar {
    y: int32
  }
  r: string
  s: string
}
```

When is `Bar#y` safe to call?

I believe it is safe to call in states where `FooEncoder#next` and
`BarEncoder#next` have been called in-order more recently than
`BarEncoder#count(0)`. Note, as `bar` is a group within `foo`, the
`BarEncoder#next` and `BarEncoder#count` methods may be called
repeatedly during the encoding of a single message.  Implementing a
more-permissive safety check, using this model, would entail checking
that the current state of the encoder/decoder is in a state where this
condition holds.

To safely model this example, we'd need (at least) the following states:

1. has no `foo` elements
2. has `foo` elements, but has not called `FooEncoder#next`
3. has `foo` elements, and has called `FooEncoder#next`
4. has `foo` elements, has called `FooEncoder#next`, and has no `bar`
elements
5. has `foo` elements, has called `FooEncoder#next`, has no `bar`
elements, and has encoded `r`
6. has `foo` elements, has called `FooEncoder#next`, has no `bar`
elements, and has encoded `s`
7. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
but has not called `BarEncoder#next`
8. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
has not called `BarEncoder#next`, and has encoded `r`
9. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
has not called `BarEncoder#next`, and has encoded `s`
10. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
and has called `BarEncoder#next`
11. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
and has called `BarEncoder#next`, and has encoded `r`
12. has `foo` elements, has called `FooEncoder#next`, has `bar` elements,
and has called `BarEncoder#next`, and has encoded `s`

```
FooEncoder foo = encoder.fooCount(3); // State 2
foo.next(); // State 3
foo.barCount(0); // State 4
foo.r("abc"); // State 5
foo.s("def"); // State 6
foo.next(); // State 4
BarEncoder bar = foo.barCount(1); // State 7
foo.r("abc"); // State 8
bar.next(); // State 11
bar.y(42); // State 11
foo.s("def"); // State 12
foo.next(); // State X?
bar.y(43); // Weird but allowed.
foo.barCount(0); // State Y? State X cannot be State 10, as State 10 should not allow _re-encoding_ of the bar group count
```

OK. We didn't have enough states. In the case where we "wrap back
around", we need to model whether `bar` block access is allowed (State 13 vs
State 3):

13. has `foo` elements, has called `FooEncoder#next`, has called
`BarEncoder#next` for last iteration, but hasn't set `bar` group size in
this iteration.

An even trickier case is with doubly-nested groups:

```
group foo {
  x: int32
  group bar {
    y: int32
    group baz {
      z: int32
    }
  }
}
```

When is `Baz#z` safe to call?

I believe it is safe to call in states where `FooEncoder#next`,
`BarEncoder#next` and `BazEncoder#next` have been called in-order
more recently than `BarEncoder#count(0)` or `BazEncoder#count(0).
Encoding this in states is likely to result in more cases like State 13
above.

Generalising: a group block field may be accessed only when all outer
groups' and its own `#next` methods have been called in-order more
recently than any outer groups' or its own `#count` method with a `0`
argument.

Can model "more-recently" exactly, i.e., with no false positives and no
false negatives, using a state machine?

Another way of looking at the state machines we generate, is as parsers
for a language. Where our language, based on an SBE schema, comprises
valid "strings" of calls to an encoder/decoder. Is this language
a regular language? If not, representing it using a FSM will not work.
Cue Pumping Lemma?

Next steps:

- Add more tests to show where this approach breaks
- Revert the changes and see if it is acceptable that some valid uses of
codecs will be prevented when enabling field order checking. I.e., it
will give false negatives when testing the hypothesis "this
encoding/decoding is valid".
    - Personally, I think false negatives are better than false
positives. (False positives are possible as of this commit.) Having said
that, it will discourage use of these checks in existing systems where
valid code already exists that does not pass the checks.

* [Java] Add (disabled) failing tests for false positives.

(When testing the hypothesis "this decoding/encoding code is valid".)

These tests show that there are problems with the approach in the last
commit.

* Revert "[Java] Permit block field access whenever it is safe (and, unfortunately, in some cases where it is not)."

This reverts commit d9eb33d.

* [Java] Treat top-level block fields as a special case.

To reduce the number of false negatives the field access order checker
generates, in this commit, we treat the top-level block fields
differently.

We know that once the top-level encoder is wrapped, these fields are
always safe to access.

As a result, we were able to enable several tests that previously
failed.

* [Java] Support turning on access checks independently of bounds checks.

Introduces the system property:

```
sbe.enable.access.order.checks
```

* [Java] Attempt to optimise code with disabled access order checks.

Some JIT compilers are able to eliminate dead code within branches of
if-statements where the conditional is a static expression. Some JIT
compilers perform inlining to some degree and use heuristics, e.g.,
method size. It isn't clear whether code size metrics will be calculated
after dead code elimination is performed.

In this commit, I've moved the code that performs state transitions when
access order checks are enabled into separate methods that the
encoders/decoders call upon field accesses, e.g., the getter for a field
`xyz` might look like this:

```
int xyz()
{
    if (ENABLED_ACCESS_ORDER_CHECKS)
    {
       onXyzAccessed();
    }

    // ...
}
```

This keeps the initial size of `xyz` smaller than when the code of
`onXyzAccessed()` is "inlined" by the code generator.

There is a lot of noise on my ThinkPad P14s, but benchmarks do show
improvements after this change.

Before:

```
$ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2

\# JMH version: 1.36
\# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
\# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java
\# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false

...

Benchmark                 Mode  Cnt         Score        Error  Units
CarBenchmark.testDecode  thrpt    6  18051405.792 ± 444296.898  ops/s
CarBenchmark.testEncode  thrpt    6  11964827.418 ± 221996.747  ops/s
```

After:

```
$ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2
\# JMH version: 1.36
\# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
\# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java
\# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false

...

Benchmark                 Mode  Cnt         Score         Error  Units
CarBenchmark.testDecode  thrpt    6  18132922.415 ± 1964596.044  ops/s
CarBenchmark.testEncode  thrpt    6  20018131.180 ±  671367.070  ops/s
```

Baseline (d2ec8f8):

```
$ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2
\# JMH version: 1.36
\# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
\# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java
\# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false

...

Benchmark                 Mode  Cnt         Score         Error  Units
CarBenchmark.testDecode  thrpt    6  19270366.357 ± 2618336.027  ops/s
CarBenchmark.testEncode  thrpt    6  22519718.535 ±  794169.180  ops/s
```

* [Java] Add feature toggle for access order check generation.

The code generator will omit checks when the
`sbe.generate.access.order.checks` system property is set to `false`.

Adding the feature toggle allowed me to run the following benchmarks on
my ThinkPad P14s, where (unfortunately) there is some noise.

Baseline (d2ec8f8):

```
$ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2
\# JMH version: 1.36
\# JMH version: 1.36
\# JMH version: 1.36
\# JMH version: 1.36
\# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
\# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java
\# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false

...

Benchmark                 Mode  Cnt         Score         Error  Units
CarBenchmark.testDecode  thrpt    6  19270366.357 ± 2618336.027  ops/s
CarBenchmark.testEncode  thrpt    6  22519718.535 ±  794169.180  ops/s
```

This commit with new code generation disabled, i.e.,
`-Dsbe.generate.access.order.checks=false` at build time:

```
$ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2
\# JMH version: 1.36
\# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
\# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java
\# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false

...

Benchmark                 Mode  Cnt         Score         Error  Units
CarBenchmark.testDecode  thrpt    6  19290928.253 ± 2226604.698  ops/s
CarBenchmark.testEncode  thrpt    6  21957767.395 ±  567000.570  ops/s
```

This commit with new code generation enabled, but disabled checks, i.e.,
`-Dsbe.enable.access.order.checks=false` at runtime.

```
$ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2
\# JMH version: 1.36
\# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
\# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java
\# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false

...

Benchmark                 Mode  Cnt         Score         Error  Units
CarBenchmark.testDecode  thrpt    6  19238066.328 ± 2557533.640  ops/s
CarBenchmark.testEncode  thrpt    6  22284274.779 ±  520984.898  ops/s
```

Whilst it is hard to definitively conclude anything using my noisy
machine to benchmark, it seems like there is no obvious performance
regression when the runtime checks are disabled in comparison to
omitting the checks at build time.

* [Java] Improve performance of field access order checking.

This commit switches the generated code to using an int to represent
the state rather than a reference to an enum case. The change improves
performance significantly, which I suspect is due to avoiding a write
barrier that updates the card table. Relevant benchmark runs are below.
Perhaps different JVM versions and GC options will yield different
relative improvements.

All benchmark runs use:
- `-Dsbe.generate.access.order.checks=true` at build time
- `-Dsbe.enable.access.order.checks=true` at runtime

Using an enum:

```
$ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=true ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2
\# JMH version: 1.36
\# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
\# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java
\# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=true

...

Benchmark                 Mode  Cnt        Score        Error  Units
CarBenchmark.testDecode  thrpt    6  8075519.641 ± 363346.224  ops/s
CarBenchmark.testEncode  thrpt    6  7211523.145 ± 504174.913  ops/s
```

Using an int:

```
$ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=true ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2
\# JMH version: 1.36
\# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
\# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java
\# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=true

...

Benchmark                 Mode  Cnt         Score         Error  Units
CarBenchmark.testDecode  thrpt    6  13030684.127 ±  155782.171  ops/s
CarBenchmark.testEncode  thrpt    6   9781612.505 ± 1342559.378  ops/s
```

* [Java] Provide more context in error messages.

In this commit, we associate field tokens with the "path" through the
message to it. We can then use this to generate code with better error
messages that include the path to the field rather than just the field
name, which could have been ambiguous when there are groups.

* [Java] Provide more context in error messages.

In this commit, I've customised the error messages around group access
to give a better indication of what was attempted.

* [Java] Specify expected transitions in error messages.

In this commit, I've improved the error messages around field access
order checks so that it reports the legal transitions in the current
state. Hopefully, this will make it easier for users to fix problems.

* [Java] Use fewer states to model encoding.

Encoders can only encode the latest version, unlike decoders. Therefore,
we do not need to model field access patterns other than that of the
latest version.

* [Java] Refactor field access order checking.

Overview:
- `AccessOrderModel.CodecInteraction` models the possible interactions
with an encoder or decoder:
    - Wrapping it around a buffer
    - Accessing its fields (block or var data)
    - Encoding/decoding the count of a repeating group
    - Moving to the next element in a repeating group
    - Moving to the last element in a repeating group
- `AccessOrderModel` models the encoder/decoder state transitions w.r.t.
these interactions

* [Java] Address review comment.

A disabled test had an incorrect assertion.

Also, I've replaced the term "group length" with "group count" in the
test names.

* [Java] Add tests for variable-length data in nested group.

These tests are based on Mike's review comments.

* [Java] Generate check that encoding is complete.

Previously, we checked the order of encoding was correct but not that
the encoding reached a terminal state.

In this commit, I've added code to generate a new method
`checkEncodingIsComplete` on Java encoders. It verifies that a terminal
state has been reached; otherwise, it throws an exception with
information about the current state and transitions.

* [Java] Initialise buffer bytes before encoding in tests.

Check that null checks are actually working, rather than just returning
0s from the buffer.

Also, fix usage of bitsets to always set all options.

* [C#] Ignore personal Rider file.

* [C#] Fix bug in string encoding method.

Previously it would generate something like:

```
buffer.PutByte(someOffset, (ushort) someValue);
//                         ^^^^^^^^
```

Where the source value was too wide for the destination.

* [C#] Port field access order tests.

In this commit, I've ported most of the tests from the Java suite of
tests for field access order checking to CSharp.

To get the tests to compile, I had to generate an empty method stub for
`CheckEncodingIsComplete()` on the codecs.

Some generated APIs are different, e.g., around strings and arrays.
Therefore, test cases and their numbers differ slightly in some places.

* [Java] Fix indent.

* [C#,Java] Port access order checking implementation to C#.

Tests remain ignored due to two pre-existing problems in the C# codecs:

1. `ToString()` alters the codec state, e.g., `_limit`.
2. Some accessors do not check the field is present in the version being
decoded.

* [C#] Recover codec state after `ToString()`.

Previously, the `ToString()` call would fail or complete but alter the
state of the codec such that it was in a terminal state. Now, we treat
`codecState` much like `limit` and recover the original value before
returning control from `ToString()`.

* [C#] Generate (more) presence checks in accessors.

Previously, some accessors did not check that the acting version (being
decoded) had the field/group/varData being accessed. Therefore, it might
return garbage. But other accessors did check.

In this commit, I've attempted to make more of the accessors (hopefully
all) check the acting version before attempting to read data from the
underlying buffer.

This change should make the C# codecs closer to the Java decoders.

* [C#] Enable access order checking tests.

Having ported the implementation from Java and fixed some issues where
the C# codecs were not checking the acting version when decoding data,
the tests now pass.

* [C++] Port access order checking tests and implementation to C++.

In this commit, I also addressed these issues (1-3):

--- 1. Tidy up string format usage. ---

For convenience and readability, I had inlined some concatenation into
`string.format(...)` templates; however, this makes it unclear whether
the variable/method-output being concatenated uses template parameters.
Therefore, I have adjusted to inject these strings via template
parameters. FWIW I'd prefer to use `StringBuilder` everywhere, as I
think it is hard to read large string format templates.

--- 2. Explcitly delete copy constructors of flyweights. ---

I found it easy to accidentally write sub-optimal code where I relied on
copy constructors. For example:

```
MyMessage::MyGroup myGroup = myMessage.myGroup(10);
```

rather than:

```
MyMessage::MyGroup &myGroup = myMessage.myGroup(10);
```

Copying semantics are a bit strange when it comes to flyweights and
trees of flyweights. Therefore, I think it is best to avoid entirely.

This change is behind a feature flag. To re-enable implicit copy
constructors specify the system property
`-Dsbe.generate.implicit.copy.constructors=true`.

--- 3. Adjust state machines to allow repeated decoding of variable-length ---
data lengths, as accesses do not advance `limit`.

* [C#] Improve indentation generation.

Previously, some sections were not correctly aligned.

* [Java, C#, C++] Model `resetCountToIndex()` state machine transition.

Previously, the access order checks did not work with the
`resetCountToIndex()` methods (generated on flyweights for repeating
groups). Now, the valid transitions using these methods are modelled in
the state machine and this is reflected in the code we generate for
Java, C# and C++.

Note that the `resetCountToIndex()` method does not change the
`limit`/`position` of the message; therefore, it is only valid to use it
when the `limit` aligns with a boundary between group elements.

* [Java, C++, C#] Disable access order checking feature by default.

We can deliver this new feature (and the C++ changes around copy
constructors and copy assignment operators) without breaking any
existing consumers by disabling the new code generation by default.

Once released, and we've started to get feedback, we can decide whether
we want to change the default and encourage our users to make use of the
new feature.

* [C++] Make access listener methods private.

Also, fix some indentation issues.

* [C++, C#] Tidy up.

Removes superfluous space before SBE_NOEXCEPT specifier in C++.

Tightens access modifiers in C#.

* [Java, C++] Tidy up.

Removes extraneous space in Java output when access order checks are
disabled.

Removes unnecessary access modifiers in C++ when access order checks are
disabled.

* [Java, C#, C++] Rename flags to enable field access order checking.

The aim of this change is to avoid naming collisions with other flags,
particularly in financial domains where "order" is used frequently.

* [Java,C#,C++] Allow custom flag/properties to enable precedence checks.

Some users would like the ability to enable precedence checks on some
schemas but not others.

Initially, I considered generating flags or system properties with the
schema namespace; however, this forces the more-complex scheme on all
users. It also might get unwieldly with long namespaces. It also assumes
a one-to-one correspondence between namespaces and schemas.

Instead, I have introduced two new system properties that can be applied
at code generation time:

- `sbe.precedence.checks.flagName`, which controls the symbol/macro used
to enable precedence checks at build time in the generated C#/C++ code.
It defaults to `"SBE_ENABLE_PRECEDENCE_CHECKS"`.

- `sbe.precedence.checks.propName`, which controls the property name
used to enable precedence checks at runtime in the generated Java code.
It defaults to `"sbe.enable.precedence.checks"`.

Now, users can run SbeTool with different values for these system
properties to generate code with different enablement flags.

Note above that the defaults are:

- `SBE_ENABLE_PRECEDENCE_CHECKS` rather than
`SBE_ENABLE_SEQUENCING_CHECKS`, and
- `sbe.enable.precedence.checks` rather than
`sbe.enable.sequencing.checks`.

These changes and a change to another system property:

- `sbe.generate.sequencing.checks` -> `sbe.generate.precedence.checks`

address some feedback from Martin on the associated PR #948.

* Address Mike's feedback around depending directly on system properties.

Also, addresses Mike's feedback around the names of those system
properties.

In this commit, I've introduced a factory-like class:
`PrecedenceChecks`. It creates models for message field access when the
checks are enabled. It also provides a convenient place to group
together some properties affecting the generation of field precedence
checks.

I have also renamed `AccessOrderModel` to `FieldPrecedenceModel` in this
commit, as it aligns better with the system property names.

* Address Mike's feedback: provide more specific diagram location.

Previously, we only gave the (quite ambiguous) name of the inner class
in the error message. Now, we give a qualified name that includes the
codec/encoder/decoder class name (for C++, C#, and Java).

* Validate PrecedenceChecks#Context in conclude.

* Address Mike's feedback: move sys prop eval higher.

In this commit, I've removed the `PrecedenceChecks#Configuration` class
in favour of a static method within `TargetCodeGeneratorLoader`.

Unfortunately, C# behaves differently to all the other code generators.
To avoid duplicating the default configuration for precedence checks in
the `CSharp` class, I've made the method publicly visible, which is a
bit ugly.

It has also introduced a cyclic dependency between the code generators,
e.g., `JavaGenerator`, and `TargetCodeGeneratorLoader`, as in order to
preserve the existing public signatures I have kept the old constructors
around and supplied a default `PrecedenceChecks`.

* Avoid cyclic dependency between classes.

Move defaults out of property evaluation and into context.
  • Loading branch information
ZachBray authored Dec 6, 2023
1 parent dd6063f commit a259c7d
Show file tree
Hide file tree
Showing 25 changed files with 18,686 additions and 469 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ out
# cmake
cmake-build-debug
codecs
.cmake
CMakeFiles
thirdparty

# cpp build linux
cppbuild/CMakeCache.txt
Expand Down Expand Up @@ -101,10 +104,12 @@ csharp/sbe-generated/issue483
csharp/sbe-generated/issue560
csharp/sbe-generated/issue661
csharp/sbe-generated/since-deprecated
csharp/sbe-generated/order_check
csharp/sbe-generated/mktdata/*.cs
csharp/sbe-generated/uk_co_real_logic_sbe_benchmarks_fix
csharp/sbe-tests/*.sbe
csharp/nuget/
csharp/csharp.sln.DotSettings.user

# rust
rust/target
Expand Down
26 changes: 20 additions & 6 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,9 @@ subprojects {
}

javaLauncher.set(toolchainLauncher)

systemProperty 'sbe.enable.ir.precedence.checks', 'true'
systemProperty 'sbe.enable.test.precedence.checks', 'true'
}
}

Expand Down Expand Up @@ -279,9 +282,12 @@ project(':sbe-tool') {
'sbe.output.dir': generatedDir,
'sbe.target.language': 'Java',
'sbe.validation.stop.on.error': 'true',
'sbe.validation.xsd': validationXsdPath)
'sbe.validation.xsd': validationXsdPath,
'sbe.generate.precedence.checks': 'true',
'sbe.java.precedence.checks.property.name': 'sbe.enable.test.precedence.checks')
args = ['src/test/resources/json-printer-test-schema.xml',
'src/test/resources/composite-elements-schema.xml']
'src/test/resources/composite-elements-schema.xml',
'src/test/resources/field-order-check-schema.xml']
}

jar {
Expand Down Expand Up @@ -541,7 +547,8 @@ project(':sbe-benchmarks') {
'sbe.validation.stop.on.error': 'true',
'sbe.validation.xsd': validationXsdPath,
'sbe.java.encoding.buffer.type': 'org.agrona.concurrent.UnsafeBuffer',
'sbe.java.decoding.buffer.type': 'org.agrona.concurrent.UnsafeBuffer')
'sbe.java.decoding.buffer.type': 'org.agrona.concurrent.UnsafeBuffer',
'sbe.generate.precedence.checks': 'false')
args = ['src/main/resources/car.xml', 'src/main/resources/fix-message-samples.xml']
}

Expand Down Expand Up @@ -730,13 +737,15 @@ tasks.register('generateCSharpCodecsTests', JavaExec) {
'sbe.output.dir': 'csharp/sbe-generated',
'sbe.target.language': 'uk.co.real_logic.sbe.generation.csharp.CSharp',
'sbe.xinclude.aware': 'true',
'sbe.validation.xsd': validationXsdPath)
'sbe.validation.xsd': validationXsdPath,
'sbe.generate.precedence.checks': 'true')
args = ['sbe-tool/src/test/resources/FixBinary.xml',
'sbe-tool/src/test/resources/issue435.xml',
'sbe-tool/src/test/resources/issue483.xml',
'sbe-tool/src/test/resources/issue560.xml',
'sbe-tool/src/test/resources/since-deprecated-test-schema.xml',
'sbe-tool/src/test/resources/example-bigendian-test-schema.xml',
'sbe-tool/src/test/resources/field-order-check-schema.xml',
'sbe-benchmarks/src/main/resources/fix-message-samples.xml']
}

Expand All @@ -753,7 +762,10 @@ tasks.register('generateJavaIrCodecs', JavaExec) {
systemProperties(
'sbe.output.dir': 'sbe-tool/src/main/java',
'sbe.target.language': 'Java',
'sbe.validation.xsd': validationXsdPath)
'sbe.validation.xsd': validationXsdPath,
'sbe.generate.precedence.checks': 'true',
'sbe.precedence.checks.flag.name': 'SBE_ENABLE_IR_PRECEDENCE_CHECKS',
'sbe.java.precedence.checks.property.name': 'sbe.enable.ir.precedence.checks')
args = ['sbe-tool/src/main/resources/sbe-ir.xml']
}

Expand All @@ -763,7 +775,9 @@ tasks.register('generateCppIrCodecs', JavaExec) {
systemProperties(
'sbe.output.dir': 'sbe-tool/src/main/cpp',
'sbe.target.language': 'cpp',
'sbe.validation.xsd': validationXsdPath)
'sbe.validation.xsd': validationXsdPath,
'sbe.generate.precedence.checks': 'true',
'sbe.precedence.checks.flag.name': 'SBE_ENABLE_IR_PRECEDENCE_CHECKS')
args = ['sbe-tool/src/main/resources/sbe-ir.xml']
}

Expand Down
8 changes: 8 additions & 0 deletions csharp/sbe-generated/sbe-generated.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@
<Copyright>Copyright (C) Bill Segall 2018, MarketFactory Inc 2017, Adaptive 2014. All rights reserved.</Copyright>
</PropertyGroup>

<PropertyGroup Condition=" '$(Configuration)' == 'Debug' ">
<DefineConstants>TRACE,SBE_ENABLE_PRECEDENCE_CHECKS</DefineConstants>
</PropertyGroup>

<PropertyGroup Condition=" '$(Configuration)' == 'Release' ">
<DefineConstants>TRACE,SBE_ENABLE_PRECEDENCE_CHECKS</DefineConstants>
</PropertyGroup>

<ItemGroup>
<ProjectReference Include="..\sbe-dll\sbe-dll.csproj" />
</ItemGroup>
Expand Down
Loading

0 comments on commit a259c7d

Please sign in to comment.