Skip to content

Commit

Permalink
README: Expand on SepReader IEnumerable yet not LINQ compatible (#207)
Browse files Browse the repository at this point in the history
Rename internal Select method.
  • Loading branch information
nietras authored Dec 2, 2024
1 parent 19b5087 commit 427ebc9
Show file tree
Hide file tree
Showing 4 changed files with 82 additions and 24 deletions.
63 changes: 48 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,14 +214,19 @@ struct`](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/buil
(please follow the `ref struct` link and understand how this limits the usage of
those). This is due to these types being simple *facades* or indirections to the
underlying reader or writer. That means you cannot use LINQ or create an array
of all rows like `reader.ToArray()` as the reader is not `IEnumerable<>` either
since `ref struct`s cannot be used in interfaces, which is in fact the point.
Hence, you need to parse or copy to different types instead. The same applies to
`Col`/`Cols` which point to internal state that is also reused. This is to avoid
repeated allocations for each row and get the best possible performance, while
still defining a well structured and straightforward API that guides users to
relevant functionality. See [Why SepReader Is Not IEnumerable and LINQ
Compatible](#why-sepreader-is-not-ienumerable-and-linq-compatible) for more.
of all rows like `reader.ToArray()`. While for .NET9+ the reader is now
`IEnumerable<>` since `ref struct`s can now be used in interfaces that have
[`where T: allows ref
struct`](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-13.0/ref-struct-interfaces)
this still does not mean it is LINQ compatible. Hence, if you need store per row
state or similar you need to parse or copy to different types instead. The same
applies to `Col`/`Cols` which point to internal state that is also reused. This
is to avoid repeated allocations for each row and get the best possible
performance, while still defining a well structured and straightforward API that
guides users to relevant functionality. See [Why SepReader Was Not IEnumerable
Until .NET 9 and Is Not LINQ
Compatible](#why-sepreader-was-not-ienumerable-until-net-9-and-is-not-linq-compatible)
for more.

⚠ For a full overview of public types and methods see [Public API
Reference](#public-api-reference).
Expand Down Expand Up @@ -485,10 +490,11 @@ If you hover over `col` you should see:
"\"Apple\r\nBanana\r\nOrange\r\nPear\""
```

#### Why SepReader Is Not IEnumerable and LINQ Compatible
#### Why SepReader Was Not IEnumerable Until .NET 9 and Is Not LINQ Compatible
As mentioned earlier Sep only allows enumeration and access to one row at a time
and `SepReader.Row` is just a simple *facade* or indirection to the underlying
reader. This is why it is defined as a `ref struct`. In fact, the following code:
reader. This is why it is defined as a `ref struct`. In fact, the following
code:
```csharp
using var reader = Sep.Reader().FromText(text);
foreach (var row in reader)
Expand All @@ -503,9 +509,9 @@ while (reader.MoveNext())
}
```
where `row` is just a *facade* for exposing row specific functionality. That is,
`row` is still basically the `reader` underneath. Hence, let's imagine *if*
`SepReader` did implement `IEnumerable<SepReader.Row>` and the `Row` was *not* a
`ref struct`. Then, you would be able to write something like below:
`row` is still basically the `reader` underneath. Hence, let's look at using
LINQ with `SepReader` implementing `IEnumerable<SepReader.Row>` and the `Row`
*not* being a `ref struct`. Then, you would be able to write something like below:
```csharp
using var reader = Sep.Reader().FromText(text);
SepReader.Row[] rows = reader.ToArray();
Expand All @@ -529,6 +535,33 @@ cols on that. This API, however, is in this authors opinion not ideal and can be
a bit confusing, which is why Sep is designed like it is. The downside is the
above caveat.

The main culprit above is that for example `ToArray()` would store a `ref
struct` in a heap allocated array, the actual enumeration is not a problem and
hence implementing `IEnumerable<SepReader.Row>` is not the problem as such. The
problem was that prior to .NET 9 it was not possible to implement this interface
with `T` being a `ref struct`, but with C# 13 `allows ref struct` and .NET 9
having annotated such interfaces it is now possible and you can assign
`SepReader` to `IEnumerable`, but most if not all of LINQ will still not work as
shown below.
```csharp
var text = """
Key;Value
A;1.1
B;2.2
""";
using var reader = Sep.Reader().FromText(text);
IEnumerable<SepReader.Row> enumerable = reader;
// Currently, most LINQ methods do not work for ref types. See below.
//
// The type 'SepReader.Row' may not be a ref struct or a type parameter
// allowing ref structs in order to use it as parameter 'TSource' in the
// generic type or method 'Enumerable.Select<TSource,
// TResult>(IEnumerable<TSource>, Func<TSource, TResult>)'
//
// enumerable.Select(row => row["Key"].ToString()).ToArray();
```
Calling `Select` should in principle be possible if this was annotated with `allows ref struct`, but it isn't currently.

If you want to use LINQ or similar you have to first parse or transform the rows
into some other type and enumerate it. This is easy to do and instead of
counting lines you should focus on how such enumeration can be easily expressed
Expand Down Expand Up @@ -587,8 +620,8 @@ static IEnumerable<T> Enumerate<T>(SepReader reader, SepReader.RowFunc<T> select
}
```

In fact, Sep now provides such a convenience extension method. And, discounting
the `Enumerate` method, this does have less boilerplate, but not really more
In fact, Sep provides such a convenience extension method. And, discounting the
`Enumerate` method, this does have less boilerplate, but not really more
effective lines of code. The issue here is that this tends to favor factoring
code in a way that can become very inefficient quickly. Consider if one wanted
to only enumerate rows matching a predicate on `Key` which meant only 1% of rows
Expand Down
27 changes: 26 additions & 1 deletion src/Sep.XyzTest/ReadMeTest.cs
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,28 @@ public void ReadMeTest_SepReader_Debuggability()
}
}

#if NET9_0_OR_GREATER
[TestMethod]
public void ReadMeTest_IEnumerable_But_Not_LINQ_Compatible()
{
var text = """
Key;Value
A;1.1
B;2.2
""";
using var reader = Sep.Reader().FromText(text);
IEnumerable<SepReader.Row> enumerable = reader;
// Currently, most LINQ methods do not work for ref types. See below.
//
// The type 'SepReader.Row' may not be a ref struct or a type parameter
// allowing ref structs in order to use it as parameter 'TSource' in the
// generic type or method 'Enumerable.Select<TSource,
// TResult>(IEnumerable<TSource>, Func<TSource, TResult>)'
//
// enumerable.Select(row => row["Key"].ToString()).ToArray();
}
#endif

[TestMethod]
public void ReadMeTest_LocalFunction_YieldReturn()
{
Expand Down Expand Up @@ -412,9 +434,12 @@ public void ReadMeTest_UpdateExampleCodeInMarkdown()
{
(nameof(ReadMeTest_) + "()", "## Example"),
(nameof(ReadMeTest_SepReader_Debuggability) + "()", "#### SepReader Debuggability"),
#if NET9_0_OR_GREATER
(nameof(ReadMeTest_IEnumerable_But_Not_LINQ_Compatible) + "()", "The main culprit above is that for example"),
#endif
(nameof(ReadMeTest_LocalFunction_YieldReturn) + "()", "If you want to use LINQ"),
(nameof(ReadMeTest_Enumerate) + "()", "Now if instead refactoring this to something LINQ-compatible"),
(nameof(ReadMeTest_EnumerateWhere) + "()", "In fact, Sep now provides such a convenience "),
(nameof(ReadMeTest_EnumerateWhere) + "()", "In fact, Sep provides such a convenience "),
(nameof(ReadMeTest_IteratorWhere) + "()", "Instead, you should focus on how to express the enumeration"),
(nameof(ReadMeTest_EnumerateTrySelect) + "()", "With this the above custom `Enumerate`"),
(nameof(ReadMeTest_Example_Copy_Rows) + "()", "### Example - Copy Rows"),
Expand Down
8 changes: 4 additions & 4 deletions src/Sep/SepReader.Cols.cs
Original file line number Diff line number Diff line change
Expand Up @@ -96,12 +96,12 @@ public void TryParse<T>(Span<T?> span) where T : struct, ISpanParsable<T>
}

public Span<T> Select<T>(ColFunc<T> selector) => IsIndices()
? _state.Select<T>(_colIndices, selector)
: _state.Select<T>(_colStartIfRange, _colIndices.Length, selector);
? _state.ColsSelect<T>(_colIndices, selector)
: _state.ColsSelect<T>(_colStartIfRange, _colIndices.Length, selector);

public unsafe Span<T> Select<T>(delegate*<Col, T> selector) => IsIndices()
? _state.Select<T>(_colIndices, selector)
: _state.Select<T>(_colStartIfRange, _colIndices.Length, selector);
? _state.ColsSelect<T>(_colIndices, selector)
: _state.ColsSelect<T>(_colStartIfRange, _colIndices.Length, selector);

bool IsIndices() => _colStartIfRange < 0;

Expand Down
8 changes: 4 additions & 4 deletions src/Sep/SepReaderState.cs
Original file line number Diff line number Diff line change
Expand Up @@ -610,7 +610,7 @@ internal void TryParse<T>(ReadOnlySpan<int> colIndices, Span<T?> span) where T :
}
}

internal Span<T> Select<T>(ReadOnlySpan<int> colIndices, ColFunc<T> selector)
internal Span<T> ColsSelect<T>(ReadOnlySpan<int> colIndices, ColFunc<T> selector)
{
ArgumentNullException.ThrowIfNull(selector);
var length = colIndices.Length;
Expand All @@ -622,7 +622,7 @@ internal Span<T> Select<T>(ReadOnlySpan<int> colIndices, ColFunc<T> selector)
return span;
}

internal unsafe Span<T> Select<T>(ReadOnlySpan<int> colIndices, delegate*<Col, T> selector)
internal unsafe Span<T> ColsSelect<T>(ReadOnlySpan<int> colIndices, delegate*<Col, T> selector)
{
var length = colIndices.Length;
var span = _arrayPool.RentUniqueArrayAsSpan<T>(length);
Expand Down Expand Up @@ -702,7 +702,7 @@ internal void TryParse<T>(int colStart, int colCount, Span<T?> span) where T : s
}
}

internal Span<T> Select<T>(int colStart, int colCount, ColFunc<T> selector)
internal Span<T> ColsSelect<T>(int colStart, int colCount, ColFunc<T> selector)
{
ArgumentNullException.ThrowIfNull(selector);
var span = _arrayPool.RentUniqueArrayAsSpan<T>(colCount);
Expand All @@ -713,7 +713,7 @@ internal Span<T> Select<T>(int colStart, int colCount, ColFunc<T> selector)
return span;
}

internal unsafe Span<T> Select<T>(int colStart, int colCount, delegate*<Col, T> selector)
internal unsafe Span<T> ColsSelect<T>(int colStart, int colCount, delegate*<Col, T> selector)
{
var span = _arrayPool.RentUniqueArrayAsSpan<T>(colCount);
for (var i = 0; i < span.Length; i++)
Expand Down

0 comments on commit 427ebc9

Please sign in to comment.