Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make write(IO, Char) actually return the amount of printed bytes instead of the attempted written bytes. #56980

Merged
merged 2 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions base/io.jl
Original file line number Diff line number Diff line change
Expand Up @@ -864,11 +864,10 @@ end

function write(io::IO, c::Char)
u = bswap(reinterpret(UInt32, c))
n = 1
n = 0
while true
write(io, u % UInt8)
n += write(io, u % UInt8)
(u >>= 8) == 0 && return n
Copy link
Contributor

@Seelengrab Seelengrab Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently unconditionally advances the given character, but what happens in case the first write fails, and the second succeeds? Now there's suddenly a torn write involved here, and even though you can theoretically know that not all of the given Char has been written (e.g. getting a return value of 3 when a 4-byte Char is passed), you still wouldn't know which byte was dropped.

I think it would be good to return after the first failing write, so that it's at least knowable that a valid prefix has been written (if the return value is nonzero).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any Julia IO type where writing a byte can fail, return zero, and then succeed, without some error being thrown?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, writing a byte with TranscodingStreams.jl will either return 1 or throw an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, a non-blocking buffered IO whose buffer is temporarily full, for example. I don't know whether there currently is such a type in the ecosystem, but the point is that it could exist and would be a valid IO, as far as I can tell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a (slightly contrived) example:

julia> io = IOBuffer(; maxsize=1)
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=1, ptr=1, mark=-1)

julia> write(io, 'a')
1

julia> write(io, 'a') # should be 0 with this PR, since the write doesn't succeed
1

julia> seekstart(io); # simulate a read-end on some other process, for example

julia> read(io, Char) # happens on the read-end
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

julia> write(io, 'a') # continue writing
1

It's a bit awkward to do this with an IOBuffer, but the principle is the same for some IO type that has an actual read-end that's distinct from the write end. For arbitrary I/O, it's usually preferrable to drop data on the write end and retry later once the buffer is ready to send again. With the current behavior, the writer wouldn't know what to try to retransmit over the I/O, since it's impossible to know which byte(s) of the Char was/were not transmitted correctly. Effectively, the number returned by write becomes irrelevant, and only matters when it matches sizeof(Char) - at which point we might as well only return true/false. If we instead abort as soon as any internal write fails, we know that at least a correct prefix of the Char (or any data, in the general case) was returned, and we can retry with only the data that we haven't attempted to transmit at all yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think write ever errors for us, given asyncio and other stuff?

Copy link
Member

@JeffBezanson JeffBezanson Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, e.g. trying to write to a read-only stream. write itself has a synchronous API, i.e. it is (task-)blocking.

Copy link
Contributor

@Seelengrab Seelengrab Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zero is not a "valid" return value in that case.

There is interesting historical data suggesting that some implementations of libc write were indeed able to return 0: https://stackoverflow.com/a/41970485

For quite a bunch of kinds of files, the behavior is unspecified, so more or less anything goes either way 🤷

I don't think write ever errors for us, given asyncio and other stuff?

Right, and for a non-blocking buffered IO it would be incredibly awkward to throw actual errors just because it's full. That possibility would be incredibly detrimental in the common case of success. I admit having 0 signal that is quite a bad API though. I guess this is yet-another case something like a Result{Int, Err} sum type would be nice, to distinguish success from errors 🤔

Maybe let me put it another way - would this be a valid IO subtype (barring some other missing methods)?

struct FlakyIO <: IO
    io::IO
end

Base.write(fio::FlakyIO, b::UInt8) = rand(Bool) ? write(fio.io, b) : 0

You could get very fancy and record which writes succeeded & which ones failed for introspection later on, or do some more complicated scheme for deciding when exactly it "fails" to write anything. This kind of type would be incredibly useful for fuzzing stuff that accidentally depends on writes to IO always succeeding (like the fallback method of write in Base does, for example).

One issue I see with just throwing an error for partial writes/write failures of parts of larger types is that then the return value of write becomes meaningless - either we always get a full write, or we get an error. There would be no more room for partial writes, which can happen in a bunch of cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversation is worth continuing, but for the purposes of fixing this bug I think it's orthogonal.

Our AbstractArray write method can also suffer "torn" writes in the same way:

function unsafe_write(s::IO, p::Ptr{UInt8}, n::UInt)
    written::Int = 0
    for i = 1:n
        written += write(s, unsafe_load(p, i))
    end
    return written
end

This is probably worth splitting into a separate issue and fixing across-the-board. The only thing I think this needs to merge @gbaraldi is a test.

Copy link
Member

@topolarity topolarity Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #57011 to continue discussion here

n += 1
end
end
# write(io, ::AbstractChar) is not defined: implementations
Expand Down
6 changes: 6 additions & 0 deletions test/iobuffer.jl
Original file line number Diff line number Diff line change
Expand Up @@ -399,3 +399,9 @@ end
io = IOBuffer(data)
@test read(io) == data
end

@testset "Writing Char to full buffer" begin
io = IOBuffer(;maxsize=1)
write(io, 'a')
@test write(io, 'a') == 0
end
Loading