Skip to content

Commit

Permalink
internal_gengo: switch back from string literal to hex byte slice
Browse files Browse the repository at this point in the history
It tourns out that Piper (the revision control system Google uses)
has a check that requires marking files containing non-UTF8 bytes
as binary, preventing textual review. We wanted to avoid this issue
in http://go.dev/cl/638135 and added \r to the escape set for Gerrit,
but we did not anticipate how strict other systems are in practice.

(I did not notice this issue earlier because the Piper check
does not trigger when sending a CL for Google-wide testing,
only when mailing a CL for review.)

It stands to reason that if our revision control and review systems
do not like string literals with non-UTF8 content, other systems
probably behave similarly. Hence, let’s revert that part of the change.
The key part of the change was to switch the type of the embedded
descriptor bytes from mutable []byte to immutable string.

I verified that the string literal remains in .rodata:

% (cd internal/reflection_test && \
  go test -c && \
  objdump -s -j .rodata reflection_test.test | grep '0a46696e')
 8e9fd0 0a46696e 7465726e 616c2f74 65737470  .Finternal/testp

The bytes printed above only occur once and match the bytes from
file_internal_testprotos_testeditions_testeditions_hybrid_test_hybrid_proto_rawDesc

Change-Id: I8e1bfe53a5bbf65abe7861a749ace37b215a8e28
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/642857
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Nicolas Hillegeer <[email protected]>
  • Loading branch information
stapelberg committed Jan 15, 2025
1 parent 0c3cc2f commit aee8a9c
Show file tree
Hide file tree
Showing 145 changed files with 26,552 additions and 196 deletions.
69 changes: 17 additions & 52 deletions cmd/protoc-gen-go/internal_gengo/reflect.go
Original file line number Diff line number Diff line change
Expand Up @@ -237,57 +237,6 @@ func stripSourceRetentionFieldsFromMessage(m protoreflect.Message) {
})
}

// escapeForLiteral formats data as characters that are valid to use in a Go
// string literal. This implementation was used (before go:embed existed) for
// many years in https://github.com/dsymonds/goembed
func escapeForLiteral(data []byte) []byte {
escaped := make([]byte, 0, len(data) /* right order of magnitude */)

for len(data) > 0 {
// https://golang.org/ref/spec#String_literals: "Within the quotes, any
// character may appear except newline and unescaped double quote. The
// text between the quotes forms the value of the literal, with backslash
// escapes interpreted as they are in rune literals […]."
switch b := data[0]; b {
case '\\':
escaped = append(escaped, []byte(`\\`)...)
case '"':
escaped = append(escaped, []byte(`\"`)...)
case '\n':
escaped = append(escaped, []byte(`\n`)...)

// While \r does not need to be escaped for Go string literals, some
// tools (like Gerrit) detect files as binary if they contain a \r
// which is not followed by \n.
case '\r':
escaped = append(escaped, []byte(`\r`)...)

case '\x00':
// https://golang.org/ref/spec#Source_code_representation: "Implementation
// restriction: For compatibility with other tools, a compiler may
// disallow the NUL character (U+0000) in the source text."
escaped = append(escaped, []byte(`\x00`)...)

default:
// https://golang.org/ref/spec#Source_code_representation: "Implementation
// restriction: […] A byte order mark may be disallowed anywhere else in
// the source."
const byteOrderMark = '\uFEFF'

if r, size := utf8.DecodeRune(data); r != utf8.RuneError && r != byteOrderMark {
escaped = append(escaped, data[:size]...)
data = data[size:]
continue
}

escaped = append(escaped, []byte(fmt.Sprintf(`\x%02x`, b))...)
}
data = data[1:]
}

return escaped
}

func genFileDescriptor(gen *protogen.Plugin, g *protogen.GeneratedFile, f *fileInfo) {
descProto := proto.Clone(f.Proto).(*descriptorpb.FileDescriptorProto)
descProto.SourceCodeInfo = nil // drop source code information
Expand All @@ -298,7 +247,23 @@ func genFileDescriptor(gen *protogen.Plugin, g *protogen.GeneratedFile, f *fileI
return
}

g.P("const ", rawDescVarName(f), ` = "`, string(escapeForLiteral(b)), `"`)
g.P("var ", rawDescVarName(f), " = string([]byte{")
for len(b) > 0 {
n := 16
if n > len(b) {
n = len(b)
}

s := ""
for _, c := range b[:n] {
s += fmt.Sprintf("0x%02x,", c)
}
g.P(s)

b = b[n:]
}
g.P("})")
g.P()

if f.needRawDesc {
onceVar := rawDescVarName(f) + "Once"
Expand Down
26 changes: 25 additions & 1 deletion cmd/protoc-gen-go/testdata/annotations/annotations.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 30 additions & 1 deletion cmd/protoc-gen-go/testdata/comments/comments.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 19 additions & 1 deletion cmd/protoc-gen-go/testdata/comments/deprecated.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 28 additions & 1 deletion cmd/protoc-gen-go/testdata/enumprefix/enumprefix.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 19 additions & 1 deletion cmd/protoc-gen-go/testdata/extensions/base/base.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit aee8a9c

Please sign in to comment.