-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use full case folding outside of character classes #383
base: main
Are you sure you want to change the base?
Conversation
This should get us part way to proper full case folding support. To handle case folding that results in multiple characters, we'll need to switch to more of a sequence-based comparison than just matching a single character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
guard #available(SwiftStdlib 5.7, *) else { fatalError() } | ||
let foldedSelf = unicodeScalars.map(\.properties._caseFolded).joined() | ||
let foldedOther = c.unicodeScalars.map(\.properties._caseFolded).joined() | ||
return foldedSelf == foldedOther |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a lot of intermediary arrays, which is unfortunate because nearly all Characters fit in the small-string form, and those that don't are usually case invariant. Does .lazy.map(...).elementsEqual(
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we really want is a stdlib API that can do the case-folded comparison — it's wasteful even to UTF-8 encode these when we should be able to convert, canonicalize, and compare character-by-character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can write the code here or in the stdlib (I agree stdlib SPI makes sense). But in the meantime, can we write the code we want the stdlib to have, or at least approximate it by making the mapping lazy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, it looks like case-insensitive matching could do multiple array allocations for every single character in the input
@swift-ci Please test |
Nate, what's the status of this PR now? |
This should get us part way to proper full case folding support. To handle case folding that results in multiple characters, we'll need to switch to more of a sequence-based comparison than just matching a single character.