Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Gen5/3DS/Switch word filters #4423

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

abcboy101
Copy link
Contributor

Separate 3DS/Switch word filters

Instead of using a superset of both the 3DS/Switch filters, use a separate list of filters for each. This should reduce the number of false positives due to the wrong list being applied for the console.

The filters themselves are taken from the latest 3DS version 11.17.0-50 and the latest Switch version 19.0.1 (rebootless). It doesn't seem that useful to flag something that used to be filtered if it's not flagged anymore, so none of the ones that were only in earlier versions are included. I was also able to trim about 15% of the regexes in each list, due to them being subsets of another regex in the list (e.g. ^abc$.*abc.*).

Handle 3DS/Switch word filter edge cases

The name being checked is normalized more thoroughly to match the console's behavior:

  • Case-insensitive (including when the regex has uppercase letters, like some of the 3DS ones do)
  • Remove any spaces
  • Convert hiragana to katakana
  • Convert small kana to regular-sized kana
  • Convert half-width kana to full-width kana

This catches a few more strings that are filtered on the console:

  • "ふぁっく" is matched by .*フアツク.*
  • "オッパイ" is matched by .*オツパイ.*

Also, if a name matches the name of a Pokémon in any language, it bypasses the filter. In Gen6 this is case-sensitive, but it Gen7+ it is case-insensitive. This means that "cOfAgRiGuS" doesn't pass in Gen6, but does in Gen7+.

Add/implement Gen5 word filter

Gen5 only checks for an exact match after normalization. The normalization is fairly naive, so a bunch of the filters don't actually work--most of the weird behavior should be captured in the test cases. Pokémon from Gen3/4 technically shouldn't have this filter applied, but that's not currently handled.

Adjust behavior of DisableWordFilterPastGen

Previously, this simply disabled the word filters in Gen1-5. It now prevents future word filters from being applied to the current context. With this option enabled, a PK5 will only be checked against the Gen5 list; likewise, a PK6 or PK7 will only be checked against the 3DS list. With it disabled (still the default), a PK5 will be checked against the Switch, 3DS, and Gen5 lists, and a PK6/7 will be checked against the Switch and 3DS lists, so you'll know if its name will be changed on transfer.

- Separate 3DS/Switch word filters
- Add/implement Gen5 word filter
- Adjust behavior of DisableWordFilterPastGen
@kwsch
Copy link
Owner

kwsch commented Jan 24, 2025

  1. The call to NormalizeString in TryMatch converts the message to string, allocating. Unfortunately, there is no overload accepting spans (yet -- .NET 10 will have). However, I'm not sure if there's any unintended "extra" side effects. Surely the console's normalization routine is more naïve than a globalized one?

What is the need for calling it?
https://github.com/kwsch/PKHeX/pull/4423/files#diff-82051be11d2a94f5dd85ca3e3a70fd3fe179f5f829c2151f2efb184277c11a9dR77

(no worries if it needs to be called; one string allocation per call, up from zero, is not a dealbreaker; I'll just add a note for .NET 10 updates in the future)

Only applies to alphanumeric and kana
@abcboy101
Copy link
Contributor Author

I was able to find a couple test cases that do show that normalization is too much:

  • "sh!t" doesn't trigger .*sh!t.*
  • "ファッ゙ク" does trigger .*フアツク.*

I've updated the method to only convert fullwidth alphanumeric to halfwidth, and halfwidth katakana to fullwidth.

kwsch added 4 commits January 25, 2025 00:41
much easier to directly query a specific wordfilter
adds look-back wordfilter checks to iterate backwards to original context

probably need to think a little more about how permissive the bypasses can be
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants