-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mysql mb4 #6409
base: release-2.1
Are you sure you want to change the base?
Mysql mb4 #6409
Conversation
In my eyes would be this a huge uplift to redurce the complexity, So I like the pr. |
I see that in your changes are using utf8mb4_general_ci and I would like to propose using utf8mb4_unicode_ci. The added performance of using general over unicode makes little difference with modern CPU & memory power/size. |
Also text fields may be affected in some circumstances. |
Pretty sure we're good here... The settings are now by characters, not bytes. The problems experienced db-structure-wise were the index lengths... |
I did some tests & couldn't find an example where there was a difference. Can you identify a scenario? I understand the benefit in theory, just not sure it's really there... |
& THANK YOU for the feedback. I think this is an important topic. |
It's data sorting for various languages that will present obstacles when using general. |
BTW I agree that this is a very important topic. |
As per my other comments on this. This is a big change in how we handle the database layer. As such its too late in 2.1 to make this. So this should not make 2.1 and be closed out. Future versions of SMF we should make this from the get go with support. Of course this means that future versions of SMF will need to support the conversion to mb4 either manually or during the upgrade process. |
Why can't this repo contin future versions?
…On Sat, Mar 20, 2021 at 5:46 PM Jeremy D ***@***.***> wrote:
As per my other comments on this. This is a big change in how we handle
the database layer. As such its too late in 2.1 to make this. So this
should not make 2.1 and be closed out. Future versions of SMF we should
make this from the get go with support. Of course this means that future
versions of SMF will need to support the conversion to mb4 either manually
or during the upgrade process.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6409 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADJNN4NBBOBOZGHKW5ZPMLTEU6XFANCNFSM4U3DTNPA>
.
|
in my eyes smf 2.1 should run only in mb4 mode and should drop mb3 support, |
I disagree with you on this. SMF2.1 should be mb4, IMO. But we've waited too long & we are appropriately maintaining scope. This is appropriate for 2.1.x. There is nothing different at all in how we handle the db layer, no changes in DB calls at all. This PR mirrors how we dealt with mb3 in 2.0. This PR is about 10% fixing bugs, 70% cleaning up old latin1 logic that should have already been removed from 2.1, and 20% migrating to mb4. But we agreed to wait after 2.1, so we are. |
Well I do disagree with that because of all the alterations to the core functionality of code. But I will stand by my statement made on others. This has to make RC4 or its not making 2.1. We can't introduce this between RC4 and final. |
Fyi, this is also showing as a draft PR. |
It's marked as a draft because the dev team had agreed to wait until after 2.1 goes live for mb4 support. I want to ensure it doesnt get merged prematurely. |
I brought this current, in case folks wanted to test. I incorporated most feedback above. This required a rebase due to many layers of conflicts. I have not yet responded to @live627's feedback on changes interfering with postgresql. In my testing, it hasn't - this works fine in pg as-is. Still trying to understand why, though... |
Exists any issue with pg? |
I haven't seen any issues with pg yet. Still early in testing, though. |
From my pov the whole topic didn't exists in pg. |
Signed by Shawn Bulen, [email protected]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approving this on the grounds the code looks good. But we need to get it tested. I don't have any non english forums to test on (other than the obvious). So we need to work on ensuring stability of this and that it converts just fine.
Smf.org would be a nice place 😀 |
I did a ton of testing, but like, 2.5 years ago... It needs to be repeated with the current codebase. |
@albertlast Of course we will, once we feel its stable enough for a production environment. We eat our own food. |
Signed by Shawn Bulen, [email protected]
Just fixed an issue where log_search_subjects was still getting inappropriately truncated (byte-based substr). |
I am still re-testing this one. Just found an issue with upgrading old forums - mainly due to index renames over the years. |
Would it seem logical to track all columns that would be affected in the index, convert, then drop and recreate that index? Its a bit of a long way around, but we can find what columns are in a index. |
This PR currently drops & recreates the indexes. Both in the maint function & in the upgrader. The upgrader already tries various old names. The maint function needs to do so as well. Also, we have a new index added after this PR (and before 2.1 release) that needs tweaking as well... |
What happens for custom indexes that may not be included in this? Will the conversion fail? If it fails, I was thinking we track all indexes where we can find columns that are converted, then rebuild those indexes with a drop/create. |
Hmmm... Custom indexes might be out there. For that matter, custom columns with custom indexes might be problematic also. To rebuild the indexes, we'd have to save off the index build info (parse it from SHOW CREATE TABLE), tweak it, & recreate the indexes. This is feeling pretty complex, for what I expect is an edge case. My take is that covering the known variants covers us 99.9%+ of the time. The folks who have customized their DB to that degree will need to complete the conversion manually. Kinda the price to pay for making said customizations... |
Signed by Shawn Bulen, [email protected]
@@ -226,6 +226,8 @@ function smf_db_create_table($table_name, $columns, $indexes = array(), $paramet | |||
$table_query .= ') ENGINE=' . $parameters['engine']; | |||
if (!empty($db_character_set) && $db_character_set == 'utf8') | |||
$table_query .= ' DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci'; | |||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an esle if would be better here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's either 'utf8' (currently an alias of 'utf8mb3') or 'utf8mb4'. These are the only two options at this point. And as noted in the description to this PR, we are defaulting to 'utf8mb4' going forward.
I don't know about custom columns. That wouldn't affect this because we didn't convert that column to mb4.
We have database tools that can list indexes. |
I'll take a peek at recreating all indexes... It's feeling like it's getting too complex for an edge case, though. Note the CONVERT command converts all columns, and we are stepping thru all tables. |
Signed by Shawn Bulen, [email protected]
Understable it can get complex with edge cases. Which is why I wondered what happens for custom indexes. I need to setup a few to test. I suspect they would be corrupted and not work. But that converting the table should still happen? Or do they need to drop first for the conversion to occur? |
I don't think the tables or indexes would get corrupted at any point. If there is a custom index that meets the criteria (on any column, custom or not), the conversion will error out, & processing will stop. The table will not be converted. The index will need to be removed, the process restarted, and the index re-added after completion. I doubt we will see a lot of custom indexes that meet the exact criteria. Probably none. Again, I consider this an unlikely edge case. |
Note: This PR targets release-2.1, but it has been assigned to the milestones for SMF 3.0. The PR will need to be updated and resubmitted to target release-3.0. I am leaving it open for now so that it doesn't get forgotten. |
I think the new approach should be more along the lines of what was discussed over in #7938 than the approach used here. Specifically, if we can ensure InnoDB and COMPACT rows, then there is no need for a 191 byte limitation on the indexes. The DB structures themselves don't need to change, the DB engine & storage traits do. The real question is the best way to do that. I suggest that as a pre-upgrader step, outside of the normal upgrade process, so we don't end up overloading the upgrader logic as we did in 2.1. Such a pre-upgrader can also check for missing standard columns, missing default values, etc., the types of things that tend to bork the upgrader. Also:
Last, & possibly most important - we should get rid of all of the entity-encoding logic for multi-byte characters altogether. All those $smcFunc calls should go - it's overly complicated & we are always chasing bugs in it. We should be storing MB3/4 chars as chars, not entities. That's the point. |
Excellent points all around. But in this PR you did ping pretty much all of the different bits that need to be addressed. Even if we address them somewhat differently for 3.0, we still need to address all these things. So it's excellent reference material. |
Partial for #7173
Fixes #6405
Fixes #7776
Fixes #5108
Overview...
Mysql charsets supported should be utf8 (for 2.1 users migrated before utf8mb4 support) and utf8mb4. The only difference between the two is entity encoding needed for 4-byte utf8 chars when not yet utf8mb4. pg supports mb4 already.
All conditional utf8 logic (utf8 vs iso-8859) is removed.
Current thinking:
191...
Trying to keep field meanings straight...
Read up on index prefixes here:
https://dev.mysql.com/doc/refman/8.0/en/column-indexes.html
https://dev.mysql.com/doc/refman/8.0/en/innodb-limits.html#:~:text=The%20index%20key%20prefix%20length%20limit%20is%20767%20bytes%20for,4%20bytes%20for%20each%20character.
mb4 support history:
https://mysqlserverteam.com/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/
Mysql usage & support by version:
https://www.eversql.com/mysql-8-adoption-usage-rate/
https://fromdual.com/support-for-mysql-from-oracle