Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(3742): improve user segmentation with BigInt-based random generation #5110

Merged
merged 6 commits into from
Jan 10, 2025

Conversation

DDDDDanica
Copy link
Contributor

@DDDDDanica DDDDDanica commented Jan 7, 2025

Explanation

Replace hash-based random number generation with BigInt-based implementation for better distribution and format support. The new implementation properly handles both UUIDv4 and hex-format metaMetricsIds, providing more consistent
and reliable user segmentation.

  • Add support for UUIDv4 format with proper bit normalization
  • Improve hex format handling using BigInt for precise calculations
  • Remove char-by-char hashing algorithm to prevent potential collisions

References

Addresses: #5051 (comment)

Changelog

@metamask/remote-feature-flag-controller

  • CHANGED: Modify generateDeterministicRandomNumber to handle both uuidv4(mobile new) and hex(mobile old and extension) side

Checklist

  • I've updated the test suite for new or updated code as appropriate
  • I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate
  • I've highlighted breaking changes using the "BREAKING" category above as appropriate
  • I've prepared draft pull requests for clients and consumer packages to resolve any breaking changes

@DDDDDanica DDDDDanica self-assigned this Jan 7, 2025
@DDDDDanica DDDDDanica force-pushed the fix/3742-generateDeterministicRandomNumber branch 3 times, most recently from 9762ab9 to b5337b6 Compare January 8, 2025 13:29
@joaoloureirop joaoloureirop self-requested a review January 8, 2025 13:31
@DDDDDanica DDDDDanica force-pushed the fix/3742-generateDeterministicRandomNumber branch from b5337b6 to c650ada Compare January 8, 2025 13:39
@DDDDDanica DDDDDanica force-pushed the fix/3742-generateDeterministicRandomNumber branch from 9e0b3cc to 87cd8f5 Compare January 8, 2025 18:32
*
* For UUIDv4 format, the following normalizations are applied:
* - Removes all dashes from the UUID
* - Remove version (4) bits and replace with 'f'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, I think we need to remove it rather than replace it with f. If we replace it with f, we're skewing the result that amount away from zero. But if we remove it completely, it won't impact the result.

Suggested change
* - Remove version (4) bits and replace with 'f'
* - Remove version (4) bits

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, though I notice that the test does show that the minimum UUID does result in zero. Odd. It looks like the result would be skewed to me though.

Copy link
Member

@Gudahtt Gudahtt Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I tested it and it does skew the results, but by too small an amount. You need to use a precision of 10^15 to see a non-zero result from the minimum input, but we're only using precision of 10^6.

Copy link
Member

@Gudahtt Gudahtt Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using an implicit minimum of zero, and a maximum of 'f' * length, perhaps we can achieve perfect distribution by updating the final calculation to something like this:

function uuidToBigInt(id: string) {
  return BigInt(`0x${uuid.replace(/-/gu, '')}`);
}

const MIN_UUID_V4 = '00000000-0000-4000-8000-000000000000';
const MAX_UUID_V4 = 'ffffffff-ffff-4fff-bfff-ffffffffffff';
const MIN_UUID_V4_BIGINT = uuidToBigInt(MIN_UUI_V4);
const MAX_UUID_V4_BIGINT = uuidToBigInt(MAX_UUI_V4);

...

export function generateDeterministicRandomNumber(
  metaMetricsId: string,
): number {
  let idValue: BigInt;
  let maxValue: BigInt;
  // uuidv4 format
  if (uuidValidate(metaMetricsId) && uuidVersion(metaMetricsId) === 4) {
    // Adjust both values by subtracting the minimum value, so that the result isn't biased away from zero
    idValue = uuidToBigInt(metaMetricsId) - MIN_UUID_V4_BIGINT;
    maxId = MAX_UUID_V4_BIGINT - MIN_UUID_V4_BIGINT;
  } else {
    // hex format with 0x prefix
    idValue = BigInt(`0x${metaMetricsId.slice(2)}`;
    maxValue = BigInt(`0x${'f'.repeat(cleanId.length)}`);
  }
  // Use BigInt division first, then convert to number to maintain precision
  return Number((value * BigInt(1_000_000)) / maxValue) / 1_000_000;
}

By adjusting the minimum and maximum for the UUIDv4 case, it ensures we use the entire range of 0-1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified the max, and it's also skewed for the same reason as the minimum, but only if you increase precision to 10^15. Not bad! Close enough that we won't notice, it's well under a percentage.

The suggestion I left here would give us a perfect distribution (JS rounding aside; it could get slightly better with bignumber.js and/or more precision). But what you have here is good enough I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion ! Converting uuid to hex value and then normalizing the UUID range to start from 0 instead of removing the bits within is a much better solution indeed! Adapted in 8724cc3

Gudahtt
Gudahtt previously approved these changes Jan 8, 2025
Copy link
Member

@Gudahtt Gudahtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Two pending non-blocking suggestions, one to improve the distribution (we're biased slightly away from 0 and 1 right now for UUIDv4 IDs), and the second to add a guard against unexpected IDs.

@DDDDDanica DDDDDanica force-pushed the fix/3742-generateDeterministicRandomNumber branch from a9a7672 to a66b49f Compare January 9, 2025 12:50
@DDDDDanica DDDDDanica force-pushed the fix/3742-generateDeterministicRandomNumber branch from a66b49f to 3ae4dd8 Compare January 9, 2025 12:57
Copy link
Member

@Gudahtt Gudahtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@DDDDDanica DDDDDanica enabled auto-merge (squash) January 10, 2025 09:33
@DDDDDanica DDDDDanica disabled auto-merge January 10, 2025 11:34
@DDDDDanica DDDDDanica merged commit 0875256 into main Jan 10, 2025
120 checks passed
@DDDDDanica DDDDDanica deleted the fix/3742-generateDeterministicRandomNumber branch January 10, 2025 11:45
@DDDDDanica DDDDDanica mentioned this pull request Jan 10, 2025
4 tasks
DDDDDanica added a commit that referenced this pull request Jan 10, 2025
## Explanation
- fix(3742): improve user segmentation with BigInt-based random
generation ([#5110](#5110))
- fix(3742): change getMetaMetricsId to only sync func type
([#5108](#5108))

<!--
Thanks for your contribution! Take a moment to answer these questions so
that reviewers have the information they need to properly understand
your changes:

* What is the current state of things and why does it need to change?
* What is the solution your changes offer and how does it work?
* Are there any changes whose purpose might not obvious to those
unfamiliar with the domain?
* If your primary goal was to update one package but you found you had
to update another one along the way, why did you do so?
* If you had to upgrade a dependency, why did you do so?
-->

## References

<!--
Are there any issues that this pull request is tied to?
Are there other links that reviewers should consult to understand these
changes better?
Are there client or consumer pull requests to adopt any breaking
changes?

For example:

* Fixes #12345
* Related to #67890
-->

## Changelog

<!--
If you're making any consumer-facing changes, list those changes here as
if you were updating a changelog, using the template below as a guide.

(CATEGORY is one of BREAKING, ADDED, CHANGED, DEPRECATED, REMOVED, or
FIXED. For security-related issues, follow the Security Advisory
process.)

Please take care to name the exact pieces of the API you've added or
changed (e.g. types, interfaces, functions, or methods).

If there are any breaking changes, make sure to offer a solution for
consumers to follow once they upgrade to the changes.

Finally, if you're only making changes to development scripts or tests,
you may replace the template below with "None".
-->

## Checklist

- [ ] I've updated the test suite for new or updated code as appropriate
- [x] I've updated documentation (JSDoc, Markdown, etc.) for new or
updated code as appropriate
- [x] I've highlighted breaking changes using the "BREAKING" category
above as appropriate
- [x] I've prepared draft pull requests for clients and consumer
packages to resolve any breaking changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants