-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broadcasting bug in repick_unused_centers()
for K-means
#283
Fix broadcasting bug in repick_unused_centers()
for K-means
#283
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #283 +/- ##
==========================================
- Coverage 95.40% 95.40% -0.01%
==========================================
Files 20 19 -1
Lines 1503 1500 -3
==========================================
- Hits 1434 1431 -3
Misses 69 69 ☔ View full report in Codecov by Sentry. |
This is a fix to a pretty significant correctness bug. Can we get this merged ASAP? |
@AzamatB Thank you for the fix and sorry for very long delay in reviewing it! Do you have a test case for the fix? |
It is clearly a bug, it affects the weights of points during empty clusters reinitialization. |
It's actually not easy at all to find the case when this bug would result in a different clustering (and that does not mean at all that the total cost would be worse) -- just because the situation where you would have to re-pick an unused cluster is rare, and re-picking 2 unused clusters (that's where the bug can start manifesting) is very rare. FWIW, here's the one that I have found: Z = Float32[
0.7469281 8.85936 26.104477 17.041048 -2.9290304 3.6850576 9.111796 -3.7082646 28.021303 -1.7768137 15.129808 10.813229 -0.37364542 9.3696785 0.058578808 16.494686 26.311668 26.63891 8.56788 24.445705 26.41015 9.561551 16.240145 15.186173 26.556469 26.59538 2.0695605 -0.21763714 -0.76779276 11.00544 -2.1536446 25.703747 24.663855 8.074589 26.758509 25.25317 1.5967612 0.31046262 0.55812395 8.941144 9.466517 25.507006 26.757868 2.425164 25.087585 26.443806 26.326286 -2.9491603 28.033964 9.664289 26.4217 23.865585 13.599952 12.534004 26.082083 9.602425 13.989668 -3.7962832 3.0748796 0.32880965 2.7096572 17.04881 9.982845 1.8529501 27.728786 -1.7613807 -4.2353554 0.8132015 24.837885 26.31062 1.0547543 1.3947017 3.0518942 -2.6900654 24.62906 -1.4229952 25.314922 10.536686 1.6732222 12.771361 12.719584 12.273293 15.357906 25.722084 24.90772 26.05082 1.7522457 2.5748842 -1.2829331 -2.350809 9.122712 15.469745 11.674798 2.4657478 -0.29506856 15.895254 3.0172286 10.194327 -1.0089047 16.014812
10.079329 18.360752 9.28761 -10.738691 -11.377904 14.796508 14.586522 -9.868497 9.238739 -9.8855295 -9.807078 17.22134 9.412983 -1.2908658 6.4643354 -10.106296 10.545128 11.217348 16.879663 -3.241572 9.065372 -3.4642792 -10.768875 -11.025509 9.280065 6.852204 10.405429 10.598914 9.294289 15.889821 9.373185 -3.4529555 -4.348801 15.373303 -4.503598 7.714737 12.660385 10.243679 10.298691 14.378907 -3.7512515 -2.8529565 -2.5031896 13.335922 -2.0082817 8.725614 9.936363 -10.26723 8.548306 14.545052 -2.9661767 -3.8050623 -2.4869242 -1.7874911 8.777304 -2.453476 -1.549685 -10.284147 13.414661 10.497435 14.179338 -10.703788 15.013265 10.063806 -5.3631563 7.3521304 -9.017388 10.804814 -2.910087 -3.1865885 9.722448 14.545546 13.292266 -8.730829 -3.5044644 10.917173 8.672507 15.791334 12.100385 -2.5765386 -4.6828575 -2.410526 -11.11536 -4.988103 -4.856656 -3.0215774 11.570467 13.921742 6.850209 9.268251 -1.5591162 -3.2451906 15.143096 9.793272 11.128278 -10.880159 13.058364 -2.2190018 8.365673 -11.913434
]
rng = StableRNG(232424)
res = kmeans(Z, 10; rng=rng) |
No description provided.