Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatic cluster count elbow finding? #216

Open
wherrera10 opened this issue May 23, 2021 · 0 comments
Open

automatic cluster count elbow finding? #216

wherrera10 opened this issue May 23, 2021 · 0 comments
Labels

Comments

@wherrera10
Copy link

wherrera10 commented May 23, 2021

The function below was written for a package (Simpsons.jl) which needs to automate finding the "best" number of
clusters. Would Clustering benefit from a PR to add it to Clustering?

"""
    find_clustering_elbow(dataarray, cmin = 1, cmax = 5)
    
Find the "elbow" of the totalcost versus cluster number curve, where
cmin <= elbow <= cmax. Note that in pathological cases where the actual
minimum of the totalcosts occurs at a cluster count less than that of the
curve "elbow", the function will return either cmin or the actual cluster
count at which the totalcost is at minimum, whichever is larger.
<br>
Returns a tuple: the cluster count and the ClusteringResult at the "elbow" optimum.
"""
function find_clustering_elbow(dataarray::AbstractMatrix{<:Real}, cmin = 1, cmax = 5; fclust = kmeans, kwargs...)
    allkmeans = [fclust(dataarray, i, kwargs...) for i in 1:cmax+1]
    alltotals = map(x -> x.totalcost, allkmeans)
    _, cidx = findmin(alltotals)
    x1, y1 = 1, alltotals[1]
    x2, y2 = cmax + 1, alltotals[cmax + 1]
    _, idx = findmax(map(i -> distance(x1, y1, x2, y2, i, alltotals[i]), 2:cmax))
    nclust = cidx < idx + 1 ? max(cmin, cidx) : idx + 1
    return nclust, allkmeans[nclust]
end
@alyst alyst added the feature label Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants