Text Clustering for Startup India

Description:

This Python script performs unsupervised text clustering on a dataset of call centre data. The script uses the TF-IDF Vectorizer and KMeans algorithm from the Scikit-learn library to cluster the text data. It determines the optimal number of clusters using the elbow method. The script then generates and prints descriptions for each cluster, the silhouette score, and the Davies-Bouldin index. The descriptions are formed by the top N (default is 15, can be modified) keywords in each cluster. It also plots a bar chart of the frequency of issues in each cluster.

The input data can be any CSV file with text data in its columns. The path to the file is defined by the variable PATH_TO_FILE.

Input:

The input file should be a CSV file with text data. Each column of the CSV file is treated as a separate category and clustering is performed on each of these categories. The path to the input file is defined by the variable PATH_TO_FILE in the script.

Setup:

Clone the repository to your local machine.
Ensure that you have the necessary Python libraries installed. You can install them using pip:
```
pip install pandas scikit-learn matplotlib numpy
```
Modify the PATH_TO_FILE variable in the script to point to your input CSV file.
Modify the NUM_OF_KEYWORDS_PER_CLUSTER variable to control the number of keywords in the cluster descriptions.

Run the script:

python clustering.py

or

python predefined clusters.py

The script will print the cluster descriptions, silhouette score, and Davies-Bouldin index for each column in the CSV file. It will also display a bar chart showing the frequency of issues in each cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md
clustering.py		clustering.py
predefined clusters.py		predefined clusters.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Clustering for Startup India

Description:

Input:

Setup:

About

Releases

Packages

Languages

luka-medv/startup-india

Folders and files

Latest commit

History

Repository files navigation

Text Clustering for Startup India

Description:

Input:

Setup:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages