text_classfication with CHI and TF-IDF
- place
train/val/test
dataset intodata/
dir - set
PATH
inword_segmentation.py
- run
python word segmentation.py
- all keyword should be extract only using training set
- run
python chi.py
- keywords are stored in
data/train_chi.py
- set
DATAPATH
&MATRAIXFILE
intf_idf.py
- DO NOT modify FEATUREPATH, keywords should always be extract by training set
- text feature are stored in
data/train.txt
,data/val.txt
,data/test.txt
- set input and output txt files in
shuffle.txt
- run
python shuffle.py
- run
python xgb.py
- model are stored as
xtrain.model
- test result are stored as
result.txt
- run
python post_process.py
- test dataset are divided by it's prediction result into
output/test_result/
directory