Skip to content

XinyueSheng2019/QuasarClassifier

Repository files navigation

Quasar Classifier

A classifier which is able to recognize quasar objects from variable transients.

Dataset Download

For SDSS Stripe82 quasar-targated dataset, the data repository is here: https://www.kaggle.com/sherrysheng97/sdss-stripe82-quasar-targeted-dataset

For PLASTiCC dataset, the data repository is here: https://www.kaggle.com/c/PLAsTiCC-2018

Packages version

  • Tensorflow 2.1.0
  • numpy 1.17.2
  • pandas 0.25.1
  • feets 0.4
  • glob 1.2.0
  • sklearn 0.23.2

Quick run on already built classifier

Jump to the Train your classifier part, adjust the config.txt file, and run the train.py.

Run on Kaggle

Kaggle notebook is strongly suggested to run all codes: https://www.kaggle.com/sherrysheng97/quasar-classifier-sdss-plasticc

All training and test data are provided. You just need to modify the configuration settings, and then click 'run all'.

Build your own classifier

Train your classifier

In train/ folder, configs.txt is used for designing the architecture of the classifier. After setting the configurations, run the train.py file to train and test the classifier.

python train.py

Configurations Setting Explaination

Config type Parameters Explaination Example
input config train_path the path of the test set file ../data/processed/unbalanced/final_v1.csv
save_path the folder that saves all results results
seed the seed for generating random number 1
features the bands/features used in training.\n All features: g, r, i, z, u, g_error, r_error, i_error, z_error, u_error g,r,i
format the input format for the training. Three formats are provided: simple, group, season group
processed the preprocess method for the input data. Three methods are provided: s: standardization; n: normalization; d: difference between neighboring data points s
set_GPR whether to choose the Gaussian Process Regression. This method will generate a new regressed light curve for each group of light curve of an object True
group_size the number of days in all groups 67
group_num the number of groups for each object 7
cut_fraction For prediction, the fraction of data drop from each group. This parameter is used for testing the improvement of accuracy with more complete data 0.1 or empty
network config rnn_type the type of RNN layers. Three types are provided: LSTM, GRU, Simple LSTM
hidden_layers a list of hiddren layers' neuron numbers. [256,256,256,256,256,256]
dropout the fraction of objects dropped before being fed into the next layer to avoid overfitting 0.25
plot_model whether to plot the model architecture True
train config batch_size the number of sequences fed into the layer for each time 32
num_epochs the times for the input data being processed 10
test_fraction the fraction of test set among all data 0.2
optimizer the optimization method for the loss function. Two functions are preferred: Adam, SGD Adam
learning_rate a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function 0.001
decay whether the learning rate will decrease with the increasing nunmber of epochs. If true, decay_value = learning_rate/num_epochs True
metrics the metrics during training for testing the performance of the classifier accuracy,AUC,f1_score

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages