Merge pull request galaxyproject#5225 from anuprulez/update_confusion…

…_matrix Update confusion matrices
ahmedhamidawan · Aug 12, 2024 · 9cc53bc · 9cc53bc
2 parents 88616d9 + b1f8bea
commit 9cc53bc
Show file tree

Hide file tree

Showing 4 changed files with 5 additions and 5 deletions.
diff --git a/...tistics/tutorials/classification_machinelearning/images/confusion_matrix_NN.png b/...tistics/tutorials/classification_machinelearning/images/confusion_matrix_NN.png
diff --git a/...cs/tutorials/classification_machinelearning/images/confusion_matrix_bagging.png b/...cs/tutorials/classification_machinelearning/images/confusion_matrix_bagging.png
diff --git a/...ics/tutorials/classification_machinelearning/images/confusion_matrix_linear.png b/...ics/tutorials/classification_machinelearning/images/confusion_matrix_linear.png
diff --git a/topics/statistics/tutorials/classification_machinelearning/tutorial.md b/topics/statistics/tutorials/classification_machinelearning/tutorial.md
@@ -184,7 +184,7 @@ We will evaluate the classification by comparing the predicted with the expected
 Now we visualize and analyze the classification using the "Plot confusion matrix, precision, recall and ROC and AUC curves" tool.
 
 > <hands-on-title>Check and visualize the classification</hands-on-title>
->  {% tool [Plot confusion matrix, precision, recall and ROC and AUC curves](toolshed.g2.bx.psu.edu/repos/bgruening/plotly_ml_performance_plots/plotly_ml_performance_plots/0.3) %}:
+>  {% tool [Plot confusion matrix, precision, recall and ROC and AUC curves](toolshed.g2.bx.psu.edu/repos/bgruening/plotly_ml_performance_plots/plotly_ml_performance_plots/0.4) %}:
 >    - {% icon param-file %} *"Select input data file"*: `test_rows_labels_noheader`
 >    - {% icon param-file %} *"Select predicted data file"*: `LogisticRegression_result`
 >    - {% icon param-file %} *"Select trained model"*: `LogisticRegression_model`
@@ -298,7 +298,7 @@ Now, we should evaluate the performance on the test dataset to find out whether
 Now we visualize and analyze the classification. As you can see, `NearestNeighbors_result` has a header, so use `test_rows_labels` to compare. 
 
 > <hands-on-title>Check and visualize the classification</hands-on-title>
->  {% tool [Plot confusion matrix, precision, recall and ROC and AUC curves](toolshed.g2.bx.psu.edu/repos/bgruening/plotly_ml_performance_plots/plotly_ml_performance_plots/0.3) %}:
+>  {% tool [Plot confusion matrix, precision, recall and ROC and AUC curves](toolshed.g2.bx.psu.edu/repos/bgruening/plotly_ml_performance_plots/plotly_ml_performance_plots/0.4) %}:
 >    - {% icon param-file %} *"Select input data file"*: `test_rows_labels`
 >    - {% icon param-file %} *"Select predicted data file"*: `NearestNeighbors_result`
 >    - {% icon param-file %} *"Select trained model"*: `NearestNeighbors_model`
@@ -369,7 +369,7 @@ Now we will evaluate the performance of the SVM classifier:
 Now let's visualize the results:
 
 > <hands-on-title>Check and visualize the classification</hands-on-title>
->  {% tool [Plot confusion matrix, precision, recall and ROC and AUC curves](toolshed.g2.bx.psu.edu/repos/bgruening/plotly_ml_performance_plots/plotly_ml_performance_plots/0.3) %}:
+>  {% tool [Plot confusion matrix, precision, recall and ROC and AUC curves](toolshed.g2.bx.psu.edu/repos/bgruening/plotly_ml_performance_plots/plotly_ml_performance_plots/0.4) %}:
 >    - {% icon param-file %} *"Select input data file"*: `test_rows_labels`
 >    - {% icon param-file %} *"Select predicted data file"*: `SVM_result`
 >    - {% icon param-file %} *"Select trained model"*: `SVM_model`
@@ -408,8 +408,8 @@ One big advantage of random forest is that it can be used for both classificatio
 > What are the advantages of random forest classifier compared with KNN and SVM?
 >
 > > <solution-title></solution-title>
-> > 1. The overfitting problem will never arise when we use the random forest algorithm in any classification problem.
-> > 2. The same random forest algorithm can be used for both classification and regression task.
+> > 1. It is resistant to overfitting as it produces results based on multiple decision trees.
+> > 2. It scales well with large datasets.
 > > 3. The random forest algorithm can be used for feature engineering, which means identifying the most important features out of the available features from the training dataset.
 > {: .solution}
 >