Objective: Use Python’s sci-kit learn library to reduce the dimensionality of a molecular dynamics trajectory, ultimately those generated in VR. Plot the results.
- Pre-Processing
Take "malonaldehyde.xyz" file and extract the coordinates data into an array that can be used by the processing team. - Processing
Use PCA available in sci-kit learn to process a matrix of input data, using one of the datasets included in scikit-learn. A great place to start learning about PCA and how to implement it in Python is here. - Visualization/Plotting
Generate 2D and 3D of whatever data you like. The matplotlib gallery is a great place for example scripts for various types of plots, e.g., fancy scatter plots, having 2D and 3D plots in the same figure, an animated 3D random walk, etc.
- Pre-Processing
Make sure pre-processing code is able to handle data from VR (depending on the size, consider using VMD to select a subset of the coordinates). - Processing
Do PCA on this new matrix of structures. - Visualization/Plotting
Plot PCs of pathways in VR.
Phase III: Project new molecular dynamics trajectory of the “training” system into the defined reduced dimensional space
- Pre-Processing
Make sure pre-processing code is able to handle this new data. - Processing
Fit new data into previously defined space. - Visualization/Plotting
Plot both old and new data in previously defined space.
- Pre-Processing
Investigate other ways of representing input data (e.g., interatomic distances, intramolecular angles, dihedrals, mass-weighted coordinates, etc.). - Processing
Investigate other dimensionality reduction techniques (e.g., PCA, TICA, kernel PCA, etc.). - Visualization/Plotting
Animated plots, line vs. scatter plots, different color maps.