-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMechPropNotes.txt
103 lines (79 loc) · 4.28 KB
/
MechPropNotes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
Mech Prop Models Thoughts, Notes, To-Do
===============================================================================
Questions:
What is the best scoring system? For regressors, r2/rms?
Opinions on how to deal with duplicates
Which datasets include which others?
Only use Suchi's dataset.
Jae had to do a lot of cleaning to be satisfied.
Vagard's law performance to compare to our score
How clustered is data?
How biased?
Try clustering .... based on?
1% threshold for duplicates?
Figure out error'd lines
Feature descriptions: https://github.com/hackingmaterials/matminer/tree/master/matminer/utils/data_files/magpie_elementdata
look for volume?
density as a feature? Add this
Correlations between different properties (hardness vs modulus etc)
Take aways:
Wrap up questions
? Does it predict new systems well?
? Does it outperform vegards's law?
1: look at linear combination of elements
2: compare to feature set without density features, how much does the model improve\
? Does it interpolate between different chemistries
CV by clustering
? Look at statistics, deviation, thresholds between values...
? Where are errors?
94: T0+
273: M0+
326: Glass0+
327 - 364: Single elements
383, 385: Single feature... Zr64.13 Al10 Cu15.75 Ni10.12 S2
598: 4 features errored (2nan, 2inf) Al3 Fe30 B6 P16 I45 N1
572: one inf feature, Fe1 W1
===============================================================================
1/11/2019
-Finished MAPI setup, synced with Github
To-Do
Train model on MAPI data.
Get new computer with python 3.4+ for matminer purposes
Large feature set, take time to compute
Talk to Logan about maybe using their resources
===============================================================================
Melting point prediction formula, vibration vs cohesive energy
- One figure showing gradual improvement in r2 as incorporate vegard's, etc.
- Conclude that amorphous metals are behaving like hard spheres.
- Using vegard's law as a sort of stacked model?
Can force model to use some physics
pugh criteria? -> Looking for new alloys with certain ductility
Lindamen criteria, calculation of melting point
could float as potential future applications, looking at theory and deviations therefrom
Mixing criteria? How do you / can you get this from magpie features?
adding/extending magpie, need mixing enthalpies, bonding stuff etc
separation of close packed structures could be useful/powerful
--> After removing outliers, look at significant deviations, see why they are special?
Looking into #atoms in asymmetric unit cell, and using as a complexity parameter
===============================================================================
https://homes.cs.washington.edu/~marcotcr/blog/lime/
https://github.com/marcotcr/lime/
https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV
===============================================================================
2/19/19
https://towardsdatascience.com/decision-tree-ensembles-bagging-and-boosting-266a8ba60fd9
http://blog.datadive.net/prediction-intervals-for-random-forests/
Random forest = Bagging / Bootstrap Aggregation
Vegards + Difference = Boosting. (ensemble technique, sequential fitting and fitting error form prior trees)
Issues with over fitting
Must tune hyper parameters
===============================================================================
Is it ever worth thinking about MG search as an unsupervised learning problem? Probably not, since we need to supply the result of glass/not glass
Density typically scales/trends with what information? We should make sure not to bias by that, but also can we re-gain that from features?
- Young's modulus? https://royalsocietypublishing.org/doi/pdf/10.1098/rspa.1995.0075
===============================================================================
Looking at repeatability of rfecv
-Density (MG) initial RFECV, repeated trials
Optimal features: [136, 48, 16, 16, 30]
-Density (MG) differene model
Optimal features: [4, 24