A Java implementation of the ID3 decision tree construction algorithm, built for the ECS629U Artificial Intelligence module at Queen Mary University of London.
Finding attributes with high entropy:
- start with a compressing transform (e.g. time/frequency for sounds, wavelet or DCT for images, movement vectors for video,…) to reduce redundancy
- for numeric attributes, variance can often be used as a good surrogate for entropy, and is easier to compute
Finding features with high mutual information with the classes:
- expert knowledge may be available
- Common sense works! e.g., don’t classify text documents based on frequency of “a”, “the”, “and”, ...