why to reduce the number of the features? having d features, we want to reduce their number to n,...
TRANSCRIPT
![Page 1: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/1.jpg)
Why to reduce the number of the features?
• Having D features, we want to reduce their number to n, where n<<D
• Benefits: Lower computing complexity
Improvement of the classification performance
• Danger: Possible loss of information
![Page 2: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/2.jpg)
Basic approaches to DR
• Feature extraction
Transform t : RD Rn
Creation of a new feature space. The features lose their original meaning.
• Feature selection
Selection of a subset of the original features.
![Page 3: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/3.jpg)
Principal Component Transform (Karhunen-Loeve)
• PCT belongs to “feature extraction”, t is a rotation
y = Tx,
T is a matrix of the eigenvectors of the original covariance matrix Cx
• PCT creates D new uncorrelated features y,
Cy = T’ Cx T
• n features with the highest variations are kept
![Page 4: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/4.jpg)
Principal Component Transform
![Page 5: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/5.jpg)
Applications of the PCT
• “Optimal” data representation, compaction of the energy
• Visualization and compression of multimodal images
![Page 6: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/6.jpg)
PCT of multispectral imagesPCT of multispectral images
Satellite image: B, G, R, nIR, IR, thermal IR
![Page 7: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/7.jpg)
Why is PCT bad for classification purposes?
PCT evaluates the contribution of individual features solely by their variation, which may be different from their discrimination power.
![Page 8: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/8.jpg)
Why is PCT bad for classification purposes?
![Page 9: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/9.jpg)
Separability problem
• Dimensionality reduction methods for classification purposes (Two-class problem) must consider the discrimination power of individual features.
• The goal is to maximize the “distance” between the classes.
![Page 10: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/10.jpg)
An Example3 classes, 3D feature space, reduction to 2D
High discriminability Low discriminability
![Page 11: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/11.jpg)
DR via feature selection
Two things needed:
• Discriminability measure
(Mahalanobis distance, Bhattacharyya distance)
MD12 = (m1 – m2)(C1 + C2)-1 (m1 – m2)’
• Selection strategy
Feature selection optimization problem
![Page 12: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/12.jpg)
Feature selection strategies
• Optimal - full search, complexity D!/(D-n)!n! - branch & bound
• Sub-optimal - direct selection (optimal if the features are not
correlated) - sequential selection (SFS, SBS) - generalized sequential selection (SFS(k), Plus
k minus m, floating search)
![Page 13: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/13.jpg)
A priori knowledge in feature selection
• The above discriminability measures (MD, BD)
require normally distributed classes.
They are misleading and inapplicable otherwise.
![Page 14: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/14.jpg)
A priori knowledge in feature selection
• The above discriminability measures (MD, BD)
require normally distributed classes.
They are misleading and inapplicable otherwise.
• Crucial questions in practical applications:
Can the class-condicional distributions be assumed to be normal?
What happens if this assumption is wrong?
![Page 15: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/15.jpg)
A two-class example
Class 2Class 1
x2 is selected
x1 is selected
![Page 16: Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c489fb/html5/thumbnails/16.jpg)
Conclusion
• PCT is optimal for representation of “one-class” data (visualization, compression, etc).
• PCT should not be used for classification purposes. Use feature selection methods based on a proper discriminability measure.
• If you still use PCT before classification, be aware of possible errors.