class: center, middle, inverse, title-slide .title[ # Classification of the Residues after High and Low Order Explosions ] .subtitle[ ## using Machine Learning Techniques on Fourier Transform Infrared (FTIR) spectra ] .author[ ### Krzysztof Banas
SSLS
] .date[ ### 2023-03-30 ] --- class: inverse center middle # Introduction --- ## SPF Project .pull-left[ * in collaboration with SPF * high and low order explosions * debris collected * analysed with infrared spectroscopy: FTIR spectrometer ] .pull-right[  ] --- ## Experiments .pull-left[ - KBr pellets - mid-IR range - multiple spectra measured - spectral signatures from high energetic material - ... but also from the surface material (paper, plastic, textile) ] .pull-right[ ]  --- ## Motivation for this sub-project .pull-left[ * check which materials are more suitable as sample collectors * check what is the influence of the order of explosion on the classification * investigate dimension reduction method in combination with discrimination * building classification model: combined PCA-LDA ] .pull-right[  ] --- class: inverse center middle # Results --- ## Spectral Data .pull-left[ * typically more variables than observations * variables are highly correlated and auto-correlated * before modeling need for dimension reduction * pre-processing is crucial ] .pull-right[  ] --- ## Pre-processing .pull-left[  ] .pull-right[  ] --- class: inverse center middle ## Dimension Reduction --- ## How PCA works ] --- ## How PCA works ] --- ## PCA on decathlon dataset
--- ## Correlation between variables <img src="index_files/figure-html/unnamed-chunk-2-1.png" width="100%" /> --- ## Principal Components: screeplot and scoreplot
--- <img src="index_files/figure-html/unnamed-chunk-4-1.png" width="100%" /> --- ## Correlation between a variable and a principal component
--- ## Variable Correlation Plot <img src="index_files/figure-html/unnamed-chunk-6-1.png" width="100%" /> --- ## Quality of Representation (cos<sup>2</sup>)
--- ## Correlation Plot .pull-left[ <img src="index_files/figure-html/unnamed-chunk-8-1.png" width="100%" /> ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-9-1.png" width="100%" /> ] --- ## Contributions of variables to PC1 <img src="index_files/figure-html/unnamed-chunk-10-1.png" width="100%" /> --- ## Contributions of variables to PC2 <img src="index_files/figure-html/unnamed-chunk-11-1.png" width="100%" /> --- ## Contributions of variables to PC1 and PC2 <img src="index_files/figure-html/unnamed-chunk-12-1.png" width="100%" /> --- ## Principal Components: scoreplot <img src="index_files/figure-html/unnamed-chunk-13-1.png" width="100%" /> --- ## PCA: Practical Applications - Finance: analyze stock market data to identify patterns - Image compression: compress digital images - Genetics: analyze genetic data to identify patterns - Social media: social media data to identify trends in user behavior and preferences - Marketing: consumer data to identify patterns in purchasing behavior and preferences - Healthcare: patient data to identify risk factors and predict outcomes - Quality control: manufacturing data to identify patterns in product quality and performance - Agriculture: crop data to identify patterns in yields and optimize farming practices - Climate research: climate data to identify patterns in weather patterns and predict future climate trends - Sports: player data to identify key factors that contribute to winning games and championships. --- ## PCA for spectrosopic data .pull-left[ - dimension reduction - removing correlation between variables - finding patters - unsupervised method - *lossy technique* - *scaling variables can yield different results* - *problems with interpretation: PCs are linear combination of original features* ] .pull-right[ ] --- ## Spectral dataset
--- ## Loadings and contribution plot .pull-left[  ] .pull-right[ - first two PCA - all wavenumbers - colour coded contribution cos<sup>2</sup> contribution to PC1 and PC2 ] --- ## Scoreplots .pull-left[  ] .pull-right[ - first two PCA - all observations (spectra) - colour coded class membership (explosive and order) ] --- class: inverse center middle ## Identification/Classification/Grouping --- ## 2D partitions <img src="FIGURES/partition_plot.png" width="800px" style="position:absolute; left:10px; top:130px;"> --- ## Building Classification Model - first: use PCA on spectral data for Dimension Reduction - then: use LDA to build the Classification Model using a first few PC's - finally: test the Classification Model with new data --- background-image: url("FIGURES/LDA_00.png") background-position: 50% 50% background-size: 40% 60% ## How LDA works<sup>1</sup> .footnote[[1] Source: Linear discriminant analysis : a detailed tutorial Gaber, T, Tharwat, A, Ibrahim, A and Hassanien, AE AI Communications (2017)] --- ## Data Preparation for LDA - **Classification Problems** applied to classify the categorical output variable (suitable for binary and multi-class classification problems) - **Gaussian Distribution** The standard LDA model applies the Gaussian Distribution of the input variables. One should review the univariate distribution of each variable and transform them into more Gaussian-looking distributions (use log and root for exponential distributions and Box-Cox for skewed distributions) - **Remove Outliers** It is good to firstly remove the outliers from your data because these outliers can skew the basic statistics used to separate classes in LDA, such as the mean and the standard deviation - **Same Variance** As LDA always assumes that all the input variables have the same variance, hence it is always a better way to firstly standardize the data before implementing an LDA model. By this, the mean will be 0, and it will have a standard deviation of 1. --- class: inverse center middle ## LDA on principal components --- ## Discriminant functions plot .pull-left[  ] .pull-right[ - first two linear discriminant functions - all observations (originaly spectra) - colour coded class membership (explosive and order) ] --- ## Discriminant functions plot .pull-left[  ] .pull-right[ - first two linear discriminant functions - all observations (originaly spectra) - colour coded class membership (explosive and order) ] --- ## Comparison .pull-left[ ### PCA - unsupervised - maximize the variance in the given dataset - good performer for a comparatively small sample size ] .pull-right[ ### LDA - supervised - find the linear discriminants to represent the axes that maximize separation between different classes - more suitable for multi-class classification tasks ] --- class: inverse center middle # Conclusions --- ## Take home message - Spectroscopic data are multidimensional and highly correlated -- - PCA "transfers" most variability of the system to the first few principal components -- - PCA removes correlation between variables -- - For classification model we need another method: LDA -- - With PCs as input variables LDA works well in predicting the class membership -- - We can validate model with various statistical methods, for example Leave-One-Out --- ## Where to find: .pull-left[ ### Manuscript:  ] .pull-right[ ### Slides: https://krzbanas.github.io/2023-03-30_SSLS_Seminar  ] --- class: center, middle # Thank You! Slides created via the R packages: [**xaringan**](https://github.com/yihui/xaringan)<br> [gadenbuie/xaringanthemer](https://github.com/gadenbuie/xaringanthemer) The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](http://yihui.name/knitr), and [R Markdown](https://rmarkdown.rstudio.com).