Previous
Dimension Reduction
Contents
Table of Contents
Next
Anomaly Detection

12.4. Feature Extraction Method 1: Principal Components Analysis

As introduced in the previous section, PCA is one of the most important dimensionality reduction methods. In fact, it is one of the most popular dimensionality reduction algorithms. Despite the fragmented nature of dimensionality reduction knowledge, PCA is viewed as a must-know by most people. It has wide applications in data compression, redundancy removal, and noise reduction. In the following, the concept and basic idea of PCA will be introduced first. Then, the theoretical basis, especially the deduction of major equations for PCA's implementations will be introduced. Next, the procedure and guidelines for implementing PCA will be presented. Lastly, kernel PCA as a way to extend PCA from a linear dimensionality reduction method into nonlinear ones will be discussed.

12.4.1. Concept and Main Idea

From the name of PCA, i.e., primary component(s) analysis, we can guess that the major idea of PCA is to find out the major components of data and represent the data only with these major components for the purpose of dimensionality reduction. In a more strict mathematical description, we attempt to implement dimensionality reduction for data with J J JJJ dimensions. Because the data is unlabeled, each data point or called instance will have J J JJJ features (or attributes). Please be aware that the data will have multiple data points, say I I III, which is the number of instances. This number of data points cannot be confused with the dimensionality of the data, though both can determine the total size of the data. Our goal of dimensionality reduction is to reduce the dimension J J JJJ rather than the number of instances I I III. Specificially, our goal is to reduce the dimensionality of the data, J J JJJ, into a small one, J ( J < J ) J J < J J^(')(J^(') < J)J^{\prime}\left(J^{\prime}<J\right)J(J<J). This dimensionality reduction is conducted in the hope that the data after dimensionality reduction can still represent the original data. That is, we hope the loss of essential information can be minimized while reducing the data dimensions.
Next, let us use a simple example to illustrate the above concepts and understand what it means by reducing dimensionality and finding major components. In the example shown in Fig. 12.1, we hope to reduce the dimensionality of the data represented by the points. The data is plotted in a two-dimensional space, i.e., have two axes, and accordingly, each data point has two elements (coordinates e 1 e 1 vec(e)_(1)\vec{e}_{1}e1 and e 2 e 2 vec(e)_(2)\vec{e}_{2}e2 ). Obviously, we can and only can reduce the data dimensionality from 2 to 1 . Thus, the purpose of PCA, in this case, is to find a direction or an axis to which the data can be projected. Therefore, a key to PCA is how to find this direction/axis as the new dimension, which can also be understood as the principal component in this PCA example.
Instead of revealing which direction is the best directly, let us see two candidate directions, i.e., w 1 w 1 vec(w)_(1)\vec{w}_{1}w1 and w 2 w 2 vec(w)_(2)\vec{w}_{2}w2, which represent the directions of the two straight lines pass through the center the data points. A comparison of the two directions could give us some hints for identifying better directions for projection. Many of us, in fact, can point out that w 1 w 1 vec(w)_(1)\vec{w}_{1}w1 is a better direction: when data is projected onto this direction or axis, more information of the original data can be retained. But why?
There are two explanations. The first explanation is that the data points are closer to this axis. In a more strict description, we can state that the total distance between all the points and this axis is smaller. The second explanation is that the projected points on w 1 w 1 vec(w)_(1)\vec{w}_{1}w1 are more separated from each other. That is, the variance of the projected data is higher, which can facilitate the differentiation between the projected data. In fact, these two explanations can be generalized as the criteria for dimensionality reduction with PCA. But, in a more general case, we attempt to find J ( 1 J < J ) J 1 J < J J^(')(1 <= J^(') < J)J^{\prime}\left(1 \leqslant J^{\prime}<J\right)J(1J<J) directions (axes) for projection so that the criteria can be met in the best way. In the later deduction, we will also show that the two criteria are equivalent: we can use either of them to find the same best directions for projections (or understood
Figure 12.1: Main idea of dimensionality reduction
as the best components).

12.4.2. Theoretical Basis

Deduction based on Minimum Distance

As mentioned above, there are two criteria for selecting the best projection direction(s) or the principal components. Let us first see how to establish the theoretical basis based on the criterion of minimum distance. In order to obtain a general deduction, we will use generalized data and space. The 2D data and space in Fig. 12.1 may still be used to assist us in understanding the process. However, please be aware that the distance and the axis/plane to which the data is projected will be generalized into the Euclidean distance and hyperplane, which are not easy to visualize as those lines and planes in 1D and 2D spaces.
We will start with data consisting of I I III data points (instance), and each data point has J J JJJ dimensions (attributes). Accordingly, we can use a matrix X ¯ X ¯ bar(X)\bar{X}X¯ as follows to represent the data
(12.1) X ¯ = [ x 1 T , x 2 T , , x I T ] T = [ x 11 x 12 x 1 J x I 1 x I 2 x I J ] I × J (12.1) X ¯ = x 1 T , x 2 T , , x I T T = x 11 x 12 x 1 J x I 1 x I 2 x I J I × J {:(12.1) bar(X)=[[ vec(x)_(1)^(T)",", vec(x)_(2)^(T)",",cdots",", vec(x)_(I)^(T)]]^(T)=[[x_(11),x_(12),cdotsx_(1J)],[vdots,ddots,vdots],[x_(I1),x_(I2),cdotsx_(IJ)]]_(I xx J):}\bar{X}=\left[\begin{array}{llll} \vec{x}_{1}^{T}, & \vec{x}_{2}^{T}, & \cdots, & \vec{x}_{I}^{T} \end{array}\right]^{T}=\left[\begin{array}{ccc} x_{11} & x_{12} & \cdots x_{1 J} \tag{12.1}\\ \vdots & \ddots & \vdots \\ x_{I 1} & x_{I 2} & \cdots x_{I J} \end{array}\right]_{I \times J}(12.1)X¯=[x1T,x2T,,xIT]T=[x11x12x1JxI1xI2xIJ]I×J
where a data point x i x i vec(x)_(i)\vec{x}_{i}xi has J J JJJ attributes and is formulated using a column array as [ x i 1 , x i 2 , , x i J ] T x i 1 , x i 2 , , x i J T [x_(i1),x_(i2),cdots,x_(iJ)]^(T)\left[x_{i 1}, x_{i 2}, \cdots, x_{i J}\right]^{T}[xi1,xi2,,xiJ]T.
It should be noted that PCA takes zero-centered data as the input. In general, data like that in Fig. 12.1 is not zero-centered. If that is the case, we will need to take the following action for zero-centering.
(12.2) x i x i 1 I i = 1 I x i (12.2) x i x i 1 I i = 1 I x i {:(12.2) vec(x)_(i)larr vec(x)_(i)-(1)/(I)sum_(i=1)^(I) vec(x)_(i):}\begin{equation*} \vec{x}_{i} \leftarrow \vec{x}_{i}-\frac{1}{I} \sum_{i=1}^{I} \vec{x}_{i} \tag{12.2} \end{equation*}(12.2)xixi1Ii=1Ixi
After this preprocessing operation, the result is that the origin of the coordinate system will be moved to the center of the data. In Fig. 12.1, the origin is moved to the data center marked by the intersection of w 1 w 1 vec(w)_(1)\vec{w}_{1}w1 and w 2 w 2 vec(w)_(2)\vec{w}_{2}w2. The following deduction will be performed with the zero-centered data. For such data, the following condition is met:
(12.3) i = 1 I x i = 0 (12.3) i = 1 I x i = 0 {:(12.3)sum_(i=1)^(I) vec(x)_(i)=0:}\begin{equation*} \sum_{i=1}^{I} \vec{x}_{i}=0 \tag{12.3} \end{equation*}(12.3)i=1Ixi=0
To generate the low-dimensional space, we first rotate the current coordinate system to obtain a new one W ¯ 0 = W ¯ 0 = bar(W)_(0)=\bar{W}_{0}=W¯0= [ w 1 , w 2 , , w J ] ( J × J ) w 1 , w 2 , , w J ( J × J ) [ vec(w)_(1), vec(w)_(2),cdots, vec(w)_(J)]_((J xx J))\left[\vec{w}_{1}, \vec{w}_{2}, \cdots, \vec{w}_{J}\right]_{(J \times J)}[w1,w2,,wJ](J×J), in which w j = [ w 1 j , w 2 j , , w J j ] T w j = w 1 j , w 2 j , , w J j T vec(w)_(j)=[w_(1j),w_(2j),cdots,w_(Jj)]^(T)\vec{w}_{j}=\left[w_{1 j}, w_{2 j}, \cdots, w_{J j}\right]^{T}wj=[w1j,w2j,,wJj]T. Each vector w j w j vec(w)_(j)\vec{w}_{j}wj represents an axis, which can also be understood as a direction or a dimension. w j w j vec(w)_(j)\vec{w}_{j}wj has J J JJJ components, which are the projections of the unit vector along this
axis onto the J J JJJ axes of the original coordinate system. This new coordinate system also has J J JJJ dimensions, which are represented by an orthonormal basis consisting of J J JJJ unit vectors that are orthogonal to each other. Therefore, we should have the following relationships for any pair of vectors from this basis: w j T w j = w j = 1 w j T w j = w j = 1 vec(w)_(j)^(T)* vec(w)_(j)=|| vec(w)_(j)||=1\vec{w}_{j}^{T} \cdot \vec{w}_{j}=\left\|\vec{w}_{j}\right\|=1wjTwj=wj=1 and w j T w j = 0 w j T w j = 0 vec(w)_(j)^(T)* vec(w)_(j^('))=0\vec{w}_{j}^{T} \cdot \vec{w}_{j^{\prime}}=0wjTwj=0 when j j j j j!=j^(')j \neq j^{\prime}jj.
However, our purpose is not to obtain this new coordinate system. Instead, we aim to find the best projection directions in a lower-dimensional space. That is, we attempt to reduce the dimensionality from J J JJJ to J ( J J ) J J J J^(')(J <= J^('))J^{\prime}\left(J \leqslant J^{\prime}\right)J(JJ), so we will need to drop several axes in the above new coordinate system so that we will have W ¯ = [ w 1 , w 2 , , w J ] J × J W ¯ = w 1 , w 2 , , w J J × J bar(W)=[ vec(w)_(1), vec(w)_(2),cdots, vec(w)_(J^('))]_(J xxJ^('))\bar{W}=\left[\vec{w}_{1}, \vec{w}_{2}, \cdots, \vec{w}_{J^{\prime}}\right]_{J \times J^{\prime}}W¯=[w1,w2,,wJ]J×J. For this projection array W ¯ W ¯ bar(W)\bar{W}W¯, we have the following relationships because it consists of an orthonormal basis: W ¯ ( J × J ) T W ¯ ( J × J ) = W ¯ J × J T W ¯ J × J = bar(W)_((J^(')xx J))^(T)* bar(W)_((J xxJ^(')))=\bar{W}_{\left(J^{\prime} \times J\right)}^{T} \cdot \bar{W}_{\left(J \times J^{\prime}\right)}=W¯(J×J)TW¯(J×J)= I ¯ ( J × J ) I ¯ J × J bar(I)_((J^(')xxJ^(')))\bar{I}_{\left(J^{\prime} \times J^{\prime}\right)}I¯(J×J) and W ¯ ( J × J ) W ¯ ( J × J ) T = I ¯ ( J × J ) W ¯ J × J W ¯ J × J T = I ¯ ( J × J ) bar(W)_((J xxJ^(')))* bar(W)_((J^(')xx J))^(T)= bar(I)_((J xx J))\bar{W}_{\left(J \times J^{\prime}\right)} \cdot \bar{W}_{\left(J^{\prime} \times J\right)}^{T}=\bar{I}_{(J \times J)}W¯(J×J)W¯(J×J)T=I¯(J×J), in which I ¯ I ¯ bar(I)\bar{I}I¯ is an identity matrix.
This reduced coordinate system with J J J^(')J^{\prime}J axes represents a new lower-dimensional space. Alternatively, we can understand it as a subspace formed by the reduced axes, to which the data in the high-dimensional space is projected. We know a projection can be obtained as the dot product of the tensor (data as an array) to be projected and the tensor representing the direction. Therefore, we can obtain the data in the new low-dimensional space as
(12.4) Z ¯ ( I × J ) = [ z 1 T , z 2 T , , z I T ] T = X ¯ ( I × J ) W ¯ ( J × J ) (12.4) Z ¯ I × J = z 1 T , z 2 T , , z I T T = X ¯ ( I × J ) W ¯ J × J {:(12.4) bar(Z)_((I xxJ^(')))=[ vec(z)_(1)^(T), vec(z)_(2)^(T),cdots, vec(z)_(I)^(T)]^(T)= bar(X)_((I xx J))* bar(W)_((J xxJ^('))):}\begin{equation*} \bar{Z}_{\left(I \times J^{\prime}\right)}=\left[\vec{z}_{1}^{T}, \vec{z}_{2}^{T}, \cdots, \vec{z}_{I}^{T}\right]^{T}=\bar{X}_{(I \times J)} \cdot \bar{W}_{\left(J \times J^{\prime}\right)} \tag{12.4} \end{equation*}(12.4)Z¯(I×J)=[z1T,z2T,,zIT]T=X¯(I×J)W¯(J×J)
where the projected data point z i z i vec(z)_(i)\vec{z}_{i}zi is essentially represented by the coordinates of this data point in the new low-dimensional space. The array for the new system W ¯ W ¯ bar(W)\bar{W}W¯ has the dimensions (or called size in the general context of arrays) of J × J J × J J xxJ^(')J \times J^{\prime}J×J. Thus, for each data point, we have
(12.5) z i ( J × 1 ) = W ¯ ( J × J ) T x i ( J × 1 ) (12.5) z i J × 1 = W ¯ J × J T x i ( J × 1 ) {:(12.5) vec(z)_(i(J^(')xx1))= bar(W)_((J^(')xx J))^(T)* vec(x)_(i(J xx1)):}\begin{equation*} \vec{z}_{i\left(J^{\prime} \times 1\right)}=\bar{W}_{\left(J^{\prime} \times J\right)}^{T} \cdot \vec{x}_{i(J \times 1)} \tag{12.5} \end{equation*}(12.5)zi(J×1)=W¯(J×J)Txi(J×1)
If we project the data in the low-dimensional space back to the original high-dimensional space, we can take the following operation:
(12.6) X ¯ ^ ( I × J ) = Z ¯ ( I × J ) W ¯ ( J × J ) T (12.6) X ¯ ^ ( I × J ) = Z ¯ I × J W ¯ J × J T {:(12.6) hat(bar(X))_((I xx J))= bar(Z)_((I xxJ^(')))* bar(W)_((J^(')xx J))^(T):}\begin{equation*} \hat{\bar{X}}_{(I \times J)}=\bar{Z}_{\left(I \times J^{\prime}\right)} \cdot \bar{W}_{\left(J^{\prime} \times J\right)}^{T} \tag{12.6} \end{equation*}(12.6)X¯^(I×J)=Z¯(I×J)W¯(J×J)T
Or, we can write this equation in terms of individual data points as follows:
(12.7) x ^ i ( J × 1 ) = W ¯ ( J × J ) z i ( J × 1 ) (12.7) x ^ i ( J × 1 ) = W ¯ J × J z i J × 1 {:(12.7) hat(vec(x))_(i(J xx1))= bar(W)_((J xxJ^(')))* vec(z)_(i(J^(')xx1)):}\begin{equation*} \hat{\vec{x}}_{i(J \times 1)}=\bar{W}_{\left(J \times J^{\prime}\right)} \cdot \vec{z}_{i\left(J^{\prime} \times 1\right)} \tag{12.7} \end{equation*}(12.7)x^i(J×1)=W¯(J×J)zi(J×1)
where X ¯ ^ ( I × J ) X ¯ ^ ( I × J ) hat(bar(X))_((I xx J))\hat{\bar{X}}_{(I \times J)}X¯^(I×J) are the coordinates of the projected data in the original high-dimensional space. For example, x i x i vec(x)_(i)\vec{x}_{i}xi is the coordinates of data point i i iii in the original high-dimensional space, z i z i vec(z)_(i)\vec{z}_{i}zi is the projection of this data point in the lowdimensional space (coordinates of the projected point in the new coordinate system), and x ^ i x ^ i hat(vec(x))_(i)\hat{\vec{x}}_{i}x^i is the coordinates of this projected point in the original high-dimensional space. Then, the distance between the original data point and the 'hyperplane' is the distance between the coordinates of the original point and its projected point in the same (high-dimensional) space: x i x ^ i 2 x i x ^ i 2 || vec(x)_(i)- hat(vec(x))_(i)||_(2)\left\|\vec{x}_{i}-\hat{\vec{x}}_{i}\right\|_{2}xix^i2. According to the minimum distance criterion, we will need to find a 'hyperplane' (coordinate systems of low-dimensional space W ¯ W ¯ bar(W)\bar{W}W¯ ), for which the total distance between all the points and the 'hyperplane' is the smallest. That is, we will need to minimize the following equation
(12.8) i = 1 I x i x ^ i 2 (12.8) i = 1 I x i x ^ i 2 {:(12.8)sum_(i=1)^(I)|| vec(x)_(i)- hat(vec(x))_(i)||_(2):}\begin{equation*} \sum_{i=1}^{I}\left\|\vec{x}_{i}-\hat{\vec{x}}_{i}\right\|_{2} \tag{12.8} \end{equation*}(12.8)i=1Ixix^i2
where 2 2 ||||_(2):}\left\|\|_{2}\right.2 is the 2 2 ℓ-2\ell-22 norm (or called Euclidean norm), which is defined as e i 2 = e i 1 2 + e i 2 2 + + e i I 2 e i 2 = e i 1 2 + e i 2 2 + + e i I 2 || vec(e)_(i)||_(2)=sqrt(e_(i1)^(2)+e_(i2)^(2)+cdots+e_(iI)^(2))\| \vec{e}_{i} \|_{2}=\sqrt{e_{i 1}^{2}+e_{i 2}^{2}+\cdots+e_{i I}^{2}}ei2=ei12+ei22++eiI2.
Next, we can reformulate this equation based on the above relationships and W ¯ T W ¯ = I ( J × J ) W ¯ T W ¯ = I J × J bar(W)^(T)* bar(W)=I_((J xxJ^(')))\bar{W}^{T} \cdot \bar{W}=I_{\left(J \times J^{\prime}\right)}W¯TW¯=I(J×J) as follows
i = 1 I x i x ^ i 2 = i = 1 I x i W ¯ z i 2 = i = 1 I x i T x i 2 i = 1 I ( W ¯ z i ) T x i + i = 1 I ( W ¯ z i ) T ( W ¯ z i ) (12.9) = i = 1 I x i T x i 2 i = 1 I z i T ( W ¯ T x i ) + i = 1 I z i T z i = i = 1 I x i T x i i = 1 I z i T z i = i = 1 I j = 1 J x j i x i j i = 1 I j = 1 J z j i z i j = X ¯ T X ¯ Z ¯ T Z ¯ i = 1 I x i x ^ i 2 = i = 1 I x i W ¯ z i 2 = i = 1 I x i T x i 2 i = 1 I W ¯ z i T x i + i = 1 I W ¯ z i T W ¯ z i (12.9) = i = 1 I x i T x i 2 i = 1 I z i T W ¯ T x i + i = 1 I z i T z i = i = 1 I x i T x i i = 1 I z i T z i = i = 1 I j = 1 J x j i x i j i = 1 I j = 1 J z j i z i j = X ¯ T X ¯ Z ¯ T Z ¯ {:[sum_(i=1)^(I)|| vec(x)_(i)- hat(vec(x))_(i)||_(2)=sum_(i=1)^(I)|| vec(x)_(i)-( bar(W))* vec(z)_(i)||_(2)],[=sum_(i=1)^(I) vec(x)_(i)^(T)* vec(x)_(i)-2sum_(i=1)^(I)(( bar(W))* vec(z)_(i))^(T)* vec(x)_(i)+sum_(i=1)^(I)(( bar(W))* vec(z)_(i))^(T)*(( bar(W))* vec(z)_(i))],[(12.9)=sum_(i=1)^(I) vec(x)_(i)^(T)* vec(x)_(i)-2sum_(i=1)^(I) vec(z)_(i)^(T)*( bar(W)^(T)* vec(x)_(i))+sum_(i=1)^(I) vec(z)_(i)^(T)* vec(z)_(i)],[=sum_(i=1)^(I) vec(x)_(i)^(T)* vec(x)_(i)-sum_(i=1)^(I) vec(z)_(i)^(T)* vec(z)_(i)],[=sum_(i=1)^(I)sum_(j=1)^(J)x_(ji)x_(ij)-sum_(i=1)^(I)sum_(j=1)^(J^('))z_(ji)z_(ij)= bar(X)^(T)* bar(X)- bar(Z)^(T)* bar(Z)]:}\begin{align*} \sum_{i=1}^{I}\left\|\vec{x}_{i}-\hat{\vec{x}}_{i}\right\|_{2} & =\sum_{i=1}^{I}\left\|\vec{x}_{i}-\bar{W} \cdot \vec{z}_{i}\right\|_{2} \\ & =\sum_{i=1}^{I} \vec{x}_{i}^{T} \cdot \vec{x}_{i}-2 \sum_{i=1}^{I}\left(\bar{W} \cdot \vec{z}_{i}\right)^{T} \cdot \vec{x}_{i}+\sum_{i=1}^{I}\left(\bar{W} \cdot \vec{z}_{i}\right)^{T} \cdot\left(\bar{W} \cdot \vec{z}_{i}\right) \\ & =\sum_{i=1}^{I} \vec{x}_{i}^{T} \cdot \vec{x}_{i}-2 \sum_{i=1}^{I} \vec{z}_{i}^{T} \cdot\left(\bar{W}^{T} \cdot \vec{x}_{i}\right)+\sum_{i=1}^{I} \vec{z}_{i}^{T} \cdot \vec{z}_{i} \tag{12.9}\\ & =\sum_{i=1}^{I} \vec{x}_{i}^{T} \cdot \vec{x}_{i}-\sum_{i=1}^{I} \vec{z}_{i}^{T} \cdot \vec{z}_{i} \\ & =\sum_{i=1}^{I} \sum_{j=1}^{J} x_{j i} x_{i j}-\sum_{i=1}^{I} \sum_{j=1}^{J^{\prime}} z_{j i} z_{i j}=\bar{X}^{T} \cdot \bar{X}-\bar{Z}^{T} \cdot \bar{Z} \end{align*}i=1Ixix^i2=i=1IxiW¯zi2=i=1IxiTxi2i=1I(W¯zi)Txi+i=1I(W¯zi)T(W¯zi)(12.9)=i=1IxiTxi2i=1IziT(W¯Txi)+i=1IziTzi=i=1IxiTxii=1IziTzi=i=1Ij=1Jxjixiji=1Ij=1Jzjizij=X¯TX¯Z¯TZ¯
where X ¯ T X ¯ X ¯ T X ¯ bar(X)^(T)* bar(X)\bar{X}^{T} \cdot \bar{X}X¯TX¯ is a double contraction operation, which will "consolidate" two axes/orders corresponding to the two variables involved in the operation. For example, X ¯ T X ¯ T bar(X)^(T)\bar{X}^{T}X¯T and X ¯ X ¯ bar(X)\bar{X}X¯ are both second-order tensors, or can be viewed as secondorder arrays, and their double contraction result will be a scalar. To further the deduction, we will recall the one property of the trace of the product of two matrices: tr ( A ¯ B ¯ ) = j = 1 J ( A ¯ B ¯ ) j j = j = 1 J ( i = 1 I A i j B j i ) = A ¯ B ¯ tr ( A ¯ B ¯ ) = j = 1 J ( A ¯ B ¯ ) j j = j = 1 J i = 1 I A i j B j i = A ¯ B ¯ tr( bar(A)* bar(B))=sum_(j=1)^(J)( bar(A)* bar(B))_(jj)=sum_(j=1)^(J)(sum_(i=1)^(I)A_(ij)B_(ji))= bar(A)* bar(B)\operatorname{tr}(\bar{A} \cdot \bar{B})=\sum_{j=1}^{J}(\bar{A} \cdot \bar{B})_{j j}=\sum_{j=1}^{J}\left(\sum_{i=1}^{I} A_{i j} B_{j i}\right)=\bar{A} \cdot \bar{B}tr(A¯B¯)=j=1J(A¯B¯)jj=j=1J(i=1IAijBji)=A¯B¯. Hence, the equation becomes
i = 1 I x i x ^ i 2 = i = 1 I x i T x i i = 1 I z i T z i (12.10) = X ¯ T X ¯ Z ¯ T Z ¯ = tr ( X ¯ T X ¯ ) tr ( Z ¯ T Z ¯ ) = tr ( X ¯ X ¯ T ) tr ( Z ¯ Z ¯ T ) i = 1 I x i x ^ i 2 = i = 1 I x i T x i i = 1 I z i T z i (12.10) = X ¯ T X ¯ Z ¯ T Z ¯ = tr X ¯ T X ¯ tr Z ¯ T Z ¯ = tr X ¯ X ¯ T tr Z ¯ Z ¯ T {:[sum_(i=1)^(I)|| vec(x)_(i)- hat(vec(x))_(i)||_(2)=sum_(i=1)^(I) vec(x)_(i)^(T)* vec(x)_(i)-sum_(i=1)^(I) vec(z)_(i)^(T)* vec(z)_(i)],[(12.10)= bar(X)^(T)cdots bar(X)- bar(Z)^(T)* bar(Z)],[=tr( bar(X)^(T)*( bar(X)))-tr( bar(Z)^(T)*( bar(Z)))],[=tr(( bar(X))* bar(X)^(T))-tr(( bar(Z))* bar(Z)^(T))]:}\begin{align*} \sum_{i=1}^{I}\left\|\vec{x}_{i}-\hat{\vec{x}}_{i}\right\|_{2} & =\sum_{i=1}^{I} \vec{x}_{i}^{T} \cdot \vec{x}_{i}-\sum_{i=1}^{I} \vec{z}_{i}^{T} \cdot \vec{z}_{i} \\ & =\bar{X}^{T} \cdots \bar{X}-\bar{Z}^{T} \cdot \bar{Z} \tag{12.10}\\ & =\operatorname{tr}\left(\bar{X}^{T} \cdot \bar{X}\right)-\operatorname{tr}\left(\bar{Z}^{T} \cdot \bar{Z}\right) \\ & =\operatorname{tr}\left(\bar{X} \cdot \bar{X}^{T}\right)-\operatorname{tr}\left(\bar{Z} \cdot \bar{Z}^{T}\right) \end{align*}i=1Ixix^i2=i=1IxiTxii=1IziTzi(12.10)=X¯TX¯Z¯TZ¯=tr(X¯TX¯)tr(Z¯TZ¯)=tr(X¯X¯T)tr(Z¯Z¯T)
The first term, i = 1 I x i T x i = tr ( X ¯ X ¯ T ) i = 1 I x i T x i = tr X ¯ X ¯ T sum_(i=1)^(I) vec(x)_(i)^(T)* vec(x)_(i)=tr(( bar(X))* bar(X)^(T))\sum_{i=1}^{I} \vec{x}_{i}^{T} \cdot \vec{x}_{i}=\operatorname{tr}\left(\bar{X} \cdot \bar{X}^{T}\right)i=1IxiTxi=tr(X¯X¯T), is a constant depending solely on the original data, so it is not related to projection. Therefore, only the second term controls the projection. Therefore, the PCA problem can be mathematically formulated as
(12.11) arg min w tr ( Z ¯ T Z ¯ ) s.t. W ¯ T W ¯ = I ( J × J ) (12.11) arg min w tr Z ¯ T Z ¯  s.t.  W ¯ T W ¯ = I J × J {:(12.11)arg min_( vec(w))-tr( bar(Z)^(T)*( bar(Z)))quad" s.t. "quad bar(W)^(T)* bar(W)=I_((J xxJ^('))):}\begin{equation*} \underset{\vec{w}}{\arg \min }-\operatorname{tr}\left(\bar{Z}^{T} \cdot \bar{Z}\right) \quad \text { s.t. } \quad \bar{W}^{T} \cdot \bar{W}=I_{\left(J \times J^{\prime}\right)} \tag{12.11} \end{equation*}(12.11)argminwtr(Z¯TZ¯) s.t. W¯TW¯=I(J×J)
The loss function tr ( Z ¯ T Z ¯ ) tr Z ¯ T Z ¯ -tr( bar(Z)^(T)*( bar(Z)))-\operatorname{tr}\left(\bar{Z}^{T} \cdot \bar{Z}\right)tr(Z¯TZ¯) to be minimized can be further reformulated as
(12.12) tr ( Z ¯ T Z ¯ ) = tr ( ( X ¯ W ¯ ) T ( X ¯ W ¯ ) ) = tr ( W ¯ T ( X ¯ T X ¯ ) W ¯ ) (12.12) tr Z ¯ T Z ¯ = tr ( X ¯ W ¯ ) T ( X ¯ W ¯ ) = tr W ¯ T X ¯ T X ¯ W ¯ {:(12.12)-tr( bar(Z)^(T)*( bar(Z)))=-tr((( bar(X))*( bar(W)))^(T)*(( bar(X))*( bar(W))))=-tr( bar(W)^(T)*( bar(X)^(T)*( bar(X)))*( bar(W))):}\begin{equation*} -\operatorname{tr}\left(\bar{Z}^{T} \cdot \bar{Z}\right)=-\operatorname{tr}\left((\bar{X} \cdot \bar{W})^{T} \cdot(\bar{X} \cdot \bar{W})\right)=-\operatorname{tr}\left(\bar{W}^{T} \cdot\left(\bar{X}^{T} \cdot \bar{X}\right) \cdot \bar{W}\right) \tag{12.12} \end{equation*}(12.12)tr(Z¯TZ¯)=tr((X¯W¯)T(X¯W¯))=tr(W¯T(X¯TX¯)W¯)
Using the above equation and the method of Lagrangian multipliers, we can convert the above constrained optimization problem into an unconstrained optimization problem with the following objective function (or loss function in this case):
(12.13) J ( W ¯ ) = tr ( W ¯ T ( X ¯ T X ¯ ) W ¯ + λ ( W ¯ T W ¯ I ¯ ) ) (12.13) J ( W ¯ ) = tr W ¯ T X ¯ T X ¯ W ¯ + λ W ¯ T W ¯ I ¯ {:(12.13)J( bar(W))=-tr( bar(W)^(T)*( bar(X)^(T)*( bar(X)))*( bar(W))+lambda( bar(W)^(T)*( bar(W))-( bar(I)))):}\begin{equation*} J(\bar{W})=-\operatorname{tr}\left(\bar{W}^{T} \cdot\left(\bar{X}^{T} \cdot \bar{X}\right) \cdot \bar{W}+\lambda\left(\bar{W}^{T} \cdot \bar{W}-\bar{I}\right)\right) \tag{12.13} \end{equation*}(12.13)J(W¯)=tr(W¯T(X¯TX¯)W¯+λ(W¯TW¯I¯))
The minimum value of the loss function is achieved when the derivation of the above function with respect to W ¯ W ¯ bar(W)\bar{W}W¯ is zero: ( X ¯ T X ¯ ) W ¯ + λ W ¯ = 0 X ¯ T X ¯ W ¯ + λ W ¯ = 0 -( bar(X)^(T)*( bar(X)))* bar(W)+lambda bar(W)=0-\left(\bar{X}^{T} \cdot \bar{X}\right) \cdot \bar{W}+\lambda \bar{W}=0(X¯TX¯)W¯+λW¯=0. Rearranging this equation, we obtain the following solution.
(12.14) ( X ¯ T X ¯ ) ( J × J ) W ¯ ( J × J ) = λ W ¯ ( J × J ) (12.14) X ¯ T X ¯ ( J × J ) W ¯ J × J = λ W ¯ J × J {:(12.14)( bar(X)^(T)*( bar(X)))_((J xx J)) bar(W)_((J xxJ^(')))=lambda bar(W)_((J xxJ^('))):}\begin{equation*} \left(\bar{X}^{T} \cdot \bar{X}\right)_{(J \times J)} \bar{W}_{\left(J \times J^{\prime}\right)}=\lambda \bar{W}_{\left(J \times J^{\prime}\right)} \tag{12.14} \end{equation*}(12.14)(X¯TX¯)(J×J)W¯(J×J)=λW¯(J×J)
The above equation corresponds to a typical eigenvalue problem. That is, W ¯ W ¯ bar(W)\bar{W}W¯ is the matrix consisting of the eigenvectors of X ¯ T X ¯ X ¯ T X ¯ bar(X)^(T)* bar(X)\bar{X}^{T} \cdot \bar{X}X¯TX¯, while λ λ lambda\lambdaλ is a diagonal matrix whose diagonal elements are the eigenvalues of X ¯ T X ¯ X ¯ T X ¯ bar(X)^(T)* bar(X)\bar{X}^{T} \cdot \bar{X}X¯TX¯ and the other elements are 0 . This provides a way of finding the best projection (array) in PCA. If we can obtain the projection array using the above equation, then we can use Z ¯ = X ¯ W ¯ Z ¯ = X ¯ W ¯ bar(Z)= bar(X)* bar(W)\bar{Z}=\bar{X} \cdot \bar{W}Z¯=X¯W¯ to easily obtain the data in the low-dimensional space.

Deduction based on Maximum Variance

The theoretical basis of PCA can also be established based on the criterion of the maximum variance, leading to the same equations for PCA implementation.
Let us start from the same zero-centered data X ¯ X ¯ bar(X)\bar{X}X¯ and try to project to a new low-dimensional space via a projection array W ¯ ( J × J ) W ¯ J × J bar(W)_((J xxJ^(')))\bar{W}_{\left(J \times J^{\prime}\right)}W¯(J×J). Thus, for any data point x i x i vec(x)_(i)\vec{x}_{i}xi, its projection in the new space (or coordinate system W ¯ W ¯ bar(W)\bar{W}W¯ ) is W ¯ T x i W ¯ T x i bar(W)^(T)* vec(x)_(i)\bar{W}^{T} \cdot \vec{x}_{i}W¯Txi. Then the covariance matrix of the projected data X ¯ W ¯ X ¯ W ¯ bar(X)* bar(W)\bar{X} \cdot \bar{W}X¯W¯ will be ( X ¯ W ¯ 0 ) T ( X ¯ W ¯ 0 ) = W ¯ T ( X ¯ T X ¯ ) W ¯ = Z ¯ T Z ¯ ( X ¯ W ¯ 0 ¯ ) T ( X ¯ W ¯ 0 ¯ ) = W ¯ T X ¯ T X ¯ W ¯ = Z ¯ T Z ¯ ( bar(X)* bar(W)- bar(0))^(T)*( bar(X)* bar(W)- bar(0))= bar(W)^(T)*( bar(X)^(T)*( bar(X)))* bar(W)= bar(Z)^(T)* bar(Z)(\bar{X} \cdot \bar{W}-\overline{0})^{T} \cdot(\bar{X} \cdot \bar{W}-\overline{0})=\bar{W}^{T} \cdot\left(\bar{X}^{T} \cdot \bar{X}\right) \cdot \bar{W}=\bar{Z}^{T} \cdot \bar{Z}(X¯W¯0)T(X¯W¯0)=W¯T(X¯TX¯)W¯=Z¯TZ¯. 0 0 vec(0)\overrightarrow{0}0 is the matrix of the mean values for variance calculation, whose elements are 0 because the data is zero-centered. The sum of the variance is the sum of the diagonals of this matrix: tr ( W ¯ T ( X ¯ T X ¯ ) W ¯ ) tr W ¯ T X ¯ T X ¯ W ¯ tr( bar(W)^(T)*( bar(X)^(T)*( bar(X)))*( bar(W)))\operatorname{tr}\left(\bar{W}^{T} \cdot\left(\bar{X}^{T} \cdot \bar{X}\right) \cdot \bar{W}\right)tr(W¯T(X¯TX¯)W¯). Then the PCA problem becomes the following constrained optimization problem.
(12.15) arg max W ¯ tr ( W ¯ T ( X ¯ T X ¯ ) W ¯ ) s.t. W ¯ T W ¯ = I ¯ ( J × J ) (12.15) arg max W ¯ tr W ¯ T X ¯ T X ¯ W ¯  s.t.  W ¯ T W ¯ = I ¯ J × J {:(12.15)arg max_( bar(W))tr( bar(W)^(T)*( bar(X)^(T)*( bar(X)))*( bar(W)))quad" s.t. "quad bar(W)^(T)* bar(W)= bar(I)_((J^(')xxJ^('))):}\begin{equation*} \underset{\bar{W}}{\arg \max } \operatorname{tr}\left(\bar{W}^{T} \cdot\left(\bar{X}^{T} \cdot \bar{X}\right) \cdot \bar{W}\right) \quad \text { s.t. } \quad \bar{W}^{T} \cdot \bar{W}=\bar{I}_{\left(J^{\prime} \times J^{\prime}\right)} \tag{12.15} \end{equation*}(12.15)argmaxW¯tr(W¯T(X¯TX¯)W¯) s.t. W¯TW¯=I¯(J×J)
The above equation aiming to maximize the variance can be reformulated as a minimization problem:
(12.16) arg min W ¯ tr ( W ¯ T ( X ¯ T X ¯ ) W ¯ ) s.t. W ¯ T W ¯ = I ¯ ( J × J ) (12.16) arg min W ¯ tr W ¯ T X ¯ T X ¯ W ¯  s.t.  W ¯ T W ¯ = I ¯ J × J {:(12.16)arg min_( bar(W))quad-tr( bar(W)^(T)*( bar(X)^(T)*( bar(X)))*( bar(W)))quad" s.t. "quad bar(W)^(T)* bar(W)= bar(I)_((J^(')xxJ^('))):}\begin{equation*} \underset{\bar{W}}{\arg \min } \quad-\operatorname{tr}\left(\bar{W}^{T} \cdot\left(\bar{X}^{T} \cdot \bar{X}\right) \cdot \bar{W}\right) \quad \text { s.t. } \quad \bar{W}^{T} \cdot \bar{W}=\bar{I}_{\left(J^{\prime} \times J^{\prime}\right)} \tag{12.16} \end{equation*}(12.16)argminW¯tr(W¯T(X¯TX¯)W¯) s.t. W¯TW¯=I¯(J×J)
Similarly, we can convert the above constrained optimization problem into an unconstrained optimization problem with the following objective function using the method of Lagrangian multipliers:
(12.17) J ( W ¯ ) = tr ( W ¯ T ( X ¯ T X ¯ ) W ¯ + λ ( W ¯ T W ¯ I ¯ ( J × J ) ) ) (12.17) J ( W ¯ ) = tr W ¯ T X ¯ T X ¯ W ¯ + λ W ¯ T W ¯ I ¯ J × J {:(12.17)J( bar(W))=-tr( bar(W)^(T)*( bar(X)^(T)*( bar(X)))*( bar(W))+lambda( bar(W)^(T)*( bar(W))- bar(I)_((J^(')xxJ^('))))):}\begin{equation*} J(\bar{W})=-\operatorname{tr}\left(\bar{W}^{T} \cdot\left(\bar{X}^{T} \cdot \bar{X}\right) \cdot \bar{W}+\lambda\left(\bar{W}^{T} \cdot \bar{W}-\bar{I}_{\left(J^{\prime} \times J^{\prime}\right)}\right)\right) \tag{12.17} \end{equation*}(12.17)J(W¯)=tr(W¯T(X¯TX¯)W¯+λ(W¯TW¯I¯(J×J)))
Then we can get the following solution to the optimization problem.
(12.18) ( X ¯ T X ¯ ) ( J × J ) W ¯ ( J × J ) = λ W ¯ ( J × J ) (12.18) X ¯ T X ¯ ( J × J ) W ¯ J × J = λ W ¯ J × J {:(12.18)( bar(X)^(T)*( bar(X)))_((J xx J))* bar(W)_((J xxJ^(')))=lambda bar(W)_((J xxJ^('))):}\begin{equation*} \left(\bar{X}^{T} \cdot \bar{X}\right)_{(J \times J)} \cdot \bar{W}_{\left(J \times J^{\prime}\right)}=\lambda \bar{W}_{\left(J \times J^{\prime}\right)} \tag{12.18} \end{equation*}(12.18)(X¯TX¯)(J×J)W¯(J×J)=λW¯(J×J)
This result is the same as what was deduced based on the minimum distance criterion.

12.4.3. Implementation

From the above introduction to PCA's theoretical basis, we can see that the major step in implementing PCA is to find eigenvalues and eigenvectors of the covariance matrix, X X T X X T X*X^(T)X \cdot X^{T}XXT. To reduce the dimensionality, we adopt the major eigenvalues (with bigger values) and their corresponding eigenvectors. Then, use the projection made of the selected eigenvectors to project the data from the high-dimensional space to the low-dimensional one. The detailed implementation procedure is outlined using the following pseudo-code.

PCA:

Input: a set of data with I I III samples and J J JJJ attributes: X ¯ ( I × J ) = [ x 1 T , x 2 T , , x I T ] T X ¯ ( I × J ) = x 1 T , x 2 T , , x I T T bar(X)_((I xx J))=[ vec(x)_(1)^(T), vec(x)_(2)^(T),cdots, vec(x)_(I)^(T)]^(T)\bar{X}_{(I \times J)}=\left[\vec{x}_{1}^{T}, \vec{x}_{2}^{T}, \cdots, \vec{x}_{I}^{T}\right]^{T}X¯(I×J)=[x1T,x2T,,xIT]T.
Output: a set of data with I I III samples and reduced dimensionality ( J J J^(')J^{\prime}J attributes): Z ( I × J ) Z I × J Z_((I xxJ^(')))Z_{\left(I \times J^{\prime}\right)}Z(I×J).
Perform zero-centering for all the data samples: x i x i 1 I i = 1 I x i x i x i 1 I i = 1 I x i vec(x)_(i)larr vec(x)_(i)-(1)/(I)sum_(i=1)^(I) vec(x)_(i)\vec{x}_{i} \leftarrow \vec{x}_{i}-\frac{1}{I} \sum_{i=1}^{I} \vec{x}_{i}xixi1Ii=1Ixi
Calculate the covariance matrix ( X ¯ T X ¯ ) ( J × J ) X ¯ T X ¯ ( J × J ) ( bar(X)^(T)*( bar(X)))_((J xx J))\left(\bar{X}^{T} \cdot \bar{X}\right)_{(J \times J)}(X¯TX¯)(J×J)
Obtain the eigenvalues and eigenvectors of X ¯ T X ¯ X ¯ T X ¯ bar(X)^(T)* bar(X)\bar{X}^{T} \cdot \bar{X}X¯TX¯
Select the J J J^(')J^{\prime}J highest eigenvalues and their corresponding eigenvectors. Form the projection array W ¯ W ¯ bar(W)\bar{W}W¯ with the selected eigenvectors. These eigenvectors need to be standardized (if not) before use.
Project the data to the lower space to reduce its dimensionality: Z ¯ = X ¯ W ¯ Z ¯ = X ¯ W ¯ bar(Z)= bar(X)* bar(W)\bar{Z}=\bar{X} \cdot \bar{W}Z¯=X¯W¯ or z i = W T x i z i = W T x i vec(z)_(i)=W^(T)* vec(x)_(i)\vec{z}_{i}=W^{T} \cdot \vec{x}_{i}zi=WTxi
Sometimes, we do not specify a J J J^(')J^{\prime}J value. Instead, we give out a threshold value t t ttt, which is defined as follows. This offers another way to define how hard the dimensionality will be reduced.
(12.19) j = 1 J λ j j = 1 J λ j t (12.19) j = 1 J λ j j = 1 J λ j t {:(12.19)(sum_(j=1)^(J^('))lambda_(j))/(sum_(j=1)^(J)lambda_(j)) >= t:}\begin{equation*} \frac{\sum_{j=1}^{J^{\prime}} \lambda_{j}}{\sum_{j=1}^{J} \lambda_{j}} \geqslant t \tag{12.19} \end{equation*}(12.19)j=1Jλjj=1Jλjt
The following is a simple example of practicing PCA.
We have a dataset consisting of ten 2D data points: ( 2.5 , 2.4 ) , ( 0.5 , 0.7 ) , ( 2.2 , 2.9 ) , ( 1.9 , 2.2 ) , ( 3.1 , 3.0 ) , ( 2.3 , 2.7 ) , ( 2 ( 2.5 , 2.4 ) , ( 0.5 , 0.7 ) , ( 2.2 , 2.9 ) , ( 1.9 , 2.2 ) , ( 3.1 , 3.0 ) , ( 2.3 , 2.7 ) , ( 2 (2.5,2.4),(0.5,0.7),(2.2,2.9),(1.9,2.2),(3.1,3.0),(2.3,2.7),(2(2.5,2.4),(0.5,0.7),(2.2,2.9),(1.9,2.2),(3.1,3.0),(2.3,2.7),(2(2.5,2.4),(0.5,0.7),(2.2,2.9),(1.9,2.2),(3.1,3.0),(2.3,2.7),(2, 1.6 ) , ( 1 , 1.1 ) , ( 1.5 , 1.6 ) , ( 1.1 , 0.9 ) 1.6 ) , ( 1 , 1.1 ) , ( 1.5 , 1.6 ) , ( 1.1 , 0.9 ) 1.6),(1,1.1),(1.5,1.6),(1.1,0.9)1.6),(1,1.1),(1.5,1.6),(1.1,0.9)1.6),(1,1.1),(1.5,1.6),(1.1,0.9). The goal is to use PCA to reduce the dimensionality of the data to 1 .
First, we carry out zero-centering to subtract the mean of the two features, i.e., ( 1.81 , 1.91 ) ( 1.81 , 1.91 ) (1.81,1.91)(1.81,1.91)(1.81,1.91), from all the data points. The zero-centered data is ( 0.69 , 0.49 ) , ( 1.31 , 1.21 ) , ( 0.39 , 0.99 ) , ( 0.09 , 0.29 ) , ( 1.29 , 1.09 ) , ( 0.49 , 0.79 ) , ( 0.19 , 0.31 ) ( 0.69 , 0.49 ) , ( 1.31 , 1.21 ) , ( 0.39 , 0.99 ) , ( 0.09 , 0.29 ) , ( 1.29 , 1.09 ) , ( 0.49 , 0.79 ) , ( 0.19 , 0.31 ) (0.69,0.49),(-1.31,-1.21),(0.39,0.99),(0.09,0.29),(1.29,1.09),(0.49,0.79),(0.19,-0.31)(0.69,0.49),(-1.31,-1.21),(0.39,0.99),(0.09,0.29),(1.29,1.09),(0.49,0.79),(0.19,-0.31)(0.69,0.49),(1.31,1.21),(0.39,0.99),(0.09,0.29),(1.29,1.09),(0.49,0.79),(0.19,0.31), ( 0.81 , 0.81 ) , ( 0.31 , 0.31 ) , ( 0.71 , 1.01 ) ( 0.81 , 0.81 ) , ( 0.31 , 0.31 ) , ( 0.71 , 1.01 ) (-0.81,-0.81),(-0.31,-0.31),(-0.71,-1.01)(-0.81,-0.81),(-0.31,-0.31),(-0.71,-1.01)(0.81,0.81),(0.31,0.31),(0.71,1.01).
Next, we calculate the covariance matrix, which is as follows:
(12.20) ( X ¯ T X ¯ ) = [ 5.549 5.539 5.539 6.449 ] (12.20) X ¯ T X ¯ = 5.549 5.539 5.539 6.449 {:(12.20)( bar(X)^(T)*( bar(X)))=[[5.549,5.539],[5.539,6.449]]:}\left(\bar{X}^{T} \cdot \bar{X}\right)=\left[\begin{array}{ll} 5.549 & 5.539 \tag{12.20}\\ 5.539 & 6.449 \end{array}\right](12.20)(X¯TX¯)=[5.5495.5395.5396.449]
The eigenvalues of the matrix are [11.55624941, 0.44175059 ], and the corresponding eigenvectors are [-0.6778734, 0.73517866 ] T 0.73517866 ] T 0.73517866]^(T)0.73517866]^{T}0.73517866]T and [ 0.73517866 , 0.6778734 ] T [ 0.73517866 , 0.6778734 ] T [-0.73517866,0.6778734]^(T)[-0.73517866,0.6778734]^{T}[0.73517866,0.6778734]T. So we selected [ 0.6778734 , 0.73517866 ] T [ 0.6778734 , 0.73517866 ] T [-0.6778734,-0.73517866]^(T)[-0.6778734,-0.73517866]^{T}[0.6778734,0.73517866]T to make W ¯ W ¯ bar(W)\bar{W}W¯. Then we apply Z ¯ = X ¯ W ¯ Z ¯ = X ¯ W ¯ bar(Z)= bar(X)* bar(W)\bar{Z}=\bar{X} \cdot \bar{W}Z¯=X¯W¯ and obtain the following ten data points in 1D: [ 0.827970186 , 1.77758033 , 0.992197494 , 0.274210416 [ 0.827970186 , 1.77758033 , 0.992197494 , 0.274210416 [-0.827970186,1.77758033,-0.992197494,-0.274210416[-0.827970186,1.77758033,-0.992197494,-0.274210416[0.827970186,1.77758033,0.992197494,0.274210416, 1.67580142 , 0.912949103 , 0.0991094375 , 1.14457216 , 0.438046137 , 1.22382056 ] 1.67580142 , 0.912949103 , 0.0991094375 , 1.14457216 , 0.438046137 , 1.22382056 ] -1.67580142,-0.912949103,0.0991094375,1.14457216,0.438046137,1.22382056]-1.67580142,-0.912949103,0.0991094375,1.14457216,0.438046137,1.22382056]1.67580142,0.912949103,0.0991094375,1.14457216,0.438046137,1.22382056].

 

 

 

 

 

 

Enjoy and Build the AI World

Sample Code from AI Engineering

Cite the code in your publications

Linear Models