Understanding the formula $P_A = A(A^TA)^{-1}A^T$ for projection
up vote
2
down vote
favorite
In this section: https://en.wikipedia.org/wiki/Projection_(linear_algebra)#Formulas of projections, it presents some formulas for the projection onto subspaces, I believe. I only know how to do the projection of a vector into a line.
If u is a unit vector on the line, then the projection is given by the
outer product
$$P_u = uu^T$$
The formula I knew was $$Proj_a(v)=frac{acdot v}{||v||^2}v$$
How is this related to $P_u$ and how to arrive at the formulas
$$P_A = A(A^TA)^{-1}A^T$$ and $$P_A = A(A^TDA)^{-1}A^TD$$?
linear-algebra analytic-geometry
add a comment |
up vote
2
down vote
favorite
In this section: https://en.wikipedia.org/wiki/Projection_(linear_algebra)#Formulas of projections, it presents some formulas for the projection onto subspaces, I believe. I only know how to do the projection of a vector into a line.
If u is a unit vector on the line, then the projection is given by the
outer product
$$P_u = uu^T$$
The formula I knew was $$Proj_a(v)=frac{acdot v}{||v||^2}v$$
How is this related to $P_u$ and how to arrive at the formulas
$$P_A = A(A^TA)^{-1}A^T$$ and $$P_A = A(A^TDA)^{-1}A^TD$$?
linear-algebra analytic-geometry
One nice proof is given in these notes. Note that your formula for $P_A$ is equivalent to saying that the least-squares solution $hat x$ satisfies $$ A^TA hat x = A^Tb $$ In particular, given the least squares solution $hat x = (A^TA)^{-1}A^Tb$, we can compute the projection onto the column space of $A$ of $b$ as $$ P_Ab = A hat x = A (A^TA)^{-1}A^Tb $$
– Omnomnomnom
Oct 19 at 2:28
This is a good explanation.
– Almacomet
Oct 19 at 3:18
1
Whenever there are transposes in matrix equations, you should try to translate them back to dot products to get the geometric interpretation. For example, when you do this for $P_u$, you find that $P_u(v) = u u^T v = u (u cdot v)$, which is a projection operator. (There is no dividing by the norm, since $u$ is already a unit vector). Can you do something similar for $A A^T$, where $A$ is a matrix made up of orthonormal columns?
– Joppy
Oct 19 at 3:38
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
In this section: https://en.wikipedia.org/wiki/Projection_(linear_algebra)#Formulas of projections, it presents some formulas for the projection onto subspaces, I believe. I only know how to do the projection of a vector into a line.
If u is a unit vector on the line, then the projection is given by the
outer product
$$P_u = uu^T$$
The formula I knew was $$Proj_a(v)=frac{acdot v}{||v||^2}v$$
How is this related to $P_u$ and how to arrive at the formulas
$$P_A = A(A^TA)^{-1}A^T$$ and $$P_A = A(A^TDA)^{-1}A^TD$$?
linear-algebra analytic-geometry
In this section: https://en.wikipedia.org/wiki/Projection_(linear_algebra)#Formulas of projections, it presents some formulas for the projection onto subspaces, I believe. I only know how to do the projection of a vector into a line.
If u is a unit vector on the line, then the projection is given by the
outer product
$$P_u = uu^T$$
The formula I knew was $$Proj_a(v)=frac{acdot v}{||v||^2}v$$
How is this related to $P_u$ and how to arrive at the formulas
$$P_A = A(A^TA)^{-1}A^T$$ and $$P_A = A(A^TDA)^{-1}A^TD$$?
linear-algebra analytic-geometry
linear-algebra analytic-geometry
asked Oct 19 at 2:03
Paprika
56312
56312
One nice proof is given in these notes. Note that your formula for $P_A$ is equivalent to saying that the least-squares solution $hat x$ satisfies $$ A^TA hat x = A^Tb $$ In particular, given the least squares solution $hat x = (A^TA)^{-1}A^Tb$, we can compute the projection onto the column space of $A$ of $b$ as $$ P_Ab = A hat x = A (A^TA)^{-1}A^Tb $$
– Omnomnomnom
Oct 19 at 2:28
This is a good explanation.
– Almacomet
Oct 19 at 3:18
1
Whenever there are transposes in matrix equations, you should try to translate them back to dot products to get the geometric interpretation. For example, when you do this for $P_u$, you find that $P_u(v) = u u^T v = u (u cdot v)$, which is a projection operator. (There is no dividing by the norm, since $u$ is already a unit vector). Can you do something similar for $A A^T$, where $A$ is a matrix made up of orthonormal columns?
– Joppy
Oct 19 at 3:38
add a comment |
One nice proof is given in these notes. Note that your formula for $P_A$ is equivalent to saying that the least-squares solution $hat x$ satisfies $$ A^TA hat x = A^Tb $$ In particular, given the least squares solution $hat x = (A^TA)^{-1}A^Tb$, we can compute the projection onto the column space of $A$ of $b$ as $$ P_Ab = A hat x = A (A^TA)^{-1}A^Tb $$
– Omnomnomnom
Oct 19 at 2:28
This is a good explanation.
– Almacomet
Oct 19 at 3:18
1
Whenever there are transposes in matrix equations, you should try to translate them back to dot products to get the geometric interpretation. For example, when you do this for $P_u$, you find that $P_u(v) = u u^T v = u (u cdot v)$, which is a projection operator. (There is no dividing by the norm, since $u$ is already a unit vector). Can you do something similar for $A A^T$, where $A$ is a matrix made up of orthonormal columns?
– Joppy
Oct 19 at 3:38
One nice proof is given in these notes. Note that your formula for $P_A$ is equivalent to saying that the least-squares solution $hat x$ satisfies $$ A^TA hat x = A^Tb $$ In particular, given the least squares solution $hat x = (A^TA)^{-1}A^Tb$, we can compute the projection onto the column space of $A$ of $b$ as $$ P_Ab = A hat x = A (A^TA)^{-1}A^Tb $$
– Omnomnomnom
Oct 19 at 2:28
One nice proof is given in these notes. Note that your formula for $P_A$ is equivalent to saying that the least-squares solution $hat x$ satisfies $$ A^TA hat x = A^Tb $$ In particular, given the least squares solution $hat x = (A^TA)^{-1}A^Tb$, we can compute the projection onto the column space of $A$ of $b$ as $$ P_Ab = A hat x = A (A^TA)^{-1}A^Tb $$
– Omnomnomnom
Oct 19 at 2:28
This is a good explanation.
– Almacomet
Oct 19 at 3:18
This is a good explanation.
– Almacomet
Oct 19 at 3:18
1
1
Whenever there are transposes in matrix equations, you should try to translate them back to dot products to get the geometric interpretation. For example, when you do this for $P_u$, you find that $P_u(v) = u u^T v = u (u cdot v)$, which is a projection operator. (There is no dividing by the norm, since $u$ is already a unit vector). Can you do something similar for $A A^T$, where $A$ is a matrix made up of orthonormal columns?
– Joppy
Oct 19 at 3:38
Whenever there are transposes in matrix equations, you should try to translate them back to dot products to get the geometric interpretation. For example, when you do this for $P_u$, you find that $P_u(v) = u u^T v = u (u cdot v)$, which is a projection operator. (There is no dividing by the norm, since $u$ is already a unit vector). Can you do something similar for $A A^T$, where $A$ is a matrix made up of orthonormal columns?
– Joppy
Oct 19 at 3:38
add a comment |
2 Answers
2
active
oldest
votes
up vote
3
down vote
The formal proof cited by Omnomnomnom in his comment uses the idea that the orthogonal projection of a vector onto a subspace is the element of that subspace that is nearest the vector. However, you can also view these matrix formulas as successive generalizations of the simpler vector formulas that you know.
Let’s back up a bit first. The projection of $mathbf v$ onto a unit vector $mathbf u$ is $(mathbf vcdotmathbf u)mathbf u$. By treating a $1times1$ matrix as a scalar, this can be written as the matrix product $(mathbf u^T mathbf v)mathbf u$. Strictly speaking, usually only left-multiplication of vectors by scalars is defined for a vector space, but we blithely ignore this and move scalars around at will, so we can further rearrange the previous expression as $(mathbf u mathbf u^T)mathbf v$, giving the first of your projection matrix formulas. Every column of the matrix $mathbf umathbf u^T$ is a multiple of $mathbf u$, so it should be clear that the image of this map is spanned by $mathbf u$.
Now suppose that we have an orthonormal basis ${mathbf u_1,dots,mathbf u_m}$ for some subspace. The orthogonal projection of $mathbf v$ onto this space is simply the sum of the individual projections onto the basis vectors: $$sum_{k=1}^m (mathbf u_k^T mathbf v)mathbf u_k = sum_{k=1}^m mathbf u_k (mathbf u_k^T mathbf v) = begin{bmatrix}mathbf u_1 cdots mathbf u_mend{bmatrix} begin{bmatrix}mathbf u_1^T mathbf v \ vdots \ mathbf u_m^T mathbf vend{bmatrix} = begin{bmatrix}mathbf u_1 cdots mathbf u_mend{bmatrix} begin{bmatrix}mathbf u_1^T \ vdots \ mathbf u_m^Tend{bmatrix} mathbf v.$$ Calling the matrix that has these basis vectors for its columns $U$, this says that the orthogonal projection matrix is $UU^T$.
Another way to interpret the first sum above is that, if we extend this basis to a complete basis of the ambient space, the $k$th coordinate of $mathbf v$ is the dot product $mathbf u_k^Tmathbf v$. In this coordinate system, the projection onto the subspace is then obtained by setting all of the coordinates after the $m$th to zero. Multiplying by $U^T$ computes these dot products in bulk, producing the vector of coordinates of $mathbf v$’s projection relative to this basis, and multiplying by $U$ converts these to the standard basis.
Going back to projection onto a vector, if we want to project onto an arbitrary nonzero vector $mathbf w$, we can apply the first formula by the simple expedient of normalizing $mathbf w$: $$left(mathbf vcdot {mathbf w over |mathbf w|}right) {mathbf w over |mathbf w|} = {mathbf vcdotmathbf w over mathbf wcdotmathbf w}mathbf w = left({mathbf w mathbf w^T over mathbf w^Tmathbf w}right) mathbf v.$$ The numerator of the parenthesized expression is a square matrix as before, while the denominator is a scalar—the square of $mathbf w$’s norm. Similarly, if we have an orthogonal basis ${mathbf w_1,dots,mathbf w_m}$ for a subspace, we can add up the individual projections: $$sum_{k=1}^m mathbf w_k {1overmathbf w_k^Tmathbf w_k}(mathbf w_k^Tmathbf v) = begin{bmatrix}mathbf w_1&cdots&mathbf w_mend{bmatrix} begin{bmatrix} frac1{mathbf w_1^Tmathbf w_1} &&0 \ &ddots& \ 0&& frac1{mathbf w_m^Tmathbf w_m} end{bmatrix} begin{bmatrix}mathbf w_1^T\vdots\mathbf w_m^Tend{bmatrix} mathbf v = W(W^TW)^{-1}W^Tmathbf v.$$ The new factor $(W^TW)^{-1}$ is a diagonal matrix that serves to normalize the basis vectors. (Its off-diagonal elements are all zero since the basis is orthogonal.) Observe that the shape of this formula is essentially the same as that for the projection onto a single vector. Also, we can interpret the parts of the formula the same way as we did for the orthonormal basis: $(W^TW)^{-1}W^Tmathbf v$ computes the $W$-coordinates of the projection of $mathbf v$ and multiplying by $W$ converts them to the standard basis.
Continuing on to an arbitrary basis for the subspace, we can no longer simply take dot products, suitably normalized, with the basis vectors to find the $W$-coordinates of the projection of $mathbf v$. The coordinate directions are no longer independent in the sense that moving in the direction of one of the basis vectors can change the components in other basis vector directions, too, so we need a way to undo all of the “cross-talk” among them. The Gram matrix $W^TW$ consists of all of the pairwise dot products of these basis vectors, and it turns out that multiplying by its inverse untangles the interactions among them: the formula for the orthogonal projection matrix in the previous paragraph also works for any basis of the subspace. (Almacomet links to a nice blog post about this in his comment.) In fact, the projection matrix that results from this formula is independent of the choice of basis for the subspace.
Finally, what if we’re using something other than the standard Euclidean inner product, or equivalently, working with a non-orthonormal basis for the ambient space? In that case, there’s some fixed positive-definite matrix $A$ such that $langlemathbf v,mathbf wrangle = mathbf w^TAmathbf v$. If you incorporate this into the above derivations, you’ll find that the orthogonal projection of $mathbf v$ relative to this inner product is $$W(W^TAW)^{-1}W^TA mathbf v:$$ as one might have expected, the matrix $A$ appears in the formula in places that correspond to dot products.
add a comment |
up vote
1
down vote
If $mathbf W$ is a finite-dimensional subspace of an inner product space $mathbf V$, and if $mathbf b$ is a vector in $mathbf V$, then $Proj_W(mathbf b)$ is the best approximation to $mathbf b$ from $mathbf W$ in the sense that $$leftlVert b-Proj_b(W)rightrVert lt leftlVert b-wrightrVert $$
for every vector $mathbf w$ in $mathbf W$ that is different from $Proj_W(mathbf b)$
Hence, one way to find the least squares solution of inconsistent linear system $Amathbf x =mathbf b$ is to calculate the orthogonal projection $Proj_W(b)$ on the column space W of A and solving the equation $$ Amathbf x = Proj_W(mathbf b)$$
$$mathbf b-Amathbf x=mathbf b-Proj_W(mathbf b)$$ and then multiplying both sides of above equation with $mathbf A^mathbf T$ $$A^T(mathbf b-Amathbf x)=A^T(mathbf b-Proj_W(mathbf b))$$ Since $mathbf b-Proj_W(mathbf b)$ is the complement of b that is orthogonal to the column space of A implies $mathbf b-Proj_W(mathbf b)$ is in null space of $A^T$(because null space of $A^T$ and column space of A are orthogonal complements to each other) hence $$A^T(mathbf b-Proj_W(mathbf b))=mathbf 0$$ Therefore, $$A^T(mathbf b-Amathbf x)= mathbf 0$$ which we can rewrite as $$ A^TAmathbf x=A^Tmathbf b$$ $$mathbf x=(A^TA)^{-1}A^Tmathbf b$$ Therefore, $$Proj_W(mathbf b)=Amathbf x=A(A^TA)^{-1}A^Tmathbf b$$
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
The formal proof cited by Omnomnomnom in his comment uses the idea that the orthogonal projection of a vector onto a subspace is the element of that subspace that is nearest the vector. However, you can also view these matrix formulas as successive generalizations of the simpler vector formulas that you know.
Let’s back up a bit first. The projection of $mathbf v$ onto a unit vector $mathbf u$ is $(mathbf vcdotmathbf u)mathbf u$. By treating a $1times1$ matrix as a scalar, this can be written as the matrix product $(mathbf u^T mathbf v)mathbf u$. Strictly speaking, usually only left-multiplication of vectors by scalars is defined for a vector space, but we blithely ignore this and move scalars around at will, so we can further rearrange the previous expression as $(mathbf u mathbf u^T)mathbf v$, giving the first of your projection matrix formulas. Every column of the matrix $mathbf umathbf u^T$ is a multiple of $mathbf u$, so it should be clear that the image of this map is spanned by $mathbf u$.
Now suppose that we have an orthonormal basis ${mathbf u_1,dots,mathbf u_m}$ for some subspace. The orthogonal projection of $mathbf v$ onto this space is simply the sum of the individual projections onto the basis vectors: $$sum_{k=1}^m (mathbf u_k^T mathbf v)mathbf u_k = sum_{k=1}^m mathbf u_k (mathbf u_k^T mathbf v) = begin{bmatrix}mathbf u_1 cdots mathbf u_mend{bmatrix} begin{bmatrix}mathbf u_1^T mathbf v \ vdots \ mathbf u_m^T mathbf vend{bmatrix} = begin{bmatrix}mathbf u_1 cdots mathbf u_mend{bmatrix} begin{bmatrix}mathbf u_1^T \ vdots \ mathbf u_m^Tend{bmatrix} mathbf v.$$ Calling the matrix that has these basis vectors for its columns $U$, this says that the orthogonal projection matrix is $UU^T$.
Another way to interpret the first sum above is that, if we extend this basis to a complete basis of the ambient space, the $k$th coordinate of $mathbf v$ is the dot product $mathbf u_k^Tmathbf v$. In this coordinate system, the projection onto the subspace is then obtained by setting all of the coordinates after the $m$th to zero. Multiplying by $U^T$ computes these dot products in bulk, producing the vector of coordinates of $mathbf v$’s projection relative to this basis, and multiplying by $U$ converts these to the standard basis.
Going back to projection onto a vector, if we want to project onto an arbitrary nonzero vector $mathbf w$, we can apply the first formula by the simple expedient of normalizing $mathbf w$: $$left(mathbf vcdot {mathbf w over |mathbf w|}right) {mathbf w over |mathbf w|} = {mathbf vcdotmathbf w over mathbf wcdotmathbf w}mathbf w = left({mathbf w mathbf w^T over mathbf w^Tmathbf w}right) mathbf v.$$ The numerator of the parenthesized expression is a square matrix as before, while the denominator is a scalar—the square of $mathbf w$’s norm. Similarly, if we have an orthogonal basis ${mathbf w_1,dots,mathbf w_m}$ for a subspace, we can add up the individual projections: $$sum_{k=1}^m mathbf w_k {1overmathbf w_k^Tmathbf w_k}(mathbf w_k^Tmathbf v) = begin{bmatrix}mathbf w_1&cdots&mathbf w_mend{bmatrix} begin{bmatrix} frac1{mathbf w_1^Tmathbf w_1} &&0 \ &ddots& \ 0&& frac1{mathbf w_m^Tmathbf w_m} end{bmatrix} begin{bmatrix}mathbf w_1^T\vdots\mathbf w_m^Tend{bmatrix} mathbf v = W(W^TW)^{-1}W^Tmathbf v.$$ The new factor $(W^TW)^{-1}$ is a diagonal matrix that serves to normalize the basis vectors. (Its off-diagonal elements are all zero since the basis is orthogonal.) Observe that the shape of this formula is essentially the same as that for the projection onto a single vector. Also, we can interpret the parts of the formula the same way as we did for the orthonormal basis: $(W^TW)^{-1}W^Tmathbf v$ computes the $W$-coordinates of the projection of $mathbf v$ and multiplying by $W$ converts them to the standard basis.
Continuing on to an arbitrary basis for the subspace, we can no longer simply take dot products, suitably normalized, with the basis vectors to find the $W$-coordinates of the projection of $mathbf v$. The coordinate directions are no longer independent in the sense that moving in the direction of one of the basis vectors can change the components in other basis vector directions, too, so we need a way to undo all of the “cross-talk” among them. The Gram matrix $W^TW$ consists of all of the pairwise dot products of these basis vectors, and it turns out that multiplying by its inverse untangles the interactions among them: the formula for the orthogonal projection matrix in the previous paragraph also works for any basis of the subspace. (Almacomet links to a nice blog post about this in his comment.) In fact, the projection matrix that results from this formula is independent of the choice of basis for the subspace.
Finally, what if we’re using something other than the standard Euclidean inner product, or equivalently, working with a non-orthonormal basis for the ambient space? In that case, there’s some fixed positive-definite matrix $A$ such that $langlemathbf v,mathbf wrangle = mathbf w^TAmathbf v$. If you incorporate this into the above derivations, you’ll find that the orthogonal projection of $mathbf v$ relative to this inner product is $$W(W^TAW)^{-1}W^TA mathbf v:$$ as one might have expected, the matrix $A$ appears in the formula in places that correspond to dot products.
add a comment |
up vote
3
down vote
The formal proof cited by Omnomnomnom in his comment uses the idea that the orthogonal projection of a vector onto a subspace is the element of that subspace that is nearest the vector. However, you can also view these matrix formulas as successive generalizations of the simpler vector formulas that you know.
Let’s back up a bit first. The projection of $mathbf v$ onto a unit vector $mathbf u$ is $(mathbf vcdotmathbf u)mathbf u$. By treating a $1times1$ matrix as a scalar, this can be written as the matrix product $(mathbf u^T mathbf v)mathbf u$. Strictly speaking, usually only left-multiplication of vectors by scalars is defined for a vector space, but we blithely ignore this and move scalars around at will, so we can further rearrange the previous expression as $(mathbf u mathbf u^T)mathbf v$, giving the first of your projection matrix formulas. Every column of the matrix $mathbf umathbf u^T$ is a multiple of $mathbf u$, so it should be clear that the image of this map is spanned by $mathbf u$.
Now suppose that we have an orthonormal basis ${mathbf u_1,dots,mathbf u_m}$ for some subspace. The orthogonal projection of $mathbf v$ onto this space is simply the sum of the individual projections onto the basis vectors: $$sum_{k=1}^m (mathbf u_k^T mathbf v)mathbf u_k = sum_{k=1}^m mathbf u_k (mathbf u_k^T mathbf v) = begin{bmatrix}mathbf u_1 cdots mathbf u_mend{bmatrix} begin{bmatrix}mathbf u_1^T mathbf v \ vdots \ mathbf u_m^T mathbf vend{bmatrix} = begin{bmatrix}mathbf u_1 cdots mathbf u_mend{bmatrix} begin{bmatrix}mathbf u_1^T \ vdots \ mathbf u_m^Tend{bmatrix} mathbf v.$$ Calling the matrix that has these basis vectors for its columns $U$, this says that the orthogonal projection matrix is $UU^T$.
Another way to interpret the first sum above is that, if we extend this basis to a complete basis of the ambient space, the $k$th coordinate of $mathbf v$ is the dot product $mathbf u_k^Tmathbf v$. In this coordinate system, the projection onto the subspace is then obtained by setting all of the coordinates after the $m$th to zero. Multiplying by $U^T$ computes these dot products in bulk, producing the vector of coordinates of $mathbf v$’s projection relative to this basis, and multiplying by $U$ converts these to the standard basis.
Going back to projection onto a vector, if we want to project onto an arbitrary nonzero vector $mathbf w$, we can apply the first formula by the simple expedient of normalizing $mathbf w$: $$left(mathbf vcdot {mathbf w over |mathbf w|}right) {mathbf w over |mathbf w|} = {mathbf vcdotmathbf w over mathbf wcdotmathbf w}mathbf w = left({mathbf w mathbf w^T over mathbf w^Tmathbf w}right) mathbf v.$$ The numerator of the parenthesized expression is a square matrix as before, while the denominator is a scalar—the square of $mathbf w$’s norm. Similarly, if we have an orthogonal basis ${mathbf w_1,dots,mathbf w_m}$ for a subspace, we can add up the individual projections: $$sum_{k=1}^m mathbf w_k {1overmathbf w_k^Tmathbf w_k}(mathbf w_k^Tmathbf v) = begin{bmatrix}mathbf w_1&cdots&mathbf w_mend{bmatrix} begin{bmatrix} frac1{mathbf w_1^Tmathbf w_1} &&0 \ &ddots& \ 0&& frac1{mathbf w_m^Tmathbf w_m} end{bmatrix} begin{bmatrix}mathbf w_1^T\vdots\mathbf w_m^Tend{bmatrix} mathbf v = W(W^TW)^{-1}W^Tmathbf v.$$ The new factor $(W^TW)^{-1}$ is a diagonal matrix that serves to normalize the basis vectors. (Its off-diagonal elements are all zero since the basis is orthogonal.) Observe that the shape of this formula is essentially the same as that for the projection onto a single vector. Also, we can interpret the parts of the formula the same way as we did for the orthonormal basis: $(W^TW)^{-1}W^Tmathbf v$ computes the $W$-coordinates of the projection of $mathbf v$ and multiplying by $W$ converts them to the standard basis.
Continuing on to an arbitrary basis for the subspace, we can no longer simply take dot products, suitably normalized, with the basis vectors to find the $W$-coordinates of the projection of $mathbf v$. The coordinate directions are no longer independent in the sense that moving in the direction of one of the basis vectors can change the components in other basis vector directions, too, so we need a way to undo all of the “cross-talk” among them. The Gram matrix $W^TW$ consists of all of the pairwise dot products of these basis vectors, and it turns out that multiplying by its inverse untangles the interactions among them: the formula for the orthogonal projection matrix in the previous paragraph also works for any basis of the subspace. (Almacomet links to a nice blog post about this in his comment.) In fact, the projection matrix that results from this formula is independent of the choice of basis for the subspace.
Finally, what if we’re using something other than the standard Euclidean inner product, or equivalently, working with a non-orthonormal basis for the ambient space? In that case, there’s some fixed positive-definite matrix $A$ such that $langlemathbf v,mathbf wrangle = mathbf w^TAmathbf v$. If you incorporate this into the above derivations, you’ll find that the orthogonal projection of $mathbf v$ relative to this inner product is $$W(W^TAW)^{-1}W^TA mathbf v:$$ as one might have expected, the matrix $A$ appears in the formula in places that correspond to dot products.
add a comment |
up vote
3
down vote
up vote
3
down vote
The formal proof cited by Omnomnomnom in his comment uses the idea that the orthogonal projection of a vector onto a subspace is the element of that subspace that is nearest the vector. However, you can also view these matrix formulas as successive generalizations of the simpler vector formulas that you know.
Let’s back up a bit first. The projection of $mathbf v$ onto a unit vector $mathbf u$ is $(mathbf vcdotmathbf u)mathbf u$. By treating a $1times1$ matrix as a scalar, this can be written as the matrix product $(mathbf u^T mathbf v)mathbf u$. Strictly speaking, usually only left-multiplication of vectors by scalars is defined for a vector space, but we blithely ignore this and move scalars around at will, so we can further rearrange the previous expression as $(mathbf u mathbf u^T)mathbf v$, giving the first of your projection matrix formulas. Every column of the matrix $mathbf umathbf u^T$ is a multiple of $mathbf u$, so it should be clear that the image of this map is spanned by $mathbf u$.
Now suppose that we have an orthonormal basis ${mathbf u_1,dots,mathbf u_m}$ for some subspace. The orthogonal projection of $mathbf v$ onto this space is simply the sum of the individual projections onto the basis vectors: $$sum_{k=1}^m (mathbf u_k^T mathbf v)mathbf u_k = sum_{k=1}^m mathbf u_k (mathbf u_k^T mathbf v) = begin{bmatrix}mathbf u_1 cdots mathbf u_mend{bmatrix} begin{bmatrix}mathbf u_1^T mathbf v \ vdots \ mathbf u_m^T mathbf vend{bmatrix} = begin{bmatrix}mathbf u_1 cdots mathbf u_mend{bmatrix} begin{bmatrix}mathbf u_1^T \ vdots \ mathbf u_m^Tend{bmatrix} mathbf v.$$ Calling the matrix that has these basis vectors for its columns $U$, this says that the orthogonal projection matrix is $UU^T$.
Another way to interpret the first sum above is that, if we extend this basis to a complete basis of the ambient space, the $k$th coordinate of $mathbf v$ is the dot product $mathbf u_k^Tmathbf v$. In this coordinate system, the projection onto the subspace is then obtained by setting all of the coordinates after the $m$th to zero. Multiplying by $U^T$ computes these dot products in bulk, producing the vector of coordinates of $mathbf v$’s projection relative to this basis, and multiplying by $U$ converts these to the standard basis.
Going back to projection onto a vector, if we want to project onto an arbitrary nonzero vector $mathbf w$, we can apply the first formula by the simple expedient of normalizing $mathbf w$: $$left(mathbf vcdot {mathbf w over |mathbf w|}right) {mathbf w over |mathbf w|} = {mathbf vcdotmathbf w over mathbf wcdotmathbf w}mathbf w = left({mathbf w mathbf w^T over mathbf w^Tmathbf w}right) mathbf v.$$ The numerator of the parenthesized expression is a square matrix as before, while the denominator is a scalar—the square of $mathbf w$’s norm. Similarly, if we have an orthogonal basis ${mathbf w_1,dots,mathbf w_m}$ for a subspace, we can add up the individual projections: $$sum_{k=1}^m mathbf w_k {1overmathbf w_k^Tmathbf w_k}(mathbf w_k^Tmathbf v) = begin{bmatrix}mathbf w_1&cdots&mathbf w_mend{bmatrix} begin{bmatrix} frac1{mathbf w_1^Tmathbf w_1} &&0 \ &ddots& \ 0&& frac1{mathbf w_m^Tmathbf w_m} end{bmatrix} begin{bmatrix}mathbf w_1^T\vdots\mathbf w_m^Tend{bmatrix} mathbf v = W(W^TW)^{-1}W^Tmathbf v.$$ The new factor $(W^TW)^{-1}$ is a diagonal matrix that serves to normalize the basis vectors. (Its off-diagonal elements are all zero since the basis is orthogonal.) Observe that the shape of this formula is essentially the same as that for the projection onto a single vector. Also, we can interpret the parts of the formula the same way as we did for the orthonormal basis: $(W^TW)^{-1}W^Tmathbf v$ computes the $W$-coordinates of the projection of $mathbf v$ and multiplying by $W$ converts them to the standard basis.
Continuing on to an arbitrary basis for the subspace, we can no longer simply take dot products, suitably normalized, with the basis vectors to find the $W$-coordinates of the projection of $mathbf v$. The coordinate directions are no longer independent in the sense that moving in the direction of one of the basis vectors can change the components in other basis vector directions, too, so we need a way to undo all of the “cross-talk” among them. The Gram matrix $W^TW$ consists of all of the pairwise dot products of these basis vectors, and it turns out that multiplying by its inverse untangles the interactions among them: the formula for the orthogonal projection matrix in the previous paragraph also works for any basis of the subspace. (Almacomet links to a nice blog post about this in his comment.) In fact, the projection matrix that results from this formula is independent of the choice of basis for the subspace.
Finally, what if we’re using something other than the standard Euclidean inner product, or equivalently, working with a non-orthonormal basis for the ambient space? In that case, there’s some fixed positive-definite matrix $A$ such that $langlemathbf v,mathbf wrangle = mathbf w^TAmathbf v$. If you incorporate this into the above derivations, you’ll find that the orthogonal projection of $mathbf v$ relative to this inner product is $$W(W^TAW)^{-1}W^TA mathbf v:$$ as one might have expected, the matrix $A$ appears in the formula in places that correspond to dot products.
The formal proof cited by Omnomnomnom in his comment uses the idea that the orthogonal projection of a vector onto a subspace is the element of that subspace that is nearest the vector. However, you can also view these matrix formulas as successive generalizations of the simpler vector formulas that you know.
Let’s back up a bit first. The projection of $mathbf v$ onto a unit vector $mathbf u$ is $(mathbf vcdotmathbf u)mathbf u$. By treating a $1times1$ matrix as a scalar, this can be written as the matrix product $(mathbf u^T mathbf v)mathbf u$. Strictly speaking, usually only left-multiplication of vectors by scalars is defined for a vector space, but we blithely ignore this and move scalars around at will, so we can further rearrange the previous expression as $(mathbf u mathbf u^T)mathbf v$, giving the first of your projection matrix formulas. Every column of the matrix $mathbf umathbf u^T$ is a multiple of $mathbf u$, so it should be clear that the image of this map is spanned by $mathbf u$.
Now suppose that we have an orthonormal basis ${mathbf u_1,dots,mathbf u_m}$ for some subspace. The orthogonal projection of $mathbf v$ onto this space is simply the sum of the individual projections onto the basis vectors: $$sum_{k=1}^m (mathbf u_k^T mathbf v)mathbf u_k = sum_{k=1}^m mathbf u_k (mathbf u_k^T mathbf v) = begin{bmatrix}mathbf u_1 cdots mathbf u_mend{bmatrix} begin{bmatrix}mathbf u_1^T mathbf v \ vdots \ mathbf u_m^T mathbf vend{bmatrix} = begin{bmatrix}mathbf u_1 cdots mathbf u_mend{bmatrix} begin{bmatrix}mathbf u_1^T \ vdots \ mathbf u_m^Tend{bmatrix} mathbf v.$$ Calling the matrix that has these basis vectors for its columns $U$, this says that the orthogonal projection matrix is $UU^T$.
Another way to interpret the first sum above is that, if we extend this basis to a complete basis of the ambient space, the $k$th coordinate of $mathbf v$ is the dot product $mathbf u_k^Tmathbf v$. In this coordinate system, the projection onto the subspace is then obtained by setting all of the coordinates after the $m$th to zero. Multiplying by $U^T$ computes these dot products in bulk, producing the vector of coordinates of $mathbf v$’s projection relative to this basis, and multiplying by $U$ converts these to the standard basis.
Going back to projection onto a vector, if we want to project onto an arbitrary nonzero vector $mathbf w$, we can apply the first formula by the simple expedient of normalizing $mathbf w$: $$left(mathbf vcdot {mathbf w over |mathbf w|}right) {mathbf w over |mathbf w|} = {mathbf vcdotmathbf w over mathbf wcdotmathbf w}mathbf w = left({mathbf w mathbf w^T over mathbf w^Tmathbf w}right) mathbf v.$$ The numerator of the parenthesized expression is a square matrix as before, while the denominator is a scalar—the square of $mathbf w$’s norm. Similarly, if we have an orthogonal basis ${mathbf w_1,dots,mathbf w_m}$ for a subspace, we can add up the individual projections: $$sum_{k=1}^m mathbf w_k {1overmathbf w_k^Tmathbf w_k}(mathbf w_k^Tmathbf v) = begin{bmatrix}mathbf w_1&cdots&mathbf w_mend{bmatrix} begin{bmatrix} frac1{mathbf w_1^Tmathbf w_1} &&0 \ &ddots& \ 0&& frac1{mathbf w_m^Tmathbf w_m} end{bmatrix} begin{bmatrix}mathbf w_1^T\vdots\mathbf w_m^Tend{bmatrix} mathbf v = W(W^TW)^{-1}W^Tmathbf v.$$ The new factor $(W^TW)^{-1}$ is a diagonal matrix that serves to normalize the basis vectors. (Its off-diagonal elements are all zero since the basis is orthogonal.) Observe that the shape of this formula is essentially the same as that for the projection onto a single vector. Also, we can interpret the parts of the formula the same way as we did for the orthonormal basis: $(W^TW)^{-1}W^Tmathbf v$ computes the $W$-coordinates of the projection of $mathbf v$ and multiplying by $W$ converts them to the standard basis.
Continuing on to an arbitrary basis for the subspace, we can no longer simply take dot products, suitably normalized, with the basis vectors to find the $W$-coordinates of the projection of $mathbf v$. The coordinate directions are no longer independent in the sense that moving in the direction of one of the basis vectors can change the components in other basis vector directions, too, so we need a way to undo all of the “cross-talk” among them. The Gram matrix $W^TW$ consists of all of the pairwise dot products of these basis vectors, and it turns out that multiplying by its inverse untangles the interactions among them: the formula for the orthogonal projection matrix in the previous paragraph also works for any basis of the subspace. (Almacomet links to a nice blog post about this in his comment.) In fact, the projection matrix that results from this formula is independent of the choice of basis for the subspace.
Finally, what if we’re using something other than the standard Euclidean inner product, or equivalently, working with a non-orthonormal basis for the ambient space? In that case, there’s some fixed positive-definite matrix $A$ such that $langlemathbf v,mathbf wrangle = mathbf w^TAmathbf v$. If you incorporate this into the above derivations, you’ll find that the orthogonal projection of $mathbf v$ relative to this inner product is $$W(W^TAW)^{-1}W^TA mathbf v:$$ as one might have expected, the matrix $A$ appears in the formula in places that correspond to dot products.
edited Nov 24 at 17:02
answered Nov 24 at 8:34
amd
28.8k21049
28.8k21049
add a comment |
add a comment |
up vote
1
down vote
If $mathbf W$ is a finite-dimensional subspace of an inner product space $mathbf V$, and if $mathbf b$ is a vector in $mathbf V$, then $Proj_W(mathbf b)$ is the best approximation to $mathbf b$ from $mathbf W$ in the sense that $$leftlVert b-Proj_b(W)rightrVert lt leftlVert b-wrightrVert $$
for every vector $mathbf w$ in $mathbf W$ that is different from $Proj_W(mathbf b)$
Hence, one way to find the least squares solution of inconsistent linear system $Amathbf x =mathbf b$ is to calculate the orthogonal projection $Proj_W(b)$ on the column space W of A and solving the equation $$ Amathbf x = Proj_W(mathbf b)$$
$$mathbf b-Amathbf x=mathbf b-Proj_W(mathbf b)$$ and then multiplying both sides of above equation with $mathbf A^mathbf T$ $$A^T(mathbf b-Amathbf x)=A^T(mathbf b-Proj_W(mathbf b))$$ Since $mathbf b-Proj_W(mathbf b)$ is the complement of b that is orthogonal to the column space of A implies $mathbf b-Proj_W(mathbf b)$ is in null space of $A^T$(because null space of $A^T$ and column space of A are orthogonal complements to each other) hence $$A^T(mathbf b-Proj_W(mathbf b))=mathbf 0$$ Therefore, $$A^T(mathbf b-Amathbf x)= mathbf 0$$ which we can rewrite as $$ A^TAmathbf x=A^Tmathbf b$$ $$mathbf x=(A^TA)^{-1}A^Tmathbf b$$ Therefore, $$Proj_W(mathbf b)=Amathbf x=A(A^TA)^{-1}A^Tmathbf b$$
add a comment |
up vote
1
down vote
If $mathbf W$ is a finite-dimensional subspace of an inner product space $mathbf V$, and if $mathbf b$ is a vector in $mathbf V$, then $Proj_W(mathbf b)$ is the best approximation to $mathbf b$ from $mathbf W$ in the sense that $$leftlVert b-Proj_b(W)rightrVert lt leftlVert b-wrightrVert $$
for every vector $mathbf w$ in $mathbf W$ that is different from $Proj_W(mathbf b)$
Hence, one way to find the least squares solution of inconsistent linear system $Amathbf x =mathbf b$ is to calculate the orthogonal projection $Proj_W(b)$ on the column space W of A and solving the equation $$ Amathbf x = Proj_W(mathbf b)$$
$$mathbf b-Amathbf x=mathbf b-Proj_W(mathbf b)$$ and then multiplying both sides of above equation with $mathbf A^mathbf T$ $$A^T(mathbf b-Amathbf x)=A^T(mathbf b-Proj_W(mathbf b))$$ Since $mathbf b-Proj_W(mathbf b)$ is the complement of b that is orthogonal to the column space of A implies $mathbf b-Proj_W(mathbf b)$ is in null space of $A^T$(because null space of $A^T$ and column space of A are orthogonal complements to each other) hence $$A^T(mathbf b-Proj_W(mathbf b))=mathbf 0$$ Therefore, $$A^T(mathbf b-Amathbf x)= mathbf 0$$ which we can rewrite as $$ A^TAmathbf x=A^Tmathbf b$$ $$mathbf x=(A^TA)^{-1}A^Tmathbf b$$ Therefore, $$Proj_W(mathbf b)=Amathbf x=A(A^TA)^{-1}A^Tmathbf b$$
add a comment |
up vote
1
down vote
up vote
1
down vote
If $mathbf W$ is a finite-dimensional subspace of an inner product space $mathbf V$, and if $mathbf b$ is a vector in $mathbf V$, then $Proj_W(mathbf b)$ is the best approximation to $mathbf b$ from $mathbf W$ in the sense that $$leftlVert b-Proj_b(W)rightrVert lt leftlVert b-wrightrVert $$
for every vector $mathbf w$ in $mathbf W$ that is different from $Proj_W(mathbf b)$
Hence, one way to find the least squares solution of inconsistent linear system $Amathbf x =mathbf b$ is to calculate the orthogonal projection $Proj_W(b)$ on the column space W of A and solving the equation $$ Amathbf x = Proj_W(mathbf b)$$
$$mathbf b-Amathbf x=mathbf b-Proj_W(mathbf b)$$ and then multiplying both sides of above equation with $mathbf A^mathbf T$ $$A^T(mathbf b-Amathbf x)=A^T(mathbf b-Proj_W(mathbf b))$$ Since $mathbf b-Proj_W(mathbf b)$ is the complement of b that is orthogonal to the column space of A implies $mathbf b-Proj_W(mathbf b)$ is in null space of $A^T$(because null space of $A^T$ and column space of A are orthogonal complements to each other) hence $$A^T(mathbf b-Proj_W(mathbf b))=mathbf 0$$ Therefore, $$A^T(mathbf b-Amathbf x)= mathbf 0$$ which we can rewrite as $$ A^TAmathbf x=A^Tmathbf b$$ $$mathbf x=(A^TA)^{-1}A^Tmathbf b$$ Therefore, $$Proj_W(mathbf b)=Amathbf x=A(A^TA)^{-1}A^Tmathbf b$$
If $mathbf W$ is a finite-dimensional subspace of an inner product space $mathbf V$, and if $mathbf b$ is a vector in $mathbf V$, then $Proj_W(mathbf b)$ is the best approximation to $mathbf b$ from $mathbf W$ in the sense that $$leftlVert b-Proj_b(W)rightrVert lt leftlVert b-wrightrVert $$
for every vector $mathbf w$ in $mathbf W$ that is different from $Proj_W(mathbf b)$
Hence, one way to find the least squares solution of inconsistent linear system $Amathbf x =mathbf b$ is to calculate the orthogonal projection $Proj_W(b)$ on the column space W of A and solving the equation $$ Amathbf x = Proj_W(mathbf b)$$
$$mathbf b-Amathbf x=mathbf b-Proj_W(mathbf b)$$ and then multiplying both sides of above equation with $mathbf A^mathbf T$ $$A^T(mathbf b-Amathbf x)=A^T(mathbf b-Proj_W(mathbf b))$$ Since $mathbf b-Proj_W(mathbf b)$ is the complement of b that is orthogonal to the column space of A implies $mathbf b-Proj_W(mathbf b)$ is in null space of $A^T$(because null space of $A^T$ and column space of A are orthogonal complements to each other) hence $$A^T(mathbf b-Proj_W(mathbf b))=mathbf 0$$ Therefore, $$A^T(mathbf b-Amathbf x)= mathbf 0$$ which we can rewrite as $$ A^TAmathbf x=A^Tmathbf b$$ $$mathbf x=(A^TA)^{-1}A^Tmathbf b$$ Therefore, $$Proj_W(mathbf b)=Amathbf x=A(A^TA)^{-1}A^Tmathbf b$$
answered Nov 24 at 10:07
Sai Satwik Kuppili
268
268
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2961480%2funderstanding-the-formula-p-a-aata-1at-for-projection%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
One nice proof is given in these notes. Note that your formula for $P_A$ is equivalent to saying that the least-squares solution $hat x$ satisfies $$ A^TA hat x = A^Tb $$ In particular, given the least squares solution $hat x = (A^TA)^{-1}A^Tb$, we can compute the projection onto the column space of $A$ of $b$ as $$ P_Ab = A hat x = A (A^TA)^{-1}A^Tb $$
– Omnomnomnom
Oct 19 at 2:28
This is a good explanation.
– Almacomet
Oct 19 at 3:18
1
Whenever there are transposes in matrix equations, you should try to translate them back to dot products to get the geometric interpretation. For example, when you do this for $P_u$, you find that $P_u(v) = u u^T v = u (u cdot v)$, which is a projection operator. (There is no dividing by the norm, since $u$ is already a unit vector). Can you do something similar for $A A^T$, where $A$ is a matrix made up of orthonormal columns?
– Joppy
Oct 19 at 3:38