Why Does SVD Provide the Least Squares and Least Norm Solution to $ A x = b $?











up vote
15
down vote

favorite
13












I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.



I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?










share|cite|improve this question
























  • Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
    – Royi
    Jun 16 at 18:11















up vote
15
down vote

favorite
13












I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.



I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?










share|cite|improve this question
























  • Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
    – Royi
    Jun 16 at 18:11













up vote
15
down vote

favorite
13









up vote
15
down vote

favorite
13






13





I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.



I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?










share|cite|improve this question















I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.



I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?







linear-algebra optimization svd least-squares






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Aug 21 '17 at 21:05









Royi

3,27512251




3,27512251










asked Oct 15 '14 at 0:19









Ufuk Can Bicici

1,16911027




1,16911027












  • Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
    – Royi
    Jun 16 at 18:11


















  • Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
    – Royi
    Jun 16 at 18:11
















Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
– Royi
Jun 16 at 18:11




Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
– Royi
Jun 16 at 18:11










4 Answers
4






active

oldest

votes

















up vote
10
down vote



accepted










First, consider the problem $Sigma x = b$, where
$$
Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
$$
Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
$$
x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
$$
From there, note that
$$
USigma V^T x = b implies\
Sigma (V^T x ) = U^T b
$$
By the above argument, the least squares solution for $(V^T x)$ is given by
$V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).



I hope you find this explanation sufficient.






share|cite|improve this answer























  • A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
    – Ufuk Can Bicici
    Oct 15 '14 at 5:27










  • Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
    – Ufuk Can Bicici
    Oct 15 '14 at 5:40












  • For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
    – Ufuk Can Bicici
    Oct 15 '14 at 9:51








  • 1




    For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
    – Omnomnomnom
    Oct 15 '14 at 11:18






  • 1




    What if $x_{r+1},dots,x_{n}$ are non-zero?
    – Omnomnomnom
    Oct 15 '14 at 11:48


















up vote
2
down vote













The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.



Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
$$
x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
$$

where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
$$
lVert
mathbf{A} x_{LS} (y)
rVert_{2}^{2} = r^{2}_{min}
$$

for all values of $y$.



What is the vector in this affine space with the smallest length? The length of the solution vectors is
$$
lVert
x_{LS}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b} +
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b}
rVert_{2}^{2}
+
underbrace{lVert
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}}_{y=mathbf{0}}
$$

The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.






share|cite|improve this answer























  • are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
    – glS
    Nov 16 at 13:20










  • @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
    – dantopa
    Nov 19 at 23:40


















up vote
0
down vote













Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.



$A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.



So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?



$U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that



$USigma f_1 = Usigma_1 e_1 ne 0$



$USigma f_2 = Usigma_2 e_2 ne 0$



$USigma f_3 = Usigma_3 e_3 ne 0$



$USigma f_4 = ... = USigma f_8 = 0$



Since $U$ is orthogonal its inverse is $U^T$.



Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:



$AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3



$AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8



And the inverse of $V$ is $V^T$.



$Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.



It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.



The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).



If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.






share|cite|improve this answer




























    up vote
    0
    down vote













    The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:




    1. Rotate the unit circle so it can be stretched along its axis

    2. Stretch each axis to form the ellipse

    3. Rotate again to align the ellipse with the output space of $A$


    To solve for $x$, you reverse this process, starting with $b$.



    Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.



    In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.



    https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf






    share|cite|improve this answer





















      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "69"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














       

      draft saved


      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f974193%2fwhy-does-svd-provide-the-least-squares-and-least-norm-solution-to-a-x-b%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      10
      down vote



      accepted










      First, consider the problem $Sigma x = b$, where
      $$
      Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
      $$
      Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
      $$
      x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
      $$
      From there, note that
      $$
      USigma V^T x = b implies\
      Sigma (V^T x ) = U^T b
      $$
      By the above argument, the least squares solution for $(V^T x)$ is given by
      $V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).



      I hope you find this explanation sufficient.






      share|cite|improve this answer























      • A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:27










      • Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:40












      • For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
        – Ufuk Can Bicici
        Oct 15 '14 at 9:51








      • 1




        For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
        – Omnomnomnom
        Oct 15 '14 at 11:18






      • 1




        What if $x_{r+1},dots,x_{n}$ are non-zero?
        – Omnomnomnom
        Oct 15 '14 at 11:48















      up vote
      10
      down vote



      accepted










      First, consider the problem $Sigma x = b$, where
      $$
      Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
      $$
      Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
      $$
      x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
      $$
      From there, note that
      $$
      USigma V^T x = b implies\
      Sigma (V^T x ) = U^T b
      $$
      By the above argument, the least squares solution for $(V^T x)$ is given by
      $V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).



      I hope you find this explanation sufficient.






      share|cite|improve this answer























      • A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:27










      • Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:40












      • For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
        – Ufuk Can Bicici
        Oct 15 '14 at 9:51








      • 1




        For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
        – Omnomnomnom
        Oct 15 '14 at 11:18






      • 1




        What if $x_{r+1},dots,x_{n}$ are non-zero?
        – Omnomnomnom
        Oct 15 '14 at 11:48













      up vote
      10
      down vote



      accepted







      up vote
      10
      down vote



      accepted






      First, consider the problem $Sigma x = b$, where
      $$
      Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
      $$
      Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
      $$
      x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
      $$
      From there, note that
      $$
      USigma V^T x = b implies\
      Sigma (V^T x ) = U^T b
      $$
      By the above argument, the least squares solution for $(V^T x)$ is given by
      $V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).



      I hope you find this explanation sufficient.






      share|cite|improve this answer














      First, consider the problem $Sigma x = b$, where
      $$
      Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
      $$
      Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
      $$
      x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
      $$
      From there, note that
      $$
      USigma V^T x = b implies\
      Sigma (V^T x ) = U^T b
      $$
      By the above argument, the least squares solution for $(V^T x)$ is given by
      $V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).



      I hope you find this explanation sufficient.







      share|cite|improve this answer














      share|cite|improve this answer



      share|cite|improve this answer








      edited Oct 15 '14 at 11:19

























      answered Oct 15 '14 at 3:13









      Omnomnomnom

      124k788176




      124k788176












      • A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:27










      • Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:40












      • For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
        – Ufuk Can Bicici
        Oct 15 '14 at 9:51








      • 1




        For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
        – Omnomnomnom
        Oct 15 '14 at 11:18






      • 1




        What if $x_{r+1},dots,x_{n}$ are non-zero?
        – Omnomnomnom
        Oct 15 '14 at 11:48


















      • A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:27










      • Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:40












      • For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
        – Ufuk Can Bicici
        Oct 15 '14 at 9:51








      • 1




        For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
        – Omnomnomnom
        Oct 15 '14 at 11:18






      • 1




        What if $x_{r+1},dots,x_{n}$ are non-zero?
        – Omnomnomnom
        Oct 15 '14 at 11:48
















      A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
      – Ufuk Can Bicici
      Oct 15 '14 at 5:27




      A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
      – Ufuk Can Bicici
      Oct 15 '14 at 5:27












      Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
      – Ufuk Can Bicici
      Oct 15 '14 at 5:40






      Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
      – Ufuk Can Bicici
      Oct 15 '14 at 5:40














      For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
      – Ufuk Can Bicici
      Oct 15 '14 at 9:51






      For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
      – Ufuk Can Bicici
      Oct 15 '14 at 9:51






      1




      1




      For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
      – Omnomnomnom
      Oct 15 '14 at 11:18




      For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
      – Omnomnomnom
      Oct 15 '14 at 11:18




      1




      1




      What if $x_{r+1},dots,x_{n}$ are non-zero?
      – Omnomnomnom
      Oct 15 '14 at 11:48




      What if $x_{r+1},dots,x_{n}$ are non-zero?
      – Omnomnomnom
      Oct 15 '14 at 11:48










      up vote
      2
      down vote













      The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.



      Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
      $$
      x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
      $$

      where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
      The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
      $$
      lVert
      mathbf{A} x_{LS} (y)
      rVert_{2}^{2} = r^{2}_{min}
      $$

      for all values of $y$.



      What is the vector in this affine space with the smallest length? The length of the solution vectors is
      $$
      lVert
      x_{LS}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b} +
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b}
      rVert_{2}^{2}
      +
      underbrace{lVert
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}}_{y=mathbf{0}}
      $$

      The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.






      share|cite|improve this answer























      • are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
        – glS
        Nov 16 at 13:20










      • @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
        – dantopa
        Nov 19 at 23:40















      up vote
      2
      down vote













      The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.



      Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
      $$
      x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
      $$

      where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
      The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
      $$
      lVert
      mathbf{A} x_{LS} (y)
      rVert_{2}^{2} = r^{2}_{min}
      $$

      for all values of $y$.



      What is the vector in this affine space with the smallest length? The length of the solution vectors is
      $$
      lVert
      x_{LS}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b} +
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b}
      rVert_{2}^{2}
      +
      underbrace{lVert
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}}_{y=mathbf{0}}
      $$

      The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.






      share|cite|improve this answer























      • are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
        – glS
        Nov 16 at 13:20










      • @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
        – dantopa
        Nov 19 at 23:40













      up vote
      2
      down vote










      up vote
      2
      down vote









      The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.



      Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
      $$
      x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
      $$

      where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
      The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
      $$
      lVert
      mathbf{A} x_{LS} (y)
      rVert_{2}^{2} = r^{2}_{min}
      $$

      for all values of $y$.



      What is the vector in this affine space with the smallest length? The length of the solution vectors is
      $$
      lVert
      x_{LS}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b} +
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b}
      rVert_{2}^{2}
      +
      underbrace{lVert
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}}_{y=mathbf{0}}
      $$

      The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.






      share|cite|improve this answer














      The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.



      Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
      $$
      x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
      $$

      where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
      The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
      $$
      lVert
      mathbf{A} x_{LS} (y)
      rVert_{2}^{2} = r^{2}_{min}
      $$

      for all values of $y$.



      What is the vector in this affine space with the smallest length? The length of the solution vectors is
      $$
      lVert
      x_{LS}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b} +
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b}
      rVert_{2}^{2}
      +
      underbrace{lVert
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}}_{y=mathbf{0}}
      $$

      The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.







      share|cite|improve this answer














      share|cite|improve this answer



      share|cite|improve this answer








      edited Nov 19 at 23:38

























      answered Mar 8 '17 at 2:01









      dantopa

      6,31131741




      6,31131741












      • are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
        – glS
        Nov 16 at 13:20










      • @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
        – dantopa
        Nov 19 at 23:40


















      • are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
        – glS
        Nov 16 at 13:20










      • @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
        – dantopa
        Nov 19 at 23:40
















      are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
      – glS
      Nov 16 at 13:20




      are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
      – glS
      Nov 16 at 13:20












      @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
      – dantopa
      Nov 19 at 23:40




      @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
      – dantopa
      Nov 19 at 23:40










      up vote
      0
      down vote













      Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.



      $A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.



      So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?



      $U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that



      $USigma f_1 = Usigma_1 e_1 ne 0$



      $USigma f_2 = Usigma_2 e_2 ne 0$



      $USigma f_3 = Usigma_3 e_3 ne 0$



      $USigma f_4 = ... = USigma f_8 = 0$



      Since $U$ is orthogonal its inverse is $U^T$.



      Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:



      $AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3



      $AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8



      And the inverse of $V$ is $V^T$.



      $Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.



      It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.



      The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).



      If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.






      share|cite|improve this answer

























        up vote
        0
        down vote













        Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.



        $A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.



        So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?



        $U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that



        $USigma f_1 = Usigma_1 e_1 ne 0$



        $USigma f_2 = Usigma_2 e_2 ne 0$



        $USigma f_3 = Usigma_3 e_3 ne 0$



        $USigma f_4 = ... = USigma f_8 = 0$



        Since $U$ is orthogonal its inverse is $U^T$.



        Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:



        $AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3



        $AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8



        And the inverse of $V$ is $V^T$.



        $Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.



        It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.



        The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).



        If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.






        share|cite|improve this answer























          up vote
          0
          down vote










          up vote
          0
          down vote









          Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.



          $A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.



          So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?



          $U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that



          $USigma f_1 = Usigma_1 e_1 ne 0$



          $USigma f_2 = Usigma_2 e_2 ne 0$



          $USigma f_3 = Usigma_3 e_3 ne 0$



          $USigma f_4 = ... = USigma f_8 = 0$



          Since $U$ is orthogonal its inverse is $U^T$.



          Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:



          $AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3



          $AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8



          And the inverse of $V$ is $V^T$.



          $Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.



          It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.



          The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).



          If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.






          share|cite|improve this answer












          Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.



          $A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.



          So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?



          $U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that



          $USigma f_1 = Usigma_1 e_1 ne 0$



          $USigma f_2 = Usigma_2 e_2 ne 0$



          $USigma f_3 = Usigma_3 e_3 ne 0$



          $USigma f_4 = ... = USigma f_8 = 0$



          Since $U$ is orthogonal its inverse is $U^T$.



          Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:



          $AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3



          $AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8



          And the inverse of $V$ is $V^T$.



          $Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.



          It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.



          The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).



          If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered Sep 30 '16 at 13:37









          Amos Joshua

          241110




          241110






















              up vote
              0
              down vote













              The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:




              1. Rotate the unit circle so it can be stretched along its axis

              2. Stretch each axis to form the ellipse

              3. Rotate again to align the ellipse with the output space of $A$


              To solve for $x$, you reverse this process, starting with $b$.



              Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.



              In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.



              https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf






              share|cite|improve this answer

























                up vote
                0
                down vote













                The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:




                1. Rotate the unit circle so it can be stretched along its axis

                2. Stretch each axis to form the ellipse

                3. Rotate again to align the ellipse with the output space of $A$


                To solve for $x$, you reverse this process, starting with $b$.



                Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.



                In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.



                https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf






                share|cite|improve this answer























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:




                  1. Rotate the unit circle so it can be stretched along its axis

                  2. Stretch each axis to form the ellipse

                  3. Rotate again to align the ellipse with the output space of $A$


                  To solve for $x$, you reverse this process, starting with $b$.



                  Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.



                  In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.



                  https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf






                  share|cite|improve this answer












                  The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:




                  1. Rotate the unit circle so it can be stretched along its axis

                  2. Stretch each axis to form the ellipse

                  3. Rotate again to align the ellipse with the output space of $A$


                  To solve for $x$, you reverse this process, starting with $b$.



                  Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.



                  In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.



                  https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Feb 25 '17 at 16:19









                  Chris Noyes

                  1




                  1






























                       

                      draft saved


                      draft discarded



















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f974193%2fwhy-does-svd-provide-the-least-squares-and-least-norm-solution-to-a-x-b%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Tonle Sap (See)

                      I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

                      Guatemaltekische Davis-Cup-Mannschaft