Derivative of matrix using index notation












0












$begingroup$


In my stats textbook, they define the following function:



$mathbf{f} = frac{1}{2}(mathbf{A}mathbf{x} - mathbf{b})^2$,



where $mathbf{A}$ is a matrix, $mathbf{x}, mathbf{b}$ are just vectors. They then say that:



$frac{partial mathbf{f}}{partial mathbf{x}} = mathbf{A}^{T}(mathbf{A}mathbf{x} - mathbf{b})$



I tried to do this derivative using index notion. So, I defined $f$ as:



$f = frac{1}{2} (A_{ij}x^{j} - b_{i})^2$,



Then took the derivative with respect to $x^k$, (I use commas to denote partial derivatives):



$f_{,k} = delta^{j}_{k} A_{ij} (A_{ij}x^{j} - b_{i})$



Which applying the contraction, I get:



$f_{,k} = A_{i}^{k} (A_{ij}x^{j} - b_{i})$



But, I do not know if $A_{i}^{k}$ represents $mathbf{A}^T$?










share|cite|improve this question











$endgroup$












  • $begingroup$
    It should be $b_i$ instead of $b^j$
    $endgroup$
    – user251257
    Dec 27 '18 at 19:44










  • $begingroup$
    Why use index notation? We have $D_xf(p)=frac12cdot2langle Ap,Ax-brangle=langle p,A^T(Ax-b)rangle$, hence the gradient of $f$ is $A^T(Ax-b)$.
    $endgroup$
    – Michael Hoppe
    Dec 27 '18 at 20:04












  • $begingroup$
    @MichaelHoppe Hi. Yes, I know about this version, as you have done it. But, I wanted to try to do it using index notation! :)
    $endgroup$
    – Thomas Moore
    Dec 27 '18 at 20:29
















0












$begingroup$


In my stats textbook, they define the following function:



$mathbf{f} = frac{1}{2}(mathbf{A}mathbf{x} - mathbf{b})^2$,



where $mathbf{A}$ is a matrix, $mathbf{x}, mathbf{b}$ are just vectors. They then say that:



$frac{partial mathbf{f}}{partial mathbf{x}} = mathbf{A}^{T}(mathbf{A}mathbf{x} - mathbf{b})$



I tried to do this derivative using index notion. So, I defined $f$ as:



$f = frac{1}{2} (A_{ij}x^{j} - b_{i})^2$,



Then took the derivative with respect to $x^k$, (I use commas to denote partial derivatives):



$f_{,k} = delta^{j}_{k} A_{ij} (A_{ij}x^{j} - b_{i})$



Which applying the contraction, I get:



$f_{,k} = A_{i}^{k} (A_{ij}x^{j} - b_{i})$



But, I do not know if $A_{i}^{k}$ represents $mathbf{A}^T$?










share|cite|improve this question











$endgroup$












  • $begingroup$
    It should be $b_i$ instead of $b^j$
    $endgroup$
    – user251257
    Dec 27 '18 at 19:44










  • $begingroup$
    Why use index notation? We have $D_xf(p)=frac12cdot2langle Ap,Ax-brangle=langle p,A^T(Ax-b)rangle$, hence the gradient of $f$ is $A^T(Ax-b)$.
    $endgroup$
    – Michael Hoppe
    Dec 27 '18 at 20:04












  • $begingroup$
    @MichaelHoppe Hi. Yes, I know about this version, as you have done it. But, I wanted to try to do it using index notation! :)
    $endgroup$
    – Thomas Moore
    Dec 27 '18 at 20:29














0












0








0





$begingroup$


In my stats textbook, they define the following function:



$mathbf{f} = frac{1}{2}(mathbf{A}mathbf{x} - mathbf{b})^2$,



where $mathbf{A}$ is a matrix, $mathbf{x}, mathbf{b}$ are just vectors. They then say that:



$frac{partial mathbf{f}}{partial mathbf{x}} = mathbf{A}^{T}(mathbf{A}mathbf{x} - mathbf{b})$



I tried to do this derivative using index notion. So, I defined $f$ as:



$f = frac{1}{2} (A_{ij}x^{j} - b_{i})^2$,



Then took the derivative with respect to $x^k$, (I use commas to denote partial derivatives):



$f_{,k} = delta^{j}_{k} A_{ij} (A_{ij}x^{j} - b_{i})$



Which applying the contraction, I get:



$f_{,k} = A_{i}^{k} (A_{ij}x^{j} - b_{i})$



But, I do not know if $A_{i}^{k}$ represents $mathbf{A}^T$?










share|cite|improve this question











$endgroup$




In my stats textbook, they define the following function:



$mathbf{f} = frac{1}{2}(mathbf{A}mathbf{x} - mathbf{b})^2$,



where $mathbf{A}$ is a matrix, $mathbf{x}, mathbf{b}$ are just vectors. They then say that:



$frac{partial mathbf{f}}{partial mathbf{x}} = mathbf{A}^{T}(mathbf{A}mathbf{x} - mathbf{b})$



I tried to do this derivative using index notion. So, I defined $f$ as:



$f = frac{1}{2} (A_{ij}x^{j} - b_{i})^2$,



Then took the derivative with respect to $x^k$, (I use commas to denote partial derivatives):



$f_{,k} = delta^{j}_{k} A_{ij} (A_{ij}x^{j} - b_{i})$



Which applying the contraction, I get:



$f_{,k} = A_{i}^{k} (A_{ij}x^{j} - b_{i})$



But, I do not know if $A_{i}^{k}$ represents $mathbf{A}^T$?







matrix-calculus tensors index-notation






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Dec 27 '18 at 19:53







Thomas Moore

















asked Dec 27 '18 at 19:23









Thomas MooreThomas Moore

425410




425410












  • $begingroup$
    It should be $b_i$ instead of $b^j$
    $endgroup$
    – user251257
    Dec 27 '18 at 19:44










  • $begingroup$
    Why use index notation? We have $D_xf(p)=frac12cdot2langle Ap,Ax-brangle=langle p,A^T(Ax-b)rangle$, hence the gradient of $f$ is $A^T(Ax-b)$.
    $endgroup$
    – Michael Hoppe
    Dec 27 '18 at 20:04












  • $begingroup$
    @MichaelHoppe Hi. Yes, I know about this version, as you have done it. But, I wanted to try to do it using index notation! :)
    $endgroup$
    – Thomas Moore
    Dec 27 '18 at 20:29


















  • $begingroup$
    It should be $b_i$ instead of $b^j$
    $endgroup$
    – user251257
    Dec 27 '18 at 19:44










  • $begingroup$
    Why use index notation? We have $D_xf(p)=frac12cdot2langle Ap,Ax-brangle=langle p,A^T(Ax-b)rangle$, hence the gradient of $f$ is $A^T(Ax-b)$.
    $endgroup$
    – Michael Hoppe
    Dec 27 '18 at 20:04












  • $begingroup$
    @MichaelHoppe Hi. Yes, I know about this version, as you have done it. But, I wanted to try to do it using index notation! :)
    $endgroup$
    – Thomas Moore
    Dec 27 '18 at 20:29
















$begingroup$
It should be $b_i$ instead of $b^j$
$endgroup$
– user251257
Dec 27 '18 at 19:44




$begingroup$
It should be $b_i$ instead of $b^j$
$endgroup$
– user251257
Dec 27 '18 at 19:44












$begingroup$
Why use index notation? We have $D_xf(p)=frac12cdot2langle Ap,Ax-brangle=langle p,A^T(Ax-b)rangle$, hence the gradient of $f$ is $A^T(Ax-b)$.
$endgroup$
– Michael Hoppe
Dec 27 '18 at 20:04






$begingroup$
Why use index notation? We have $D_xf(p)=frac12cdot2langle Ap,Ax-brangle=langle p,A^T(Ax-b)rangle$, hence the gradient of $f$ is $A^T(Ax-b)$.
$endgroup$
– Michael Hoppe
Dec 27 '18 at 20:04














$begingroup$
@MichaelHoppe Hi. Yes, I know about this version, as you have done it. But, I wanted to try to do it using index notation! :)
$endgroup$
– Thomas Moore
Dec 27 '18 at 20:29




$begingroup$
@MichaelHoppe Hi. Yes, I know about this version, as you have done it. But, I wanted to try to do it using index notation! :)
$endgroup$
– Thomas Moore
Dec 27 '18 at 20:29










2 Answers
2






active

oldest

votes


















1












$begingroup$

Some comments (I am not yet allowed add them as a comment):

a) your function $f=(Ax-b)^2$ is not defined if $A$ is a matrix. My guess is that it should be $f(x)=(Ax-b)^T(Ax-b)$.

b) The key of derivating a real valued function $f$ wrt a $K$-vector $x$ is:

b1) If $x$ is a column vector then $partial f/partial x$ is a column vector with $partial f/partial x_i$ as i-th element

b2) $partial x/partial x^T = partial x^T/partial x = I_K$

With these conventions derivation of $f$ wrt the vector $x$ yields the same result as element by element partial derivation.






share|cite|improve this answer











$endgroup$





















    0












    $begingroup$

    Your second equation can be rewritten by taking its $k$th component, viz. $$f_{,k}=(A^T)_{ki}(Ax-b)_i=(A^T)_{ki}(A_{ij}x_j-b_i).$$Comparing this with your final equation, $A_i^k=(A^T)_{ki}=A_{ik}$.






    share|cite|improve this answer









    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "69"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3054284%2fderivative-of-matrix-using-index-notation%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1












      $begingroup$

      Some comments (I am not yet allowed add them as a comment):

      a) your function $f=(Ax-b)^2$ is not defined if $A$ is a matrix. My guess is that it should be $f(x)=(Ax-b)^T(Ax-b)$.

      b) The key of derivating a real valued function $f$ wrt a $K$-vector $x$ is:

      b1) If $x$ is a column vector then $partial f/partial x$ is a column vector with $partial f/partial x_i$ as i-th element

      b2) $partial x/partial x^T = partial x^T/partial x = I_K$

      With these conventions derivation of $f$ wrt the vector $x$ yields the same result as element by element partial derivation.






      share|cite|improve this answer











      $endgroup$


















        1












        $begingroup$

        Some comments (I am not yet allowed add them as a comment):

        a) your function $f=(Ax-b)^2$ is not defined if $A$ is a matrix. My guess is that it should be $f(x)=(Ax-b)^T(Ax-b)$.

        b) The key of derivating a real valued function $f$ wrt a $K$-vector $x$ is:

        b1) If $x$ is a column vector then $partial f/partial x$ is a column vector with $partial f/partial x_i$ as i-th element

        b2) $partial x/partial x^T = partial x^T/partial x = I_K$

        With these conventions derivation of $f$ wrt the vector $x$ yields the same result as element by element partial derivation.






        share|cite|improve this answer











        $endgroup$
















          1












          1








          1





          $begingroup$

          Some comments (I am not yet allowed add them as a comment):

          a) your function $f=(Ax-b)^2$ is not defined if $A$ is a matrix. My guess is that it should be $f(x)=(Ax-b)^T(Ax-b)$.

          b) The key of derivating a real valued function $f$ wrt a $K$-vector $x$ is:

          b1) If $x$ is a column vector then $partial f/partial x$ is a column vector with $partial f/partial x_i$ as i-th element

          b2) $partial x/partial x^T = partial x^T/partial x = I_K$

          With these conventions derivation of $f$ wrt the vector $x$ yields the same result as element by element partial derivation.






          share|cite|improve this answer











          $endgroup$



          Some comments (I am not yet allowed add them as a comment):

          a) your function $f=(Ax-b)^2$ is not defined if $A$ is a matrix. My guess is that it should be $f(x)=(Ax-b)^T(Ax-b)$.

          b) The key of derivating a real valued function $f$ wrt a $K$-vector $x$ is:

          b1) If $x$ is a column vector then $partial f/partial x$ is a column vector with $partial f/partial x_i$ as i-th element

          b2) $partial x/partial x^T = partial x^T/partial x = I_K$

          With these conventions derivation of $f$ wrt the vector $x$ yields the same result as element by element partial derivation.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Dec 29 '18 at 18:04

























          answered Dec 27 '18 at 21:26









          BertrandBertrand

          45815




          45815























              0












              $begingroup$

              Your second equation can be rewritten by taking its $k$th component, viz. $$f_{,k}=(A^T)_{ki}(Ax-b)_i=(A^T)_{ki}(A_{ij}x_j-b_i).$$Comparing this with your final equation, $A_i^k=(A^T)_{ki}=A_{ik}$.






              share|cite|improve this answer









              $endgroup$


















                0












                $begingroup$

                Your second equation can be rewritten by taking its $k$th component, viz. $$f_{,k}=(A^T)_{ki}(Ax-b)_i=(A^T)_{ki}(A_{ij}x_j-b_i).$$Comparing this with your final equation, $A_i^k=(A^T)_{ki}=A_{ik}$.






                share|cite|improve this answer









                $endgroup$
















                  0












                  0








                  0





                  $begingroup$

                  Your second equation can be rewritten by taking its $k$th component, viz. $$f_{,k}=(A^T)_{ki}(Ax-b)_i=(A^T)_{ki}(A_{ij}x_j-b_i).$$Comparing this with your final equation, $A_i^k=(A^T)_{ki}=A_{ik}$.






                  share|cite|improve this answer









                  $endgroup$



                  Your second equation can be rewritten by taking its $k$th component, viz. $$f_{,k}=(A^T)_{ki}(Ax-b)_i=(A^T)_{ki}(A_{ij}x_j-b_i).$$Comparing this with your final equation, $A_i^k=(A^T)_{ki}=A_{ik}$.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Dec 27 '18 at 21:00









                  J.G.J.G.

                  30.3k23148




                  30.3k23148






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Mathematics Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3054284%2fderivative-of-matrix-using-index-notation%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Tonle Sap (See)

                      I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

                      Guatemaltekische Davis-Cup-Mannschaft