Conceptually, why does a positive definite Hessian at a specific point able to tell you if that point is a...












1














This is not about calculating anything. But can anyone tell me why this is the case?



So, from wikipedia:




If the Hessian is positive definite at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point for f. Otherwise the test is inconclusive. This implies that, at a local minimum (resp. a local maximum), the Hessian is positive-semi-definite (resp. negative semi-definite).




Can someone explain, intuitively, why this is the case?










share|cite|improve this question



























    1














    This is not about calculating anything. But can anyone tell me why this is the case?



    So, from wikipedia:




    If the Hessian is positive definite at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point for f. Otherwise the test is inconclusive. This implies that, at a local minimum (resp. a local maximum), the Hessian is positive-semi-definite (resp. negative semi-definite).




    Can someone explain, intuitively, why this is the case?










    share|cite|improve this question

























      1












      1








      1


      2





      This is not about calculating anything. But can anyone tell me why this is the case?



      So, from wikipedia:




      If the Hessian is positive definite at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point for f. Otherwise the test is inconclusive. This implies that, at a local minimum (resp. a local maximum), the Hessian is positive-semi-definite (resp. negative semi-definite).




      Can someone explain, intuitively, why this is the case?










      share|cite|improve this question













      This is not about calculating anything. But can anyone tell me why this is the case?



      So, from wikipedia:




      If the Hessian is positive definite at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point for f. Otherwise the test is inconclusive. This implies that, at a local minimum (resp. a local maximum), the Hessian is positive-semi-definite (resp. negative semi-definite).




      Can someone explain, intuitively, why this is the case?







      optimization






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Nov 13 '15 at 16:04









      rjm726

      62




      62






















          3 Answers
          3






          active

          oldest

          votes


















          2














          It's pretty much the same as the 1-dimensional case. The second derivative gives you an idea of the local "curvature" of the function near the point, with a positive second derivative meaning that it's curving "up". In multiple dimensions, the Hessian matrix gives you the same information, except now you have infinitely many directions to look for curvature. Positive definiteness says that all the eigenvalues are positive, which means that any time you look along an eigenvector, the function will be curving up. Assuming the Hessian is nondegenerate, the eigenvectors form a basis near that point, and so looking in any direction you'll also see "curving up" because you can decompose the direction into eigenvector directions.



          You can extend this idea to negative definite and semidefinite cases fairly easily - the idea is the same. Looking along eigenvectors gives you 1-D slices of the function, and then you're back to 1D calculus.






          share|cite|improve this answer































            1














            Roughly like this:



            A Taylor expansion around $x$ by $h$ is
            $$
            f(x + h) = f(x) + text{grad } f cdot h + frac{1}{2} h^T H h + O(h^3)
            $$
            at a critical point the gradient vanishes and this reduces to
            $$
            f(x + h) = f(x) + frac{1}{2} h^T H h + O(h^3)
            $$
            For a minimum, neglecting $O(h^3)$ for small $h$ one would need
            $$
            f(x + h) - f(x) = frac{1}{2} h^T H h ge 0
            $$
            and that is why positive semi-definiteness is needed.



            For a maximum
            $$
            f(x + h) - f(x) = frac{1}{2} h^T H h le 0
            $$






            share|cite|improve this answer





























              0














              This is because of Taylor's formula at order $2$:
              begin{align*}f(x+h,y+k)-f(x,y)&=hf'_x(x,y)+kf'_y(x,y)begin{aligned}[t]&+frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)\&+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)end {aligned}\
              &=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)
              end{align*}
              If the quadratic form $;q(h,k)=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)$ is positive definite, the sign of the left-hand side is positive for all $lVert(h,k)bigrrVert^2$ small enough, hence $f(x+h,y+k)-f(x,y)>0$, so we have a local minimum. If it is definite negative, for the same reasons, we have a local maximum.






              share|cite|improve this answer





















                Your Answer





                StackExchange.ifUsing("editor", function () {
                return StackExchange.using("mathjaxEditing", function () {
                StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
                StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                });
                });
                }, "mathjax-editing");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "69"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                noCode: true, onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1527379%2fconceptually-why-does-a-positive-definite-hessian-at-a-specific-point-able-to-t%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                2














                It's pretty much the same as the 1-dimensional case. The second derivative gives you an idea of the local "curvature" of the function near the point, with a positive second derivative meaning that it's curving "up". In multiple dimensions, the Hessian matrix gives you the same information, except now you have infinitely many directions to look for curvature. Positive definiteness says that all the eigenvalues are positive, which means that any time you look along an eigenvector, the function will be curving up. Assuming the Hessian is nondegenerate, the eigenvectors form a basis near that point, and so looking in any direction you'll also see "curving up" because you can decompose the direction into eigenvector directions.



                You can extend this idea to negative definite and semidefinite cases fairly easily - the idea is the same. Looking along eigenvectors gives you 1-D slices of the function, and then you're back to 1D calculus.






                share|cite|improve this answer




























                  2














                  It's pretty much the same as the 1-dimensional case. The second derivative gives you an idea of the local "curvature" of the function near the point, with a positive second derivative meaning that it's curving "up". In multiple dimensions, the Hessian matrix gives you the same information, except now you have infinitely many directions to look for curvature. Positive definiteness says that all the eigenvalues are positive, which means that any time you look along an eigenvector, the function will be curving up. Assuming the Hessian is nondegenerate, the eigenvectors form a basis near that point, and so looking in any direction you'll also see "curving up" because you can decompose the direction into eigenvector directions.



                  You can extend this idea to negative definite and semidefinite cases fairly easily - the idea is the same. Looking along eigenvectors gives you 1-D slices of the function, and then you're back to 1D calculus.






                  share|cite|improve this answer


























                    2












                    2








                    2






                    It's pretty much the same as the 1-dimensional case. The second derivative gives you an idea of the local "curvature" of the function near the point, with a positive second derivative meaning that it's curving "up". In multiple dimensions, the Hessian matrix gives you the same information, except now you have infinitely many directions to look for curvature. Positive definiteness says that all the eigenvalues are positive, which means that any time you look along an eigenvector, the function will be curving up. Assuming the Hessian is nondegenerate, the eigenvectors form a basis near that point, and so looking in any direction you'll also see "curving up" because you can decompose the direction into eigenvector directions.



                    You can extend this idea to negative definite and semidefinite cases fairly easily - the idea is the same. Looking along eigenvectors gives you 1-D slices of the function, and then you're back to 1D calculus.






                    share|cite|improve this answer














                    It's pretty much the same as the 1-dimensional case. The second derivative gives you an idea of the local "curvature" of the function near the point, with a positive second derivative meaning that it's curving "up". In multiple dimensions, the Hessian matrix gives you the same information, except now you have infinitely many directions to look for curvature. Positive definiteness says that all the eigenvalues are positive, which means that any time you look along an eigenvector, the function will be curving up. Assuming the Hessian is nondegenerate, the eigenvectors form a basis near that point, and so looking in any direction you'll also see "curving up" because you can decompose the direction into eigenvector directions.



                    You can extend this idea to negative definite and semidefinite cases fairly easily - the idea is the same. Looking along eigenvectors gives you 1-D slices of the function, and then you're back to 1D calculus.







                    share|cite|improve this answer














                    share|cite|improve this answer



                    share|cite|improve this answer








                    edited Nov 29 at 3:17

























                    answered Nov 13 '15 at 16:14









                    icurays1

                    13.3k13054




                    13.3k13054























                        1














                        Roughly like this:



                        A Taylor expansion around $x$ by $h$ is
                        $$
                        f(x + h) = f(x) + text{grad } f cdot h + frac{1}{2} h^T H h + O(h^3)
                        $$
                        at a critical point the gradient vanishes and this reduces to
                        $$
                        f(x + h) = f(x) + frac{1}{2} h^T H h + O(h^3)
                        $$
                        For a minimum, neglecting $O(h^3)$ for small $h$ one would need
                        $$
                        f(x + h) - f(x) = frac{1}{2} h^T H h ge 0
                        $$
                        and that is why positive semi-definiteness is needed.



                        For a maximum
                        $$
                        f(x + h) - f(x) = frac{1}{2} h^T H h le 0
                        $$






                        share|cite|improve this answer


























                          1














                          Roughly like this:



                          A Taylor expansion around $x$ by $h$ is
                          $$
                          f(x + h) = f(x) + text{grad } f cdot h + frac{1}{2} h^T H h + O(h^3)
                          $$
                          at a critical point the gradient vanishes and this reduces to
                          $$
                          f(x + h) = f(x) + frac{1}{2} h^T H h + O(h^3)
                          $$
                          For a minimum, neglecting $O(h^3)$ for small $h$ one would need
                          $$
                          f(x + h) - f(x) = frac{1}{2} h^T H h ge 0
                          $$
                          and that is why positive semi-definiteness is needed.



                          For a maximum
                          $$
                          f(x + h) - f(x) = frac{1}{2} h^T H h le 0
                          $$






                          share|cite|improve this answer
























                            1












                            1








                            1






                            Roughly like this:



                            A Taylor expansion around $x$ by $h$ is
                            $$
                            f(x + h) = f(x) + text{grad } f cdot h + frac{1}{2} h^T H h + O(h^3)
                            $$
                            at a critical point the gradient vanishes and this reduces to
                            $$
                            f(x + h) = f(x) + frac{1}{2} h^T H h + O(h^3)
                            $$
                            For a minimum, neglecting $O(h^3)$ for small $h$ one would need
                            $$
                            f(x + h) - f(x) = frac{1}{2} h^T H h ge 0
                            $$
                            and that is why positive semi-definiteness is needed.



                            For a maximum
                            $$
                            f(x + h) - f(x) = frac{1}{2} h^T H h le 0
                            $$






                            share|cite|improve this answer












                            Roughly like this:



                            A Taylor expansion around $x$ by $h$ is
                            $$
                            f(x + h) = f(x) + text{grad } f cdot h + frac{1}{2} h^T H h + O(h^3)
                            $$
                            at a critical point the gradient vanishes and this reduces to
                            $$
                            f(x + h) = f(x) + frac{1}{2} h^T H h + O(h^3)
                            $$
                            For a minimum, neglecting $O(h^3)$ for small $h$ one would need
                            $$
                            f(x + h) - f(x) = frac{1}{2} h^T H h ge 0
                            $$
                            and that is why positive semi-definiteness is needed.



                            For a maximum
                            $$
                            f(x + h) - f(x) = frac{1}{2} h^T H h le 0
                            $$







                            share|cite|improve this answer












                            share|cite|improve this answer



                            share|cite|improve this answer










                            answered Nov 13 '15 at 16:11









                            mvw

                            31.3k22252




                            31.3k22252























                                0














                                This is because of Taylor's formula at order $2$:
                                begin{align*}f(x+h,y+k)-f(x,y)&=hf'_x(x,y)+kf'_y(x,y)begin{aligned}[t]&+frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)\&+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)end {aligned}\
                                &=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)
                                end{align*}
                                If the quadratic form $;q(h,k)=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)$ is positive definite, the sign of the left-hand side is positive for all $lVert(h,k)bigrrVert^2$ small enough, hence $f(x+h,y+k)-f(x,y)>0$, so we have a local minimum. If it is definite negative, for the same reasons, we have a local maximum.






                                share|cite|improve this answer


























                                  0














                                  This is because of Taylor's formula at order $2$:
                                  begin{align*}f(x+h,y+k)-f(x,y)&=hf'_x(x,y)+kf'_y(x,y)begin{aligned}[t]&+frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)\&+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)end {aligned}\
                                  &=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)
                                  end{align*}
                                  If the quadratic form $;q(h,k)=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)$ is positive definite, the sign of the left-hand side is positive for all $lVert(h,k)bigrrVert^2$ small enough, hence $f(x+h,y+k)-f(x,y)>0$, so we have a local minimum. If it is definite negative, for the same reasons, we have a local maximum.






                                  share|cite|improve this answer
























                                    0












                                    0








                                    0






                                    This is because of Taylor's formula at order $2$:
                                    begin{align*}f(x+h,y+k)-f(x,y)&=hf'_x(x,y)+kf'_y(x,y)begin{aligned}[t]&+frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)\&+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)end {aligned}\
                                    &=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)
                                    end{align*}
                                    If the quadratic form $;q(h,k)=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)$ is positive definite, the sign of the left-hand side is positive for all $lVert(h,k)bigrrVert^2$ small enough, hence $f(x+h,y+k)-f(x,y)>0$, so we have a local minimum. If it is definite negative, for the same reasons, we have a local maximum.






                                    share|cite|improve this answer












                                    This is because of Taylor's formula at order $2$:
                                    begin{align*}f(x+h,y+k)-f(x,y)&=hf'_x(x,y)+kf'_y(x,y)begin{aligned}[t]&+frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)\&+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)end {aligned}\
                                    &=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)
                                    end{align*}
                                    If the quadratic form $;q(h,k)=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)$ is positive definite, the sign of the left-hand side is positive for all $lVert(h,k)bigrrVert^2$ small enough, hence $f(x+h,y+k)-f(x,y)>0$, so we have a local minimum. If it is definite negative, for the same reasons, we have a local maximum.







                                    share|cite|improve this answer












                                    share|cite|improve this answer



                                    share|cite|improve this answer










                                    answered Nov 13 '15 at 16:41









                                    Bernard

                                    118k638111




                                    118k638111






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Mathematics Stack Exchange!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        Use MathJax to format equations. MathJax reference.


                                        To learn more, see our tips on writing great answers.





                                        Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                        Please pay close attention to the following guidance:


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1527379%2fconceptually-why-does-a-positive-definite-hessian-at-a-specific-point-able-to-t%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Wiesbaden

                                        Marschland

                                        Dieringhausen