Conceptually, why does a positive definite Hessian at a specific point able to tell you if that point is a...
This is not about calculating anything. But can anyone tell me why this is the case?
So, from wikipedia:
If the Hessian is positive definite at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point for f. Otherwise the test is inconclusive. This implies that, at a local minimum (resp. a local maximum), the Hessian is positive-semi-definite (resp. negative semi-definite).
Can someone explain, intuitively, why this is the case?
optimization
add a comment |
This is not about calculating anything. But can anyone tell me why this is the case?
So, from wikipedia:
If the Hessian is positive definite at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point for f. Otherwise the test is inconclusive. This implies that, at a local minimum (resp. a local maximum), the Hessian is positive-semi-definite (resp. negative semi-definite).
Can someone explain, intuitively, why this is the case?
optimization
add a comment |
This is not about calculating anything. But can anyone tell me why this is the case?
So, from wikipedia:
If the Hessian is positive definite at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point for f. Otherwise the test is inconclusive. This implies that, at a local minimum (resp. a local maximum), the Hessian is positive-semi-definite (resp. negative semi-definite).
Can someone explain, intuitively, why this is the case?
optimization
This is not about calculating anything. But can anyone tell me why this is the case?
So, from wikipedia:
If the Hessian is positive definite at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point for f. Otherwise the test is inconclusive. This implies that, at a local minimum (resp. a local maximum), the Hessian is positive-semi-definite (resp. negative semi-definite).
Can someone explain, intuitively, why this is the case?
optimization
optimization
asked Nov 13 '15 at 16:04
rjm726
62
62
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
It's pretty much the same as the 1-dimensional case. The second derivative gives you an idea of the local "curvature" of the function near the point, with a positive second derivative meaning that it's curving "up". In multiple dimensions, the Hessian matrix gives you the same information, except now you have infinitely many directions to look for curvature. Positive definiteness says that all the eigenvalues are positive, which means that any time you look along an eigenvector, the function will be curving up. Assuming the Hessian is nondegenerate, the eigenvectors form a basis near that point, and so looking in any direction you'll also see "curving up" because you can decompose the direction into eigenvector directions.
You can extend this idea to negative definite and semidefinite cases fairly easily - the idea is the same. Looking along eigenvectors gives you 1-D slices of the function, and then you're back to 1D calculus.
add a comment |
Roughly like this:
A Taylor expansion around $x$ by $h$ is
$$
f(x + h) = f(x) + text{grad } f cdot h + frac{1}{2} h^T H h + O(h^3)
$$
at a critical point the gradient vanishes and this reduces to
$$
f(x + h) = f(x) + frac{1}{2} h^T H h + O(h^3)
$$
For a minimum, neglecting $O(h^3)$ for small $h$ one would need
$$
f(x + h) - f(x) = frac{1}{2} h^T H h ge 0
$$
and that is why positive semi-definiteness is needed.
For a maximum
$$
f(x + h) - f(x) = frac{1}{2} h^T H h le 0
$$
add a comment |
This is because of Taylor's formula at order $2$:
begin{align*}f(x+h,y+k)-f(x,y)&=hf'_x(x,y)+kf'_y(x,y)begin{aligned}[t]&+frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)\&+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)end {aligned}\
&=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)
end{align*}
If the quadratic form $;q(h,k)=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)$ is positive definite, the sign of the left-hand side is positive for all $lVert(h,k)bigrrVert^2$ small enough, hence $f(x+h,y+k)-f(x,y)>0$, so we have a local minimum. If it is definite negative, for the same reasons, we have a local maximum.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1527379%2fconceptually-why-does-a-positive-definite-hessian-at-a-specific-point-able-to-t%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
It's pretty much the same as the 1-dimensional case. The second derivative gives you an idea of the local "curvature" of the function near the point, with a positive second derivative meaning that it's curving "up". In multiple dimensions, the Hessian matrix gives you the same information, except now you have infinitely many directions to look for curvature. Positive definiteness says that all the eigenvalues are positive, which means that any time you look along an eigenvector, the function will be curving up. Assuming the Hessian is nondegenerate, the eigenvectors form a basis near that point, and so looking in any direction you'll also see "curving up" because you can decompose the direction into eigenvector directions.
You can extend this idea to negative definite and semidefinite cases fairly easily - the idea is the same. Looking along eigenvectors gives you 1-D slices of the function, and then you're back to 1D calculus.
add a comment |
It's pretty much the same as the 1-dimensional case. The second derivative gives you an idea of the local "curvature" of the function near the point, with a positive second derivative meaning that it's curving "up". In multiple dimensions, the Hessian matrix gives you the same information, except now you have infinitely many directions to look for curvature. Positive definiteness says that all the eigenvalues are positive, which means that any time you look along an eigenvector, the function will be curving up. Assuming the Hessian is nondegenerate, the eigenvectors form a basis near that point, and so looking in any direction you'll also see "curving up" because you can decompose the direction into eigenvector directions.
You can extend this idea to negative definite and semidefinite cases fairly easily - the idea is the same. Looking along eigenvectors gives you 1-D slices of the function, and then you're back to 1D calculus.
add a comment |
It's pretty much the same as the 1-dimensional case. The second derivative gives you an idea of the local "curvature" of the function near the point, with a positive second derivative meaning that it's curving "up". In multiple dimensions, the Hessian matrix gives you the same information, except now you have infinitely many directions to look for curvature. Positive definiteness says that all the eigenvalues are positive, which means that any time you look along an eigenvector, the function will be curving up. Assuming the Hessian is nondegenerate, the eigenvectors form a basis near that point, and so looking in any direction you'll also see "curving up" because you can decompose the direction into eigenvector directions.
You can extend this idea to negative definite and semidefinite cases fairly easily - the idea is the same. Looking along eigenvectors gives you 1-D slices of the function, and then you're back to 1D calculus.
It's pretty much the same as the 1-dimensional case. The second derivative gives you an idea of the local "curvature" of the function near the point, with a positive second derivative meaning that it's curving "up". In multiple dimensions, the Hessian matrix gives you the same information, except now you have infinitely many directions to look for curvature. Positive definiteness says that all the eigenvalues are positive, which means that any time you look along an eigenvector, the function will be curving up. Assuming the Hessian is nondegenerate, the eigenvectors form a basis near that point, and so looking in any direction you'll also see "curving up" because you can decompose the direction into eigenvector directions.
You can extend this idea to negative definite and semidefinite cases fairly easily - the idea is the same. Looking along eigenvectors gives you 1-D slices of the function, and then you're back to 1D calculus.
edited Nov 29 at 3:17
answered Nov 13 '15 at 16:14
icurays1
13.3k13054
13.3k13054
add a comment |
add a comment |
Roughly like this:
A Taylor expansion around $x$ by $h$ is
$$
f(x + h) = f(x) + text{grad } f cdot h + frac{1}{2} h^T H h + O(h^3)
$$
at a critical point the gradient vanishes and this reduces to
$$
f(x + h) = f(x) + frac{1}{2} h^T H h + O(h^3)
$$
For a minimum, neglecting $O(h^3)$ for small $h$ one would need
$$
f(x + h) - f(x) = frac{1}{2} h^T H h ge 0
$$
and that is why positive semi-definiteness is needed.
For a maximum
$$
f(x + h) - f(x) = frac{1}{2} h^T H h le 0
$$
add a comment |
Roughly like this:
A Taylor expansion around $x$ by $h$ is
$$
f(x + h) = f(x) + text{grad } f cdot h + frac{1}{2} h^T H h + O(h^3)
$$
at a critical point the gradient vanishes and this reduces to
$$
f(x + h) = f(x) + frac{1}{2} h^T H h + O(h^3)
$$
For a minimum, neglecting $O(h^3)$ for small $h$ one would need
$$
f(x + h) - f(x) = frac{1}{2} h^T H h ge 0
$$
and that is why positive semi-definiteness is needed.
For a maximum
$$
f(x + h) - f(x) = frac{1}{2} h^T H h le 0
$$
add a comment |
Roughly like this:
A Taylor expansion around $x$ by $h$ is
$$
f(x + h) = f(x) + text{grad } f cdot h + frac{1}{2} h^T H h + O(h^3)
$$
at a critical point the gradient vanishes and this reduces to
$$
f(x + h) = f(x) + frac{1}{2} h^T H h + O(h^3)
$$
For a minimum, neglecting $O(h^3)$ for small $h$ one would need
$$
f(x + h) - f(x) = frac{1}{2} h^T H h ge 0
$$
and that is why positive semi-definiteness is needed.
For a maximum
$$
f(x + h) - f(x) = frac{1}{2} h^T H h le 0
$$
Roughly like this:
A Taylor expansion around $x$ by $h$ is
$$
f(x + h) = f(x) + text{grad } f cdot h + frac{1}{2} h^T H h + O(h^3)
$$
at a critical point the gradient vanishes and this reduces to
$$
f(x + h) = f(x) + frac{1}{2} h^T H h + O(h^3)
$$
For a minimum, neglecting $O(h^3)$ for small $h$ one would need
$$
f(x + h) - f(x) = frac{1}{2} h^T H h ge 0
$$
and that is why positive semi-definiteness is needed.
For a maximum
$$
f(x + h) - f(x) = frac{1}{2} h^T H h le 0
$$
answered Nov 13 '15 at 16:11
mvw
31.3k22252
31.3k22252
add a comment |
add a comment |
This is because of Taylor's formula at order $2$:
begin{align*}f(x+h,y+k)-f(x,y)&=hf'_x(x,y)+kf'_y(x,y)begin{aligned}[t]&+frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)\&+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)end {aligned}\
&=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)
end{align*}
If the quadratic form $;q(h,k)=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)$ is positive definite, the sign of the left-hand side is positive for all $lVert(h,k)bigrrVert^2$ small enough, hence $f(x+h,y+k)-f(x,y)>0$, so we have a local minimum. If it is definite negative, for the same reasons, we have a local maximum.
add a comment |
This is because of Taylor's formula at order $2$:
begin{align*}f(x+h,y+k)-f(x,y)&=hf'_x(x,y)+kf'_y(x,y)begin{aligned}[t]&+frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)\&+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)end {aligned}\
&=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)
end{align*}
If the quadratic form $;q(h,k)=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)$ is positive definite, the sign of the left-hand side is positive for all $lVert(h,k)bigrrVert^2$ small enough, hence $f(x+h,y+k)-f(x,y)>0$, so we have a local minimum. If it is definite negative, for the same reasons, we have a local maximum.
add a comment |
This is because of Taylor's formula at order $2$:
begin{align*}f(x+h,y+k)-f(x,y)&=hf'_x(x,y)+kf'_y(x,y)begin{aligned}[t]&+frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)\&+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)end {aligned}\
&=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)
end{align*}
If the quadratic form $;q(h,k)=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)$ is positive definite, the sign of the left-hand side is positive for all $lVert(h,k)bigrrVert^2$ small enough, hence $f(x+h,y+k)-f(x,y)>0$, so we have a local minimum. If it is definite negative, for the same reasons, we have a local maximum.
This is because of Taylor's formula at order $2$:
begin{align*}f(x+h,y+k)-f(x,y)&=hf'_x(x,y)+kf'_y(x,y)begin{aligned}[t]&+frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)\&+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)end {aligned}\
&=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)+obigl(bigllVert(h,k)bigrrVert^2 bigr)
end{align*}
If the quadratic form $;q(h,k)=frac12Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)Bigr)$ is positive definite, the sign of the left-hand side is positive for all $lVert(h,k)bigrrVert^2$ small enough, hence $f(x+h,y+k)-f(x,y)>0$, so we have a local minimum. If it is definite negative, for the same reasons, we have a local maximum.
answered Nov 13 '15 at 16:41
Bernard
118k638111
118k638111
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1527379%2fconceptually-why-does-a-positive-definite-hessian-at-a-specific-point-able-to-t%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown