Calculate variance from a stream of sample values












14














I'd like to calculate a standard deviation for a very large (but known) number of sample values, with the highest accuracy possible. The number of samples is larger than can be efficiently stored in memory.



The basic variance formula is:



$sigma^2 = frac{1}{N}sum (x - mu)^2$



... but this formulation depends on knowing the value of $mu$ already.



$mu$ can be calculated cumulatively -- that is, you can calculate the mean without storing every sample value. You just have to store their sum.



But to calculate the variance, is it necessary to store every sample value? Given a stream of samples, can I accumulate a calculation of the variance, without a need for memory of each sample? Put another way, is there a formulation of the variance which doesn't depend on foreknowledge of the exact value of $mu$ before the whole sample set has been seen?










share|cite|improve this question






















  • This very same issue was discussed on dsp.SE, and the answers were very similar, with numerically unstable methods proposed in one answer, and more stable methods described by others.
    – Dilip Sarwate
    Mar 4 '12 at 17:04
















14














I'd like to calculate a standard deviation for a very large (but known) number of sample values, with the highest accuracy possible. The number of samples is larger than can be efficiently stored in memory.



The basic variance formula is:



$sigma^2 = frac{1}{N}sum (x - mu)^2$



... but this formulation depends on knowing the value of $mu$ already.



$mu$ can be calculated cumulatively -- that is, you can calculate the mean without storing every sample value. You just have to store their sum.



But to calculate the variance, is it necessary to store every sample value? Given a stream of samples, can I accumulate a calculation of the variance, without a need for memory of each sample? Put another way, is there a formulation of the variance which doesn't depend on foreknowledge of the exact value of $mu$ before the whole sample set has been seen?










share|cite|improve this question






















  • This very same issue was discussed on dsp.SE, and the answers were very similar, with numerically unstable methods proposed in one answer, and more stable methods described by others.
    – Dilip Sarwate
    Mar 4 '12 at 17:04














14












14








14


9





I'd like to calculate a standard deviation for a very large (but known) number of sample values, with the highest accuracy possible. The number of samples is larger than can be efficiently stored in memory.



The basic variance formula is:



$sigma^2 = frac{1}{N}sum (x - mu)^2$



... but this formulation depends on knowing the value of $mu$ already.



$mu$ can be calculated cumulatively -- that is, you can calculate the mean without storing every sample value. You just have to store their sum.



But to calculate the variance, is it necessary to store every sample value? Given a stream of samples, can I accumulate a calculation of the variance, without a need for memory of each sample? Put another way, is there a formulation of the variance which doesn't depend on foreknowledge of the exact value of $mu$ before the whole sample set has been seen?










share|cite|improve this question













I'd like to calculate a standard deviation for a very large (but known) number of sample values, with the highest accuracy possible. The number of samples is larger than can be efficiently stored in memory.



The basic variance formula is:



$sigma^2 = frac{1}{N}sum (x - mu)^2$



... but this formulation depends on knowing the value of $mu$ already.



$mu$ can be calculated cumulatively -- that is, you can calculate the mean without storing every sample value. You just have to store their sum.



But to calculate the variance, is it necessary to store every sample value? Given a stream of samples, can I accumulate a calculation of the variance, without a need for memory of each sample? Put another way, is there a formulation of the variance which doesn't depend on foreknowledge of the exact value of $mu$ before the whole sample set has been seen?







statistics algorithms






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Feb 5 '11 at 21:18









user6677

173115




173115












  • This very same issue was discussed on dsp.SE, and the answers were very similar, with numerically unstable methods proposed in one answer, and more stable methods described by others.
    – Dilip Sarwate
    Mar 4 '12 at 17:04


















  • This very same issue was discussed on dsp.SE, and the answers were very similar, with numerically unstable methods proposed in one answer, and more stable methods described by others.
    – Dilip Sarwate
    Mar 4 '12 at 17:04
















This very same issue was discussed on dsp.SE, and the answers were very similar, with numerically unstable methods proposed in one answer, and more stable methods described by others.
– Dilip Sarwate
Mar 4 '12 at 17:04




This very same issue was discussed on dsp.SE, and the answers were very similar, with numerically unstable methods proposed in one answer, and more stable methods described by others.
– Dilip Sarwate
Mar 4 '12 at 17:04










2 Answers
2






active

oldest

votes


















10














You can keep two running counters - one for $sum_i x_i$ and another for $sum_i x_i^2$. Since variance can be written as
$$ sigma^2 = frac{1}{N} left[ sum_i x_i^2 - frac{(sum_i x_i)^2}{N} right] $$



you can compute the variance of the data that you have seen thus far with just these two counters. Note that the $N$ here is not the total length of all your samples but only the number of samples you have observed in the past.






share|cite|improve this answer



















  • 1




    Wait -- shouldn't the N be inside the the expected values, such that the right side is divided by $N^2$?
    – user6677
    Feb 5 '11 at 22:05












  • e.g. $$ sigma^2 = frac{(sum_i x_i^2)}{N} - (frac{sum_i x_i}{N})^2 $$
    – user6677
    Feb 5 '11 at 22:27










  • @user6677: You are indeed right. Thanks for the correction.
    – Dinesh
    Feb 5 '11 at 22:30






  • 5




    Note the sum of squares especially is at risk of overflow.
    – dfrankow
    Dec 29 '14 at 18:06






  • 2




    This is numerically unstable. If your mean is large compared to your stdev, your estimate will be way off. Knuth's algo is better (see other answer).
    – Leopd
    Feb 18 '15 at 1:40



















29





+50









I'm a little late to the party, but it appears that this method is pretty unstable, but that there is a method that allows for streaming computation of the variance without sacrificing numerical stability.



Cook describes a method from Knuth, the punchline of which is to initialize $m_1 = x_1$, and $v_1 = 0$, where $m_k$ is the mean of the first $k$ values. From there,



$$
begin{align*}
m_k & = m_{k-1} + frac{x_k - m_{k-1}}k \
v_k & = v_{k-1} + (x_k - m_{k-1})(x_k - m_k)
end{align*}
$$



The mean at this point is simply extracted as $m_k$, and the variance is $sigma^2 = frac{v_k}{k-1}$. It's easy to verify that it works for the mean, but I'm still working on grokking the variance.






share|cite|improve this answer



















  • 2




    +1, excellent! I didn't feel a simple upvote would be sufficient to express my appreciation of this answer, so there's an extra 50 rep bounty coming your way in 24 hours.
    – Ilmari Karonen
    Mar 4 '12 at 20:12






  • 1




    Isn't the final variance calculation should be Vk / k (not k-1)?
    – Eyal Roth
    Oct 10 '13 at 15:49










  • @errr: it has to do with using biased vs unbiased estimator
    – alexey
    Mar 4 '16 at 2:03






  • 1




    $v_k = sum_i^k (x_i-m_k)^2$, so $v_k = v_{k-1} + x_k^2 - frac{(k-1)m_k^2 - km_{k-1}^2}{k(k-1)}$
    – GuLearn
    May 5 '17 at 2:02












  • To test whether it's stable or not refer to the Examples in en.wikipedia.org/wiki/Algorithms_for_calculating_variance. The goes through finding the variance for (4, 7, 13, 16) then for (1e8 + 4, 1e8 + 7, 1e8 + 13, 1e8 + 16) and then for (1e9 + 4, 1e9 + 7, 1e9 + 13, 1e9 + 16). In theory they should all return the same variance but because of the imprecisions in the double arithmetic the result deviate from the expected variance.
    – thomas.han
    Jun 30 at 16:58











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f20593%2fcalculate-variance-from-a-stream-of-sample-values%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









10














You can keep two running counters - one for $sum_i x_i$ and another for $sum_i x_i^2$. Since variance can be written as
$$ sigma^2 = frac{1}{N} left[ sum_i x_i^2 - frac{(sum_i x_i)^2}{N} right] $$



you can compute the variance of the data that you have seen thus far with just these two counters. Note that the $N$ here is not the total length of all your samples but only the number of samples you have observed in the past.






share|cite|improve this answer



















  • 1




    Wait -- shouldn't the N be inside the the expected values, such that the right side is divided by $N^2$?
    – user6677
    Feb 5 '11 at 22:05












  • e.g. $$ sigma^2 = frac{(sum_i x_i^2)}{N} - (frac{sum_i x_i}{N})^2 $$
    – user6677
    Feb 5 '11 at 22:27










  • @user6677: You are indeed right. Thanks for the correction.
    – Dinesh
    Feb 5 '11 at 22:30






  • 5




    Note the sum of squares especially is at risk of overflow.
    – dfrankow
    Dec 29 '14 at 18:06






  • 2




    This is numerically unstable. If your mean is large compared to your stdev, your estimate will be way off. Knuth's algo is better (see other answer).
    – Leopd
    Feb 18 '15 at 1:40
















10














You can keep two running counters - one for $sum_i x_i$ and another for $sum_i x_i^2$. Since variance can be written as
$$ sigma^2 = frac{1}{N} left[ sum_i x_i^2 - frac{(sum_i x_i)^2}{N} right] $$



you can compute the variance of the data that you have seen thus far with just these two counters. Note that the $N$ here is not the total length of all your samples but only the number of samples you have observed in the past.






share|cite|improve this answer



















  • 1




    Wait -- shouldn't the N be inside the the expected values, such that the right side is divided by $N^2$?
    – user6677
    Feb 5 '11 at 22:05












  • e.g. $$ sigma^2 = frac{(sum_i x_i^2)}{N} - (frac{sum_i x_i}{N})^2 $$
    – user6677
    Feb 5 '11 at 22:27










  • @user6677: You are indeed right. Thanks for the correction.
    – Dinesh
    Feb 5 '11 at 22:30






  • 5




    Note the sum of squares especially is at risk of overflow.
    – dfrankow
    Dec 29 '14 at 18:06






  • 2




    This is numerically unstable. If your mean is large compared to your stdev, your estimate will be way off. Knuth's algo is better (see other answer).
    – Leopd
    Feb 18 '15 at 1:40














10












10








10






You can keep two running counters - one for $sum_i x_i$ and another for $sum_i x_i^2$. Since variance can be written as
$$ sigma^2 = frac{1}{N} left[ sum_i x_i^2 - frac{(sum_i x_i)^2}{N} right] $$



you can compute the variance of the data that you have seen thus far with just these two counters. Note that the $N$ here is not the total length of all your samples but only the number of samples you have observed in the past.






share|cite|improve this answer














You can keep two running counters - one for $sum_i x_i$ and another for $sum_i x_i^2$. Since variance can be written as
$$ sigma^2 = frac{1}{N} left[ sum_i x_i^2 - frac{(sum_i x_i)^2}{N} right] $$



you can compute the variance of the data that you have seen thus far with just these two counters. Note that the $N$ here is not the total length of all your samples but only the number of samples you have observed in the past.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Feb 5 '11 at 22:31

























answered Feb 5 '11 at 21:35









Dinesh

2,1691923




2,1691923








  • 1




    Wait -- shouldn't the N be inside the the expected values, such that the right side is divided by $N^2$?
    – user6677
    Feb 5 '11 at 22:05












  • e.g. $$ sigma^2 = frac{(sum_i x_i^2)}{N} - (frac{sum_i x_i}{N})^2 $$
    – user6677
    Feb 5 '11 at 22:27










  • @user6677: You are indeed right. Thanks for the correction.
    – Dinesh
    Feb 5 '11 at 22:30






  • 5




    Note the sum of squares especially is at risk of overflow.
    – dfrankow
    Dec 29 '14 at 18:06






  • 2




    This is numerically unstable. If your mean is large compared to your stdev, your estimate will be way off. Knuth's algo is better (see other answer).
    – Leopd
    Feb 18 '15 at 1:40














  • 1




    Wait -- shouldn't the N be inside the the expected values, such that the right side is divided by $N^2$?
    – user6677
    Feb 5 '11 at 22:05












  • e.g. $$ sigma^2 = frac{(sum_i x_i^2)}{N} - (frac{sum_i x_i}{N})^2 $$
    – user6677
    Feb 5 '11 at 22:27










  • @user6677: You are indeed right. Thanks for the correction.
    – Dinesh
    Feb 5 '11 at 22:30






  • 5




    Note the sum of squares especially is at risk of overflow.
    – dfrankow
    Dec 29 '14 at 18:06






  • 2




    This is numerically unstable. If your mean is large compared to your stdev, your estimate will be way off. Knuth's algo is better (see other answer).
    – Leopd
    Feb 18 '15 at 1:40








1




1




Wait -- shouldn't the N be inside the the expected values, such that the right side is divided by $N^2$?
– user6677
Feb 5 '11 at 22:05






Wait -- shouldn't the N be inside the the expected values, such that the right side is divided by $N^2$?
– user6677
Feb 5 '11 at 22:05














e.g. $$ sigma^2 = frac{(sum_i x_i^2)}{N} - (frac{sum_i x_i}{N})^2 $$
– user6677
Feb 5 '11 at 22:27




e.g. $$ sigma^2 = frac{(sum_i x_i^2)}{N} - (frac{sum_i x_i}{N})^2 $$
– user6677
Feb 5 '11 at 22:27












@user6677: You are indeed right. Thanks for the correction.
– Dinesh
Feb 5 '11 at 22:30




@user6677: You are indeed right. Thanks for the correction.
– Dinesh
Feb 5 '11 at 22:30




5




5




Note the sum of squares especially is at risk of overflow.
– dfrankow
Dec 29 '14 at 18:06




Note the sum of squares especially is at risk of overflow.
– dfrankow
Dec 29 '14 at 18:06




2




2




This is numerically unstable. If your mean is large compared to your stdev, your estimate will be way off. Knuth's algo is better (see other answer).
– Leopd
Feb 18 '15 at 1:40




This is numerically unstable. If your mean is large compared to your stdev, your estimate will be way off. Knuth's algo is better (see other answer).
– Leopd
Feb 18 '15 at 1:40











29





+50









I'm a little late to the party, but it appears that this method is pretty unstable, but that there is a method that allows for streaming computation of the variance without sacrificing numerical stability.



Cook describes a method from Knuth, the punchline of which is to initialize $m_1 = x_1$, and $v_1 = 0$, where $m_k$ is the mean of the first $k$ values. From there,



$$
begin{align*}
m_k & = m_{k-1} + frac{x_k - m_{k-1}}k \
v_k & = v_{k-1} + (x_k - m_{k-1})(x_k - m_k)
end{align*}
$$



The mean at this point is simply extracted as $m_k$, and the variance is $sigma^2 = frac{v_k}{k-1}$. It's easy to verify that it works for the mean, but I'm still working on grokking the variance.






share|cite|improve this answer



















  • 2




    +1, excellent! I didn't feel a simple upvote would be sufficient to express my appreciation of this answer, so there's an extra 50 rep bounty coming your way in 24 hours.
    – Ilmari Karonen
    Mar 4 '12 at 20:12






  • 1




    Isn't the final variance calculation should be Vk / k (not k-1)?
    – Eyal Roth
    Oct 10 '13 at 15:49










  • @errr: it has to do with using biased vs unbiased estimator
    – alexey
    Mar 4 '16 at 2:03






  • 1




    $v_k = sum_i^k (x_i-m_k)^2$, so $v_k = v_{k-1} + x_k^2 - frac{(k-1)m_k^2 - km_{k-1}^2}{k(k-1)}$
    – GuLearn
    May 5 '17 at 2:02












  • To test whether it's stable or not refer to the Examples in en.wikipedia.org/wiki/Algorithms_for_calculating_variance. The goes through finding the variance for (4, 7, 13, 16) then for (1e8 + 4, 1e8 + 7, 1e8 + 13, 1e8 + 16) and then for (1e9 + 4, 1e9 + 7, 1e9 + 13, 1e9 + 16). In theory they should all return the same variance but because of the imprecisions in the double arithmetic the result deviate from the expected variance.
    – thomas.han
    Jun 30 at 16:58
















29





+50









I'm a little late to the party, but it appears that this method is pretty unstable, but that there is a method that allows for streaming computation of the variance without sacrificing numerical stability.



Cook describes a method from Knuth, the punchline of which is to initialize $m_1 = x_1$, and $v_1 = 0$, where $m_k$ is the mean of the first $k$ values. From there,



$$
begin{align*}
m_k & = m_{k-1} + frac{x_k - m_{k-1}}k \
v_k & = v_{k-1} + (x_k - m_{k-1})(x_k - m_k)
end{align*}
$$



The mean at this point is simply extracted as $m_k$, and the variance is $sigma^2 = frac{v_k}{k-1}$. It's easy to verify that it works for the mean, but I'm still working on grokking the variance.






share|cite|improve this answer



















  • 2




    +1, excellent! I didn't feel a simple upvote would be sufficient to express my appreciation of this answer, so there's an extra 50 rep bounty coming your way in 24 hours.
    – Ilmari Karonen
    Mar 4 '12 at 20:12






  • 1




    Isn't the final variance calculation should be Vk / k (not k-1)?
    – Eyal Roth
    Oct 10 '13 at 15:49










  • @errr: it has to do with using biased vs unbiased estimator
    – alexey
    Mar 4 '16 at 2:03






  • 1




    $v_k = sum_i^k (x_i-m_k)^2$, so $v_k = v_{k-1} + x_k^2 - frac{(k-1)m_k^2 - km_{k-1}^2}{k(k-1)}$
    – GuLearn
    May 5 '17 at 2:02












  • To test whether it's stable or not refer to the Examples in en.wikipedia.org/wiki/Algorithms_for_calculating_variance. The goes through finding the variance for (4, 7, 13, 16) then for (1e8 + 4, 1e8 + 7, 1e8 + 13, 1e8 + 16) and then for (1e9 + 4, 1e9 + 7, 1e9 + 13, 1e9 + 16). In theory they should all return the same variance but because of the imprecisions in the double arithmetic the result deviate from the expected variance.
    – thomas.han
    Jun 30 at 16:58














29





+50







29





+50



29




+50




I'm a little late to the party, but it appears that this method is pretty unstable, but that there is a method that allows for streaming computation of the variance without sacrificing numerical stability.



Cook describes a method from Knuth, the punchline of which is to initialize $m_1 = x_1$, and $v_1 = 0$, where $m_k$ is the mean of the first $k$ values. From there,



$$
begin{align*}
m_k & = m_{k-1} + frac{x_k - m_{k-1}}k \
v_k & = v_{k-1} + (x_k - m_{k-1})(x_k - m_k)
end{align*}
$$



The mean at this point is simply extracted as $m_k$, and the variance is $sigma^2 = frac{v_k}{k-1}$. It's easy to verify that it works for the mean, but I'm still working on grokking the variance.






share|cite|improve this answer














I'm a little late to the party, but it appears that this method is pretty unstable, but that there is a method that allows for streaming computation of the variance without sacrificing numerical stability.



Cook describes a method from Knuth, the punchline of which is to initialize $m_1 = x_1$, and $v_1 = 0$, where $m_k$ is the mean of the first $k$ values. From there,



$$
begin{align*}
m_k & = m_{k-1} + frac{x_k - m_{k-1}}k \
v_k & = v_{k-1} + (x_k - m_{k-1})(x_k - m_k)
end{align*}
$$



The mean at this point is simply extracted as $m_k$, and the variance is $sigma^2 = frac{v_k}{k-1}$. It's easy to verify that it works for the mean, but I'm still working on grokking the variance.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Mar 4 '12 at 20:03









Ilmari Karonen

19.3k25182




19.3k25182










answered Mar 4 '12 at 15:59









Dan Lecocq

44143




44143








  • 2




    +1, excellent! I didn't feel a simple upvote would be sufficient to express my appreciation of this answer, so there's an extra 50 rep bounty coming your way in 24 hours.
    – Ilmari Karonen
    Mar 4 '12 at 20:12






  • 1




    Isn't the final variance calculation should be Vk / k (not k-1)?
    – Eyal Roth
    Oct 10 '13 at 15:49










  • @errr: it has to do with using biased vs unbiased estimator
    – alexey
    Mar 4 '16 at 2:03






  • 1




    $v_k = sum_i^k (x_i-m_k)^2$, so $v_k = v_{k-1} + x_k^2 - frac{(k-1)m_k^2 - km_{k-1}^2}{k(k-1)}$
    – GuLearn
    May 5 '17 at 2:02












  • To test whether it's stable or not refer to the Examples in en.wikipedia.org/wiki/Algorithms_for_calculating_variance. The goes through finding the variance for (4, 7, 13, 16) then for (1e8 + 4, 1e8 + 7, 1e8 + 13, 1e8 + 16) and then for (1e9 + 4, 1e9 + 7, 1e9 + 13, 1e9 + 16). In theory they should all return the same variance but because of the imprecisions in the double arithmetic the result deviate from the expected variance.
    – thomas.han
    Jun 30 at 16:58














  • 2




    +1, excellent! I didn't feel a simple upvote would be sufficient to express my appreciation of this answer, so there's an extra 50 rep bounty coming your way in 24 hours.
    – Ilmari Karonen
    Mar 4 '12 at 20:12






  • 1




    Isn't the final variance calculation should be Vk / k (not k-1)?
    – Eyal Roth
    Oct 10 '13 at 15:49










  • @errr: it has to do with using biased vs unbiased estimator
    – alexey
    Mar 4 '16 at 2:03






  • 1




    $v_k = sum_i^k (x_i-m_k)^2$, so $v_k = v_{k-1} + x_k^2 - frac{(k-1)m_k^2 - km_{k-1}^2}{k(k-1)}$
    – GuLearn
    May 5 '17 at 2:02












  • To test whether it's stable or not refer to the Examples in en.wikipedia.org/wiki/Algorithms_for_calculating_variance. The goes through finding the variance for (4, 7, 13, 16) then for (1e8 + 4, 1e8 + 7, 1e8 + 13, 1e8 + 16) and then for (1e9 + 4, 1e9 + 7, 1e9 + 13, 1e9 + 16). In theory they should all return the same variance but because of the imprecisions in the double arithmetic the result deviate from the expected variance.
    – thomas.han
    Jun 30 at 16:58








2




2




+1, excellent! I didn't feel a simple upvote would be sufficient to express my appreciation of this answer, so there's an extra 50 rep bounty coming your way in 24 hours.
– Ilmari Karonen
Mar 4 '12 at 20:12




+1, excellent! I didn't feel a simple upvote would be sufficient to express my appreciation of this answer, so there's an extra 50 rep bounty coming your way in 24 hours.
– Ilmari Karonen
Mar 4 '12 at 20:12




1




1




Isn't the final variance calculation should be Vk / k (not k-1)?
– Eyal Roth
Oct 10 '13 at 15:49




Isn't the final variance calculation should be Vk / k (not k-1)?
– Eyal Roth
Oct 10 '13 at 15:49












@errr: it has to do with using biased vs unbiased estimator
– alexey
Mar 4 '16 at 2:03




@errr: it has to do with using biased vs unbiased estimator
– alexey
Mar 4 '16 at 2:03




1




1




$v_k = sum_i^k (x_i-m_k)^2$, so $v_k = v_{k-1} + x_k^2 - frac{(k-1)m_k^2 - km_{k-1}^2}{k(k-1)}$
– GuLearn
May 5 '17 at 2:02






$v_k = sum_i^k (x_i-m_k)^2$, so $v_k = v_{k-1} + x_k^2 - frac{(k-1)m_k^2 - km_{k-1}^2}{k(k-1)}$
– GuLearn
May 5 '17 at 2:02














To test whether it's stable or not refer to the Examples in en.wikipedia.org/wiki/Algorithms_for_calculating_variance. The goes through finding the variance for (4, 7, 13, 16) then for (1e8 + 4, 1e8 + 7, 1e8 + 13, 1e8 + 16) and then for (1e9 + 4, 1e9 + 7, 1e9 + 13, 1e9 + 16). In theory they should all return the same variance but because of the imprecisions in the double arithmetic the result deviate from the expected variance.
– thomas.han
Jun 30 at 16:58




To test whether it's stable or not refer to the Examples in en.wikipedia.org/wiki/Algorithms_for_calculating_variance. The goes through finding the variance for (4, 7, 13, 16) then for (1e8 + 4, 1e8 + 7, 1e8 + 13, 1e8 + 16) and then for (1e9 + 4, 1e9 + 7, 1e9 + 13, 1e9 + 16). In theory they should all return the same variance but because of the imprecisions in the double arithmetic the result deviate from the expected variance.
– thomas.han
Jun 30 at 16:58


















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f20593%2fcalculate-variance-from-a-stream-of-sample-values%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wiesbaden

Marschland

Dieringhausen