Why is PCA sensitive to outliers?












21












$begingroup$


There are many posts on this SE that discuss robust approaches to principal component analysis (PCA), but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place.










share|cite|improve this question











$endgroup$








  • 4




    $begingroup$
    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    $endgroup$
    – mathreadler
    Nov 26 '18 at 11:48










  • $begingroup$
    This answer tells you everything you need. Just picture an outlier and read attentively.
    $endgroup$
    – Stephan Kolassa
    Nov 26 '18 at 20:15
















21












$begingroup$


There are many posts on this SE that discuss robust approaches to principal component analysis (PCA), but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place.










share|cite|improve this question











$endgroup$








  • 4




    $begingroup$
    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    $endgroup$
    – mathreadler
    Nov 26 '18 at 11:48










  • $begingroup$
    This answer tells you everything you need. Just picture an outlier and read attentively.
    $endgroup$
    – Stephan Kolassa
    Nov 26 '18 at 20:15














21












21








21


7



$begingroup$


There are many posts on this SE that discuss robust approaches to principal component analysis (PCA), but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place.










share|cite|improve this question











$endgroup$




There are many posts on this SE that discuss robust approaches to principal component analysis (PCA), but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place.







machine-learning pca outliers






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Dec 30 '18 at 19:39









Peter Mortensen

20328




20328










asked Nov 26 '18 at 1:59









PsiPsi

20828




20828








  • 4




    $begingroup$
    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    $endgroup$
    – mathreadler
    Nov 26 '18 at 11:48










  • $begingroup$
    This answer tells you everything you need. Just picture an outlier and read attentively.
    $endgroup$
    – Stephan Kolassa
    Nov 26 '18 at 20:15














  • 4




    $begingroup$
    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    $endgroup$
    – mathreadler
    Nov 26 '18 at 11:48










  • $begingroup$
    This answer tells you everything you need. Just picture an outlier and read attentively.
    $endgroup$
    – Stephan Kolassa
    Nov 26 '18 at 20:15








4




4




$begingroup$
Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
$endgroup$
– mathreadler
Nov 26 '18 at 11:48




$begingroup$
Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
$endgroup$
– mathreadler
Nov 26 '18 at 11:48












$begingroup$
This answer tells you everything you need. Just picture an outlier and read attentively.
$endgroup$
– Stephan Kolassa
Nov 26 '18 at 20:15




$begingroup$
This answer tells you everything you need. Just picture an outlier and read attentively.
$endgroup$
– Stephan Kolassa
Nov 26 '18 at 20:15










1 Answer
1






active

oldest

votes


















30












$begingroup$

One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
$$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
cdot rVert_F$
is a Frobenius norm of the matrix



Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






share|cite|improve this answer











$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378751%2fwhy-is-pca-sensitive-to-outliers%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    30












    $begingroup$

    One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
    $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
    Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
    cdot rVert_F$
    is a Frobenius norm of the matrix



    Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






    share|cite|improve this answer











    $endgroup$


















      30












      $begingroup$

      One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
      $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
      Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
      cdot rVert_F$
      is a Frobenius norm of the matrix



      Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






      share|cite|improve this answer











      $endgroup$
















        30












        30








        30





        $begingroup$

        One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
        $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
        Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
        cdot rVert_F$
        is a Frobenius norm of the matrix



        Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






        share|cite|improve this answer











        $endgroup$



        One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
        $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
        Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
        cdot rVert_F$
        is a Frobenius norm of the matrix



        Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Nov 26 '18 at 18:51

























        answered Nov 26 '18 at 4:40









        sega_saisega_sai

        830710




        830710






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378751%2fwhy-is-pca-sensitive-to-outliers%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wiesbaden

            Marschland

            Dieringhausen