Concepts about data distributions












-1














I'm a little confused here
is there a connection between data distribution and detecting novelty, I mean can data distribution differ between novelty, noise, or outlier? In order to detect them!



Another point need to be answered as well:
"training data and test data are drawn from the same distribution or the same feature space "
so when exactly does the data distribution change? And when the data distribution changes, on which set I'm supposed to focus on? where/when can this happen?










share|improve this question
























  • your question is out of topic
    – M. Doosti Lakhani
    Nov 20 at 21:04






  • 1




    Not a programming question, better suited for Cross Validated
    – desertnaut
    Nov 20 at 23:25


















-1














I'm a little confused here
is there a connection between data distribution and detecting novelty, I mean can data distribution differ between novelty, noise, or outlier? In order to detect them!



Another point need to be answered as well:
"training data and test data are drawn from the same distribution or the same feature space "
so when exactly does the data distribution change? And when the data distribution changes, on which set I'm supposed to focus on? where/when can this happen?










share|improve this question
























  • your question is out of topic
    – M. Doosti Lakhani
    Nov 20 at 21:04






  • 1




    Not a programming question, better suited for Cross Validated
    – desertnaut
    Nov 20 at 23:25
















-1












-1








-1







I'm a little confused here
is there a connection between data distribution and detecting novelty, I mean can data distribution differ between novelty, noise, or outlier? In order to detect them!



Another point need to be answered as well:
"training data and test data are drawn from the same distribution or the same feature space "
so when exactly does the data distribution change? And when the data distribution changes, on which set I'm supposed to focus on? where/when can this happen?










share|improve this question















I'm a little confused here
is there a connection between data distribution and detecting novelty, I mean can data distribution differ between novelty, noise, or outlier? In order to detect them!



Another point need to be answered as well:
"training data and test data are drawn from the same distribution or the same feature space "
so when exactly does the data distribution change? And when the data distribution changes, on which set I'm supposed to focus on? where/when can this happen?







machine-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 at 22:55









K.Dᴀᴠɪs

6,807112139




6,807112139










asked Nov 20 at 18:31









Joseph_

2




2












  • your question is out of topic
    – M. Doosti Lakhani
    Nov 20 at 21:04






  • 1




    Not a programming question, better suited for Cross Validated
    – desertnaut
    Nov 20 at 23:25




















  • your question is out of topic
    – M. Doosti Lakhani
    Nov 20 at 21:04






  • 1




    Not a programming question, better suited for Cross Validated
    – desertnaut
    Nov 20 at 23:25


















your question is out of topic
– M. Doosti Lakhani
Nov 20 at 21:04




your question is out of topic
– M. Doosti Lakhani
Nov 20 at 21:04




1




1




Not a programming question, better suited for Cross Validated
– desertnaut
Nov 20 at 23:25






Not a programming question, better suited for Cross Validated
– desertnaut
Nov 20 at 23:25














1 Answer
1






active

oldest

votes


















0














I would suggest reading this from scikit-learn. I think it is a good overview from which you can understand the difference between outlier and novelty detection. Basically novelty is a cluster of "outliers" but which being so close to each other represent a new kind of data partition probably, not simply something strange. Definitely, for the first such point, there is no way to discriminate between the two possibilities, but if you process your new data in batches and you detect high-density regions in the outlier spaces, you might suspect some novelty in the data.



For your second point, this is basically what concept drift is about.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399347%2fconcepts-about-data-distributions%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I would suggest reading this from scikit-learn. I think it is a good overview from which you can understand the difference between outlier and novelty detection. Basically novelty is a cluster of "outliers" but which being so close to each other represent a new kind of data partition probably, not simply something strange. Definitely, for the first such point, there is no way to discriminate between the two possibilities, but if you process your new data in batches and you detect high-density regions in the outlier spaces, you might suspect some novelty in the data.



    For your second point, this is basically what concept drift is about.






    share|improve this answer


























      0














      I would suggest reading this from scikit-learn. I think it is a good overview from which you can understand the difference between outlier and novelty detection. Basically novelty is a cluster of "outliers" but which being so close to each other represent a new kind of data partition probably, not simply something strange. Definitely, for the first such point, there is no way to discriminate between the two possibilities, but if you process your new data in batches and you detect high-density regions in the outlier spaces, you might suspect some novelty in the data.



      For your second point, this is basically what concept drift is about.






      share|improve this answer
























        0












        0








        0






        I would suggest reading this from scikit-learn. I think it is a good overview from which you can understand the difference between outlier and novelty detection. Basically novelty is a cluster of "outliers" but which being so close to each other represent a new kind of data partition probably, not simply something strange. Definitely, for the first such point, there is no way to discriminate between the two possibilities, but if you process your new data in batches and you detect high-density regions in the outlier spaces, you might suspect some novelty in the data.



        For your second point, this is basically what concept drift is about.






        share|improve this answer












        I would suggest reading this from scikit-learn. I think it is a good overview from which you can understand the difference between outlier and novelty detection. Basically novelty is a cluster of "outliers" but which being so close to each other represent a new kind of data partition probably, not simply something strange. Definitely, for the first such point, there is no way to discriminate between the two possibilities, but if you process your new data in batches and you detect high-density regions in the outlier spaces, you might suspect some novelty in the data.



        For your second point, this is basically what concept drift is about.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 20 at 23:03









        zsomko

        4816




        4816






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399347%2fconcepts-about-data-distributions%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wiesbaden

            Marschland

            Dieringhausen