Concepts about data distributions
I'm a little confused here
is there a connection between data distribution and detecting novelty, I mean can data distribution differ between novelty, noise, or outlier? In order to detect them!
Another point need to be answered as well:
"training data and test data are drawn from the same distribution or the same feature space "
so when exactly does the data distribution change? And when the data distribution changes, on which set I'm supposed to focus on? where/when can this happen?
machine-learning
add a comment |
I'm a little confused here
is there a connection between data distribution and detecting novelty, I mean can data distribution differ between novelty, noise, or outlier? In order to detect them!
Another point need to be answered as well:
"training data and test data are drawn from the same distribution or the same feature space "
so when exactly does the data distribution change? And when the data distribution changes, on which set I'm supposed to focus on? where/when can this happen?
machine-learning
your question is out of topic
– M. Doosti Lakhani
Nov 20 at 21:04
1
Not a programming question, better suited for Cross Validated
– desertnaut
Nov 20 at 23:25
add a comment |
I'm a little confused here
is there a connection between data distribution and detecting novelty, I mean can data distribution differ between novelty, noise, or outlier? In order to detect them!
Another point need to be answered as well:
"training data and test data are drawn from the same distribution or the same feature space "
so when exactly does the data distribution change? And when the data distribution changes, on which set I'm supposed to focus on? where/when can this happen?
machine-learning
I'm a little confused here
is there a connection between data distribution and detecting novelty, I mean can data distribution differ between novelty, noise, or outlier? In order to detect them!
Another point need to be answered as well:
"training data and test data are drawn from the same distribution or the same feature space "
so when exactly does the data distribution change? And when the data distribution changes, on which set I'm supposed to focus on? where/when can this happen?
machine-learning
machine-learning
edited Nov 20 at 22:55
K.Dᴀᴠɪs
6,807112139
6,807112139
asked Nov 20 at 18:31
Joseph_
2
2
your question is out of topic
– M. Doosti Lakhani
Nov 20 at 21:04
1
Not a programming question, better suited for Cross Validated
– desertnaut
Nov 20 at 23:25
add a comment |
your question is out of topic
– M. Doosti Lakhani
Nov 20 at 21:04
1
Not a programming question, better suited for Cross Validated
– desertnaut
Nov 20 at 23:25
your question is out of topic
– M. Doosti Lakhani
Nov 20 at 21:04
your question is out of topic
– M. Doosti Lakhani
Nov 20 at 21:04
1
1
Not a programming question, better suited for Cross Validated
– desertnaut
Nov 20 at 23:25
Not a programming question, better suited for Cross Validated
– desertnaut
Nov 20 at 23:25
add a comment |
1 Answer
1
active
oldest
votes
I would suggest reading this from scikit-learn. I think it is a good overview from which you can understand the difference between outlier and novelty detection. Basically novelty is a cluster of "outliers" but which being so close to each other represent a new kind of data partition probably, not simply something strange. Definitely, for the first such point, there is no way to discriminate between the two possibilities, but if you process your new data in batches and you detect high-density regions in the outlier spaces, you might suspect some novelty in the data.
For your second point, this is basically what concept drift is about.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399347%2fconcepts-about-data-distributions%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I would suggest reading this from scikit-learn. I think it is a good overview from which you can understand the difference between outlier and novelty detection. Basically novelty is a cluster of "outliers" but which being so close to each other represent a new kind of data partition probably, not simply something strange. Definitely, for the first such point, there is no way to discriminate between the two possibilities, but if you process your new data in batches and you detect high-density regions in the outlier spaces, you might suspect some novelty in the data.
For your second point, this is basically what concept drift is about.
add a comment |
I would suggest reading this from scikit-learn. I think it is a good overview from which you can understand the difference between outlier and novelty detection. Basically novelty is a cluster of "outliers" but which being so close to each other represent a new kind of data partition probably, not simply something strange. Definitely, for the first such point, there is no way to discriminate between the two possibilities, but if you process your new data in batches and you detect high-density regions in the outlier spaces, you might suspect some novelty in the data.
For your second point, this is basically what concept drift is about.
add a comment |
I would suggest reading this from scikit-learn. I think it is a good overview from which you can understand the difference between outlier and novelty detection. Basically novelty is a cluster of "outliers" but which being so close to each other represent a new kind of data partition probably, not simply something strange. Definitely, for the first such point, there is no way to discriminate between the two possibilities, but if you process your new data in batches and you detect high-density regions in the outlier spaces, you might suspect some novelty in the data.
For your second point, this is basically what concept drift is about.
I would suggest reading this from scikit-learn. I think it is a good overview from which you can understand the difference between outlier and novelty detection. Basically novelty is a cluster of "outliers" but which being so close to each other represent a new kind of data partition probably, not simply something strange. Definitely, for the first such point, there is no way to discriminate between the two possibilities, but if you process your new data in batches and you detect high-density regions in the outlier spaces, you might suspect some novelty in the data.
For your second point, this is basically what concept drift is about.
answered Nov 20 at 23:03
zsomko
4816
4816
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399347%2fconcepts-about-data-distributions%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
your question is out of topic
– M. Doosti Lakhani
Nov 20 at 21:04
1
Not a programming question, better suited for Cross Validated
– desertnaut
Nov 20 at 23:25