How to deal with a dataset with “periods of time” and missing data
I'm working on a dataset, which has as columns points in time (e.g. August, September, etc.) and as rows different measurements which were collected at that point.
Apart from that, the data is not clean at all, the are a lot of missing data and I just can't drop all the rows with them or filling them up so my idea was to divide the dataset in 4 smaller ones.
What kind of analysis can be performed on a dataset of this kind? Should I invert columns and rows?
dataset regression cluster-computing data-analysis missing-data
add a comment |
I'm working on a dataset, which has as columns points in time (e.g. August, September, etc.) and as rows different measurements which were collected at that point.
Apart from that, the data is not clean at all, the are a lot of missing data and I just can't drop all the rows with them or filling them up so my idea was to divide the dataset in 4 smaller ones.
What kind of analysis can be performed on a dataset of this kind? Should I invert columns and rows?
dataset regression cluster-computing data-analysis missing-data
You could invert the columns/rows and then perform time series imputation with a R package like imputeTS. If this actually makes sense depends a lot on your dataset.
– stats0007
Nov 27 '18 at 17:11
I have very few observations and the dataset is made of satisfaction data by consumers. I have some doubts, but do you think it would be a good idea?
– Zhang_anlan
Nov 27 '18 at 17:27
add a comment |
I'm working on a dataset, which has as columns points in time (e.g. August, September, etc.) and as rows different measurements which were collected at that point.
Apart from that, the data is not clean at all, the are a lot of missing data and I just can't drop all the rows with them or filling them up so my idea was to divide the dataset in 4 smaller ones.
What kind of analysis can be performed on a dataset of this kind? Should I invert columns and rows?
dataset regression cluster-computing data-analysis missing-data
I'm working on a dataset, which has as columns points in time (e.g. August, September, etc.) and as rows different measurements which were collected at that point.
Apart from that, the data is not clean at all, the are a lot of missing data and I just can't drop all the rows with them or filling them up so my idea was to divide the dataset in 4 smaller ones.
What kind of analysis can be performed on a dataset of this kind? Should I invert columns and rows?
dataset regression cluster-computing data-analysis missing-data
dataset regression cluster-computing data-analysis missing-data
edited Nov 23 '18 at 8:14
Zhang_anlan
asked Nov 23 '18 at 7:59
Zhang_anlanZhang_anlan
347
347
You could invert the columns/rows and then perform time series imputation with a R package like imputeTS. If this actually makes sense depends a lot on your dataset.
– stats0007
Nov 27 '18 at 17:11
I have very few observations and the dataset is made of satisfaction data by consumers. I have some doubts, but do you think it would be a good idea?
– Zhang_anlan
Nov 27 '18 at 17:27
add a comment |
You could invert the columns/rows and then perform time series imputation with a R package like imputeTS. If this actually makes sense depends a lot on your dataset.
– stats0007
Nov 27 '18 at 17:11
I have very few observations and the dataset is made of satisfaction data by consumers. I have some doubts, but do you think it would be a good idea?
– Zhang_anlan
Nov 27 '18 at 17:27
You could invert the columns/rows and then perform time series imputation with a R package like imputeTS. If this actually makes sense depends a lot on your dataset.
– stats0007
Nov 27 '18 at 17:11
You could invert the columns/rows and then perform time series imputation with a R package like imputeTS. If this actually makes sense depends a lot on your dataset.
– stats0007
Nov 27 '18 at 17:11
I have very few observations and the dataset is made of satisfaction data by consumers. I have some doubts, but do you think it would be a good idea?
– Zhang_anlan
Nov 27 '18 at 17:27
I have very few observations and the dataset is made of satisfaction data by consumers. I have some doubts, but do you think it would be a good idea?
– Zhang_anlan
Nov 27 '18 at 17:27
add a comment |
1 Answer
1
active
oldest
votes
A timeseries regression with missing data is a special case within statistical analysis. Simply re-jigging the data set is not the solution.
I understand periodicity analysis and spectral analysis is performed to identify the sinosoid of best fit, i.e. a sine wave is driven through the missing data points and regression is one approach in identifying the fit to the existing data.
The same question has been previously raised on Stats exchange based on ARIMA (moving average). Personally, I am not overawed by this approach because there will be a specialist solution.
https://stats.stackexchange.com/questions/121414/how-do-i-handle-nonexistent-or-missing-data
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53442692%2fhow-to-deal-with-a-dataset-with-periods-of-time-and-missing-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
A timeseries regression with missing data is a special case within statistical analysis. Simply re-jigging the data set is not the solution.
I understand periodicity analysis and spectral analysis is performed to identify the sinosoid of best fit, i.e. a sine wave is driven through the missing data points and regression is one approach in identifying the fit to the existing data.
The same question has been previously raised on Stats exchange based on ARIMA (moving average). Personally, I am not overawed by this approach because there will be a specialist solution.
https://stats.stackexchange.com/questions/121414/how-do-i-handle-nonexistent-or-missing-data
add a comment |
A timeseries regression with missing data is a special case within statistical analysis. Simply re-jigging the data set is not the solution.
I understand periodicity analysis and spectral analysis is performed to identify the sinosoid of best fit, i.e. a sine wave is driven through the missing data points and regression is one approach in identifying the fit to the existing data.
The same question has been previously raised on Stats exchange based on ARIMA (moving average). Personally, I am not overawed by this approach because there will be a specialist solution.
https://stats.stackexchange.com/questions/121414/how-do-i-handle-nonexistent-or-missing-data
add a comment |
A timeseries regression with missing data is a special case within statistical analysis. Simply re-jigging the data set is not the solution.
I understand periodicity analysis and spectral analysis is performed to identify the sinosoid of best fit, i.e. a sine wave is driven through the missing data points and regression is one approach in identifying the fit to the existing data.
The same question has been previously raised on Stats exchange based on ARIMA (moving average). Personally, I am not overawed by this approach because there will be a specialist solution.
https://stats.stackexchange.com/questions/121414/how-do-i-handle-nonexistent-or-missing-data
A timeseries regression with missing data is a special case within statistical analysis. Simply re-jigging the data set is not the solution.
I understand periodicity analysis and spectral analysis is performed to identify the sinosoid of best fit, i.e. a sine wave is driven through the missing data points and regression is one approach in identifying the fit to the existing data.
The same question has been previously raised on Stats exchange based on ARIMA (moving average). Personally, I am not overawed by this approach because there will be a specialist solution.
https://stats.stackexchange.com/questions/121414/how-do-i-handle-nonexistent-or-missing-data
answered Nov 23 '18 at 8:28
Michael G.Michael G.
2231316
2231316
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53442692%2fhow-to-deal-with-a-dataset-with-periods-of-time-and-missing-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You could invert the columns/rows and then perform time series imputation with a R package like imputeTS. If this actually makes sense depends a lot on your dataset.
– stats0007
Nov 27 '18 at 17:11
I have very few observations and the dataset is made of satisfaction data by consumers. I have some doubts, but do you think it would be a good idea?
– Zhang_anlan
Nov 27 '18 at 17:27