undefined columns selected error in R when trying to subset using sapply
I have been tearing my hair out over this for the last hour, the following code was working perfectly a couple of hours ago, and now I have no idea why it doesn't anymore. I have searched for other questions regarding the undefined columns selected error, but I think I have corrected for all of the info in those answers. I am sure there is some tiny thing I have overlooked or accidently left in, but I can't see it!
I have a data frame with both factor and numeric variables, I want to subset so that I keep all of the factor variables, and remove numeric variables whose columns have a mean < 0.1.
I found the following code on another question on stackoverflow, which slightly modified worked well on my test data (smaller sub-dataset I am using for testing before trying out code on a big 3GB object)
meanfunction01 <- function(x){
if(is.numeric(x)){
mean(x) > 0.1
} else {
TRUE}
}
#then apply function to data table
Zdata <- Data1[,sapply(Data1, meanfunction01)]
I swear I was using this a few hours ago, then when i came back to it and tried to use it again it stopped working and now just returns the following error:
Error in `[.data.frame`(Data1, , sapply(Data1, meanfunction01)) :
undefined columns selected
I was trying to modify the function so that it would loop over multiple objects (I have 54 objects I want to apply it to, and didn't want to type them all manually), but I don't think I edited the original function, and now it has stopped working.
A brief str() of my data:
> str(Data1[1:10])
'data.frame': 11 obs. of 10 variables:
$ Name : Factor w/ 11688 levels "GTEX-1117F-0226-SM-5GZZ7",..: 8186 8242 8262 8270 8343 8388 8403 8621 8689 8709 ...
$ SEX : Factor w/ 2 levels "Female","Male": 1 2 2 1 1 2 2 1 2 1 ...
$ AGE : Factor w/ 6 levels "20-29","30-39",..: 4 4 1 3 3 1 3 3 3 2 ...
$ CIRCUMSTANCES: Factor w/ 5 levels "0","1","2","3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Tissue.x : Factor w/ 53 levels "Adipose_Subcutaneous",..: 7 7 7 7 7 7 7 7 7 7 ...
$ ENSG00000223972.4 : num 0 0.0701 0.0339 0.1149 0.0549 ...
$ ENSG00000227232.4 : num 12.5 17.2 13.1 16 15.7 ...
$ ENSG00000243485.2 : num 0.0717 0 0.1508 0 0.061 ...
$ ENSG00000237613.2 : num 0 0.0654 0 0.0402 0.0768 ...
$ ENSG00000268020.2 : num 0 0.0421 0.0611 0 0 ...
r
add a comment |
I have been tearing my hair out over this for the last hour, the following code was working perfectly a couple of hours ago, and now I have no idea why it doesn't anymore. I have searched for other questions regarding the undefined columns selected error, but I think I have corrected for all of the info in those answers. I am sure there is some tiny thing I have overlooked or accidently left in, but I can't see it!
I have a data frame with both factor and numeric variables, I want to subset so that I keep all of the factor variables, and remove numeric variables whose columns have a mean < 0.1.
I found the following code on another question on stackoverflow, which slightly modified worked well on my test data (smaller sub-dataset I am using for testing before trying out code on a big 3GB object)
meanfunction01 <- function(x){
if(is.numeric(x)){
mean(x) > 0.1
} else {
TRUE}
}
#then apply function to data table
Zdata <- Data1[,sapply(Data1, meanfunction01)]
I swear I was using this a few hours ago, then when i came back to it and tried to use it again it stopped working and now just returns the following error:
Error in `[.data.frame`(Data1, , sapply(Data1, meanfunction01)) :
undefined columns selected
I was trying to modify the function so that it would loop over multiple objects (I have 54 objects I want to apply it to, and didn't want to type them all manually), but I don't think I edited the original function, and now it has stopped working.
A brief str() of my data:
> str(Data1[1:10])
'data.frame': 11 obs. of 10 variables:
$ Name : Factor w/ 11688 levels "GTEX-1117F-0226-SM-5GZZ7",..: 8186 8242 8262 8270 8343 8388 8403 8621 8689 8709 ...
$ SEX : Factor w/ 2 levels "Female","Male": 1 2 2 1 1 2 2 1 2 1 ...
$ AGE : Factor w/ 6 levels "20-29","30-39",..: 4 4 1 3 3 1 3 3 3 2 ...
$ CIRCUMSTANCES: Factor w/ 5 levels "0","1","2","3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Tissue.x : Factor w/ 53 levels "Adipose_Subcutaneous",..: 7 7 7 7 7 7 7 7 7 7 ...
$ ENSG00000223972.4 : num 0 0.0701 0.0339 0.1149 0.0549 ...
$ ENSG00000227232.4 : num 12.5 17.2 13.1 16 15.7 ...
$ ENSG00000243485.2 : num 0.0717 0 0.1508 0 0.061 ...
$ ENSG00000237613.2 : num 0 0.0654 0 0.0402 0.0768 ...
$ ENSG00000268020.2 : num 0 0.0421 0.0611 0 0 ...
r
It will be very difficult to guess what the problem might be without a working example that demonstrates the issue. You may find some helpful tips in stackoverflow.com/questions/5963269/…
– Ista
Nov 22 '18 at 16:17
Ok, in trying to head() my data in order to create an example dataset for you, think I have narrowed down the problem. When I re-loaded the object into my environment, it seems to have categorised some of my columns as integers rather than numeric. When I run the code on the head([1:20]) subset it works fine, as the integer columns start appearing later on in the date around column 10000 or something. Now I am trying to figure out how to recatagorise these columns as numeric instead, which is a different problem entirely. Thanks anyway!
– Phil D
Nov 22 '18 at 16:52
Why just suspect that the data structure changes as you move along your multiple columns? Why not check how the columns are structured by runninghead()
without restricting the range of columns? If it turns out that indeed some columns are integers rather than numeric then re-structure them using something along these lines:Data1[,4:33] <- lapply(Data1[,4:33], as.numeric)
– Chris Ruehlemann
Nov 22 '18 at 18:44
Well, mostly because using head() without restricting columns in a data frame with >50000 columns will be difficult to check manually! I have started restructuring using lapply though
– Phil D
Nov 26 '18 at 11:24
add a comment |
I have been tearing my hair out over this for the last hour, the following code was working perfectly a couple of hours ago, and now I have no idea why it doesn't anymore. I have searched for other questions regarding the undefined columns selected error, but I think I have corrected for all of the info in those answers. I am sure there is some tiny thing I have overlooked or accidently left in, but I can't see it!
I have a data frame with both factor and numeric variables, I want to subset so that I keep all of the factor variables, and remove numeric variables whose columns have a mean < 0.1.
I found the following code on another question on stackoverflow, which slightly modified worked well on my test data (smaller sub-dataset I am using for testing before trying out code on a big 3GB object)
meanfunction01 <- function(x){
if(is.numeric(x)){
mean(x) > 0.1
} else {
TRUE}
}
#then apply function to data table
Zdata <- Data1[,sapply(Data1, meanfunction01)]
I swear I was using this a few hours ago, then when i came back to it and tried to use it again it stopped working and now just returns the following error:
Error in `[.data.frame`(Data1, , sapply(Data1, meanfunction01)) :
undefined columns selected
I was trying to modify the function so that it would loop over multiple objects (I have 54 objects I want to apply it to, and didn't want to type them all manually), but I don't think I edited the original function, and now it has stopped working.
A brief str() of my data:
> str(Data1[1:10])
'data.frame': 11 obs. of 10 variables:
$ Name : Factor w/ 11688 levels "GTEX-1117F-0226-SM-5GZZ7",..: 8186 8242 8262 8270 8343 8388 8403 8621 8689 8709 ...
$ SEX : Factor w/ 2 levels "Female","Male": 1 2 2 1 1 2 2 1 2 1 ...
$ AGE : Factor w/ 6 levels "20-29","30-39",..: 4 4 1 3 3 1 3 3 3 2 ...
$ CIRCUMSTANCES: Factor w/ 5 levels "0","1","2","3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Tissue.x : Factor w/ 53 levels "Adipose_Subcutaneous",..: 7 7 7 7 7 7 7 7 7 7 ...
$ ENSG00000223972.4 : num 0 0.0701 0.0339 0.1149 0.0549 ...
$ ENSG00000227232.4 : num 12.5 17.2 13.1 16 15.7 ...
$ ENSG00000243485.2 : num 0.0717 0 0.1508 0 0.061 ...
$ ENSG00000237613.2 : num 0 0.0654 0 0.0402 0.0768 ...
$ ENSG00000268020.2 : num 0 0.0421 0.0611 0 0 ...
r
I have been tearing my hair out over this for the last hour, the following code was working perfectly a couple of hours ago, and now I have no idea why it doesn't anymore. I have searched for other questions regarding the undefined columns selected error, but I think I have corrected for all of the info in those answers. I am sure there is some tiny thing I have overlooked or accidently left in, but I can't see it!
I have a data frame with both factor and numeric variables, I want to subset so that I keep all of the factor variables, and remove numeric variables whose columns have a mean < 0.1.
I found the following code on another question on stackoverflow, which slightly modified worked well on my test data (smaller sub-dataset I am using for testing before trying out code on a big 3GB object)
meanfunction01 <- function(x){
if(is.numeric(x)){
mean(x) > 0.1
} else {
TRUE}
}
#then apply function to data table
Zdata <- Data1[,sapply(Data1, meanfunction01)]
I swear I was using this a few hours ago, then when i came back to it and tried to use it again it stopped working and now just returns the following error:
Error in `[.data.frame`(Data1, , sapply(Data1, meanfunction01)) :
undefined columns selected
I was trying to modify the function so that it would loop over multiple objects (I have 54 objects I want to apply it to, and didn't want to type them all manually), but I don't think I edited the original function, and now it has stopped working.
A brief str() of my data:
> str(Data1[1:10])
'data.frame': 11 obs. of 10 variables:
$ Name : Factor w/ 11688 levels "GTEX-1117F-0226-SM-5GZZ7",..: 8186 8242 8262 8270 8343 8388 8403 8621 8689 8709 ...
$ SEX : Factor w/ 2 levels "Female","Male": 1 2 2 1 1 2 2 1 2 1 ...
$ AGE : Factor w/ 6 levels "20-29","30-39",..: 4 4 1 3 3 1 3 3 3 2 ...
$ CIRCUMSTANCES: Factor w/ 5 levels "0","1","2","3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Tissue.x : Factor w/ 53 levels "Adipose_Subcutaneous",..: 7 7 7 7 7 7 7 7 7 7 ...
$ ENSG00000223972.4 : num 0 0.0701 0.0339 0.1149 0.0549 ...
$ ENSG00000227232.4 : num 12.5 17.2 13.1 16 15.7 ...
$ ENSG00000243485.2 : num 0.0717 0 0.1508 0 0.061 ...
$ ENSG00000237613.2 : num 0 0.0654 0 0.0402 0.0768 ...
$ ENSG00000268020.2 : num 0 0.0421 0.0611 0 0 ...
r
r
asked Nov 22 '18 at 16:05
Phil DPhil D
327
327
It will be very difficult to guess what the problem might be without a working example that demonstrates the issue. You may find some helpful tips in stackoverflow.com/questions/5963269/…
– Ista
Nov 22 '18 at 16:17
Ok, in trying to head() my data in order to create an example dataset for you, think I have narrowed down the problem. When I re-loaded the object into my environment, it seems to have categorised some of my columns as integers rather than numeric. When I run the code on the head([1:20]) subset it works fine, as the integer columns start appearing later on in the date around column 10000 or something. Now I am trying to figure out how to recatagorise these columns as numeric instead, which is a different problem entirely. Thanks anyway!
– Phil D
Nov 22 '18 at 16:52
Why just suspect that the data structure changes as you move along your multiple columns? Why not check how the columns are structured by runninghead()
without restricting the range of columns? If it turns out that indeed some columns are integers rather than numeric then re-structure them using something along these lines:Data1[,4:33] <- lapply(Data1[,4:33], as.numeric)
– Chris Ruehlemann
Nov 22 '18 at 18:44
Well, mostly because using head() without restricting columns in a data frame with >50000 columns will be difficult to check manually! I have started restructuring using lapply though
– Phil D
Nov 26 '18 at 11:24
add a comment |
It will be very difficult to guess what the problem might be without a working example that demonstrates the issue. You may find some helpful tips in stackoverflow.com/questions/5963269/…
– Ista
Nov 22 '18 at 16:17
Ok, in trying to head() my data in order to create an example dataset for you, think I have narrowed down the problem. When I re-loaded the object into my environment, it seems to have categorised some of my columns as integers rather than numeric. When I run the code on the head([1:20]) subset it works fine, as the integer columns start appearing later on in the date around column 10000 or something. Now I am trying to figure out how to recatagorise these columns as numeric instead, which is a different problem entirely. Thanks anyway!
– Phil D
Nov 22 '18 at 16:52
Why just suspect that the data structure changes as you move along your multiple columns? Why not check how the columns are structured by runninghead()
without restricting the range of columns? If it turns out that indeed some columns are integers rather than numeric then re-structure them using something along these lines:Data1[,4:33] <- lapply(Data1[,4:33], as.numeric)
– Chris Ruehlemann
Nov 22 '18 at 18:44
Well, mostly because using head() without restricting columns in a data frame with >50000 columns will be difficult to check manually! I have started restructuring using lapply though
– Phil D
Nov 26 '18 at 11:24
It will be very difficult to guess what the problem might be without a working example that demonstrates the issue. You may find some helpful tips in stackoverflow.com/questions/5963269/…
– Ista
Nov 22 '18 at 16:17
It will be very difficult to guess what the problem might be without a working example that demonstrates the issue. You may find some helpful tips in stackoverflow.com/questions/5963269/…
– Ista
Nov 22 '18 at 16:17
Ok, in trying to head() my data in order to create an example dataset for you, think I have narrowed down the problem. When I re-loaded the object into my environment, it seems to have categorised some of my columns as integers rather than numeric. When I run the code on the head([1:20]) subset it works fine, as the integer columns start appearing later on in the date around column 10000 or something. Now I am trying to figure out how to recatagorise these columns as numeric instead, which is a different problem entirely. Thanks anyway!
– Phil D
Nov 22 '18 at 16:52
Ok, in trying to head() my data in order to create an example dataset for you, think I have narrowed down the problem. When I re-loaded the object into my environment, it seems to have categorised some of my columns as integers rather than numeric. When I run the code on the head([1:20]) subset it works fine, as the integer columns start appearing later on in the date around column 10000 or something. Now I am trying to figure out how to recatagorise these columns as numeric instead, which is a different problem entirely. Thanks anyway!
– Phil D
Nov 22 '18 at 16:52
Why just suspect that the data structure changes as you move along your multiple columns? Why not check how the columns are structured by running
head()
without restricting the range of columns? If it turns out that indeed some columns are integers rather than numeric then re-structure them using something along these lines: Data1[,4:33] <- lapply(Data1[,4:33], as.numeric)
– Chris Ruehlemann
Nov 22 '18 at 18:44
Why just suspect that the data structure changes as you move along your multiple columns? Why not check how the columns are structured by running
head()
without restricting the range of columns? If it turns out that indeed some columns are integers rather than numeric then re-structure them using something along these lines: Data1[,4:33] <- lapply(Data1[,4:33], as.numeric)
– Chris Ruehlemann
Nov 22 '18 at 18:44
Well, mostly because using head() without restricting columns in a data frame with >50000 columns will be difficult to check manually! I have started restructuring using lapply though
– Phil D
Nov 26 '18 at 11:24
Well, mostly because using head() without restricting columns in a data frame with >50000 columns will be difficult to check manually! I have started restructuring using lapply though
– Phil D
Nov 26 '18 at 11:24
add a comment |
1 Answer
1
active
oldest
votes
So if your only issue is changing the class of the integer variables in your data.frame but you have many columns (>10000) you may want to consider converting your data.frame into a data.table. Your code would then look like this:
library(data.table)
Data1<-data.table(Data1) #or if you have your data in csv document just use fread instead of read.csv which will automatically give you a data.table.
Then you just need to find the integer columns using this:
which(sapply(Data1,is.integer))
Putting it altogether using the data.table commands:
Data1[,which(sapply(Data1,is.integer)):=lapply(.SD,as.numeric),.SDcols=which(sapply(Data1,is.integer))]
Note you don't need to assign the above line of code into anything since data.table uses pointers which makes it much faster than data.frame or tibbles objects. So running the above line will update your Data1 object efficiently. The classes of the other non-integer columns (i.e., factors) will remain unchanged.
Please update if you have further questions but this should answer your comment. Best of luck!
Thanks for the advice, but I figured out the solution (to this problem at least) myself in the end, and it wasn't the integers causing this problem (although they caused a different problem), it was simply the presence of NA values, which I didn't realise I could remove by using na.rm = TRUE inside the mean() function. Sorry for the very basic questions requiring very basic solutions, but I am still very new to coding and teaching myself!
– Phil D
Nov 26 '18 at 11:35
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53434684%2fundefined-columns-selected-error-in-r-when-trying-to-subset-using-sapply%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
So if your only issue is changing the class of the integer variables in your data.frame but you have many columns (>10000) you may want to consider converting your data.frame into a data.table. Your code would then look like this:
library(data.table)
Data1<-data.table(Data1) #or if you have your data in csv document just use fread instead of read.csv which will automatically give you a data.table.
Then you just need to find the integer columns using this:
which(sapply(Data1,is.integer))
Putting it altogether using the data.table commands:
Data1[,which(sapply(Data1,is.integer)):=lapply(.SD,as.numeric),.SDcols=which(sapply(Data1,is.integer))]
Note you don't need to assign the above line of code into anything since data.table uses pointers which makes it much faster than data.frame or tibbles objects. So running the above line will update your Data1 object efficiently. The classes of the other non-integer columns (i.e., factors) will remain unchanged.
Please update if you have further questions but this should answer your comment. Best of luck!
Thanks for the advice, but I figured out the solution (to this problem at least) myself in the end, and it wasn't the integers causing this problem (although they caused a different problem), it was simply the presence of NA values, which I didn't realise I could remove by using na.rm = TRUE inside the mean() function. Sorry for the very basic questions requiring very basic solutions, but I am still very new to coding and teaching myself!
– Phil D
Nov 26 '18 at 11:35
add a comment |
So if your only issue is changing the class of the integer variables in your data.frame but you have many columns (>10000) you may want to consider converting your data.frame into a data.table. Your code would then look like this:
library(data.table)
Data1<-data.table(Data1) #or if you have your data in csv document just use fread instead of read.csv which will automatically give you a data.table.
Then you just need to find the integer columns using this:
which(sapply(Data1,is.integer))
Putting it altogether using the data.table commands:
Data1[,which(sapply(Data1,is.integer)):=lapply(.SD,as.numeric),.SDcols=which(sapply(Data1,is.integer))]
Note you don't need to assign the above line of code into anything since data.table uses pointers which makes it much faster than data.frame or tibbles objects. So running the above line will update your Data1 object efficiently. The classes of the other non-integer columns (i.e., factors) will remain unchanged.
Please update if you have further questions but this should answer your comment. Best of luck!
Thanks for the advice, but I figured out the solution (to this problem at least) myself in the end, and it wasn't the integers causing this problem (although they caused a different problem), it was simply the presence of NA values, which I didn't realise I could remove by using na.rm = TRUE inside the mean() function. Sorry for the very basic questions requiring very basic solutions, but I am still very new to coding and teaching myself!
– Phil D
Nov 26 '18 at 11:35
add a comment |
So if your only issue is changing the class of the integer variables in your data.frame but you have many columns (>10000) you may want to consider converting your data.frame into a data.table. Your code would then look like this:
library(data.table)
Data1<-data.table(Data1) #or if you have your data in csv document just use fread instead of read.csv which will automatically give you a data.table.
Then you just need to find the integer columns using this:
which(sapply(Data1,is.integer))
Putting it altogether using the data.table commands:
Data1[,which(sapply(Data1,is.integer)):=lapply(.SD,as.numeric),.SDcols=which(sapply(Data1,is.integer))]
Note you don't need to assign the above line of code into anything since data.table uses pointers which makes it much faster than data.frame or tibbles objects. So running the above line will update your Data1 object efficiently. The classes of the other non-integer columns (i.e., factors) will remain unchanged.
Please update if you have further questions but this should answer your comment. Best of luck!
So if your only issue is changing the class of the integer variables in your data.frame but you have many columns (>10000) you may want to consider converting your data.frame into a data.table. Your code would then look like this:
library(data.table)
Data1<-data.table(Data1) #or if you have your data in csv document just use fread instead of read.csv which will automatically give you a data.table.
Then you just need to find the integer columns using this:
which(sapply(Data1,is.integer))
Putting it altogether using the data.table commands:
Data1[,which(sapply(Data1,is.integer)):=lapply(.SD,as.numeric),.SDcols=which(sapply(Data1,is.integer))]
Note you don't need to assign the above line of code into anything since data.table uses pointers which makes it much faster than data.frame or tibbles objects. So running the above line will update your Data1 object efficiently. The classes of the other non-integer columns (i.e., factors) will remain unchanged.
Please update if you have further questions but this should answer your comment. Best of luck!
answered Nov 23 '18 at 0:11
Jason JohnsonJason Johnson
965
965
Thanks for the advice, but I figured out the solution (to this problem at least) myself in the end, and it wasn't the integers causing this problem (although they caused a different problem), it was simply the presence of NA values, which I didn't realise I could remove by using na.rm = TRUE inside the mean() function. Sorry for the very basic questions requiring very basic solutions, but I am still very new to coding and teaching myself!
– Phil D
Nov 26 '18 at 11:35
add a comment |
Thanks for the advice, but I figured out the solution (to this problem at least) myself in the end, and it wasn't the integers causing this problem (although they caused a different problem), it was simply the presence of NA values, which I didn't realise I could remove by using na.rm = TRUE inside the mean() function. Sorry for the very basic questions requiring very basic solutions, but I am still very new to coding and teaching myself!
– Phil D
Nov 26 '18 at 11:35
Thanks for the advice, but I figured out the solution (to this problem at least) myself in the end, and it wasn't the integers causing this problem (although they caused a different problem), it was simply the presence of NA values, which I didn't realise I could remove by using na.rm = TRUE inside the mean() function. Sorry for the very basic questions requiring very basic solutions, but I am still very new to coding and teaching myself!
– Phil D
Nov 26 '18 at 11:35
Thanks for the advice, but I figured out the solution (to this problem at least) myself in the end, and it wasn't the integers causing this problem (although they caused a different problem), it was simply the presence of NA values, which I didn't realise I could remove by using na.rm = TRUE inside the mean() function. Sorry for the very basic questions requiring very basic solutions, but I am still very new to coding and teaching myself!
– Phil D
Nov 26 '18 at 11:35
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53434684%2fundefined-columns-selected-error-in-r-when-trying-to-subset-using-sapply%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It will be very difficult to guess what the problem might be without a working example that demonstrates the issue. You may find some helpful tips in stackoverflow.com/questions/5963269/…
– Ista
Nov 22 '18 at 16:17
Ok, in trying to head() my data in order to create an example dataset for you, think I have narrowed down the problem. When I re-loaded the object into my environment, it seems to have categorised some of my columns as integers rather than numeric. When I run the code on the head([1:20]) subset it works fine, as the integer columns start appearing later on in the date around column 10000 or something. Now I am trying to figure out how to recatagorise these columns as numeric instead, which is a different problem entirely. Thanks anyway!
– Phil D
Nov 22 '18 at 16:52
Why just suspect that the data structure changes as you move along your multiple columns? Why not check how the columns are structured by running
head()
without restricting the range of columns? If it turns out that indeed some columns are integers rather than numeric then re-structure them using something along these lines:Data1[,4:33] <- lapply(Data1[,4:33], as.numeric)
– Chris Ruehlemann
Nov 22 '18 at 18:44
Well, mostly because using head() without restricting columns in a data frame with >50000 columns will be difficult to check manually! I have started restructuring using lapply though
– Phil D
Nov 26 '18 at 11:24