Can't give a subset when using randomForest inside a function











up vote
1
down vote

favorite












I'm wanting to create a function that uses within it the randomForest function from the randomForest package. This takes the "subset" argument, which is a vector of row numbers of the data frame to use for training. However, if I use this argument when calling the randomForest function in another defined function, I get the error:



 Error in eval(substitute(subset), data, env) : 
object 'tr_subset' not found


Here is a reproducible example, where we attempt to train a random forest to classify a response "type" either "A" or "B", based on three numerical predictors:



library(randomForest)

# define a random data frame to train with
test.data = data.frame(
type = rep(NA, times = 500),
x = runif(500),
y = runif(500),
z = runif(500)
)
train.data$type[runif(500) >= 0.5] = "A"
train.data$type[is.na(test.data$type)] = "B"
train.data$type = as.factor(test.data$type)

# define the training range
training.range = sample(500)[1:300]

# formula to use
tr_form = formula(type ~ x + y + z)

# Function that includes the randomForest function
train_rf = function(form, all_data, tr_subset) {
p = randomForest(
formula = form,
data = all_data,
subset = tr_subset,
na.action = na.omit
)

return(p)
}

# test the new defined function
test_tree = train_rf(form = tr_form, all_data = train.data, tr_subset = training.range)


Running this gives the error:



 Error in eval(substitute(subset), data, env) : 
object 'tr_subset' not found


If, however, subset = tr_subset is removed from the randomForest function, and tr_subset is removed from the train_rf function, this code runs fine, however the whole data set is used for training!



It should be noted that using the subset argument in randomForest when not defined in another function works completely fine, and is the intended method for the function, as described in the vignette linked above.



I know in the mean time I could just define another training set that has just the row numbers required, and train using all of that, but is there a reason why my original code doesn't work please?



Thanks.



EDIT: I conjecture that, as subset() is a base R function, R is getting confused and thinking you're wanting to use the base R function rather than defining an argument of the randomForest function. I'm not an expert, though, so I may be wrong.










share|improve this question




























    up vote
    1
    down vote

    favorite












    I'm wanting to create a function that uses within it the randomForest function from the randomForest package. This takes the "subset" argument, which is a vector of row numbers of the data frame to use for training. However, if I use this argument when calling the randomForest function in another defined function, I get the error:



     Error in eval(substitute(subset), data, env) : 
    object 'tr_subset' not found


    Here is a reproducible example, where we attempt to train a random forest to classify a response "type" either "A" or "B", based on three numerical predictors:



    library(randomForest)

    # define a random data frame to train with
    test.data = data.frame(
    type = rep(NA, times = 500),
    x = runif(500),
    y = runif(500),
    z = runif(500)
    )
    train.data$type[runif(500) >= 0.5] = "A"
    train.data$type[is.na(test.data$type)] = "B"
    train.data$type = as.factor(test.data$type)

    # define the training range
    training.range = sample(500)[1:300]

    # formula to use
    tr_form = formula(type ~ x + y + z)

    # Function that includes the randomForest function
    train_rf = function(form, all_data, tr_subset) {
    p = randomForest(
    formula = form,
    data = all_data,
    subset = tr_subset,
    na.action = na.omit
    )

    return(p)
    }

    # test the new defined function
    test_tree = train_rf(form = tr_form, all_data = train.data, tr_subset = training.range)


    Running this gives the error:



     Error in eval(substitute(subset), data, env) : 
    object 'tr_subset' not found


    If, however, subset = tr_subset is removed from the randomForest function, and tr_subset is removed from the train_rf function, this code runs fine, however the whole data set is used for training!



    It should be noted that using the subset argument in randomForest when not defined in another function works completely fine, and is the intended method for the function, as described in the vignette linked above.



    I know in the mean time I could just define another training set that has just the row numbers required, and train using all of that, but is there a reason why my original code doesn't work please?



    Thanks.



    EDIT: I conjecture that, as subset() is a base R function, R is getting confused and thinking you're wanting to use the base R function rather than defining an argument of the randomForest function. I'm not an expert, though, so I may be wrong.










    share|improve this question


























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I'm wanting to create a function that uses within it the randomForest function from the randomForest package. This takes the "subset" argument, which is a vector of row numbers of the data frame to use for training. However, if I use this argument when calling the randomForest function in another defined function, I get the error:



       Error in eval(substitute(subset), data, env) : 
      object 'tr_subset' not found


      Here is a reproducible example, where we attempt to train a random forest to classify a response "type" either "A" or "B", based on three numerical predictors:



      library(randomForest)

      # define a random data frame to train with
      test.data = data.frame(
      type = rep(NA, times = 500),
      x = runif(500),
      y = runif(500),
      z = runif(500)
      )
      train.data$type[runif(500) >= 0.5] = "A"
      train.data$type[is.na(test.data$type)] = "B"
      train.data$type = as.factor(test.data$type)

      # define the training range
      training.range = sample(500)[1:300]

      # formula to use
      tr_form = formula(type ~ x + y + z)

      # Function that includes the randomForest function
      train_rf = function(form, all_data, tr_subset) {
      p = randomForest(
      formula = form,
      data = all_data,
      subset = tr_subset,
      na.action = na.omit
      )

      return(p)
      }

      # test the new defined function
      test_tree = train_rf(form = tr_form, all_data = train.data, tr_subset = training.range)


      Running this gives the error:



       Error in eval(substitute(subset), data, env) : 
      object 'tr_subset' not found


      If, however, subset = tr_subset is removed from the randomForest function, and tr_subset is removed from the train_rf function, this code runs fine, however the whole data set is used for training!



      It should be noted that using the subset argument in randomForest when not defined in another function works completely fine, and is the intended method for the function, as described in the vignette linked above.



      I know in the mean time I could just define another training set that has just the row numbers required, and train using all of that, but is there a reason why my original code doesn't work please?



      Thanks.



      EDIT: I conjecture that, as subset() is a base R function, R is getting confused and thinking you're wanting to use the base R function rather than defining an argument of the randomForest function. I'm not an expert, though, so I may be wrong.










      share|improve this question















      I'm wanting to create a function that uses within it the randomForest function from the randomForest package. This takes the "subset" argument, which is a vector of row numbers of the data frame to use for training. However, if I use this argument when calling the randomForest function in another defined function, I get the error:



       Error in eval(substitute(subset), data, env) : 
      object 'tr_subset' not found


      Here is a reproducible example, where we attempt to train a random forest to classify a response "type" either "A" or "B", based on three numerical predictors:



      library(randomForest)

      # define a random data frame to train with
      test.data = data.frame(
      type = rep(NA, times = 500),
      x = runif(500),
      y = runif(500),
      z = runif(500)
      )
      train.data$type[runif(500) >= 0.5] = "A"
      train.data$type[is.na(test.data$type)] = "B"
      train.data$type = as.factor(test.data$type)

      # define the training range
      training.range = sample(500)[1:300]

      # formula to use
      tr_form = formula(type ~ x + y + z)

      # Function that includes the randomForest function
      train_rf = function(form, all_data, tr_subset) {
      p = randomForest(
      formula = form,
      data = all_data,
      subset = tr_subset,
      na.action = na.omit
      )

      return(p)
      }

      # test the new defined function
      test_tree = train_rf(form = tr_form, all_data = train.data, tr_subset = training.range)


      Running this gives the error:



       Error in eval(substitute(subset), data, env) : 
      object 'tr_subset' not found


      If, however, subset = tr_subset is removed from the randomForest function, and tr_subset is removed from the train_rf function, this code runs fine, however the whole data set is used for training!



      It should be noted that using the subset argument in randomForest when not defined in another function works completely fine, and is the intended method for the function, as described in the vignette linked above.



      I know in the mean time I could just define another training set that has just the row numbers required, and train using all of that, but is there a reason why my original code doesn't work please?



      Thanks.



      EDIT: I conjecture that, as subset() is a base R function, R is getting confused and thinking you're wanting to use the base R function rather than defining an argument of the randomForest function. I'm not an expert, though, so I may be wrong.







      r runtime-error random-forest






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 21 at 15:17

























      asked Nov 20 at 13:57









      GMSL

      285




      285





























          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53394613%2fcant-give-a-subset-when-using-randomforest-inside-a-function%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53394613%2fcant-give-a-subset-when-using-randomforest-inside-a-function%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          To store a contact into the json file from server.js file using a class in NodeJS

          Marschland

          Redirect URL with Chrome Remote Debugging Android Devices