Error in RStudio while running decision tree (mac)
I am running a CART decision tree on a training set which I've tokenized using quanteda for a routine text analysis task. The resulting DFM from tokenizing was turned into a dataframe and appended with the class attribute I am predicting for.
Like many DFMs, the table is very wide (33k columns), but only contains about 5,500 rows of documents. Calling rpart on my training set returns a stack overflow error.
If it matters, to help increase the speed of calculations, I am using the doSNOW library so I can run the model on 3 out of 4 of my cores in parallel.
I've looked at this answer but can't figure out how to do the equivalent on my mac workstation to see if the same solution would work for me. There is a chance that even if I increase the ppsize of RStudio, I may still run into this error.
So my question is how do I increase the maxppsize of RStudio on a mac, or more generally, how can I fix this stack overflow so I can run my model?
Thanks!
r rstudio
add a comment |
I am running a CART decision tree on a training set which I've tokenized using quanteda for a routine text analysis task. The resulting DFM from tokenizing was turned into a dataframe and appended with the class attribute I am predicting for.
Like many DFMs, the table is very wide (33k columns), but only contains about 5,500 rows of documents. Calling rpart on my training set returns a stack overflow error.
If it matters, to help increase the speed of calculations, I am using the doSNOW library so I can run the model on 3 out of 4 of my cores in parallel.
I've looked at this answer but can't figure out how to do the equivalent on my mac workstation to see if the same solution would work for me. There is a chance that even if I increase the ppsize of RStudio, I may still run into this error.
So my question is how do I increase the maxppsize of RStudio on a mac, or more generally, how can I fix this stack overflow so I can run my model?
Thanks!
r rstudio
add a comment |
I am running a CART decision tree on a training set which I've tokenized using quanteda for a routine text analysis task. The resulting DFM from tokenizing was turned into a dataframe and appended with the class attribute I am predicting for.
Like many DFMs, the table is very wide (33k columns), but only contains about 5,500 rows of documents. Calling rpart on my training set returns a stack overflow error.
If it matters, to help increase the speed of calculations, I am using the doSNOW library so I can run the model on 3 out of 4 of my cores in parallel.
I've looked at this answer but can't figure out how to do the equivalent on my mac workstation to see if the same solution would work for me. There is a chance that even if I increase the ppsize of RStudio, I may still run into this error.
So my question is how do I increase the maxppsize of RStudio on a mac, or more generally, how can I fix this stack overflow so I can run my model?
Thanks!
r rstudio
I am running a CART decision tree on a training set which I've tokenized using quanteda for a routine text analysis task. The resulting DFM from tokenizing was turned into a dataframe and appended with the class attribute I am predicting for.
Like many DFMs, the table is very wide (33k columns), but only contains about 5,500 rows of documents. Calling rpart on my training set returns a stack overflow error.
If it matters, to help increase the speed of calculations, I am using the doSNOW library so I can run the model on 3 out of 4 of my cores in parallel.
I've looked at this answer but can't figure out how to do the equivalent on my mac workstation to see if the same solution would work for me. There is a chance that even if I increase the ppsize of RStudio, I may still run into this error.
So my question is how do I increase the maxppsize of RStudio on a mac, or more generally, how can I fix this stack overflow so I can run my model?
Thanks!
r rstudio
r rstudio
edited Nov 20 at 23:35
Dave
2,21751525
2,21751525
asked Nov 20 at 16:59
user10463769
144
144
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
In the end, I found that macs don't have this same command line option since the mac version of RStudio uses all available memory by default.
So the way I fixed this is by decreasing the complexity of the task by reducing the sparsity. I cleaned the document-term matrix by removing all tokens that did not occur in at least 5% of the corpus. This was enough to take a matrix with 33k columns down to a much more manageable 3k columns while still leading to a highly representative DFM.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397930%2ferror-in-rstudio-while-running-decision-tree-mac%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
In the end, I found that macs don't have this same command line option since the mac version of RStudio uses all available memory by default.
So the way I fixed this is by decreasing the complexity of the task by reducing the sparsity. I cleaned the document-term matrix by removing all tokens that did not occur in at least 5% of the corpus. This was enough to take a matrix with 33k columns down to a much more manageable 3k columns while still leading to a highly representative DFM.
add a comment |
In the end, I found that macs don't have this same command line option since the mac version of RStudio uses all available memory by default.
So the way I fixed this is by decreasing the complexity of the task by reducing the sparsity. I cleaned the document-term matrix by removing all tokens that did not occur in at least 5% of the corpus. This was enough to take a matrix with 33k columns down to a much more manageable 3k columns while still leading to a highly representative DFM.
add a comment |
In the end, I found that macs don't have this same command line option since the mac version of RStudio uses all available memory by default.
So the way I fixed this is by decreasing the complexity of the task by reducing the sparsity. I cleaned the document-term matrix by removing all tokens that did not occur in at least 5% of the corpus. This was enough to take a matrix with 33k columns down to a much more manageable 3k columns while still leading to a highly representative DFM.
In the end, I found that macs don't have this same command line option since the mac version of RStudio uses all available memory by default.
So the way I fixed this is by decreasing the complexity of the task by reducing the sparsity. I cleaned the document-term matrix by removing all tokens that did not occur in at least 5% of the corpus. This was enough to take a matrix with 33k columns down to a much more manageable 3k columns while still leading to a highly representative DFM.
answered Dec 2 at 7:30
user10463769
144
144
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397930%2ferror-in-rstudio-while-running-decision-tree-mac%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown