Run Selenium parallel test on Azure batch
I am using latest version of R on windows 7.
I would like to run many test in parallel using RSelenium
so, my question is:
- What is the recommended way to run many
RSelenium
tests?
Let's say I would like to run 1000 tests and each step takes 1 hour. Running tests one by one takes lot's of time (24 test per day, so in total cca 42 days). I know how to use doParallel and foreach package to run tests in parallel on my machine: Run RSelenium in parallel,
but sometimes, this is not enough. I would like like to run around 100 tests in parallel. I tried to use Azure Batch for that, but get lot's of errors on some nodes when starting the selenium server.
More concretely, I have written dockerfile:
FROM rocker/r-base:latest
RUN apt-get update
&& apt-get install -y --no-install-recommends
libxml2-dev
libcurl4-openssl-dev
libssl-dev
gnupg2
libfftw3-dev
libtiff-dev
libx11-dev
libcairo2-dev
libxt-dev
firefox
#RUN add-apt-repository -y ppa:mozillateam/firefox-next
## Install Java
RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee /etc/apt/sources.list.d/webupd8team-java.list
&& echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee -a /etc/apt/sources.list.d/webupd8team-java.list
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886
&& echo "oracle-java8-installer shared/accepted-oracle-license-v1-1 select true"
| /usr/bin/debconf-set-selections
&& apt-get update
&& apt-get install -y oracle-java8-installer
&& update-alternatives --display java
&& rm -rf /var/lib/apt/lists/*
&& apt-get clean
&& R CMD javareconf
## make sure Java can be found in rApache and other daemons not looking in R ldpaths
RUN echo "/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/" > /etc/ld.so.conf.d/rJava.conf
RUN /sbin/ldconfig
# Install the R Packages from CRAN
RUN Rscript -e 'install.packages(c("Cairo", "Rcpp", "RSelenium", "httr", "rvest", "imager", "RCurl"))'
I have used doAzureParallel
package to execute many scripts in parallel:
# prepare Azure batch
setwd("E:/data/R/web_scraping/zk_ba/azure")
library(doAzureParallel)
setVerbose(TRUE)
setAutoDeleteJob(FALSE)
generateCredentialsConfig("credentials.json")
setCredentials("credentials.json")
generateClusterConfig("cluster.json")
cluster <- makeCluster("cluster.json")
registerDoAzureParallel(cluster)
getDoParWorkers()
opt <- list(wait = FALSE)
jobId <- foreach(
i = 1:n_cluster,
# .packages = c("RSelenium", "imager", "httr", "RCurl", "rvest"),
# .combine = 'rbind',
.errorhandling = "pass",
.options.azure = opt,
.export = c("metadata", "first_step", "parcele_df", "vlasnici_df", "status_teret_df", "n_cluster")
) %dopar% {
library(RSelenium)
library(imager)
library(httr)
library(RCurl)
library(rvest)
#-----------------------------------#
# START SELENIUM AND PREPARE #
#-----------------------------------#
if (first_step == TRUE) {
tryCatch({
rD <<- RSelenium::rsDriver(
browser = "firefox",
extraCapabilities = list(
"moz:firefoxOptions" = list(
args = list('--headless')
)
)
)
}, error = function(e) NA)
driver <<- rD$client
driver$open()
driver$navigate("http://www.e-grunt.ba/")
Sys.sleep(3L)
..
}
but this return error on many nodes:
<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused>
What would be general advice in situations where we need to use RSelenium in lot's of parallel tests?
r selenium foreach rselenium azure-batch
add a comment |
I am using latest version of R on windows 7.
I would like to run many test in parallel using RSelenium
so, my question is:
- What is the recommended way to run many
RSelenium
tests?
Let's say I would like to run 1000 tests and each step takes 1 hour. Running tests one by one takes lot's of time (24 test per day, so in total cca 42 days). I know how to use doParallel and foreach package to run tests in parallel on my machine: Run RSelenium in parallel,
but sometimes, this is not enough. I would like like to run around 100 tests in parallel. I tried to use Azure Batch for that, but get lot's of errors on some nodes when starting the selenium server.
More concretely, I have written dockerfile:
FROM rocker/r-base:latest
RUN apt-get update
&& apt-get install -y --no-install-recommends
libxml2-dev
libcurl4-openssl-dev
libssl-dev
gnupg2
libfftw3-dev
libtiff-dev
libx11-dev
libcairo2-dev
libxt-dev
firefox
#RUN add-apt-repository -y ppa:mozillateam/firefox-next
## Install Java
RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee /etc/apt/sources.list.d/webupd8team-java.list
&& echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee -a /etc/apt/sources.list.d/webupd8team-java.list
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886
&& echo "oracle-java8-installer shared/accepted-oracle-license-v1-1 select true"
| /usr/bin/debconf-set-selections
&& apt-get update
&& apt-get install -y oracle-java8-installer
&& update-alternatives --display java
&& rm -rf /var/lib/apt/lists/*
&& apt-get clean
&& R CMD javareconf
## make sure Java can be found in rApache and other daemons not looking in R ldpaths
RUN echo "/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/" > /etc/ld.so.conf.d/rJava.conf
RUN /sbin/ldconfig
# Install the R Packages from CRAN
RUN Rscript -e 'install.packages(c("Cairo", "Rcpp", "RSelenium", "httr", "rvest", "imager", "RCurl"))'
I have used doAzureParallel
package to execute many scripts in parallel:
# prepare Azure batch
setwd("E:/data/R/web_scraping/zk_ba/azure")
library(doAzureParallel)
setVerbose(TRUE)
setAutoDeleteJob(FALSE)
generateCredentialsConfig("credentials.json")
setCredentials("credentials.json")
generateClusterConfig("cluster.json")
cluster <- makeCluster("cluster.json")
registerDoAzureParallel(cluster)
getDoParWorkers()
opt <- list(wait = FALSE)
jobId <- foreach(
i = 1:n_cluster,
# .packages = c("RSelenium", "imager", "httr", "RCurl", "rvest"),
# .combine = 'rbind',
.errorhandling = "pass",
.options.azure = opt,
.export = c("metadata", "first_step", "parcele_df", "vlasnici_df", "status_teret_df", "n_cluster")
) %dopar% {
library(RSelenium)
library(imager)
library(httr)
library(RCurl)
library(rvest)
#-----------------------------------#
# START SELENIUM AND PREPARE #
#-----------------------------------#
if (first_step == TRUE) {
tryCatch({
rD <<- RSelenium::rsDriver(
browser = "firefox",
extraCapabilities = list(
"moz:firefoxOptions" = list(
args = list('--headless')
)
)
)
}, error = function(e) NA)
driver <<- rD$client
driver$open()
driver$navigate("http://www.e-grunt.ba/")
Sys.sleep(3L)
..
}
but this return error on many nodes:
<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused>
What would be general advice in situations where we need to use RSelenium in lot's of parallel tests?
r selenium foreach rselenium azure-batch
But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.
– Mislav
Nov 29 '18 at 8:53
add a comment |
I am using latest version of R on windows 7.
I would like to run many test in parallel using RSelenium
so, my question is:
- What is the recommended way to run many
RSelenium
tests?
Let's say I would like to run 1000 tests and each step takes 1 hour. Running tests one by one takes lot's of time (24 test per day, so in total cca 42 days). I know how to use doParallel and foreach package to run tests in parallel on my machine: Run RSelenium in parallel,
but sometimes, this is not enough. I would like like to run around 100 tests in parallel. I tried to use Azure Batch for that, but get lot's of errors on some nodes when starting the selenium server.
More concretely, I have written dockerfile:
FROM rocker/r-base:latest
RUN apt-get update
&& apt-get install -y --no-install-recommends
libxml2-dev
libcurl4-openssl-dev
libssl-dev
gnupg2
libfftw3-dev
libtiff-dev
libx11-dev
libcairo2-dev
libxt-dev
firefox
#RUN add-apt-repository -y ppa:mozillateam/firefox-next
## Install Java
RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee /etc/apt/sources.list.d/webupd8team-java.list
&& echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee -a /etc/apt/sources.list.d/webupd8team-java.list
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886
&& echo "oracle-java8-installer shared/accepted-oracle-license-v1-1 select true"
| /usr/bin/debconf-set-selections
&& apt-get update
&& apt-get install -y oracle-java8-installer
&& update-alternatives --display java
&& rm -rf /var/lib/apt/lists/*
&& apt-get clean
&& R CMD javareconf
## make sure Java can be found in rApache and other daemons not looking in R ldpaths
RUN echo "/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/" > /etc/ld.so.conf.d/rJava.conf
RUN /sbin/ldconfig
# Install the R Packages from CRAN
RUN Rscript -e 'install.packages(c("Cairo", "Rcpp", "RSelenium", "httr", "rvest", "imager", "RCurl"))'
I have used doAzureParallel
package to execute many scripts in parallel:
# prepare Azure batch
setwd("E:/data/R/web_scraping/zk_ba/azure")
library(doAzureParallel)
setVerbose(TRUE)
setAutoDeleteJob(FALSE)
generateCredentialsConfig("credentials.json")
setCredentials("credentials.json")
generateClusterConfig("cluster.json")
cluster <- makeCluster("cluster.json")
registerDoAzureParallel(cluster)
getDoParWorkers()
opt <- list(wait = FALSE)
jobId <- foreach(
i = 1:n_cluster,
# .packages = c("RSelenium", "imager", "httr", "RCurl", "rvest"),
# .combine = 'rbind',
.errorhandling = "pass",
.options.azure = opt,
.export = c("metadata", "first_step", "parcele_df", "vlasnici_df", "status_teret_df", "n_cluster")
) %dopar% {
library(RSelenium)
library(imager)
library(httr)
library(RCurl)
library(rvest)
#-----------------------------------#
# START SELENIUM AND PREPARE #
#-----------------------------------#
if (first_step == TRUE) {
tryCatch({
rD <<- RSelenium::rsDriver(
browser = "firefox",
extraCapabilities = list(
"moz:firefoxOptions" = list(
args = list('--headless')
)
)
)
}, error = function(e) NA)
driver <<- rD$client
driver$open()
driver$navigate("http://www.e-grunt.ba/")
Sys.sleep(3L)
..
}
but this return error on many nodes:
<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused>
What would be general advice in situations where we need to use RSelenium in lot's of parallel tests?
r selenium foreach rselenium azure-batch
I am using latest version of R on windows 7.
I would like to run many test in parallel using RSelenium
so, my question is:
- What is the recommended way to run many
RSelenium
tests?
Let's say I would like to run 1000 tests and each step takes 1 hour. Running tests one by one takes lot's of time (24 test per day, so in total cca 42 days). I know how to use doParallel and foreach package to run tests in parallel on my machine: Run RSelenium in parallel,
but sometimes, this is not enough. I would like like to run around 100 tests in parallel. I tried to use Azure Batch for that, but get lot's of errors on some nodes when starting the selenium server.
More concretely, I have written dockerfile:
FROM rocker/r-base:latest
RUN apt-get update
&& apt-get install -y --no-install-recommends
libxml2-dev
libcurl4-openssl-dev
libssl-dev
gnupg2
libfftw3-dev
libtiff-dev
libx11-dev
libcairo2-dev
libxt-dev
firefox
#RUN add-apt-repository -y ppa:mozillateam/firefox-next
## Install Java
RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee /etc/apt/sources.list.d/webupd8team-java.list
&& echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee -a /etc/apt/sources.list.d/webupd8team-java.list
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886
&& echo "oracle-java8-installer shared/accepted-oracle-license-v1-1 select true"
| /usr/bin/debconf-set-selections
&& apt-get update
&& apt-get install -y oracle-java8-installer
&& update-alternatives --display java
&& rm -rf /var/lib/apt/lists/*
&& apt-get clean
&& R CMD javareconf
## make sure Java can be found in rApache and other daemons not looking in R ldpaths
RUN echo "/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/" > /etc/ld.so.conf.d/rJava.conf
RUN /sbin/ldconfig
# Install the R Packages from CRAN
RUN Rscript -e 'install.packages(c("Cairo", "Rcpp", "RSelenium", "httr", "rvest", "imager", "RCurl"))'
I have used doAzureParallel
package to execute many scripts in parallel:
# prepare Azure batch
setwd("E:/data/R/web_scraping/zk_ba/azure")
library(doAzureParallel)
setVerbose(TRUE)
setAutoDeleteJob(FALSE)
generateCredentialsConfig("credentials.json")
setCredentials("credentials.json")
generateClusterConfig("cluster.json")
cluster <- makeCluster("cluster.json")
registerDoAzureParallel(cluster)
getDoParWorkers()
opt <- list(wait = FALSE)
jobId <- foreach(
i = 1:n_cluster,
# .packages = c("RSelenium", "imager", "httr", "RCurl", "rvest"),
# .combine = 'rbind',
.errorhandling = "pass",
.options.azure = opt,
.export = c("metadata", "first_step", "parcele_df", "vlasnici_df", "status_teret_df", "n_cluster")
) %dopar% {
library(RSelenium)
library(imager)
library(httr)
library(RCurl)
library(rvest)
#-----------------------------------#
# START SELENIUM AND PREPARE #
#-----------------------------------#
if (first_step == TRUE) {
tryCatch({
rD <<- RSelenium::rsDriver(
browser = "firefox",
extraCapabilities = list(
"moz:firefoxOptions" = list(
args = list('--headless')
)
)
)
}, error = function(e) NA)
driver <<- rD$client
driver$open()
driver$navigate("http://www.e-grunt.ba/")
Sys.sleep(3L)
..
}
but this return error on many nodes:
<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused>
What would be general advice in situations where we need to use RSelenium in lot's of parallel tests?
r selenium foreach rselenium azure-batch
r selenium foreach rselenium azure-batch
edited Dec 3 '18 at 0:19
Tats_innit
30.9k85869
30.9k85869
asked Nov 26 '18 at 11:04
MislavMislav
489319
489319
But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.
– Mislav
Nov 29 '18 at 8:53
add a comment |
But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.
– Mislav
Nov 29 '18 at 8:53
But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.
– Mislav
Nov 29 '18 at 8:53
But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.
– Mislav
Nov 29 '18 at 8:53
add a comment |
0
active
oldest
votes
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53479771%2frun-selenium-parallel-test-on-azure-batch%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53479771%2frun-selenium-parallel-test-on-azure-batch%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.
– Mislav
Nov 29 '18 at 8:53