Run Selenium parallel test on Azure batch












11















I am using latest version of R on windows 7.



I would like to run many test in parallel using RSelenium so, my question is:




  • What is the recommended way to run many RSelenium tests?


Let's say I would like to run 1000 tests and each step takes 1 hour. Running tests one by one takes lot's of time (24 test per day, so in total cca 42 days). I know how to use doParallel and foreach package to run tests in parallel on my machine: Run RSelenium in parallel,
but sometimes, this is not enough. I would like like to run around 100 tests in parallel. I tried to use Azure Batch for that, but get lot's of errors on some nodes when starting the selenium server.



More concretely, I have written dockerfile:



FROM rocker/r-base:latest 

RUN apt-get update
&& apt-get install -y --no-install-recommends
libxml2-dev
libcurl4-openssl-dev
libssl-dev
gnupg2
libfftw3-dev
libtiff-dev
libx11-dev
libcairo2-dev
libxt-dev
firefox

#RUN add-apt-repository -y ppa:mozillateam/firefox-next

## Install Java
RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee /etc/apt/sources.list.d/webupd8team-java.list
&& echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee -a /etc/apt/sources.list.d/webupd8team-java.list
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886
&& echo "oracle-java8-installer shared/accepted-oracle-license-v1-1 select true"
| /usr/bin/debconf-set-selections
&& apt-get update
&& apt-get install -y oracle-java8-installer
&& update-alternatives --display java
&& rm -rf /var/lib/apt/lists/*
&& apt-get clean
&& R CMD javareconf

## make sure Java can be found in rApache and other daemons not looking in R ldpaths
RUN echo "/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/" > /etc/ld.so.conf.d/rJava.conf
RUN /sbin/ldconfig

# Install the R Packages from CRAN
RUN Rscript -e 'install.packages(c("Cairo", "Rcpp", "RSelenium", "httr", "rvest", "imager", "RCurl"))'


I have used doAzureParallel package to execute many scripts in parallel:



# prepare Azure batch
setwd("E:/data/R/web_scraping/zk_ba/azure")
library(doAzureParallel)
setVerbose(TRUE)
setAutoDeleteJob(FALSE)
generateCredentialsConfig("credentials.json")
setCredentials("credentials.json")
generateClusterConfig("cluster.json")
cluster <- makeCluster("cluster.json")
registerDoAzureParallel(cluster)
getDoParWorkers()
opt <- list(wait = FALSE)

jobId <- foreach(
i = 1:n_cluster,
# .packages = c("RSelenium", "imager", "httr", "RCurl", "rvest"),
# .combine = 'rbind',
.errorhandling = "pass",
.options.azure = opt,
.export = c("metadata", "first_step", "parcele_df", "vlasnici_df", "status_teret_df", "n_cluster")
) %dopar% {

library(RSelenium)
library(imager)
library(httr)
library(RCurl)
library(rvest)

#-----------------------------------#
# START SELENIUM AND PREPARE #
#-----------------------------------#

if (first_step == TRUE) {
tryCatch({
rD <<- RSelenium::rsDriver(
browser = "firefox",
extraCapabilities = list(
"moz:firefoxOptions" = list(
args = list('--headless')
)
)
)
}, error = function(e) NA)
driver <<- rD$client
driver$open()
driver$navigate("http://www.e-grunt.ba/")
Sys.sleep(3L)
..
}


but this return error on many nodes:



<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused>


What would be general advice in situations where we need to use RSelenium in lot's of parallel tests?










share|improve this question

























  • But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.

    – Mislav
    Nov 29 '18 at 8:53
















11















I am using latest version of R on windows 7.



I would like to run many test in parallel using RSelenium so, my question is:




  • What is the recommended way to run many RSelenium tests?


Let's say I would like to run 1000 tests and each step takes 1 hour. Running tests one by one takes lot's of time (24 test per day, so in total cca 42 days). I know how to use doParallel and foreach package to run tests in parallel on my machine: Run RSelenium in parallel,
but sometimes, this is not enough. I would like like to run around 100 tests in parallel. I tried to use Azure Batch for that, but get lot's of errors on some nodes when starting the selenium server.



More concretely, I have written dockerfile:



FROM rocker/r-base:latest 

RUN apt-get update
&& apt-get install -y --no-install-recommends
libxml2-dev
libcurl4-openssl-dev
libssl-dev
gnupg2
libfftw3-dev
libtiff-dev
libx11-dev
libcairo2-dev
libxt-dev
firefox

#RUN add-apt-repository -y ppa:mozillateam/firefox-next

## Install Java
RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee /etc/apt/sources.list.d/webupd8team-java.list
&& echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee -a /etc/apt/sources.list.d/webupd8team-java.list
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886
&& echo "oracle-java8-installer shared/accepted-oracle-license-v1-1 select true"
| /usr/bin/debconf-set-selections
&& apt-get update
&& apt-get install -y oracle-java8-installer
&& update-alternatives --display java
&& rm -rf /var/lib/apt/lists/*
&& apt-get clean
&& R CMD javareconf

## make sure Java can be found in rApache and other daemons not looking in R ldpaths
RUN echo "/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/" > /etc/ld.so.conf.d/rJava.conf
RUN /sbin/ldconfig

# Install the R Packages from CRAN
RUN Rscript -e 'install.packages(c("Cairo", "Rcpp", "RSelenium", "httr", "rvest", "imager", "RCurl"))'


I have used doAzureParallel package to execute many scripts in parallel:



# prepare Azure batch
setwd("E:/data/R/web_scraping/zk_ba/azure")
library(doAzureParallel)
setVerbose(TRUE)
setAutoDeleteJob(FALSE)
generateCredentialsConfig("credentials.json")
setCredentials("credentials.json")
generateClusterConfig("cluster.json")
cluster <- makeCluster("cluster.json")
registerDoAzureParallel(cluster)
getDoParWorkers()
opt <- list(wait = FALSE)

jobId <- foreach(
i = 1:n_cluster,
# .packages = c("RSelenium", "imager", "httr", "RCurl", "rvest"),
# .combine = 'rbind',
.errorhandling = "pass",
.options.azure = opt,
.export = c("metadata", "first_step", "parcele_df", "vlasnici_df", "status_teret_df", "n_cluster")
) %dopar% {

library(RSelenium)
library(imager)
library(httr)
library(RCurl)
library(rvest)

#-----------------------------------#
# START SELENIUM AND PREPARE #
#-----------------------------------#

if (first_step == TRUE) {
tryCatch({
rD <<- RSelenium::rsDriver(
browser = "firefox",
extraCapabilities = list(
"moz:firefoxOptions" = list(
args = list('--headless')
)
)
)
}, error = function(e) NA)
driver <<- rD$client
driver$open()
driver$navigate("http://www.e-grunt.ba/")
Sys.sleep(3L)
..
}


but this return error on many nodes:



<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused>


What would be general advice in situations where we need to use RSelenium in lot's of parallel tests?










share|improve this question

























  • But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.

    – Mislav
    Nov 29 '18 at 8:53














11












11








11


1






I am using latest version of R on windows 7.



I would like to run many test in parallel using RSelenium so, my question is:




  • What is the recommended way to run many RSelenium tests?


Let's say I would like to run 1000 tests and each step takes 1 hour. Running tests one by one takes lot's of time (24 test per day, so in total cca 42 days). I know how to use doParallel and foreach package to run tests in parallel on my machine: Run RSelenium in parallel,
but sometimes, this is not enough. I would like like to run around 100 tests in parallel. I tried to use Azure Batch for that, but get lot's of errors on some nodes when starting the selenium server.



More concretely, I have written dockerfile:



FROM rocker/r-base:latest 

RUN apt-get update
&& apt-get install -y --no-install-recommends
libxml2-dev
libcurl4-openssl-dev
libssl-dev
gnupg2
libfftw3-dev
libtiff-dev
libx11-dev
libcairo2-dev
libxt-dev
firefox

#RUN add-apt-repository -y ppa:mozillateam/firefox-next

## Install Java
RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee /etc/apt/sources.list.d/webupd8team-java.list
&& echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee -a /etc/apt/sources.list.d/webupd8team-java.list
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886
&& echo "oracle-java8-installer shared/accepted-oracle-license-v1-1 select true"
| /usr/bin/debconf-set-selections
&& apt-get update
&& apt-get install -y oracle-java8-installer
&& update-alternatives --display java
&& rm -rf /var/lib/apt/lists/*
&& apt-get clean
&& R CMD javareconf

## make sure Java can be found in rApache and other daemons not looking in R ldpaths
RUN echo "/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/" > /etc/ld.so.conf.d/rJava.conf
RUN /sbin/ldconfig

# Install the R Packages from CRAN
RUN Rscript -e 'install.packages(c("Cairo", "Rcpp", "RSelenium", "httr", "rvest", "imager", "RCurl"))'


I have used doAzureParallel package to execute many scripts in parallel:



# prepare Azure batch
setwd("E:/data/R/web_scraping/zk_ba/azure")
library(doAzureParallel)
setVerbose(TRUE)
setAutoDeleteJob(FALSE)
generateCredentialsConfig("credentials.json")
setCredentials("credentials.json")
generateClusterConfig("cluster.json")
cluster <- makeCluster("cluster.json")
registerDoAzureParallel(cluster)
getDoParWorkers()
opt <- list(wait = FALSE)

jobId <- foreach(
i = 1:n_cluster,
# .packages = c("RSelenium", "imager", "httr", "RCurl", "rvest"),
# .combine = 'rbind',
.errorhandling = "pass",
.options.azure = opt,
.export = c("metadata", "first_step", "parcele_df", "vlasnici_df", "status_teret_df", "n_cluster")
) %dopar% {

library(RSelenium)
library(imager)
library(httr)
library(RCurl)
library(rvest)

#-----------------------------------#
# START SELENIUM AND PREPARE #
#-----------------------------------#

if (first_step == TRUE) {
tryCatch({
rD <<- RSelenium::rsDriver(
browser = "firefox",
extraCapabilities = list(
"moz:firefoxOptions" = list(
args = list('--headless')
)
)
)
}, error = function(e) NA)
driver <<- rD$client
driver$open()
driver$navigate("http://www.e-grunt.ba/")
Sys.sleep(3L)
..
}


but this return error on many nodes:



<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused>


What would be general advice in situations where we need to use RSelenium in lot's of parallel tests?










share|improve this question
















I am using latest version of R on windows 7.



I would like to run many test in parallel using RSelenium so, my question is:




  • What is the recommended way to run many RSelenium tests?


Let's say I would like to run 1000 tests and each step takes 1 hour. Running tests one by one takes lot's of time (24 test per day, so in total cca 42 days). I know how to use doParallel and foreach package to run tests in parallel on my machine: Run RSelenium in parallel,
but sometimes, this is not enough. I would like like to run around 100 tests in parallel. I tried to use Azure Batch for that, but get lot's of errors on some nodes when starting the selenium server.



More concretely, I have written dockerfile:



FROM rocker/r-base:latest 

RUN apt-get update
&& apt-get install -y --no-install-recommends
libxml2-dev
libcurl4-openssl-dev
libssl-dev
gnupg2
libfftw3-dev
libtiff-dev
libx11-dev
libcairo2-dev
libxt-dev
firefox

#RUN add-apt-repository -y ppa:mozillateam/firefox-next

## Install Java
RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee /etc/apt/sources.list.d/webupd8team-java.list
&& echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main"
| tee -a /etc/apt/sources.list.d/webupd8team-java.list
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886
&& echo "oracle-java8-installer shared/accepted-oracle-license-v1-1 select true"
| /usr/bin/debconf-set-selections
&& apt-get update
&& apt-get install -y oracle-java8-installer
&& update-alternatives --display java
&& rm -rf /var/lib/apt/lists/*
&& apt-get clean
&& R CMD javareconf

## make sure Java can be found in rApache and other daemons not looking in R ldpaths
RUN echo "/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/" > /etc/ld.so.conf.d/rJava.conf
RUN /sbin/ldconfig

# Install the R Packages from CRAN
RUN Rscript -e 'install.packages(c("Cairo", "Rcpp", "RSelenium", "httr", "rvest", "imager", "RCurl"))'


I have used doAzureParallel package to execute many scripts in parallel:



# prepare Azure batch
setwd("E:/data/R/web_scraping/zk_ba/azure")
library(doAzureParallel)
setVerbose(TRUE)
setAutoDeleteJob(FALSE)
generateCredentialsConfig("credentials.json")
setCredentials("credentials.json")
generateClusterConfig("cluster.json")
cluster <- makeCluster("cluster.json")
registerDoAzureParallel(cluster)
getDoParWorkers()
opt <- list(wait = FALSE)

jobId <- foreach(
i = 1:n_cluster,
# .packages = c("RSelenium", "imager", "httr", "RCurl", "rvest"),
# .combine = 'rbind',
.errorhandling = "pass",
.options.azure = opt,
.export = c("metadata", "first_step", "parcele_df", "vlasnici_df", "status_teret_df", "n_cluster")
) %dopar% {

library(RSelenium)
library(imager)
library(httr)
library(RCurl)
library(rvest)

#-----------------------------------#
# START SELENIUM AND PREPARE #
#-----------------------------------#

if (first_step == TRUE) {
tryCatch({
rD <<- RSelenium::rsDriver(
browser = "firefox",
extraCapabilities = list(
"moz:firefoxOptions" = list(
args = list('--headless')
)
)
)
}, error = function(e) NA)
driver <<- rD$client
driver$open()
driver$navigate("http://www.e-grunt.ba/")
Sys.sleep(3L)
..
}


but this return error on many nodes:



<simpleError in checkError(res): Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused>


What would be general advice in situations where we need to use RSelenium in lot's of parallel tests?







r selenium foreach rselenium azure-batch






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 3 '18 at 0:19









Tats_innit

30.9k85869




30.9k85869










asked Nov 26 '18 at 11:04









MislavMislav

489319




489319













  • But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.

    – Mislav
    Nov 29 '18 at 8:53



















  • But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.

    – Mislav
    Nov 29 '18 at 8:53

















But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.

– Mislav
Nov 29 '18 at 8:53





But I think I have to start driver on VM,, not on every node, and I am using 4 VM's and 4 nodes. I don't know why same port would be a problem if VM's are independent from on to another. I have also tried to run Selenium session in parallel o lokal port and I called rsDriver function only once. All other nodes successfully listened this driver on one port.

– Mislav
Nov 29 '18 at 8:53












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53479771%2frun-selenium-parallel-test-on-azure-batch%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53479771%2frun-selenium-parallel-test-on-azure-batch%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wiesbaden

Marschland

Dieringhausen