PBS - Job exceeded reserved number of CPUs

up vote
0
down vote

favorite

I'm training deep learning network called darknet/YOLO on remote server with NVIDIA graphic cards.

I'm submitting job which train neural network on GPU, which works well.

Problem is in huge amount of consumed processors. I usually submit a job with 8 processors, like this

qsub -q gpu select=1:ncpus=8:ngpus=1:mem=15gb:gpu_cap=cuda61

but it's always killed because of exceeded number of processors. Even if i use 20 processors its exceeded.

I don't know why job consumes so many processors on the server, even tho i may run the job on my notebook with Intel i5 processor, but its too slow.

What I've tried:

1) Set cgroups=cpuacct which forces the job to NOT to use more processors then assigned, but it didn't work at all. Seem's like restriction works just in case server dont have resources for others. In the case there are free processors, the restriction doesnt work (https://drill.apache.org/docs/configuring-cgroups-to-control-cpu-usage/#cpu-limits)

2) Set place=excelhost which does not kill the job in case it exceed assigned resources. On the other side, it takes like 7 days to even start the job with this flag and I have to train network every day.

Question:

I don't need these processors and i don't understand why the job uses so many of them. How may i force the job to NOT exceed the given number of processors ? Or some other idea how could i solve this kind of problem ?

edited Nov 20 at 6:33

asked Nov 19 at 15:02

Filip Kočica

5,5852732

add a comment |

up vote
0
down vote

favorite

I'm training deep learning network called darknet/YOLO on remote server with NVIDIA graphic cards.

I'm submitting job which train neural network on GPU, which works well.

Problem is in huge amount of consumed processors. I usually submit a job with 8 processors, like this

qsub -q gpu select=1:ncpus=8:ngpus=1:mem=15gb:gpu_cap=cuda61

but it's always killed because of exceeded number of processors. Even if i use 20 processors its exceeded.

I don't know why job consumes so many processors on the server, even tho i may run the job on my notebook with Intel i5 processor, but its too slow.

What I've tried:

Question:

edited Nov 20 at 6:33

asked Nov 19 at 15:02

Filip Kočica

5,5852732

add a comment |

up vote
0
down vote

favorite

I'm training deep learning network called darknet/YOLO on remote server with NVIDIA graphic cards.

I'm submitting job which train neural network on GPU, which works well.

Problem is in huge amount of consumed processors. I usually submit a job with 8 processors, like this

qsub -q gpu select=1:ncpus=8:ngpus=1:mem=15gb:gpu_cap=cuda61

but it's always killed because of exceeded number of processors. Even if i use 20 processors its exceeded.

I don't know why job consumes so many processors on the server, even tho i may run the job on my notebook with Intel i5 processor, but its too slow.

What I've tried:

Question:

edited Nov 20 at 6:33

asked Nov 19 at 15:02

Filip Kočica

5,5852732

I'm training deep learning network called darknet/YOLO on remote server with NVIDIA graphic cards.

I'm submitting job which train neural network on GPU, which works well.

Problem is in huge amount of consumed processors. I usually submit a job with 8 processors, like this

qsub -q gpu select=1:ncpus=8:ngpus=1:mem=15gb:gpu_cap=cuda61

but it's always killed because of exceeded number of processors. Even if i use 20 processors its exceeded.

I don't know why job consumes so many processors on the server, even tho i may run the job on my notebook with Intel i5 processor, but its too slow.

What I've tried:

Question:

linux pbs yolo darknet

edited Nov 20 at 6:33

asked Nov 19 at 15:02

Filip Kočica

5,5852732

edited Nov 20 at 6:33

asked Nov 19 at 15:02

Filip Kočica

5,5852732

edited Nov 20 at 6:33

asked Nov 19 at 15:02

Filip Kočica

5,5852732

asked Nov 19 at 15:02

Filip Kočica

5,5852732

asked Nov 19 at 15:02

Filip Kočica

5,5852732

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

It is more likely that it is a mismatch between admin set restrictions for that queue and your request. So ping your admin and get the details of the queues. (e.g queue1 ppm, gpu's)

answered Nov 19 at 23:46

nav

675612

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377358%2fpbs-job-exceeded-reserved-number-of-cpus%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

It is more likely that it is a mismatch between admin set restrictions for that queue and your request. So ping your admin and get the details of the queues. (e.g queue1 ppm, gpu's)

answered Nov 19 at 23:46

nav

675612

add a comment |

up vote
0
down vote

It is more likely that it is a mismatch between admin set restrictions for that queue and your request. So ping your admin and get the details of the queues. (e.g queue1 ppm, gpu's)

answered Nov 19 at 23:46

nav

675612

add a comment |

up vote
0
down vote

It is more likely that it is a mismatch between admin set restrictions for that queue and your request. So ping your admin and get the details of the queues. (e.g queue1 ppm, gpu's)

answered Nov 19 at 23:46

nav

675612

It is more likely that it is a mismatch between admin set restrictions for that queue and your request. So ping your admin and get the details of the queues. (e.g queue1 ppm, gpu's)

answered Nov 19 at 23:46

nav

675612

answered Nov 19 at 23:46

nav

675612

answered Nov 19 at 23:46

nav

675612

answered Nov 19 at 23:46

nav

675612

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

YDC5,ezP1,hJ2cfrNRgWDi fiG4K7Uws3CMBD2Y

搜尋此網誌

Ytukyg