MongoDB draining shard but balancer not running? (removeShard taking too much time)

I'm trying to downsize a sharded cluster which currently has 8 shards, to a cluster with 4 shards.

I've started with the 8th shard and tried removing it first.

db.adminCommand( { removeShard : "rs8" } );

----

{

    "msg" : "draining ongoing",

    "state" : "ongoing",

    "remaining" : {

        "chunks" : NumberLong(1575),

        "dbs" : NumberLong(0)

    },

    "note" : "you need to drop or movePrimary these databases",

    "dbsToMove" : [ ],

    "ok" : 1

}

So there are 1575 chunks to be migrated to the rest of the cluster.

But running sh.isBalancerRunning() I get the value false and also the output of sh.status() is like the following:

  ...

  ...



  active mongoses:

        "3.4.10" : 16

  autosplit:

        Currently enabled: yes

  balancer:

        Currently enabled:  yes

        Currently running:  no

NaN

        Failed balancer rounds in last 5 attempts:  0

        Migration Results for the last 24 hours: 

                59 : Success

                1 : Failed with error 'aborted', from rs8 to rs1

                1 : Failed with error 'aborted', from rs2 to rs6

                1 : Failed with error 'aborted', from rs8 to rs5

                4929 : Failed with error 'aborted', from rs2 to rs7

                1 : Failed with error 'aborted', from rs8 to rs2

                506 : Failed with error 'aborted', from rs8 to rs7

                1 : Failed with error 'aborted', from rs2 to rs3

...

So the balancer is enabled, but not running. But there is a draining shard (rs8) that's being removed, so I think the balancer should be constantly running, right? It's not though, as evident in the logs I provided above.

Also the process is taking incredibly long, for the past nearly day, the number of remaining chunks have decreased only by 10 chunks, from 1575 to 1565! This way, it's gonna take months for me to reduce a sharded cluster of 8 instances to a sharded cluster of 4 instances!

It also seems MongoDB itself doesn't stop writes to the draining shard, so what I'm experiencing is that the rate of chunks increasing, maybe is nearly canceling out their decrease?

Any help is greatly appreciated!

Thanks

asked Oct 24 '18 at 6:30

SpiXel

2,3691631

add a comment |

I'm trying to downsize a sharded cluster which currently has 8 shards, to a cluster with 4 shards.

I've started with the 8th shard and tried removing it first.

db.adminCommand( { removeShard : "rs8" } );

----

{

    "msg" : "draining ongoing",

    "state" : "ongoing",

    "remaining" : {

        "chunks" : NumberLong(1575),

        "dbs" : NumberLong(0)

    },

    "note" : "you need to drop or movePrimary these databases",

    "dbsToMove" : [ ],

    "ok" : 1

}

So there are 1575 chunks to be migrated to the rest of the cluster.

But running sh.isBalancerRunning() I get the value false and also the output of sh.status() is like the following:

  ...

  ...



  active mongoses:

        "3.4.10" : 16

  autosplit:

        Currently enabled: yes

  balancer:

        Currently enabled:  yes

        Currently running:  no

NaN

        Failed balancer rounds in last 5 attempts:  0

        Migration Results for the last 24 hours: 

                59 : Success

                1 : Failed with error 'aborted', from rs8 to rs1

                1 : Failed with error 'aborted', from rs2 to rs6

                1 : Failed with error 'aborted', from rs8 to rs5

                4929 : Failed with error 'aborted', from rs2 to rs7

                1 : Failed with error 'aborted', from rs8 to rs2

                506 : Failed with error 'aborted', from rs8 to rs7

                1 : Failed with error 'aborted', from rs2 to rs3

...

It also seems MongoDB itself doesn't stop writes to the draining shard, so what I'm experiencing is that the rate of chunks increasing, maybe is nearly canceling out their decrease?

Any help is greatly appreciated!

Thanks

asked Oct 24 '18 at 6:30

SpiXel

2,3691631

add a comment |

I'm trying to downsize a sharded cluster which currently has 8 shards, to a cluster with 4 shards.

I've started with the 8th shard and tried removing it first.

db.adminCommand( { removeShard : "rs8" } );

----

{

    "msg" : "draining ongoing",

    "state" : "ongoing",

    "remaining" : {

        "chunks" : NumberLong(1575),

        "dbs" : NumberLong(0)

    },

    "note" : "you need to drop or movePrimary these databases",

    "dbsToMove" : [ ],

    "ok" : 1

}

So there are 1575 chunks to be migrated to the rest of the cluster.

But running sh.isBalancerRunning() I get the value false and also the output of sh.status() is like the following:

  ...

  ...



  active mongoses:

        "3.4.10" : 16

  autosplit:

        Currently enabled: yes

  balancer:

        Currently enabled:  yes

        Currently running:  no

NaN

        Failed balancer rounds in last 5 attempts:  0

        Migration Results for the last 24 hours: 

                59 : Success

                1 : Failed with error 'aborted', from rs8 to rs1

                1 : Failed with error 'aborted', from rs2 to rs6

                1 : Failed with error 'aborted', from rs8 to rs5

                4929 : Failed with error 'aborted', from rs2 to rs7

                1 : Failed with error 'aborted', from rs8 to rs2

                506 : Failed with error 'aborted', from rs8 to rs7

                1 : Failed with error 'aborted', from rs2 to rs3

...

It also seems MongoDB itself doesn't stop writes to the draining shard, so what I'm experiencing is that the rate of chunks increasing, maybe is nearly canceling out their decrease?

Any help is greatly appreciated!

Thanks

asked Oct 24 '18 at 6:30

SpiXel

2,3691631

I'm trying to downsize a sharded cluster which currently has 8 shards, to a cluster with 4 shards.

I've started with the 8th shard and tried removing it first.

db.adminCommand( { removeShard : "rs8" } );

----

{

    "msg" : "draining ongoing",

    "state" : "ongoing",

    "remaining" : {

        "chunks" : NumberLong(1575),

        "dbs" : NumberLong(0)

    },

    "note" : "you need to drop or movePrimary these databases",

    "dbsToMove" : [ ],

    "ok" : 1

}

So there are 1575 chunks to be migrated to the rest of the cluster.

But running sh.isBalancerRunning() I get the value false and also the output of sh.status() is like the following:

  ...

  ...



  active mongoses:

        "3.4.10" : 16

  autosplit:

        Currently enabled: yes

  balancer:

        Currently enabled:  yes

        Currently running:  no

NaN

        Failed balancer rounds in last 5 attempts:  0

        Migration Results for the last 24 hours: 

                59 : Success

                1 : Failed with error 'aborted', from rs8 to rs1

                1 : Failed with error 'aborted', from rs2 to rs6

                1 : Failed with error 'aborted', from rs8 to rs5

                4929 : Failed with error 'aborted', from rs2 to rs7

                1 : Failed with error 'aborted', from rs8 to rs2

                506 : Failed with error 'aborted', from rs8 to rs7

                1 : Failed with error 'aborted', from rs2 to rs3

...

It also seems MongoDB itself doesn't stop writes to the draining shard, so what I'm experiencing is that the rate of chunks increasing, maybe is nearly canceling out their decrease?

Any help is greatly appreciated!

Thanks

mongodb sharding

asked Oct 24 '18 at 6:30

SpiXel

2,3691631

asked Oct 24 '18 at 6:30

SpiXel

2,3691631

asked Oct 24 '18 at 6:30

SpiXel

2,3691631

asked Oct 24 '18 at 6:30

SpiXel

2,3691631

asked Oct 24 '18 at 6:30

SpiXel

2,3691631

add a comment |

1 Answer
1

active

oldest

votes

EDIT

Great, now after exactly a month, the process is over and I have a 4 shard cluster! Doing the trick I described below helped reduce the time it could have took anyways, but honestly, the slowest thing I've ever done.

Ok, so answering my own here,

I couldn't get the automatic balancing behavior to work as fast as I wanted, each day what I observed was that about 5 to 7 chunks would have been migrated (meaning the whole process would take years!)

What I did to kinda overcome this issue, was to use the moveChunk command manually.

So what I basically did was:

while 'can still sample':

    // Sample the 8th shard for 100 documents

    db.col.aggreagte([{$sample: {size: 100}}])



    For every document:

        db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);

So I'm manually moving chunks out of the 8th shard to the first 4 shards (one downside being since we need the balancer to be enabled and only one shard can be draining at every moment, some of those migrated chunks will be again migrated automatically to shards 5-7, which I wanna later remove too, this leads into the process taking more time, any solutions?).

Since the 8th shard is draining, it won't be filled again with the balancer and now the whole process is much faster, about 350-400 chunks per day. So hopefully each shard will take about 5 days at most and then the whole resize would take about 20 days!

That's the fastest I could make it, I appreciate anyone with any other answers or strategies to perform this downsize better.

edited Nov 24 '18 at 9:49

answered Oct 29 '18 at 14:13

SpiXel

2,3691631

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52962297%2fmongodb-draining-shard-but-balancer-not-running-removeshard-taking-too-much-ti%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

EDIT

Ok, so answering my own here,

What I did to kinda overcome this issue, was to use the moveChunk command manually.

So what I basically did was:

while 'can still sample':

    // Sample the 8th shard for 100 documents

    db.col.aggreagte([{$sample: {size: 100}}])



    For every document:

        db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);

That's the fastest I could make it, I appreciate anyone with any other answers or strategies to perform this downsize better.

edited Nov 24 '18 at 9:49

answered Oct 29 '18 at 14:13

SpiXel

2,3691631

add a comment |

EDIT

Ok, so answering my own here,

What I did to kinda overcome this issue, was to use the moveChunk command manually.

So what I basically did was:

while 'can still sample':

    // Sample the 8th shard for 100 documents

    db.col.aggreagte([{$sample: {size: 100}}])



    For every document:

        db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);

That's the fastest I could make it, I appreciate anyone with any other answers or strategies to perform this downsize better.

edited Nov 24 '18 at 9:49

answered Oct 29 '18 at 14:13

SpiXel

2,3691631

add a comment |

EDIT

Ok, so answering my own here,

What I did to kinda overcome this issue, was to use the moveChunk command manually.

So what I basically did was:

while 'can still sample':

    // Sample the 8th shard for 100 documents

    db.col.aggreagte([{$sample: {size: 100}}])



    For every document:

        db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);

That's the fastest I could make it, I appreciate anyone with any other answers or strategies to perform this downsize better.

edited Nov 24 '18 at 9:49

answered Oct 29 '18 at 14:13

SpiXel

2,3691631

EDIT

Ok, so answering my own here,

What I did to kinda overcome this issue, was to use the moveChunk command manually.

So what I basically did was:

while 'can still sample':

    // Sample the 8th shard for 100 documents

    db.col.aggreagte([{$sample: {size: 100}}])



    For every document:

        db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);

That's the fastest I could make it, I appreciate anyone with any other answers or strategies to perform this downsize better.

edited Nov 24 '18 at 9:49

answered Oct 29 '18 at 14:13

SpiXel

2,3691631

edited Nov 24 '18 at 9:49

answered Oct 29 '18 at 14:13

SpiXel

2,3691631

answered Oct 29 '18 at 14:13

SpiXel

2,3691631

answered Oct 29 '18 at 14:13

SpiXel

2,3691631

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytukyg