MongoDB draining shard but balancer not running? (removeShard taking too much time)












4















I'm trying to downsize a sharded cluster which currently has 8 shards, to a cluster with 4 shards.



I've started with the 8th shard and tried removing it first.



db.adminCommand( { removeShard : "rs8" } );
----
{
"msg" : "draining ongoing",
"state" : "ongoing",
"remaining" : {
"chunks" : NumberLong(1575),
"dbs" : NumberLong(0)
},
"note" : "you need to drop or movePrimary these databases",
"dbsToMove" : [ ],
"ok" : 1
}


So there are 1575 chunks to be migrated to the rest of the cluster.



But running sh.isBalancerRunning() I get the value false and also the output of sh.status() is like the following:



  ...
...

active mongoses:
"3.4.10" : 16
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
NaN
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
59 : Success
1 : Failed with error 'aborted', from rs8 to rs1
1 : Failed with error 'aborted', from rs2 to rs6
1 : Failed with error 'aborted', from rs8 to rs5
4929 : Failed with error 'aborted', from rs2 to rs7
1 : Failed with error 'aborted', from rs8 to rs2
506 : Failed with error 'aborted', from rs8 to rs7
1 : Failed with error 'aborted', from rs2 to rs3
...


So the balancer is enabled, but not running. But there is a draining shard (rs8) that's being removed, so I think the balancer should be constantly running, right? It's not though, as evident in the logs I provided above.



Also the process is taking incredibly long, for the past nearly day, the number of remaining chunks have decreased only by 10 chunks, from 1575 to 1565! This way, it's gonna take months for me to reduce a sharded cluster of 8 instances to a sharded cluster of 4 instances!



It also seems MongoDB itself doesn't stop writes to the draining shard, so what I'm experiencing is that the rate of chunks increasing, maybe is nearly canceling out their decrease?



Any help is greatly appreciated!

Thanks










share|improve this question



























    4















    I'm trying to downsize a sharded cluster which currently has 8 shards, to a cluster with 4 shards.



    I've started with the 8th shard and tried removing it first.



    db.adminCommand( { removeShard : "rs8" } );
    ----
    {
    "msg" : "draining ongoing",
    "state" : "ongoing",
    "remaining" : {
    "chunks" : NumberLong(1575),
    "dbs" : NumberLong(0)
    },
    "note" : "you need to drop or movePrimary these databases",
    "dbsToMove" : [ ],
    "ok" : 1
    }


    So there are 1575 chunks to be migrated to the rest of the cluster.



    But running sh.isBalancerRunning() I get the value false and also the output of sh.status() is like the following:



      ...
    ...

    active mongoses:
    "3.4.10" : 16
    autosplit:
    Currently enabled: yes
    balancer:
    Currently enabled: yes
    Currently running: no
    NaN
    Failed balancer rounds in last 5 attempts: 0
    Migration Results for the last 24 hours:
    59 : Success
    1 : Failed with error 'aborted', from rs8 to rs1
    1 : Failed with error 'aborted', from rs2 to rs6
    1 : Failed with error 'aborted', from rs8 to rs5
    4929 : Failed with error 'aborted', from rs2 to rs7
    1 : Failed with error 'aborted', from rs8 to rs2
    506 : Failed with error 'aborted', from rs8 to rs7
    1 : Failed with error 'aborted', from rs2 to rs3
    ...


    So the balancer is enabled, but not running. But there is a draining shard (rs8) that's being removed, so I think the balancer should be constantly running, right? It's not though, as evident in the logs I provided above.



    Also the process is taking incredibly long, for the past nearly day, the number of remaining chunks have decreased only by 10 chunks, from 1575 to 1565! This way, it's gonna take months for me to reduce a sharded cluster of 8 instances to a sharded cluster of 4 instances!



    It also seems MongoDB itself doesn't stop writes to the draining shard, so what I'm experiencing is that the rate of chunks increasing, maybe is nearly canceling out their decrease?



    Any help is greatly appreciated!

    Thanks










    share|improve this question

























      4












      4








      4


      1






      I'm trying to downsize a sharded cluster which currently has 8 shards, to a cluster with 4 shards.



      I've started with the 8th shard and tried removing it first.



      db.adminCommand( { removeShard : "rs8" } );
      ----
      {
      "msg" : "draining ongoing",
      "state" : "ongoing",
      "remaining" : {
      "chunks" : NumberLong(1575),
      "dbs" : NumberLong(0)
      },
      "note" : "you need to drop or movePrimary these databases",
      "dbsToMove" : [ ],
      "ok" : 1
      }


      So there are 1575 chunks to be migrated to the rest of the cluster.



      But running sh.isBalancerRunning() I get the value false and also the output of sh.status() is like the following:



        ...
      ...

      active mongoses:
      "3.4.10" : 16
      autosplit:
      Currently enabled: yes
      balancer:
      Currently enabled: yes
      Currently running: no
      NaN
      Failed balancer rounds in last 5 attempts: 0
      Migration Results for the last 24 hours:
      59 : Success
      1 : Failed with error 'aborted', from rs8 to rs1
      1 : Failed with error 'aborted', from rs2 to rs6
      1 : Failed with error 'aborted', from rs8 to rs5
      4929 : Failed with error 'aborted', from rs2 to rs7
      1 : Failed with error 'aborted', from rs8 to rs2
      506 : Failed with error 'aborted', from rs8 to rs7
      1 : Failed with error 'aborted', from rs2 to rs3
      ...


      So the balancer is enabled, but not running. But there is a draining shard (rs8) that's being removed, so I think the balancer should be constantly running, right? It's not though, as evident in the logs I provided above.



      Also the process is taking incredibly long, for the past nearly day, the number of remaining chunks have decreased only by 10 chunks, from 1575 to 1565! This way, it's gonna take months for me to reduce a sharded cluster of 8 instances to a sharded cluster of 4 instances!



      It also seems MongoDB itself doesn't stop writes to the draining shard, so what I'm experiencing is that the rate of chunks increasing, maybe is nearly canceling out their decrease?



      Any help is greatly appreciated!

      Thanks










      share|improve this question














      I'm trying to downsize a sharded cluster which currently has 8 shards, to a cluster with 4 shards.



      I've started with the 8th shard and tried removing it first.



      db.adminCommand( { removeShard : "rs8" } );
      ----
      {
      "msg" : "draining ongoing",
      "state" : "ongoing",
      "remaining" : {
      "chunks" : NumberLong(1575),
      "dbs" : NumberLong(0)
      },
      "note" : "you need to drop or movePrimary these databases",
      "dbsToMove" : [ ],
      "ok" : 1
      }


      So there are 1575 chunks to be migrated to the rest of the cluster.



      But running sh.isBalancerRunning() I get the value false and also the output of sh.status() is like the following:



        ...
      ...

      active mongoses:
      "3.4.10" : 16
      autosplit:
      Currently enabled: yes
      balancer:
      Currently enabled: yes
      Currently running: no
      NaN
      Failed balancer rounds in last 5 attempts: 0
      Migration Results for the last 24 hours:
      59 : Success
      1 : Failed with error 'aborted', from rs8 to rs1
      1 : Failed with error 'aborted', from rs2 to rs6
      1 : Failed with error 'aborted', from rs8 to rs5
      4929 : Failed with error 'aborted', from rs2 to rs7
      1 : Failed with error 'aborted', from rs8 to rs2
      506 : Failed with error 'aborted', from rs8 to rs7
      1 : Failed with error 'aborted', from rs2 to rs3
      ...


      So the balancer is enabled, but not running. But there is a draining shard (rs8) that's being removed, so I think the balancer should be constantly running, right? It's not though, as evident in the logs I provided above.



      Also the process is taking incredibly long, for the past nearly day, the number of remaining chunks have decreased only by 10 chunks, from 1575 to 1565! This way, it's gonna take months for me to reduce a sharded cluster of 8 instances to a sharded cluster of 4 instances!



      It also seems MongoDB itself doesn't stop writes to the draining shard, so what I'm experiencing is that the rate of chunks increasing, maybe is nearly canceling out their decrease?



      Any help is greatly appreciated!

      Thanks







      mongodb sharding






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Oct 24 '18 at 6:30









      SpiXelSpiXel

      2,3691631




      2,3691631
























          1 Answer
          1






          active

          oldest

          votes


















          0














          EDIT



          Great, now after exactly a month, the process is over and I have a 4 shard cluster! Doing the trick I described below helped reduce the time it could have took anyways, but honestly, the slowest thing I've ever done.





          Ok, so answering my own here,



          I couldn't get the automatic balancing behavior to work as fast as I wanted, each day what I observed was that about 5 to 7 chunks would have been migrated (meaning the whole process would take years!)



          What I did to kinda overcome this issue, was to use the moveChunk command manually.



          So what I basically did was:



          while 'can still sample':
          // Sample the 8th shard for 100 documents
          db.col.aggreagte([{$sample: {size: 100}}])

          For every document:
          db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);


          So I'm manually moving chunks out of the 8th shard to the first 4 shards (one downside being since we need the balancer to be enabled and only one shard can be draining at every moment, some of those migrated chunks will be again migrated automatically to shards 5-7, which I wanna later remove too, this leads into the process taking more time, any solutions?).



          Since the 8th shard is draining, it won't be filled again with the balancer and now the whole process is much faster, about 350-400 chunks per day. So hopefully each shard will take about 5 days at most and then the whole resize would take about 20 days!



          That's the fastest I could make it, I appreciate anyone with any other answers or strategies to perform this downsize better.






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52962297%2fmongodb-draining-shard-but-balancer-not-running-removeshard-taking-too-much-ti%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            EDIT



            Great, now after exactly a month, the process is over and I have a 4 shard cluster! Doing the trick I described below helped reduce the time it could have took anyways, but honestly, the slowest thing I've ever done.





            Ok, so answering my own here,



            I couldn't get the automatic balancing behavior to work as fast as I wanted, each day what I observed was that about 5 to 7 chunks would have been migrated (meaning the whole process would take years!)



            What I did to kinda overcome this issue, was to use the moveChunk command manually.



            So what I basically did was:



            while 'can still sample':
            // Sample the 8th shard for 100 documents
            db.col.aggreagte([{$sample: {size: 100}}])

            For every document:
            db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);


            So I'm manually moving chunks out of the 8th shard to the first 4 shards (one downside being since we need the balancer to be enabled and only one shard can be draining at every moment, some of those migrated chunks will be again migrated automatically to shards 5-7, which I wanna later remove too, this leads into the process taking more time, any solutions?).



            Since the 8th shard is draining, it won't be filled again with the balancer and now the whole process is much faster, about 350-400 chunks per day. So hopefully each shard will take about 5 days at most and then the whole resize would take about 20 days!



            That's the fastest I could make it, I appreciate anyone with any other answers or strategies to perform this downsize better.






            share|improve this answer






























              0














              EDIT



              Great, now after exactly a month, the process is over and I have a 4 shard cluster! Doing the trick I described below helped reduce the time it could have took anyways, but honestly, the slowest thing I've ever done.





              Ok, so answering my own here,



              I couldn't get the automatic balancing behavior to work as fast as I wanted, each day what I observed was that about 5 to 7 chunks would have been migrated (meaning the whole process would take years!)



              What I did to kinda overcome this issue, was to use the moveChunk command manually.



              So what I basically did was:



              while 'can still sample':
              // Sample the 8th shard for 100 documents
              db.col.aggreagte([{$sample: {size: 100}}])

              For every document:
              db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);


              So I'm manually moving chunks out of the 8th shard to the first 4 shards (one downside being since we need the balancer to be enabled and only one shard can be draining at every moment, some of those migrated chunks will be again migrated automatically to shards 5-7, which I wanna later remove too, this leads into the process taking more time, any solutions?).



              Since the 8th shard is draining, it won't be filled again with the balancer and now the whole process is much faster, about 350-400 chunks per day. So hopefully each shard will take about 5 days at most and then the whole resize would take about 20 days!



              That's the fastest I could make it, I appreciate anyone with any other answers or strategies to perform this downsize better.






              share|improve this answer




























                0












                0








                0







                EDIT



                Great, now after exactly a month, the process is over and I have a 4 shard cluster! Doing the trick I described below helped reduce the time it could have took anyways, but honestly, the slowest thing I've ever done.





                Ok, so answering my own here,



                I couldn't get the automatic balancing behavior to work as fast as I wanted, each day what I observed was that about 5 to 7 chunks would have been migrated (meaning the whole process would take years!)



                What I did to kinda overcome this issue, was to use the moveChunk command manually.



                So what I basically did was:



                while 'can still sample':
                // Sample the 8th shard for 100 documents
                db.col.aggreagte([{$sample: {size: 100}}])

                For every document:
                db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);


                So I'm manually moving chunks out of the 8th shard to the first 4 shards (one downside being since we need the balancer to be enabled and only one shard can be draining at every moment, some of those migrated chunks will be again migrated automatically to shards 5-7, which I wanna later remove too, this leads into the process taking more time, any solutions?).



                Since the 8th shard is draining, it won't be filled again with the balancer and now the whole process is much faster, about 350-400 chunks per day. So hopefully each shard will take about 5 days at most and then the whole resize would take about 20 days!



                That's the fastest I could make it, I appreciate anyone with any other answers or strategies to perform this downsize better.






                share|improve this answer















                EDIT



                Great, now after exactly a month, the process is over and I have a 4 shard cluster! Doing the trick I described below helped reduce the time it could have took anyways, but honestly, the slowest thing I've ever done.





                Ok, so answering my own here,



                I couldn't get the automatic balancing behavior to work as fast as I wanted, each day what I observed was that about 5 to 7 chunks would have been migrated (meaning the whole process would take years!)



                What I did to kinda overcome this issue, was to use the moveChunk command manually.



                So what I basically did was:



                while 'can still sample':
                // Sample the 8th shard for 100 documents
                db.col.aggreagte([{$sample: {size: 100}}])

                For every document:
                db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);


                So I'm manually moving chunks out of the 8th shard to the first 4 shards (one downside being since we need the balancer to be enabled and only one shard can be draining at every moment, some of those migrated chunks will be again migrated automatically to shards 5-7, which I wanna later remove too, this leads into the process taking more time, any solutions?).



                Since the 8th shard is draining, it won't be filled again with the balancer and now the whole process is much faster, about 350-400 chunks per day. So hopefully each shard will take about 5 days at most and then the whole resize would take about 20 days!



                That's the fastest I could make it, I appreciate anyone with any other answers or strategies to perform this downsize better.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 24 '18 at 9:49

























                answered Oct 29 '18 at 14:13









                SpiXelSpiXel

                2,3691631




                2,3691631
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52962297%2fmongodb-draining-shard-but-balancer-not-running-removeshard-taking-too-much-ti%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Tonle Sap (See)

                    I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

                    Guatemaltekische Davis-Cup-Mannschaft