Mongodb dump (filtering documents and fields)












2















I want to make a partial dump of a Mongodb database (partial as in, I need to filter some documents and some fields). Then, this dump will be imported onto another server.



I cannot use the mongodump utility, as it doesn't allow filtering fields.

I could use the mongoexport utility, as it allows filtering both documents and fields. Though, the documentation states that mongoexport can only output a JSON file and:




does not reliably preserve all rich BSON data types, because JSON can only represent a subset of the types supported by BSON.





  1. I find this statement a bit vague, and I don't entirely understand it. So, what happens exactly if I dump my database in JSON? What risks do I run? Do I risk losing some documents?

  2. If you think I should absolutely avoid using mongoexport in production, can I write my own Nodejs application to do the filtering and to output a dump in BSON? Or would that not be possible?










share|improve this question



























    2















    I want to make a partial dump of a Mongodb database (partial as in, I need to filter some documents and some fields). Then, this dump will be imported onto another server.



    I cannot use the mongodump utility, as it doesn't allow filtering fields.

    I could use the mongoexport utility, as it allows filtering both documents and fields. Though, the documentation states that mongoexport can only output a JSON file and:




    does not reliably preserve all rich BSON data types, because JSON can only represent a subset of the types supported by BSON.





    1. I find this statement a bit vague, and I don't entirely understand it. So, what happens exactly if I dump my database in JSON? What risks do I run? Do I risk losing some documents?

    2. If you think I should absolutely avoid using mongoexport in production, can I write my own Nodejs application to do the filtering and to output a dump in BSON? Or would that not be possible?










    share|improve this question

























      2












      2








      2


      1






      I want to make a partial dump of a Mongodb database (partial as in, I need to filter some documents and some fields). Then, this dump will be imported onto another server.



      I cannot use the mongodump utility, as it doesn't allow filtering fields.

      I could use the mongoexport utility, as it allows filtering both documents and fields. Though, the documentation states that mongoexport can only output a JSON file and:




      does not reliably preserve all rich BSON data types, because JSON can only represent a subset of the types supported by BSON.





      1. I find this statement a bit vague, and I don't entirely understand it. So, what happens exactly if I dump my database in JSON? What risks do I run? Do I risk losing some documents?

      2. If you think I should absolutely avoid using mongoexport in production, can I write my own Nodejs application to do the filtering and to output a dump in BSON? Or would that not be possible?










      share|improve this question














      I want to make a partial dump of a Mongodb database (partial as in, I need to filter some documents and some fields). Then, this dump will be imported onto another server.



      I cannot use the mongodump utility, as it doesn't allow filtering fields.

      I could use the mongoexport utility, as it allows filtering both documents and fields. Though, the documentation states that mongoexport can only output a JSON file and:




      does not reliably preserve all rich BSON data types, because JSON can only represent a subset of the types supported by BSON.





      1. I find this statement a bit vague, and I don't entirely understand it. So, what happens exactly if I dump my database in JSON? What risks do I run? Do I risk losing some documents?

      2. If you think I should absolutely avoid using mongoexport in production, can I write my own Nodejs application to do the filtering and to output a dump in BSON? Or would that not be possible?







      mongodb filter dump bson mongoexport






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 25 '18 at 23:46









      MikiTesiMikiTesi

      264




      264
























          1 Answer
          1






          active

          oldest

          votes


















          3














          It is possible to do using Views without resorting to writing a low level implementation reading and writing the BSON content. There are also options which do in fact preserve type even when using JSON formats, and you don't even need a "View" for that.



          Using Views with mongodump



          The basic premise is to create a View which only returns the content you want. A View can be the result of any aggregation pipeline expression.



          For example, given a simple document in a collection:



          db.test.insert({ "a": 1, "b": 2, "c": 3 })


          You can create the View on that collection, with just wanted fields:



          db.test.createView("testView", "test", [{ "$project": { "a": 1, "b": 2 } }])


          Then exiting the mongo shell you can access the View from mongodump using the --viewsAsCollections option:



          mongodump --db test --collection testView --viewsAsCollections


          This exports just the named "collection" ( actually a View ) only. The --viewsAsCollections means that instead of mongodump just returning the view definition ( being essentially the aggregation pipeline ) it returns the results instead just like it was a real collection.



          The resulting BSON content can then be loaded via mongorestore:



          mongorestore --db other --collection test


          Then the content from the BSON dump is actually written into the new database target of the host you are connecting to and with the specified collection name



          use other
          db.test.find()

          { "_id" : ObjectId("5bfb3e0eadd1d8af906ad140"), "a" : 1, "b" : 2 }


          Noting also that as a View, the aggregation pipeline can really be anything, so $match statements can filter and you can transform or even actually "aggregate" however you want.



          Using Views or --fields with mongoexport



          In much the same way, the mongoexport utility can also access the content from a View.



          Despite this not being "strict BSON", there is actually a standard with MongoDB which does in fact preserve the data types. This is actually covered in the documentation under MongoDB extended JSON.



          So this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there.



          For example:



          db.mixed.insert({
          "a": NumberLong(1),
          "b": NumberDecimal("123.45"),
          "c": new Date(),
          "d": "unwanted"
          })


          Which would appear in the mongo shell as:



          {
          "_id" : ObjectId("5bfb428790b2b4e4241a015c"),
          "a" : NumberLong(1),
          "b" : NumberDecimal("123.45"),
          "c" : ISODate("2018-11-26T00:47:03.033Z"),
          "d" : "unwanted"
          }


          You can still set up a View:



          db.createView("mixedView", "mixed", [{ "$project": { "a": 1, "b": 1, "c": 1 } }])


          And the export will just pick up the data:



          mongoexport --db test --collection mixedView > out.json

          {
          "_id": {
          "$oid": "5bfb428790b2b4e4241a015c"
          },
          "a": {
          "$numberLong": "1"
          },
          "b": {
          "$numberDecimal": "123.45"
          },
          "c": {
          "$date": "2018-11-26T00:47:03.033Z"
          }
          }


          Or the same thing on the original collection, just using --fields for selection:



          mongoexport --db test --collection mixed --fields a,b,c > out.json


          With exactly the same output. Being that the only restriction is the --query can only support a regular query expression as given to find() or similar. This is not as flexible as a View, but can do rudimentary filtering for most needs.



          The Extended JSON format is recognized by mongoimport and there are also implementations of parsers available for many languages which recognize this as well, and as the content is read it is inserted into the target collection with the "type" information preserved:



          mongoimport --db other --collection mixed out.json


          And then viewing the data:



          use other
          db.mixed.findOne()
          {
          "_id" : ObjectId("5bfb428790b2b4e4241a015c"),
          "a" : NumberLong(1),
          "b" : NumberDecimal("123.45"),
          "c" : ISODate("2018-11-26T00:47:03.033Z")
          }


          So it is possible and the Extended JSON format exists for the purpose of data interchange in situations where sending binary content may not be viable or even desirable, but maintaining the "type" information is desirable.





          Overall there are many options you can use without needing to revert to reading and writing binary BSON formats, or any other complex binary format to store the data in between transfers.



          As a note on the "vague" passage, the actual supported BSON types are listed within the Extended JSON page of the documentation. You can even compare this to the BSON Specification to see that despite the "cautious" statement the common types of data you will indeed use are actually all supported there. Whilst some external interpretations of that spec may not adhere to understanding ALL of them, the utilities bundled such as mongoexport and mongoimport are indeed compliant.






          share|improve this answer


























          • Thank you for your detailed explanation. So, correct me if I'm wrong, but the answer to my first question is that no, I don't lose any information by using JSON, because as you said "this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there". This means that a JSON dump file holds exactly the same information but it encodes it by taking up more space. So I can just expect a JSON dump file to be larger than its BSON equivalent dump file, right?

            – MikiTesi
            Nov 26 '18 at 12:46











          • As for my second question: is it possible to write my own Nodejs application to do the filtering and to output a dump file in BSON?

            – MikiTesi
            Nov 26 '18 at 12:47











          • For now I'll use your solution (that is, creaing a view and then dumping it with mongodump), but I'm still interested in knowing whether I could use a custom Nodejs application instead. The reason for this is that performance is important in my scenario, and I have a feeling that creating a view and subsequently dumping it with mongodump would take significantly longer than the execution of a Nodejs application that filters data and automatically writes it into a (BSON) dump file.

            – MikiTesi
            Nov 26 '18 at 12:49











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473151%2fmongodb-dump-filtering-documents-and-fields%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          It is possible to do using Views without resorting to writing a low level implementation reading and writing the BSON content. There are also options which do in fact preserve type even when using JSON formats, and you don't even need a "View" for that.



          Using Views with mongodump



          The basic premise is to create a View which only returns the content you want. A View can be the result of any aggregation pipeline expression.



          For example, given a simple document in a collection:



          db.test.insert({ "a": 1, "b": 2, "c": 3 })


          You can create the View on that collection, with just wanted fields:



          db.test.createView("testView", "test", [{ "$project": { "a": 1, "b": 2 } }])


          Then exiting the mongo shell you can access the View from mongodump using the --viewsAsCollections option:



          mongodump --db test --collection testView --viewsAsCollections


          This exports just the named "collection" ( actually a View ) only. The --viewsAsCollections means that instead of mongodump just returning the view definition ( being essentially the aggregation pipeline ) it returns the results instead just like it was a real collection.



          The resulting BSON content can then be loaded via mongorestore:



          mongorestore --db other --collection test


          Then the content from the BSON dump is actually written into the new database target of the host you are connecting to and with the specified collection name



          use other
          db.test.find()

          { "_id" : ObjectId("5bfb3e0eadd1d8af906ad140"), "a" : 1, "b" : 2 }


          Noting also that as a View, the aggregation pipeline can really be anything, so $match statements can filter and you can transform or even actually "aggregate" however you want.



          Using Views or --fields with mongoexport



          In much the same way, the mongoexport utility can also access the content from a View.



          Despite this not being "strict BSON", there is actually a standard with MongoDB which does in fact preserve the data types. This is actually covered in the documentation under MongoDB extended JSON.



          So this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there.



          For example:



          db.mixed.insert({
          "a": NumberLong(1),
          "b": NumberDecimal("123.45"),
          "c": new Date(),
          "d": "unwanted"
          })


          Which would appear in the mongo shell as:



          {
          "_id" : ObjectId("5bfb428790b2b4e4241a015c"),
          "a" : NumberLong(1),
          "b" : NumberDecimal("123.45"),
          "c" : ISODate("2018-11-26T00:47:03.033Z"),
          "d" : "unwanted"
          }


          You can still set up a View:



          db.createView("mixedView", "mixed", [{ "$project": { "a": 1, "b": 1, "c": 1 } }])


          And the export will just pick up the data:



          mongoexport --db test --collection mixedView > out.json

          {
          "_id": {
          "$oid": "5bfb428790b2b4e4241a015c"
          },
          "a": {
          "$numberLong": "1"
          },
          "b": {
          "$numberDecimal": "123.45"
          },
          "c": {
          "$date": "2018-11-26T00:47:03.033Z"
          }
          }


          Or the same thing on the original collection, just using --fields for selection:



          mongoexport --db test --collection mixed --fields a,b,c > out.json


          With exactly the same output. Being that the only restriction is the --query can only support a regular query expression as given to find() or similar. This is not as flexible as a View, but can do rudimentary filtering for most needs.



          The Extended JSON format is recognized by mongoimport and there are also implementations of parsers available for many languages which recognize this as well, and as the content is read it is inserted into the target collection with the "type" information preserved:



          mongoimport --db other --collection mixed out.json


          And then viewing the data:



          use other
          db.mixed.findOne()
          {
          "_id" : ObjectId("5bfb428790b2b4e4241a015c"),
          "a" : NumberLong(1),
          "b" : NumberDecimal("123.45"),
          "c" : ISODate("2018-11-26T00:47:03.033Z")
          }


          So it is possible and the Extended JSON format exists for the purpose of data interchange in situations where sending binary content may not be viable or even desirable, but maintaining the "type" information is desirable.





          Overall there are many options you can use without needing to revert to reading and writing binary BSON formats, or any other complex binary format to store the data in between transfers.



          As a note on the "vague" passage, the actual supported BSON types are listed within the Extended JSON page of the documentation. You can even compare this to the BSON Specification to see that despite the "cautious" statement the common types of data you will indeed use are actually all supported there. Whilst some external interpretations of that spec may not adhere to understanding ALL of them, the utilities bundled such as mongoexport and mongoimport are indeed compliant.






          share|improve this answer


























          • Thank you for your detailed explanation. So, correct me if I'm wrong, but the answer to my first question is that no, I don't lose any information by using JSON, because as you said "this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there". This means that a JSON dump file holds exactly the same information but it encodes it by taking up more space. So I can just expect a JSON dump file to be larger than its BSON equivalent dump file, right?

            – MikiTesi
            Nov 26 '18 at 12:46











          • As for my second question: is it possible to write my own Nodejs application to do the filtering and to output a dump file in BSON?

            – MikiTesi
            Nov 26 '18 at 12:47











          • For now I'll use your solution (that is, creaing a view and then dumping it with mongodump), but I'm still interested in knowing whether I could use a custom Nodejs application instead. The reason for this is that performance is important in my scenario, and I have a feeling that creating a view and subsequently dumping it with mongodump would take significantly longer than the execution of a Nodejs application that filters data and automatically writes it into a (BSON) dump file.

            – MikiTesi
            Nov 26 '18 at 12:49
















          3














          It is possible to do using Views without resorting to writing a low level implementation reading and writing the BSON content. There are also options which do in fact preserve type even when using JSON formats, and you don't even need a "View" for that.



          Using Views with mongodump



          The basic premise is to create a View which only returns the content you want. A View can be the result of any aggregation pipeline expression.



          For example, given a simple document in a collection:



          db.test.insert({ "a": 1, "b": 2, "c": 3 })


          You can create the View on that collection, with just wanted fields:



          db.test.createView("testView", "test", [{ "$project": { "a": 1, "b": 2 } }])


          Then exiting the mongo shell you can access the View from mongodump using the --viewsAsCollections option:



          mongodump --db test --collection testView --viewsAsCollections


          This exports just the named "collection" ( actually a View ) only. The --viewsAsCollections means that instead of mongodump just returning the view definition ( being essentially the aggregation pipeline ) it returns the results instead just like it was a real collection.



          The resulting BSON content can then be loaded via mongorestore:



          mongorestore --db other --collection test


          Then the content from the BSON dump is actually written into the new database target of the host you are connecting to and with the specified collection name



          use other
          db.test.find()

          { "_id" : ObjectId("5bfb3e0eadd1d8af906ad140"), "a" : 1, "b" : 2 }


          Noting also that as a View, the aggregation pipeline can really be anything, so $match statements can filter and you can transform or even actually "aggregate" however you want.



          Using Views or --fields with mongoexport



          In much the same way, the mongoexport utility can also access the content from a View.



          Despite this not being "strict BSON", there is actually a standard with MongoDB which does in fact preserve the data types. This is actually covered in the documentation under MongoDB extended JSON.



          So this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there.



          For example:



          db.mixed.insert({
          "a": NumberLong(1),
          "b": NumberDecimal("123.45"),
          "c": new Date(),
          "d": "unwanted"
          })


          Which would appear in the mongo shell as:



          {
          "_id" : ObjectId("5bfb428790b2b4e4241a015c"),
          "a" : NumberLong(1),
          "b" : NumberDecimal("123.45"),
          "c" : ISODate("2018-11-26T00:47:03.033Z"),
          "d" : "unwanted"
          }


          You can still set up a View:



          db.createView("mixedView", "mixed", [{ "$project": { "a": 1, "b": 1, "c": 1 } }])


          And the export will just pick up the data:



          mongoexport --db test --collection mixedView > out.json

          {
          "_id": {
          "$oid": "5bfb428790b2b4e4241a015c"
          },
          "a": {
          "$numberLong": "1"
          },
          "b": {
          "$numberDecimal": "123.45"
          },
          "c": {
          "$date": "2018-11-26T00:47:03.033Z"
          }
          }


          Or the same thing on the original collection, just using --fields for selection:



          mongoexport --db test --collection mixed --fields a,b,c > out.json


          With exactly the same output. Being that the only restriction is the --query can only support a regular query expression as given to find() or similar. This is not as flexible as a View, but can do rudimentary filtering for most needs.



          The Extended JSON format is recognized by mongoimport and there are also implementations of parsers available for many languages which recognize this as well, and as the content is read it is inserted into the target collection with the "type" information preserved:



          mongoimport --db other --collection mixed out.json


          And then viewing the data:



          use other
          db.mixed.findOne()
          {
          "_id" : ObjectId("5bfb428790b2b4e4241a015c"),
          "a" : NumberLong(1),
          "b" : NumberDecimal("123.45"),
          "c" : ISODate("2018-11-26T00:47:03.033Z")
          }


          So it is possible and the Extended JSON format exists for the purpose of data interchange in situations where sending binary content may not be viable or even desirable, but maintaining the "type" information is desirable.





          Overall there are many options you can use without needing to revert to reading and writing binary BSON formats, or any other complex binary format to store the data in between transfers.



          As a note on the "vague" passage, the actual supported BSON types are listed within the Extended JSON page of the documentation. You can even compare this to the BSON Specification to see that despite the "cautious" statement the common types of data you will indeed use are actually all supported there. Whilst some external interpretations of that spec may not adhere to understanding ALL of them, the utilities bundled such as mongoexport and mongoimport are indeed compliant.






          share|improve this answer


























          • Thank you for your detailed explanation. So, correct me if I'm wrong, but the answer to my first question is that no, I don't lose any information by using JSON, because as you said "this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there". This means that a JSON dump file holds exactly the same information but it encodes it by taking up more space. So I can just expect a JSON dump file to be larger than its BSON equivalent dump file, right?

            – MikiTesi
            Nov 26 '18 at 12:46











          • As for my second question: is it possible to write my own Nodejs application to do the filtering and to output a dump file in BSON?

            – MikiTesi
            Nov 26 '18 at 12:47











          • For now I'll use your solution (that is, creaing a view and then dumping it with mongodump), but I'm still interested in knowing whether I could use a custom Nodejs application instead. The reason for this is that performance is important in my scenario, and I have a feeling that creating a view and subsequently dumping it with mongodump would take significantly longer than the execution of a Nodejs application that filters data and automatically writes it into a (BSON) dump file.

            – MikiTesi
            Nov 26 '18 at 12:49














          3












          3








          3







          It is possible to do using Views without resorting to writing a low level implementation reading and writing the BSON content. There are also options which do in fact preserve type even when using JSON formats, and you don't even need a "View" for that.



          Using Views with mongodump



          The basic premise is to create a View which only returns the content you want. A View can be the result of any aggregation pipeline expression.



          For example, given a simple document in a collection:



          db.test.insert({ "a": 1, "b": 2, "c": 3 })


          You can create the View on that collection, with just wanted fields:



          db.test.createView("testView", "test", [{ "$project": { "a": 1, "b": 2 } }])


          Then exiting the mongo shell you can access the View from mongodump using the --viewsAsCollections option:



          mongodump --db test --collection testView --viewsAsCollections


          This exports just the named "collection" ( actually a View ) only. The --viewsAsCollections means that instead of mongodump just returning the view definition ( being essentially the aggregation pipeline ) it returns the results instead just like it was a real collection.



          The resulting BSON content can then be loaded via mongorestore:



          mongorestore --db other --collection test


          Then the content from the BSON dump is actually written into the new database target of the host you are connecting to and with the specified collection name



          use other
          db.test.find()

          { "_id" : ObjectId("5bfb3e0eadd1d8af906ad140"), "a" : 1, "b" : 2 }


          Noting also that as a View, the aggregation pipeline can really be anything, so $match statements can filter and you can transform or even actually "aggregate" however you want.



          Using Views or --fields with mongoexport



          In much the same way, the mongoexport utility can also access the content from a View.



          Despite this not being "strict BSON", there is actually a standard with MongoDB which does in fact preserve the data types. This is actually covered in the documentation under MongoDB extended JSON.



          So this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there.



          For example:



          db.mixed.insert({
          "a": NumberLong(1),
          "b": NumberDecimal("123.45"),
          "c": new Date(),
          "d": "unwanted"
          })


          Which would appear in the mongo shell as:



          {
          "_id" : ObjectId("5bfb428790b2b4e4241a015c"),
          "a" : NumberLong(1),
          "b" : NumberDecimal("123.45"),
          "c" : ISODate("2018-11-26T00:47:03.033Z"),
          "d" : "unwanted"
          }


          You can still set up a View:



          db.createView("mixedView", "mixed", [{ "$project": { "a": 1, "b": 1, "c": 1 } }])


          And the export will just pick up the data:



          mongoexport --db test --collection mixedView > out.json

          {
          "_id": {
          "$oid": "5bfb428790b2b4e4241a015c"
          },
          "a": {
          "$numberLong": "1"
          },
          "b": {
          "$numberDecimal": "123.45"
          },
          "c": {
          "$date": "2018-11-26T00:47:03.033Z"
          }
          }


          Or the same thing on the original collection, just using --fields for selection:



          mongoexport --db test --collection mixed --fields a,b,c > out.json


          With exactly the same output. Being that the only restriction is the --query can only support a regular query expression as given to find() or similar. This is not as flexible as a View, but can do rudimentary filtering for most needs.



          The Extended JSON format is recognized by mongoimport and there are also implementations of parsers available for many languages which recognize this as well, and as the content is read it is inserted into the target collection with the "type" information preserved:



          mongoimport --db other --collection mixed out.json


          And then viewing the data:



          use other
          db.mixed.findOne()
          {
          "_id" : ObjectId("5bfb428790b2b4e4241a015c"),
          "a" : NumberLong(1),
          "b" : NumberDecimal("123.45"),
          "c" : ISODate("2018-11-26T00:47:03.033Z")
          }


          So it is possible and the Extended JSON format exists for the purpose of data interchange in situations where sending binary content may not be viable or even desirable, but maintaining the "type" information is desirable.





          Overall there are many options you can use without needing to revert to reading and writing binary BSON formats, or any other complex binary format to store the data in between transfers.



          As a note on the "vague" passage, the actual supported BSON types are listed within the Extended JSON page of the documentation. You can even compare this to the BSON Specification to see that despite the "cautious" statement the common types of data you will indeed use are actually all supported there. Whilst some external interpretations of that spec may not adhere to understanding ALL of them, the utilities bundled such as mongoexport and mongoimport are indeed compliant.






          share|improve this answer















          It is possible to do using Views without resorting to writing a low level implementation reading and writing the BSON content. There are also options which do in fact preserve type even when using JSON formats, and you don't even need a "View" for that.



          Using Views with mongodump



          The basic premise is to create a View which only returns the content you want. A View can be the result of any aggregation pipeline expression.



          For example, given a simple document in a collection:



          db.test.insert({ "a": 1, "b": 2, "c": 3 })


          You can create the View on that collection, with just wanted fields:



          db.test.createView("testView", "test", [{ "$project": { "a": 1, "b": 2 } }])


          Then exiting the mongo shell you can access the View from mongodump using the --viewsAsCollections option:



          mongodump --db test --collection testView --viewsAsCollections


          This exports just the named "collection" ( actually a View ) only. The --viewsAsCollections means that instead of mongodump just returning the view definition ( being essentially the aggregation pipeline ) it returns the results instead just like it was a real collection.



          The resulting BSON content can then be loaded via mongorestore:



          mongorestore --db other --collection test


          Then the content from the BSON dump is actually written into the new database target of the host you are connecting to and with the specified collection name



          use other
          db.test.find()

          { "_id" : ObjectId("5bfb3e0eadd1d8af906ad140"), "a" : 1, "b" : 2 }


          Noting also that as a View, the aggregation pipeline can really be anything, so $match statements can filter and you can transform or even actually "aggregate" however you want.



          Using Views or --fields with mongoexport



          In much the same way, the mongoexport utility can also access the content from a View.



          Despite this not being "strict BSON", there is actually a standard with MongoDB which does in fact preserve the data types. This is actually covered in the documentation under MongoDB extended JSON.



          So this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there.



          For example:



          db.mixed.insert({
          "a": NumberLong(1),
          "b": NumberDecimal("123.45"),
          "c": new Date(),
          "d": "unwanted"
          })


          Which would appear in the mongo shell as:



          {
          "_id" : ObjectId("5bfb428790b2b4e4241a015c"),
          "a" : NumberLong(1),
          "b" : NumberDecimal("123.45"),
          "c" : ISODate("2018-11-26T00:47:03.033Z"),
          "d" : "unwanted"
          }


          You can still set up a View:



          db.createView("mixedView", "mixed", [{ "$project": { "a": 1, "b": 1, "c": 1 } }])


          And the export will just pick up the data:



          mongoexport --db test --collection mixedView > out.json

          {
          "_id": {
          "$oid": "5bfb428790b2b4e4241a015c"
          },
          "a": {
          "$numberLong": "1"
          },
          "b": {
          "$numberDecimal": "123.45"
          },
          "c": {
          "$date": "2018-11-26T00:47:03.033Z"
          }
          }


          Or the same thing on the original collection, just using --fields for selection:



          mongoexport --db test --collection mixed --fields a,b,c > out.json


          With exactly the same output. Being that the only restriction is the --query can only support a regular query expression as given to find() or similar. This is not as flexible as a View, but can do rudimentary filtering for most needs.



          The Extended JSON format is recognized by mongoimport and there are also implementations of parsers available for many languages which recognize this as well, and as the content is read it is inserted into the target collection with the "type" information preserved:



          mongoimport --db other --collection mixed out.json


          And then viewing the data:



          use other
          db.mixed.findOne()
          {
          "_id" : ObjectId("5bfb428790b2b4e4241a015c"),
          "a" : NumberLong(1),
          "b" : NumberDecimal("123.45"),
          "c" : ISODate("2018-11-26T00:47:03.033Z")
          }


          So it is possible and the Extended JSON format exists for the purpose of data interchange in situations where sending binary content may not be viable or even desirable, but maintaining the "type" information is desirable.





          Overall there are many options you can use without needing to revert to reading and writing binary BSON formats, or any other complex binary format to store the data in between transfers.



          As a note on the "vague" passage, the actual supported BSON types are listed within the Extended JSON page of the documentation. You can even compare this to the BSON Specification to see that despite the "cautious" statement the common types of data you will indeed use are actually all supported there. Whilst some external interpretations of that spec may not adhere to understanding ALL of them, the utilities bundled such as mongoexport and mongoimport are indeed compliant.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 26 '18 at 7:09

























          answered Nov 26 '18 at 1:11









          Neil LunnNeil Lunn

          100k23178187




          100k23178187













          • Thank you for your detailed explanation. So, correct me if I'm wrong, but the answer to my first question is that no, I don't lose any information by using JSON, because as you said "this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there". This means that a JSON dump file holds exactly the same information but it encodes it by taking up more space. So I can just expect a JSON dump file to be larger than its BSON equivalent dump file, right?

            – MikiTesi
            Nov 26 '18 at 12:46











          • As for my second question: is it possible to write my own Nodejs application to do the filtering and to output a dump file in BSON?

            – MikiTesi
            Nov 26 '18 at 12:47











          • For now I'll use your solution (that is, creaing a view and then dumping it with mongodump), but I'm still interested in knowing whether I could use a custom Nodejs application instead. The reason for this is that performance is important in my scenario, and I have a feeling that creating a view and subsequently dumping it with mongodump would take significantly longer than the execution of a Nodejs application that filters data and automatically writes it into a (BSON) dump file.

            – MikiTesi
            Nov 26 '18 at 12:49



















          • Thank you for your detailed explanation. So, correct me if I'm wrong, but the answer to my first question is that no, I don't lose any information by using JSON, because as you said "this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there". This means that a JSON dump file holds exactly the same information but it encodes it by taking up more space. So I can just expect a JSON dump file to be larger than its BSON equivalent dump file, right?

            – MikiTesi
            Nov 26 '18 at 12:46











          • As for my second question: is it possible to write my own Nodejs application to do the filtering and to output a dump file in BSON?

            – MikiTesi
            Nov 26 '18 at 12:47











          • For now I'll use your solution (that is, creaing a view and then dumping it with mongodump), but I'm still interested in knowing whether I could use a custom Nodejs application instead. The reason for this is that performance is important in my scenario, and I have a feeling that creating a view and subsequently dumping it with mongodump would take significantly longer than the execution of a Nodejs application that filters data and automatically writes it into a (BSON) dump file.

            – MikiTesi
            Nov 26 '18 at 12:49

















          Thank you for your detailed explanation. So, correct me if I'm wrong, but the answer to my first question is that no, I don't lose any information by using JSON, because as you said "this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there". This means that a JSON dump file holds exactly the same information but it encodes it by taking up more space. So I can just expect a JSON dump file to be larger than its BSON equivalent dump file, right?

          – MikiTesi
          Nov 26 '18 at 12:46





          Thank you for your detailed explanation. So, correct me if I'm wrong, but the answer to my first question is that no, I don't lose any information by using JSON, because as you said "this is NOT a Binary format, and as JSON it does take considerably more storage space but the necessary information is indeed there". This means that a JSON dump file holds exactly the same information but it encodes it by taking up more space. So I can just expect a JSON dump file to be larger than its BSON equivalent dump file, right?

          – MikiTesi
          Nov 26 '18 at 12:46













          As for my second question: is it possible to write my own Nodejs application to do the filtering and to output a dump file in BSON?

          – MikiTesi
          Nov 26 '18 at 12:47





          As for my second question: is it possible to write my own Nodejs application to do the filtering and to output a dump file in BSON?

          – MikiTesi
          Nov 26 '18 at 12:47













          For now I'll use your solution (that is, creaing a view and then dumping it with mongodump), but I'm still interested in knowing whether I could use a custom Nodejs application instead. The reason for this is that performance is important in my scenario, and I have a feeling that creating a view and subsequently dumping it with mongodump would take significantly longer than the execution of a Nodejs application that filters data and automatically writes it into a (BSON) dump file.

          – MikiTesi
          Nov 26 '18 at 12:49





          For now I'll use your solution (that is, creaing a view and then dumping it with mongodump), but I'm still interested in knowing whether I could use a custom Nodejs application instead. The reason for this is that performance is important in my scenario, and I have a feeling that creating a view and subsequently dumping it with mongodump would take significantly longer than the execution of a Nodejs application that filters data and automatically writes it into a (BSON) dump file.

          – MikiTesi
          Nov 26 '18 at 12:49




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473151%2fmongodb-dump-filtering-documents-and-fields%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Tonle Sap (See)

          I get strange results when I access the Sqlitedatabase with Unity C# via XAMPP

          Guatemaltekische Davis-Cup-Mannschaft