Cross JOIN collections and GroupBy CosmosDB Javascript API
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am searching for a solution in the Javascript API for CosmosDB, where you can perform an INNER/OUTER JOIN between two document collections.
I have been unsuccessful.
From my understanding, Javascript Stored Procedures run within a collection, and cannot access/reference data in another collection.
If the above is true, where does this leave our application's datasource that has been designed in a relational way? If Business requires a immediate query, to collect pthe following data:
All agreements/contracts that has been migrated to a new product offering, within a specific region, for a given time frame. How would I go about this query, if there are about 5 collections containing all infromation related to this query?
Any guidance?
UPDATE
Customer
{
"id": "d02e6668-ce24-455d-b241-32835bb2dcb5",
"Name": "Test User One",
"Surname": "Test"
}
Agreement
{
"id": "ee1094bd-16f4-45ec-9f5e-7ecd91d4e729",
"CustomerId": "d02e6668-ce24-455d-b241-32835bb2dcb5"
"RetailProductVersionInstance":
[
{
"id": "8ce31e7c-7b1a-4221-89a3-449ae4fd6622",
"RetailProductVersionId": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
"AgreementInstanceUser": {
"FirstName": "Luke",
"LastName": "Pothier",
"AgreementUserTypeId": ""
},
"AgreementInstanceMSISDN": {
"IsoCountryDialingCode": null,
"PhoneNumber": "0839263922",
"NetworkOperatorId": "30303728-9983-47f9-a494-1de853d66254"
},
"RetailProductVersionInstanceState": "IN USE",
"IsPrimaryRetailProduct": true,
"RetailProductVersionInstancePhysicalItems": [
{
"id": "f8090aba-f06b-4233-9f9e-eb2567a20afe",
"PhysicalItemId": "75f64ab3-81d2-f600-6acb-d37da216846f",
"RetailProductVersionInstancePhysicalItemNumbers": [
{
"id": "9905058b-8369-4a64-b9a5-e17e28750fba",
"PhysicalItemNumberTypeId": "39226b5a-429b-4634-bbce-2213974e5bab",
"PhysicalItemNumberValue": "KJDS959405"
},
{
"id": "1fe09dd2-fb8a-49b3-99e6-8c51df10adb1",
"PhysicalItemNumberTypeId": "960a1750-64be-4333-9a7f-c8da419d670a",
"PhysicalItemNumberValue": "DJDJ94943"
}
],
"RetailProductVersionInstancePhysicalItemState": "IN USE",
"DateCreatedUtc": "2018-11-21T13:55:00Z",
"DateUpdatedUtc": "2020-11-21T13:55:00Z"
}
]
}
]
}
RetailProduct
{
"id": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
"FriendlyName": "Data-Package 100GB",
"WholeSaleProductId": "d054dae5-173d-478b-bb0e-7516e6a24476"
}
WholeSaleProduct:
{
"id": "d054dae5-173d-478b-bb0e-7516e6a24476",
"ProductName": "Data 100",
"ProviderLiabilities":
}
Above, I have added some sample documentation.
Relationships:
- Agreement.CustomerId links to Customer.id
- Agreement.RetailProductVersionInstance.RetailProductVersionId links
to RetailProduct.id - RetailProduct.WholeSaleProductId links to WholeSaleProduct.id
How, would I write a Javascript Stored Procedure, in CosmosDB, to perform joins between these 4 collections?
join
add a comment |
I am searching for a solution in the Javascript API for CosmosDB, where you can perform an INNER/OUTER JOIN between two document collections.
I have been unsuccessful.
From my understanding, Javascript Stored Procedures run within a collection, and cannot access/reference data in another collection.
If the above is true, where does this leave our application's datasource that has been designed in a relational way? If Business requires a immediate query, to collect pthe following data:
All agreements/contracts that has been migrated to a new product offering, within a specific region, for a given time frame. How would I go about this query, if there are about 5 collections containing all infromation related to this query?
Any guidance?
UPDATE
Customer
{
"id": "d02e6668-ce24-455d-b241-32835bb2dcb5",
"Name": "Test User One",
"Surname": "Test"
}
Agreement
{
"id": "ee1094bd-16f4-45ec-9f5e-7ecd91d4e729",
"CustomerId": "d02e6668-ce24-455d-b241-32835bb2dcb5"
"RetailProductVersionInstance":
[
{
"id": "8ce31e7c-7b1a-4221-89a3-449ae4fd6622",
"RetailProductVersionId": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
"AgreementInstanceUser": {
"FirstName": "Luke",
"LastName": "Pothier",
"AgreementUserTypeId": ""
},
"AgreementInstanceMSISDN": {
"IsoCountryDialingCode": null,
"PhoneNumber": "0839263922",
"NetworkOperatorId": "30303728-9983-47f9-a494-1de853d66254"
},
"RetailProductVersionInstanceState": "IN USE",
"IsPrimaryRetailProduct": true,
"RetailProductVersionInstancePhysicalItems": [
{
"id": "f8090aba-f06b-4233-9f9e-eb2567a20afe",
"PhysicalItemId": "75f64ab3-81d2-f600-6acb-d37da216846f",
"RetailProductVersionInstancePhysicalItemNumbers": [
{
"id": "9905058b-8369-4a64-b9a5-e17e28750fba",
"PhysicalItemNumberTypeId": "39226b5a-429b-4634-bbce-2213974e5bab",
"PhysicalItemNumberValue": "KJDS959405"
},
{
"id": "1fe09dd2-fb8a-49b3-99e6-8c51df10adb1",
"PhysicalItemNumberTypeId": "960a1750-64be-4333-9a7f-c8da419d670a",
"PhysicalItemNumberValue": "DJDJ94943"
}
],
"RetailProductVersionInstancePhysicalItemState": "IN USE",
"DateCreatedUtc": "2018-11-21T13:55:00Z",
"DateUpdatedUtc": "2020-11-21T13:55:00Z"
}
]
}
]
}
RetailProduct
{
"id": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
"FriendlyName": "Data-Package 100GB",
"WholeSaleProductId": "d054dae5-173d-478b-bb0e-7516e6a24476"
}
WholeSaleProduct:
{
"id": "d054dae5-173d-478b-bb0e-7516e6a24476",
"ProductName": "Data 100",
"ProviderLiabilities":
}
Above, I have added some sample documentation.
Relationships:
- Agreement.CustomerId links to Customer.id
- Agreement.RetailProductVersionInstance.RetailProductVersionId links
to RetailProduct.id - RetailProduct.WholeSaleProductId links to WholeSaleProduct.id
How, would I write a Javascript Stored Procedure, in CosmosDB, to perform joins between these 4 collections?
join
Without seeing specific schema details, there's really no way to help with a query. But suffice it to say that, if you've spread your data across 5 collections, then you'll need at least 5 queries. Please edit your question to show samples of data in each collection, along with the queries that you've built so far, issues you're running into, etc. Also, FYI stored procedures are scoped to a single partition within a collection. FYI #2 you can also optimize your searching by 1) storing multiple document types in the same collection and 2) denormalizing data into your documents.
– David Makogon
Nov 26 '18 at 23:41
1
Possible duplicate of CosmosDB Join (SQL API)
– Chris Anderson-MSFT
Nov 27 '18 at 2:22
As already stated: a stored procedure cannot span multiple collections, nor can a SQL query outside of a stored procedure. The way you've shaped your data, you'll need to either write multiple queries or refactor your data model. Ok, you can also force all data into a single partition, but I don't see how that would work with your schema. As far as how to create multiple queries: that's too broad a question. If you need help with a specific query, then edit your question to include it, and show where you're having difficulty.
– David Makogon
Nov 27 '18 at 7:37
Also - good catch by @ChrisAnderson-MSFT regarding the duplicate (as the notion of cross-collection querying has already been addressed elsewhere).
– David Makogon
Nov 27 '18 at 7:38
add a comment |
I am searching for a solution in the Javascript API for CosmosDB, where you can perform an INNER/OUTER JOIN between two document collections.
I have been unsuccessful.
From my understanding, Javascript Stored Procedures run within a collection, and cannot access/reference data in another collection.
If the above is true, where does this leave our application's datasource that has been designed in a relational way? If Business requires a immediate query, to collect pthe following data:
All agreements/contracts that has been migrated to a new product offering, within a specific region, for a given time frame. How would I go about this query, if there are about 5 collections containing all infromation related to this query?
Any guidance?
UPDATE
Customer
{
"id": "d02e6668-ce24-455d-b241-32835bb2dcb5",
"Name": "Test User One",
"Surname": "Test"
}
Agreement
{
"id": "ee1094bd-16f4-45ec-9f5e-7ecd91d4e729",
"CustomerId": "d02e6668-ce24-455d-b241-32835bb2dcb5"
"RetailProductVersionInstance":
[
{
"id": "8ce31e7c-7b1a-4221-89a3-449ae4fd6622",
"RetailProductVersionId": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
"AgreementInstanceUser": {
"FirstName": "Luke",
"LastName": "Pothier",
"AgreementUserTypeId": ""
},
"AgreementInstanceMSISDN": {
"IsoCountryDialingCode": null,
"PhoneNumber": "0839263922",
"NetworkOperatorId": "30303728-9983-47f9-a494-1de853d66254"
},
"RetailProductVersionInstanceState": "IN USE",
"IsPrimaryRetailProduct": true,
"RetailProductVersionInstancePhysicalItems": [
{
"id": "f8090aba-f06b-4233-9f9e-eb2567a20afe",
"PhysicalItemId": "75f64ab3-81d2-f600-6acb-d37da216846f",
"RetailProductVersionInstancePhysicalItemNumbers": [
{
"id": "9905058b-8369-4a64-b9a5-e17e28750fba",
"PhysicalItemNumberTypeId": "39226b5a-429b-4634-bbce-2213974e5bab",
"PhysicalItemNumberValue": "KJDS959405"
},
{
"id": "1fe09dd2-fb8a-49b3-99e6-8c51df10adb1",
"PhysicalItemNumberTypeId": "960a1750-64be-4333-9a7f-c8da419d670a",
"PhysicalItemNumberValue": "DJDJ94943"
}
],
"RetailProductVersionInstancePhysicalItemState": "IN USE",
"DateCreatedUtc": "2018-11-21T13:55:00Z",
"DateUpdatedUtc": "2020-11-21T13:55:00Z"
}
]
}
]
}
RetailProduct
{
"id": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
"FriendlyName": "Data-Package 100GB",
"WholeSaleProductId": "d054dae5-173d-478b-bb0e-7516e6a24476"
}
WholeSaleProduct:
{
"id": "d054dae5-173d-478b-bb0e-7516e6a24476",
"ProductName": "Data 100",
"ProviderLiabilities":
}
Above, I have added some sample documentation.
Relationships:
- Agreement.CustomerId links to Customer.id
- Agreement.RetailProductVersionInstance.RetailProductVersionId links
to RetailProduct.id - RetailProduct.WholeSaleProductId links to WholeSaleProduct.id
How, would I write a Javascript Stored Procedure, in CosmosDB, to perform joins between these 4 collections?
join
I am searching for a solution in the Javascript API for CosmosDB, where you can perform an INNER/OUTER JOIN between two document collections.
I have been unsuccessful.
From my understanding, Javascript Stored Procedures run within a collection, and cannot access/reference data in another collection.
If the above is true, where does this leave our application's datasource that has been designed in a relational way? If Business requires a immediate query, to collect pthe following data:
All agreements/contracts that has been migrated to a new product offering, within a specific region, for a given time frame. How would I go about this query, if there are about 5 collections containing all infromation related to this query?
Any guidance?
UPDATE
Customer
{
"id": "d02e6668-ce24-455d-b241-32835bb2dcb5",
"Name": "Test User One",
"Surname": "Test"
}
Agreement
{
"id": "ee1094bd-16f4-45ec-9f5e-7ecd91d4e729",
"CustomerId": "d02e6668-ce24-455d-b241-32835bb2dcb5"
"RetailProductVersionInstance":
[
{
"id": "8ce31e7c-7b1a-4221-89a3-449ae4fd6622",
"RetailProductVersionId": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
"AgreementInstanceUser": {
"FirstName": "Luke",
"LastName": "Pothier",
"AgreementUserTypeId": ""
},
"AgreementInstanceMSISDN": {
"IsoCountryDialingCode": null,
"PhoneNumber": "0839263922",
"NetworkOperatorId": "30303728-9983-47f9-a494-1de853d66254"
},
"RetailProductVersionInstanceState": "IN USE",
"IsPrimaryRetailProduct": true,
"RetailProductVersionInstancePhysicalItems": [
{
"id": "f8090aba-f06b-4233-9f9e-eb2567a20afe",
"PhysicalItemId": "75f64ab3-81d2-f600-6acb-d37da216846f",
"RetailProductVersionInstancePhysicalItemNumbers": [
{
"id": "9905058b-8369-4a64-b9a5-e17e28750fba",
"PhysicalItemNumberTypeId": "39226b5a-429b-4634-bbce-2213974e5bab",
"PhysicalItemNumberValue": "KJDS959405"
},
{
"id": "1fe09dd2-fb8a-49b3-99e6-8c51df10adb1",
"PhysicalItemNumberTypeId": "960a1750-64be-4333-9a7f-c8da419d670a",
"PhysicalItemNumberValue": "DJDJ94943"
}
],
"RetailProductVersionInstancePhysicalItemState": "IN USE",
"DateCreatedUtc": "2018-11-21T13:55:00Z",
"DateUpdatedUtc": "2020-11-21T13:55:00Z"
}
]
}
]
}
RetailProduct
{
"id": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
"FriendlyName": "Data-Package 100GB",
"WholeSaleProductId": "d054dae5-173d-478b-bb0e-7516e6a24476"
}
WholeSaleProduct:
{
"id": "d054dae5-173d-478b-bb0e-7516e6a24476",
"ProductName": "Data 100",
"ProviderLiabilities":
}
Above, I have added some sample documentation.
Relationships:
- Agreement.CustomerId links to Customer.id
- Agreement.RetailProductVersionInstance.RetailProductVersionId links
to RetailProduct.id - RetailProduct.WholeSaleProductId links to WholeSaleProduct.id
How, would I write a Javascript Stored Procedure, in CosmosDB, to perform joins between these 4 collections?
join
join
edited Nov 27 '18 at 5:01
monstertjie_za
asked Nov 26 '18 at 19:20
monstertjie_zamonstertjie_za
2,16841436
2,16841436
Without seeing specific schema details, there's really no way to help with a query. But suffice it to say that, if you've spread your data across 5 collections, then you'll need at least 5 queries. Please edit your question to show samples of data in each collection, along with the queries that you've built so far, issues you're running into, etc. Also, FYI stored procedures are scoped to a single partition within a collection. FYI #2 you can also optimize your searching by 1) storing multiple document types in the same collection and 2) denormalizing data into your documents.
– David Makogon
Nov 26 '18 at 23:41
1
Possible duplicate of CosmosDB Join (SQL API)
– Chris Anderson-MSFT
Nov 27 '18 at 2:22
As already stated: a stored procedure cannot span multiple collections, nor can a SQL query outside of a stored procedure. The way you've shaped your data, you'll need to either write multiple queries or refactor your data model. Ok, you can also force all data into a single partition, but I don't see how that would work with your schema. As far as how to create multiple queries: that's too broad a question. If you need help with a specific query, then edit your question to include it, and show where you're having difficulty.
– David Makogon
Nov 27 '18 at 7:37
Also - good catch by @ChrisAnderson-MSFT regarding the duplicate (as the notion of cross-collection querying has already been addressed elsewhere).
– David Makogon
Nov 27 '18 at 7:38
add a comment |
Without seeing specific schema details, there's really no way to help with a query. But suffice it to say that, if you've spread your data across 5 collections, then you'll need at least 5 queries. Please edit your question to show samples of data in each collection, along with the queries that you've built so far, issues you're running into, etc. Also, FYI stored procedures are scoped to a single partition within a collection. FYI #2 you can also optimize your searching by 1) storing multiple document types in the same collection and 2) denormalizing data into your documents.
– David Makogon
Nov 26 '18 at 23:41
1
Possible duplicate of CosmosDB Join (SQL API)
– Chris Anderson-MSFT
Nov 27 '18 at 2:22
As already stated: a stored procedure cannot span multiple collections, nor can a SQL query outside of a stored procedure. The way you've shaped your data, you'll need to either write multiple queries or refactor your data model. Ok, you can also force all data into a single partition, but I don't see how that would work with your schema. As far as how to create multiple queries: that's too broad a question. If you need help with a specific query, then edit your question to include it, and show where you're having difficulty.
– David Makogon
Nov 27 '18 at 7:37
Also - good catch by @ChrisAnderson-MSFT regarding the duplicate (as the notion of cross-collection querying has already been addressed elsewhere).
– David Makogon
Nov 27 '18 at 7:38
Without seeing specific schema details, there's really no way to help with a query. But suffice it to say that, if you've spread your data across 5 collections, then you'll need at least 5 queries. Please edit your question to show samples of data in each collection, along with the queries that you've built so far, issues you're running into, etc. Also, FYI stored procedures are scoped to a single partition within a collection. FYI #2 you can also optimize your searching by 1) storing multiple document types in the same collection and 2) denormalizing data into your documents.
– David Makogon
Nov 26 '18 at 23:41
Without seeing specific schema details, there's really no way to help with a query. But suffice it to say that, if you've spread your data across 5 collections, then you'll need at least 5 queries. Please edit your question to show samples of data in each collection, along with the queries that you've built so far, issues you're running into, etc. Also, FYI stored procedures are scoped to a single partition within a collection. FYI #2 you can also optimize your searching by 1) storing multiple document types in the same collection and 2) denormalizing data into your documents.
– David Makogon
Nov 26 '18 at 23:41
1
1
Possible duplicate of CosmosDB Join (SQL API)
– Chris Anderson-MSFT
Nov 27 '18 at 2:22
Possible duplicate of CosmosDB Join (SQL API)
– Chris Anderson-MSFT
Nov 27 '18 at 2:22
As already stated: a stored procedure cannot span multiple collections, nor can a SQL query outside of a stored procedure. The way you've shaped your data, you'll need to either write multiple queries or refactor your data model. Ok, you can also force all data into a single partition, but I don't see how that would work with your schema. As far as how to create multiple queries: that's too broad a question. If you need help with a specific query, then edit your question to include it, and show where you're having difficulty.
– David Makogon
Nov 27 '18 at 7:37
As already stated: a stored procedure cannot span multiple collections, nor can a SQL query outside of a stored procedure. The way you've shaped your data, you'll need to either write multiple queries or refactor your data model. Ok, you can also force all data into a single partition, but I don't see how that would work with your schema. As far as how to create multiple queries: that's too broad a question. If you need help with a specific query, then edit your question to include it, and show where you're having difficulty.
– David Makogon
Nov 27 '18 at 7:37
Also - good catch by @ChrisAnderson-MSFT regarding the duplicate (as the notion of cross-collection querying has already been addressed elsewhere).
– David Makogon
Nov 27 '18 at 7:38
Also - good catch by @ChrisAnderson-MSFT regarding the duplicate (as the notion of cross-collection querying has already been addressed elsewhere).
– David Makogon
Nov 27 '18 at 7:38
add a comment |
1 Answer
1
active
oldest
votes
Short answer is that you cannot perform joins between different collections via SQL in Cosmos DB.
Generally, the solution to this type of question is multiple queries or different schema. In your scenario, if you can denormalize your schema into one collection without duplicating data, then it is easy.
If you provide your schemas, it'd be possible to provide a more comprehensive answer.
-- Edit 1 --
Stored Procedures are only good candidates for operations that require multiple operations on the same collection + partition key. This makes them good for bulk insert/delete/update, transactions (which need at least a read and a write), and a few other things. They aren't good for CPU intensive things, but rather things that would normally be IO bound by network latency. They aren't possible to use for cross partition or cross collection scenarios. In those cases, you must perform the operations exclusively from the remote client.
In your case, it's a fairly straightforward 2 + 2N separate reads, where N is the number of products. You need to read the agreement first. Then you can look up the customer and the product records in parallel, and then you can look up the wholesale record last, so you should have a latency of 3s + C, where s is the average duration of a given read request and C is some constant CPU time to perform the join/issue the request/etc.
It's worth considering whether you can consolidate RetailProduct and WholeSale product into a single record where Wholesale contains all the RetailProducts in an array, or as separate documents, partitioned by the wholesale id, with a well known id that contained the Wholesale product info in a separate document. That would reduce your latency by 1 third. If you go with the partitioning by wholesale id idea, you could write 1 query for any records that shared a wholesale id, so you'd get 2 + log(N) reads, but the same effective latency. For that strategy, you'd store a composite index of "wholesaleid+productid" in the agreement. One issue to worry about is that it duplicates the wholesale+product relationship, but as long as that relationship doesn't change, I don't think there is anything to worry about and it provides a good optimization for info lookup.
I will modify my answer to include my collections etc...You make reference to SQL API, but I am aware of the limitations of the SQL API in Cosmos. My question was more focues on Javascript Stored Procedures within Cosmos
– monstertjie_za
Nov 27 '18 at 4:17
Stored procedures are also scoped to the collection, and, additionally, the partition key.
– Chris Anderson-MSFT
Nov 28 '18 at 1:22
Expanded my original answer
– Chris Anderson-MSFT
Nov 28 '18 at 1:40
Good explanation, thanks for the extensive answer
– monstertjie_za
Nov 28 '18 at 6:52
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487690%2fcross-join-collections-and-groupby-cosmosdb-javascript-api%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Short answer is that you cannot perform joins between different collections via SQL in Cosmos DB.
Generally, the solution to this type of question is multiple queries or different schema. In your scenario, if you can denormalize your schema into one collection without duplicating data, then it is easy.
If you provide your schemas, it'd be possible to provide a more comprehensive answer.
-- Edit 1 --
Stored Procedures are only good candidates for operations that require multiple operations on the same collection + partition key. This makes them good for bulk insert/delete/update, transactions (which need at least a read and a write), and a few other things. They aren't good for CPU intensive things, but rather things that would normally be IO bound by network latency. They aren't possible to use for cross partition or cross collection scenarios. In those cases, you must perform the operations exclusively from the remote client.
In your case, it's a fairly straightforward 2 + 2N separate reads, where N is the number of products. You need to read the agreement first. Then you can look up the customer and the product records in parallel, and then you can look up the wholesale record last, so you should have a latency of 3s + C, where s is the average duration of a given read request and C is some constant CPU time to perform the join/issue the request/etc.
It's worth considering whether you can consolidate RetailProduct and WholeSale product into a single record where Wholesale contains all the RetailProducts in an array, or as separate documents, partitioned by the wholesale id, with a well known id that contained the Wholesale product info in a separate document. That would reduce your latency by 1 third. If you go with the partitioning by wholesale id idea, you could write 1 query for any records that shared a wholesale id, so you'd get 2 + log(N) reads, but the same effective latency. For that strategy, you'd store a composite index of "wholesaleid+productid" in the agreement. One issue to worry about is that it duplicates the wholesale+product relationship, but as long as that relationship doesn't change, I don't think there is anything to worry about and it provides a good optimization for info lookup.
I will modify my answer to include my collections etc...You make reference to SQL API, but I am aware of the limitations of the SQL API in Cosmos. My question was more focues on Javascript Stored Procedures within Cosmos
– monstertjie_za
Nov 27 '18 at 4:17
Stored procedures are also scoped to the collection, and, additionally, the partition key.
– Chris Anderson-MSFT
Nov 28 '18 at 1:22
Expanded my original answer
– Chris Anderson-MSFT
Nov 28 '18 at 1:40
Good explanation, thanks for the extensive answer
– monstertjie_za
Nov 28 '18 at 6:52
add a comment |
Short answer is that you cannot perform joins between different collections via SQL in Cosmos DB.
Generally, the solution to this type of question is multiple queries or different schema. In your scenario, if you can denormalize your schema into one collection without duplicating data, then it is easy.
If you provide your schemas, it'd be possible to provide a more comprehensive answer.
-- Edit 1 --
Stored Procedures are only good candidates for operations that require multiple operations on the same collection + partition key. This makes them good for bulk insert/delete/update, transactions (which need at least a read and a write), and a few other things. They aren't good for CPU intensive things, but rather things that would normally be IO bound by network latency. They aren't possible to use for cross partition or cross collection scenarios. In those cases, you must perform the operations exclusively from the remote client.
In your case, it's a fairly straightforward 2 + 2N separate reads, where N is the number of products. You need to read the agreement first. Then you can look up the customer and the product records in parallel, and then you can look up the wholesale record last, so you should have a latency of 3s + C, where s is the average duration of a given read request and C is some constant CPU time to perform the join/issue the request/etc.
It's worth considering whether you can consolidate RetailProduct and WholeSale product into a single record where Wholesale contains all the RetailProducts in an array, or as separate documents, partitioned by the wholesale id, with a well known id that contained the Wholesale product info in a separate document. That would reduce your latency by 1 third. If you go with the partitioning by wholesale id idea, you could write 1 query for any records that shared a wholesale id, so you'd get 2 + log(N) reads, but the same effective latency. For that strategy, you'd store a composite index of "wholesaleid+productid" in the agreement. One issue to worry about is that it duplicates the wholesale+product relationship, but as long as that relationship doesn't change, I don't think there is anything to worry about and it provides a good optimization for info lookup.
I will modify my answer to include my collections etc...You make reference to SQL API, but I am aware of the limitations of the SQL API in Cosmos. My question was more focues on Javascript Stored Procedures within Cosmos
– monstertjie_za
Nov 27 '18 at 4:17
Stored procedures are also scoped to the collection, and, additionally, the partition key.
– Chris Anderson-MSFT
Nov 28 '18 at 1:22
Expanded my original answer
– Chris Anderson-MSFT
Nov 28 '18 at 1:40
Good explanation, thanks for the extensive answer
– monstertjie_za
Nov 28 '18 at 6:52
add a comment |
Short answer is that you cannot perform joins between different collections via SQL in Cosmos DB.
Generally, the solution to this type of question is multiple queries or different schema. In your scenario, if you can denormalize your schema into one collection without duplicating data, then it is easy.
If you provide your schemas, it'd be possible to provide a more comprehensive answer.
-- Edit 1 --
Stored Procedures are only good candidates for operations that require multiple operations on the same collection + partition key. This makes them good for bulk insert/delete/update, transactions (which need at least a read and a write), and a few other things. They aren't good for CPU intensive things, but rather things that would normally be IO bound by network latency. They aren't possible to use for cross partition or cross collection scenarios. In those cases, you must perform the operations exclusively from the remote client.
In your case, it's a fairly straightforward 2 + 2N separate reads, where N is the number of products. You need to read the agreement first. Then you can look up the customer and the product records in parallel, and then you can look up the wholesale record last, so you should have a latency of 3s + C, where s is the average duration of a given read request and C is some constant CPU time to perform the join/issue the request/etc.
It's worth considering whether you can consolidate RetailProduct and WholeSale product into a single record where Wholesale contains all the RetailProducts in an array, or as separate documents, partitioned by the wholesale id, with a well known id that contained the Wholesale product info in a separate document. That would reduce your latency by 1 third. If you go with the partitioning by wholesale id idea, you could write 1 query for any records that shared a wholesale id, so you'd get 2 + log(N) reads, but the same effective latency. For that strategy, you'd store a composite index of "wholesaleid+productid" in the agreement. One issue to worry about is that it duplicates the wholesale+product relationship, but as long as that relationship doesn't change, I don't think there is anything to worry about and it provides a good optimization for info lookup.
Short answer is that you cannot perform joins between different collections via SQL in Cosmos DB.
Generally, the solution to this type of question is multiple queries or different schema. In your scenario, if you can denormalize your schema into one collection without duplicating data, then it is easy.
If you provide your schemas, it'd be possible to provide a more comprehensive answer.
-- Edit 1 --
Stored Procedures are only good candidates for operations that require multiple operations on the same collection + partition key. This makes them good for bulk insert/delete/update, transactions (which need at least a read and a write), and a few other things. They aren't good for CPU intensive things, but rather things that would normally be IO bound by network latency. They aren't possible to use for cross partition or cross collection scenarios. In those cases, you must perform the operations exclusively from the remote client.
In your case, it's a fairly straightforward 2 + 2N separate reads, where N is the number of products. You need to read the agreement first. Then you can look up the customer and the product records in parallel, and then you can look up the wholesale record last, so you should have a latency of 3s + C, where s is the average duration of a given read request and C is some constant CPU time to perform the join/issue the request/etc.
It's worth considering whether you can consolidate RetailProduct and WholeSale product into a single record where Wholesale contains all the RetailProducts in an array, or as separate documents, partitioned by the wholesale id, with a well known id that contained the Wholesale product info in a separate document. That would reduce your latency by 1 third. If you go with the partitioning by wholesale id idea, you could write 1 query for any records that shared a wholesale id, so you'd get 2 + log(N) reads, but the same effective latency. For that strategy, you'd store a composite index of "wholesaleid+productid" in the agreement. One issue to worry about is that it duplicates the wholesale+product relationship, but as long as that relationship doesn't change, I don't think there is anything to worry about and it provides a good optimization for info lookup.
edited Nov 28 '18 at 1:40
answered Nov 27 '18 at 2:26
Chris Anderson-MSFTChris Anderson-MSFT
5,64311430
5,64311430
I will modify my answer to include my collections etc...You make reference to SQL API, but I am aware of the limitations of the SQL API in Cosmos. My question was more focues on Javascript Stored Procedures within Cosmos
– monstertjie_za
Nov 27 '18 at 4:17
Stored procedures are also scoped to the collection, and, additionally, the partition key.
– Chris Anderson-MSFT
Nov 28 '18 at 1:22
Expanded my original answer
– Chris Anderson-MSFT
Nov 28 '18 at 1:40
Good explanation, thanks for the extensive answer
– monstertjie_za
Nov 28 '18 at 6:52
add a comment |
I will modify my answer to include my collections etc...You make reference to SQL API, but I am aware of the limitations of the SQL API in Cosmos. My question was more focues on Javascript Stored Procedures within Cosmos
– monstertjie_za
Nov 27 '18 at 4:17
Stored procedures are also scoped to the collection, and, additionally, the partition key.
– Chris Anderson-MSFT
Nov 28 '18 at 1:22
Expanded my original answer
– Chris Anderson-MSFT
Nov 28 '18 at 1:40
Good explanation, thanks for the extensive answer
– monstertjie_za
Nov 28 '18 at 6:52
I will modify my answer to include my collections etc...You make reference to SQL API, but I am aware of the limitations of the SQL API in Cosmos. My question was more focues on Javascript Stored Procedures within Cosmos
– monstertjie_za
Nov 27 '18 at 4:17
I will modify my answer to include my collections etc...You make reference to SQL API, but I am aware of the limitations of the SQL API in Cosmos. My question was more focues on Javascript Stored Procedures within Cosmos
– monstertjie_za
Nov 27 '18 at 4:17
Stored procedures are also scoped to the collection, and, additionally, the partition key.
– Chris Anderson-MSFT
Nov 28 '18 at 1:22
Stored procedures are also scoped to the collection, and, additionally, the partition key.
– Chris Anderson-MSFT
Nov 28 '18 at 1:22
Expanded my original answer
– Chris Anderson-MSFT
Nov 28 '18 at 1:40
Expanded my original answer
– Chris Anderson-MSFT
Nov 28 '18 at 1:40
Good explanation, thanks for the extensive answer
– monstertjie_za
Nov 28 '18 at 6:52
Good explanation, thanks for the extensive answer
– monstertjie_za
Nov 28 '18 at 6:52
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487690%2fcross-join-collections-and-groupby-cosmosdb-javascript-api%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Without seeing specific schema details, there's really no way to help with a query. But suffice it to say that, if you've spread your data across 5 collections, then you'll need at least 5 queries. Please edit your question to show samples of data in each collection, along with the queries that you've built so far, issues you're running into, etc. Also, FYI stored procedures are scoped to a single partition within a collection. FYI #2 you can also optimize your searching by 1) storing multiple document types in the same collection and 2) denormalizing data into your documents.
– David Makogon
Nov 26 '18 at 23:41
1
Possible duplicate of CosmosDB Join (SQL API)
– Chris Anderson-MSFT
Nov 27 '18 at 2:22
As already stated: a stored procedure cannot span multiple collections, nor can a SQL query outside of a stored procedure. The way you've shaped your data, you'll need to either write multiple queries or refactor your data model. Ok, you can also force all data into a single partition, but I don't see how that would work with your schema. As far as how to create multiple queries: that's too broad a question. If you need help with a specific query, then edit your question to include it, and show where you're having difficulty.
– David Makogon
Nov 27 '18 at 7:37
Also - good catch by @ChrisAnderson-MSFT regarding the duplicate (as the notion of cross-collection querying has already been addressed elsewhere).
– David Makogon
Nov 27 '18 at 7:38