Does “where” position in LINQ query matter when joining in-memory?
Situation: Say we are executing a LINQ query that joins two in-memory lists (so no DbSets or SQL-query generation involved) and this query also has a where
clause. This where
only filters on properties included in the original set (the from
part of the query).
Question: Does the linq query interpreter optimize this query in that it first executes the where
before it performs the join
, regardless of whether I write the where
before or after the join
? – so it does not have to perform a join on elements that are not included later anyways.
Example: For example, I have a categories
list I want to join with a products
list. However, I am just interested in the category
with ID
1. Does the linq interpreter internally perform the exact same operations regardless of whether I write:
from category in categories
join prod in products on category.ID equals prod.CategoryID
where category.ID == 1 // <------ below join
select new { Category = category.Name, Product = prod.Name };
or
from category in categories
where category.ID == 1 // <------ above join
join prod in products on category.ID equals prod.CategoryID
select new { Category = category.Name, Product = prod.Name };
Previous research: I already saw this question but the OP author stated that his/her question is only targeting non-in-memory cases with generated SQL. I am explicitly interested with LINQ executing a join on two lists in-memory.
Update: This is not a dublicate of "Order execution of chain linq query" question as the referenced question clearly refers to a dbset and my question explicitly addressed a non-db scenario. (Moreover, although similar, I am not asking about inclusions based on navigational properties here but about "joins".)
Update2: Although very similar, this is also not a dublicate of "Is order of the predicate important when using LINQ?" as I am asking explicitly about in-memory situations and I cannot see the referenced question explicitly addressing this case. Moreover, the question is a bit old and I am actually interested in linq in the context of .NET Core (which didn't exist in 2012), so I updated the tag of this question to reflect this second point.
Please note: With this question I am aiming at whether the linq query interpreter somehow optimizes this query in the background and am hoping to get a reference to a piece of documentation or source code that shows how this is done by linq. I am not interested in answers such as "it does not matter because the performance of both queries is roughly the same".
c# linq join .net-core where
|
show 2 more comments
Situation: Say we are executing a LINQ query that joins two in-memory lists (so no DbSets or SQL-query generation involved) and this query also has a where
clause. This where
only filters on properties included in the original set (the from
part of the query).
Question: Does the linq query interpreter optimize this query in that it first executes the where
before it performs the join
, regardless of whether I write the where
before or after the join
? – so it does not have to perform a join on elements that are not included later anyways.
Example: For example, I have a categories
list I want to join with a products
list. However, I am just interested in the category
with ID
1. Does the linq interpreter internally perform the exact same operations regardless of whether I write:
from category in categories
join prod in products on category.ID equals prod.CategoryID
where category.ID == 1 // <------ below join
select new { Category = category.Name, Product = prod.Name };
or
from category in categories
where category.ID == 1 // <------ above join
join prod in products on category.ID equals prod.CategoryID
select new { Category = category.Name, Product = prod.Name };
Previous research: I already saw this question but the OP author stated that his/her question is only targeting non-in-memory cases with generated SQL. I am explicitly interested with LINQ executing a join on two lists in-memory.
Update: This is not a dublicate of "Order execution of chain linq query" question as the referenced question clearly refers to a dbset and my question explicitly addressed a non-db scenario. (Moreover, although similar, I am not asking about inclusions based on navigational properties here but about "joins".)
Update2: Although very similar, this is also not a dublicate of "Is order of the predicate important when using LINQ?" as I am asking explicitly about in-memory situations and I cannot see the referenced question explicitly addressing this case. Moreover, the question is a bit old and I am actually interested in linq in the context of .NET Core (which didn't exist in 2012), so I updated the tag of this question to reflect this second point.
Please note: With this question I am aiming at whether the linq query interpreter somehow optimizes this query in the background and am hoping to get a reference to a piece of documentation or source code that shows how this is done by linq. I am not interested in answers such as "it does not matter because the performance of both queries is roughly the same".
c# linq join .net-core where
1
Yes it does. There's no interpreter. A LINQ to Objects query is executed as is, it's not translated to something else.Where()
is an iterator that loops over the input and returns any item that matches the predicate. You can check the source code directly to see how it's implemented, for the full framework and .NET Core
– Panagiotis Kanavos
Dec 18 '18 at 11:29
If you want tolerable performance you shouldn't join in-memory lists like that. You'll be making M*N comparisons. You should create dictionaries or hashsets to find entries with common keys
– Panagiotis Kanavos
Dec 18 '18 at 11:31
Possible duplicate of Order execution of chain linq query
– SeM
Dec 18 '18 at 11:47
Also Is order of the predicate important when using LINQ?.
– SeM
Dec 18 '18 at 11:49
@SeM please see my points which state why this is not a dublicate. Your second link comes quite close but does not explicitly refer to in-memory joins. However your comment convinced me to sharpen my question in stating that I am interested in linq in the context of dotnet core.
– B12Toaster
Dec 18 '18 at 12:15
|
show 2 more comments
Situation: Say we are executing a LINQ query that joins two in-memory lists (so no DbSets or SQL-query generation involved) and this query also has a where
clause. This where
only filters on properties included in the original set (the from
part of the query).
Question: Does the linq query interpreter optimize this query in that it first executes the where
before it performs the join
, regardless of whether I write the where
before or after the join
? – so it does not have to perform a join on elements that are not included later anyways.
Example: For example, I have a categories
list I want to join with a products
list. However, I am just interested in the category
with ID
1. Does the linq interpreter internally perform the exact same operations regardless of whether I write:
from category in categories
join prod in products on category.ID equals prod.CategoryID
where category.ID == 1 // <------ below join
select new { Category = category.Name, Product = prod.Name };
or
from category in categories
where category.ID == 1 // <------ above join
join prod in products on category.ID equals prod.CategoryID
select new { Category = category.Name, Product = prod.Name };
Previous research: I already saw this question but the OP author stated that his/her question is only targeting non-in-memory cases with generated SQL. I am explicitly interested with LINQ executing a join on two lists in-memory.
Update: This is not a dublicate of "Order execution of chain linq query" question as the referenced question clearly refers to a dbset and my question explicitly addressed a non-db scenario. (Moreover, although similar, I am not asking about inclusions based on navigational properties here but about "joins".)
Update2: Although very similar, this is also not a dublicate of "Is order of the predicate important when using LINQ?" as I am asking explicitly about in-memory situations and I cannot see the referenced question explicitly addressing this case. Moreover, the question is a bit old and I am actually interested in linq in the context of .NET Core (which didn't exist in 2012), so I updated the tag of this question to reflect this second point.
Please note: With this question I am aiming at whether the linq query interpreter somehow optimizes this query in the background and am hoping to get a reference to a piece of documentation or source code that shows how this is done by linq. I am not interested in answers such as "it does not matter because the performance of both queries is roughly the same".
c# linq join .net-core where
Situation: Say we are executing a LINQ query that joins two in-memory lists (so no DbSets or SQL-query generation involved) and this query also has a where
clause. This where
only filters on properties included in the original set (the from
part of the query).
Question: Does the linq query interpreter optimize this query in that it first executes the where
before it performs the join
, regardless of whether I write the where
before or after the join
? – so it does not have to perform a join on elements that are not included later anyways.
Example: For example, I have a categories
list I want to join with a products
list. However, I am just interested in the category
with ID
1. Does the linq interpreter internally perform the exact same operations regardless of whether I write:
from category in categories
join prod in products on category.ID equals prod.CategoryID
where category.ID == 1 // <------ below join
select new { Category = category.Name, Product = prod.Name };
or
from category in categories
where category.ID == 1 // <------ above join
join prod in products on category.ID equals prod.CategoryID
select new { Category = category.Name, Product = prod.Name };
Previous research: I already saw this question but the OP author stated that his/her question is only targeting non-in-memory cases with generated SQL. I am explicitly interested with LINQ executing a join on two lists in-memory.
Update: This is not a dublicate of "Order execution of chain linq query" question as the referenced question clearly refers to a dbset and my question explicitly addressed a non-db scenario. (Moreover, although similar, I am not asking about inclusions based on navigational properties here but about "joins".)
Update2: Although very similar, this is also not a dublicate of "Is order of the predicate important when using LINQ?" as I am asking explicitly about in-memory situations and I cannot see the referenced question explicitly addressing this case. Moreover, the question is a bit old and I am actually interested in linq in the context of .NET Core (which didn't exist in 2012), so I updated the tag of this question to reflect this second point.
Please note: With this question I am aiming at whether the linq query interpreter somehow optimizes this query in the background and am hoping to get a reference to a piece of documentation or source code that shows how this is done by linq. I am not interested in answers such as "it does not matter because the performance of both queries is roughly the same".
c# linq join .net-core where
c# linq join .net-core where
edited Dec 18 '18 at 12:08
B12Toaster
asked Dec 18 '18 at 11:21
B12ToasterB12Toaster
2,24121524
2,24121524
1
Yes it does. There's no interpreter. A LINQ to Objects query is executed as is, it's not translated to something else.Where()
is an iterator that loops over the input and returns any item that matches the predicate. You can check the source code directly to see how it's implemented, for the full framework and .NET Core
– Panagiotis Kanavos
Dec 18 '18 at 11:29
If you want tolerable performance you shouldn't join in-memory lists like that. You'll be making M*N comparisons. You should create dictionaries or hashsets to find entries with common keys
– Panagiotis Kanavos
Dec 18 '18 at 11:31
Possible duplicate of Order execution of chain linq query
– SeM
Dec 18 '18 at 11:47
Also Is order of the predicate important when using LINQ?.
– SeM
Dec 18 '18 at 11:49
@SeM please see my points which state why this is not a dublicate. Your second link comes quite close but does not explicitly refer to in-memory joins. However your comment convinced me to sharpen my question in stating that I am interested in linq in the context of dotnet core.
– B12Toaster
Dec 18 '18 at 12:15
|
show 2 more comments
1
Yes it does. There's no interpreter. A LINQ to Objects query is executed as is, it's not translated to something else.Where()
is an iterator that loops over the input and returns any item that matches the predicate. You can check the source code directly to see how it's implemented, for the full framework and .NET Core
– Panagiotis Kanavos
Dec 18 '18 at 11:29
If you want tolerable performance you shouldn't join in-memory lists like that. You'll be making M*N comparisons. You should create dictionaries or hashsets to find entries with common keys
– Panagiotis Kanavos
Dec 18 '18 at 11:31
Possible duplicate of Order execution of chain linq query
– SeM
Dec 18 '18 at 11:47
Also Is order of the predicate important when using LINQ?.
– SeM
Dec 18 '18 at 11:49
@SeM please see my points which state why this is not a dublicate. Your second link comes quite close but does not explicitly refer to in-memory joins. However your comment convinced me to sharpen my question in stating that I am interested in linq in the context of dotnet core.
– B12Toaster
Dec 18 '18 at 12:15
1
1
Yes it does. There's no interpreter. A LINQ to Objects query is executed as is, it's not translated to something else.
Where()
is an iterator that loops over the input and returns any item that matches the predicate. You can check the source code directly to see how it's implemented, for the full framework and .NET Core– Panagiotis Kanavos
Dec 18 '18 at 11:29
Yes it does. There's no interpreter. A LINQ to Objects query is executed as is, it's not translated to something else.
Where()
is an iterator that loops over the input and returns any item that matches the predicate. You can check the source code directly to see how it's implemented, for the full framework and .NET Core– Panagiotis Kanavos
Dec 18 '18 at 11:29
If you want tolerable performance you shouldn't join in-memory lists like that. You'll be making M*N comparisons. You should create dictionaries or hashsets to find entries with common keys
– Panagiotis Kanavos
Dec 18 '18 at 11:31
If you want tolerable performance you shouldn't join in-memory lists like that. You'll be making M*N comparisons. You should create dictionaries or hashsets to find entries with common keys
– Panagiotis Kanavos
Dec 18 '18 at 11:31
Possible duplicate of Order execution of chain linq query
– SeM
Dec 18 '18 at 11:47
Possible duplicate of Order execution of chain linq query
– SeM
Dec 18 '18 at 11:47
Also Is order of the predicate important when using LINQ?.
– SeM
Dec 18 '18 at 11:49
Also Is order of the predicate important when using LINQ?.
– SeM
Dec 18 '18 at 11:49
@SeM please see my points which state why this is not a dublicate. Your second link comes quite close but does not explicitly refer to in-memory joins. However your comment convinced me to sharpen my question in stating that I am interested in linq in the context of dotnet core.
– B12Toaster
Dec 18 '18 at 12:15
@SeM please see my points which state why this is not a dublicate. Your second link comes quite close but does not explicitly refer to in-memory joins. However your comment convinced me to sharpen my question in stating that I am interested in linq in the context of dotnet core.
– B12Toaster
Dec 18 '18 at 12:15
|
show 2 more comments
2 Answers
2
active
oldest
votes
The LINQ query syntax will be compiled to a method chain. For details, read e.g. in this question.
The first LINQ query will be compiled to the following method chain:
categories
.Join(
products,
category => category.ID,
prod => prod.CategoryID,
(category, prod) => new { category, prod })
.Where(t => t.category.ID == 1)
.Select(t => new { Category = t.category.Name, Product = t.prod.Name });
The second one:
categories
.Where(category => category.ID == 1)
.Join(
products,
category => category.ID,
prod => prod.CategoryID,
(category, prod) => new { Category = category.Name, Product = prod.Name });
As you can see, the second query will cause less allocations (note only one anonymous type vs 2 in the first query, and note how many instances of those anonymous types will be created on performing the query).
Furthermore, it's clear that the first query will perform a join operation on lot more data than the second (already filtered) one.
There will be no additional query optimization in case of LINQ-to-objects queries.
So the second version is preferable.
2
It's not the number of anonymous types that's important here IMO - it's that in the second case we're joining on less data to start with.
– Jon Skeet
Dec 18 '18 at 12:20
@JonSkeet, yes, but that's pretty obvious. In the first case we not only join on more data, but also cause a lot more memory pressure.
– dymanoid
Dec 18 '18 at 12:23
2
You may think it's obvious, but I don't think that's necessarily obvious to the OP. (Or rather, the OP may well be wondering whether LINQ to Objects would perform that optimization automatically.)
– Jon Skeet
Dec 18 '18 at 12:25
1
@JonSkeet, okay, thanks for pointing that out. I updated my answer.
– dymanoid
Dec 18 '18 at 12:27
add a comment |
For in memory lists (IEnumerables), no optimization is applied and query execution is made in chained order for in-memory lists.
I also tried result
by first casting it to IQueryable
then apply filtering but apparently casting time is pretty high for this big table.
I made a quick test for this case.
Console.WriteLine($"List Row Count = {list.Count()}");
Console.WriteLine($"JoinList Row Count = {joinList.Count()}");
var watch = Stopwatch.StartNew();
var result = list.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Where(t => t.inner.Prop3 == "Prop13")
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result.Dump();
watch.Stop();
Console.WriteLine($"Result1 Elapsed = {watch.ElapsedTicks}");
watch.Restart();
var result2 = list
.Where(t => t.Prop3 == "Prop13")
.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result2.Dump();
watch.Stop();
Console.WriteLine($"Result2 Elapsed = {watch.ElapsedTicks}");
watch.Restart();
var result3 = list.AsQueryable().Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Where(t => t.inner.Prop3 == "Prop13")
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result3.Dump();
watch.Stop();
Console.WriteLine($"Result3 Elapsed = {watch.ElapsedTicks}");
Findings:
List Count = 100
JoinList Count = 10
Result1 Elapsed = 27
Result2 Elapsed = 17
Result3 Elapsed = 591
List Count = 1000
JoinList Count = 10
Result1 Elapsed = 20
Result2 Elapsed = 12
Result3 Elapsed = 586
List Count = 100000
JoinList Count = 10
Result1 Elapsed = 603
Result2 Elapsed = 19
Result3 Elapsed = 1277
List Count = 1000000
JoinList Count = 10
Result1 Elapsed = 1469
Result2 Elapsed = 88
Result3 Elapsed = 3219
Thanks for your tests. I just had closer look at the implementation of AsQueryable and it's related docs entry: "AsQueryable(IEnumerable) returns [...] an IQueryable<T> that executes queries by calling the equivalent query operator methods in Enumerable instead of those in Queryable." ...
– B12Toaster
Dec 18 '18 at 15:21
1
...and it looks like when resolving/executing theEnumerableQuery
eventually, it will simply execute theWhere
andJoin
in the order it was stated in the chained expression – so usingAsQueryable
seems not to provide an optimization benefit here.
– B12Toaster
Dec 18 '18 at 15:21
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53831925%2fdoes-where-position-in-linq-query-matter-when-joining-in-memory%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The LINQ query syntax will be compiled to a method chain. For details, read e.g. in this question.
The first LINQ query will be compiled to the following method chain:
categories
.Join(
products,
category => category.ID,
prod => prod.CategoryID,
(category, prod) => new { category, prod })
.Where(t => t.category.ID == 1)
.Select(t => new { Category = t.category.Name, Product = t.prod.Name });
The second one:
categories
.Where(category => category.ID == 1)
.Join(
products,
category => category.ID,
prod => prod.CategoryID,
(category, prod) => new { Category = category.Name, Product = prod.Name });
As you can see, the second query will cause less allocations (note only one anonymous type vs 2 in the first query, and note how many instances of those anonymous types will be created on performing the query).
Furthermore, it's clear that the first query will perform a join operation on lot more data than the second (already filtered) one.
There will be no additional query optimization in case of LINQ-to-objects queries.
So the second version is preferable.
2
It's not the number of anonymous types that's important here IMO - it's that in the second case we're joining on less data to start with.
– Jon Skeet
Dec 18 '18 at 12:20
@JonSkeet, yes, but that's pretty obvious. In the first case we not only join on more data, but also cause a lot more memory pressure.
– dymanoid
Dec 18 '18 at 12:23
2
You may think it's obvious, but I don't think that's necessarily obvious to the OP. (Or rather, the OP may well be wondering whether LINQ to Objects would perform that optimization automatically.)
– Jon Skeet
Dec 18 '18 at 12:25
1
@JonSkeet, okay, thanks for pointing that out. I updated my answer.
– dymanoid
Dec 18 '18 at 12:27
add a comment |
The LINQ query syntax will be compiled to a method chain. For details, read e.g. in this question.
The first LINQ query will be compiled to the following method chain:
categories
.Join(
products,
category => category.ID,
prod => prod.CategoryID,
(category, prod) => new { category, prod })
.Where(t => t.category.ID == 1)
.Select(t => new { Category = t.category.Name, Product = t.prod.Name });
The second one:
categories
.Where(category => category.ID == 1)
.Join(
products,
category => category.ID,
prod => prod.CategoryID,
(category, prod) => new { Category = category.Name, Product = prod.Name });
As you can see, the second query will cause less allocations (note only one anonymous type vs 2 in the first query, and note how many instances of those anonymous types will be created on performing the query).
Furthermore, it's clear that the first query will perform a join operation on lot more data than the second (already filtered) one.
There will be no additional query optimization in case of LINQ-to-objects queries.
So the second version is preferable.
2
It's not the number of anonymous types that's important here IMO - it's that in the second case we're joining on less data to start with.
– Jon Skeet
Dec 18 '18 at 12:20
@JonSkeet, yes, but that's pretty obvious. In the first case we not only join on more data, but also cause a lot more memory pressure.
– dymanoid
Dec 18 '18 at 12:23
2
You may think it's obvious, but I don't think that's necessarily obvious to the OP. (Or rather, the OP may well be wondering whether LINQ to Objects would perform that optimization automatically.)
– Jon Skeet
Dec 18 '18 at 12:25
1
@JonSkeet, okay, thanks for pointing that out. I updated my answer.
– dymanoid
Dec 18 '18 at 12:27
add a comment |
The LINQ query syntax will be compiled to a method chain. For details, read e.g. in this question.
The first LINQ query will be compiled to the following method chain:
categories
.Join(
products,
category => category.ID,
prod => prod.CategoryID,
(category, prod) => new { category, prod })
.Where(t => t.category.ID == 1)
.Select(t => new { Category = t.category.Name, Product = t.prod.Name });
The second one:
categories
.Where(category => category.ID == 1)
.Join(
products,
category => category.ID,
prod => prod.CategoryID,
(category, prod) => new { Category = category.Name, Product = prod.Name });
As you can see, the second query will cause less allocations (note only one anonymous type vs 2 in the first query, and note how many instances of those anonymous types will be created on performing the query).
Furthermore, it's clear that the first query will perform a join operation on lot more data than the second (already filtered) one.
There will be no additional query optimization in case of LINQ-to-objects queries.
So the second version is preferable.
The LINQ query syntax will be compiled to a method chain. For details, read e.g. in this question.
The first LINQ query will be compiled to the following method chain:
categories
.Join(
products,
category => category.ID,
prod => prod.CategoryID,
(category, prod) => new { category, prod })
.Where(t => t.category.ID == 1)
.Select(t => new { Category = t.category.Name, Product = t.prod.Name });
The second one:
categories
.Where(category => category.ID == 1)
.Join(
products,
category => category.ID,
prod => prod.CategoryID,
(category, prod) => new { Category = category.Name, Product = prod.Name });
As you can see, the second query will cause less allocations (note only one anonymous type vs 2 in the first query, and note how many instances of those anonymous types will be created on performing the query).
Furthermore, it's clear that the first query will perform a join operation on lot more data than the second (already filtered) one.
There will be no additional query optimization in case of LINQ-to-objects queries.
So the second version is preferable.
edited Dec 18 '18 at 16:17
answered Dec 18 '18 at 11:30
dymanoiddymanoid
9,31222248
9,31222248
2
It's not the number of anonymous types that's important here IMO - it's that in the second case we're joining on less data to start with.
– Jon Skeet
Dec 18 '18 at 12:20
@JonSkeet, yes, but that's pretty obvious. In the first case we not only join on more data, but also cause a lot more memory pressure.
– dymanoid
Dec 18 '18 at 12:23
2
You may think it's obvious, but I don't think that's necessarily obvious to the OP. (Or rather, the OP may well be wondering whether LINQ to Objects would perform that optimization automatically.)
– Jon Skeet
Dec 18 '18 at 12:25
1
@JonSkeet, okay, thanks for pointing that out. I updated my answer.
– dymanoid
Dec 18 '18 at 12:27
add a comment |
2
It's not the number of anonymous types that's important here IMO - it's that in the second case we're joining on less data to start with.
– Jon Skeet
Dec 18 '18 at 12:20
@JonSkeet, yes, but that's pretty obvious. In the first case we not only join on more data, but also cause a lot more memory pressure.
– dymanoid
Dec 18 '18 at 12:23
2
You may think it's obvious, but I don't think that's necessarily obvious to the OP. (Or rather, the OP may well be wondering whether LINQ to Objects would perform that optimization automatically.)
– Jon Skeet
Dec 18 '18 at 12:25
1
@JonSkeet, okay, thanks for pointing that out. I updated my answer.
– dymanoid
Dec 18 '18 at 12:27
2
2
It's not the number of anonymous types that's important here IMO - it's that in the second case we're joining on less data to start with.
– Jon Skeet
Dec 18 '18 at 12:20
It's not the number of anonymous types that's important here IMO - it's that in the second case we're joining on less data to start with.
– Jon Skeet
Dec 18 '18 at 12:20
@JonSkeet, yes, but that's pretty obvious. In the first case we not only join on more data, but also cause a lot more memory pressure.
– dymanoid
Dec 18 '18 at 12:23
@JonSkeet, yes, but that's pretty obvious. In the first case we not only join on more data, but also cause a lot more memory pressure.
– dymanoid
Dec 18 '18 at 12:23
2
2
You may think it's obvious, but I don't think that's necessarily obvious to the OP. (Or rather, the OP may well be wondering whether LINQ to Objects would perform that optimization automatically.)
– Jon Skeet
Dec 18 '18 at 12:25
You may think it's obvious, but I don't think that's necessarily obvious to the OP. (Or rather, the OP may well be wondering whether LINQ to Objects would perform that optimization automatically.)
– Jon Skeet
Dec 18 '18 at 12:25
1
1
@JonSkeet, okay, thanks for pointing that out. I updated my answer.
– dymanoid
Dec 18 '18 at 12:27
@JonSkeet, okay, thanks for pointing that out. I updated my answer.
– dymanoid
Dec 18 '18 at 12:27
add a comment |
For in memory lists (IEnumerables), no optimization is applied and query execution is made in chained order for in-memory lists.
I also tried result
by first casting it to IQueryable
then apply filtering but apparently casting time is pretty high for this big table.
I made a quick test for this case.
Console.WriteLine($"List Row Count = {list.Count()}");
Console.WriteLine($"JoinList Row Count = {joinList.Count()}");
var watch = Stopwatch.StartNew();
var result = list.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Where(t => t.inner.Prop3 == "Prop13")
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result.Dump();
watch.Stop();
Console.WriteLine($"Result1 Elapsed = {watch.ElapsedTicks}");
watch.Restart();
var result2 = list
.Where(t => t.Prop3 == "Prop13")
.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result2.Dump();
watch.Stop();
Console.WriteLine($"Result2 Elapsed = {watch.ElapsedTicks}");
watch.Restart();
var result3 = list.AsQueryable().Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Where(t => t.inner.Prop3 == "Prop13")
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result3.Dump();
watch.Stop();
Console.WriteLine($"Result3 Elapsed = {watch.ElapsedTicks}");
Findings:
List Count = 100
JoinList Count = 10
Result1 Elapsed = 27
Result2 Elapsed = 17
Result3 Elapsed = 591
List Count = 1000
JoinList Count = 10
Result1 Elapsed = 20
Result2 Elapsed = 12
Result3 Elapsed = 586
List Count = 100000
JoinList Count = 10
Result1 Elapsed = 603
Result2 Elapsed = 19
Result3 Elapsed = 1277
List Count = 1000000
JoinList Count = 10
Result1 Elapsed = 1469
Result2 Elapsed = 88
Result3 Elapsed = 3219
Thanks for your tests. I just had closer look at the implementation of AsQueryable and it's related docs entry: "AsQueryable(IEnumerable) returns [...] an IQueryable<T> that executes queries by calling the equivalent query operator methods in Enumerable instead of those in Queryable." ...
– B12Toaster
Dec 18 '18 at 15:21
1
...and it looks like when resolving/executing theEnumerableQuery
eventually, it will simply execute theWhere
andJoin
in the order it was stated in the chained expression – so usingAsQueryable
seems not to provide an optimization benefit here.
– B12Toaster
Dec 18 '18 at 15:21
add a comment |
For in memory lists (IEnumerables), no optimization is applied and query execution is made in chained order for in-memory lists.
I also tried result
by first casting it to IQueryable
then apply filtering but apparently casting time is pretty high for this big table.
I made a quick test for this case.
Console.WriteLine($"List Row Count = {list.Count()}");
Console.WriteLine($"JoinList Row Count = {joinList.Count()}");
var watch = Stopwatch.StartNew();
var result = list.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Where(t => t.inner.Prop3 == "Prop13")
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result.Dump();
watch.Stop();
Console.WriteLine($"Result1 Elapsed = {watch.ElapsedTicks}");
watch.Restart();
var result2 = list
.Where(t => t.Prop3 == "Prop13")
.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result2.Dump();
watch.Stop();
Console.WriteLine($"Result2 Elapsed = {watch.ElapsedTicks}");
watch.Restart();
var result3 = list.AsQueryable().Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Where(t => t.inner.Prop3 == "Prop13")
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result3.Dump();
watch.Stop();
Console.WriteLine($"Result3 Elapsed = {watch.ElapsedTicks}");
Findings:
List Count = 100
JoinList Count = 10
Result1 Elapsed = 27
Result2 Elapsed = 17
Result3 Elapsed = 591
List Count = 1000
JoinList Count = 10
Result1 Elapsed = 20
Result2 Elapsed = 12
Result3 Elapsed = 586
List Count = 100000
JoinList Count = 10
Result1 Elapsed = 603
Result2 Elapsed = 19
Result3 Elapsed = 1277
List Count = 1000000
JoinList Count = 10
Result1 Elapsed = 1469
Result2 Elapsed = 88
Result3 Elapsed = 3219
Thanks for your tests. I just had closer look at the implementation of AsQueryable and it's related docs entry: "AsQueryable(IEnumerable) returns [...] an IQueryable<T> that executes queries by calling the equivalent query operator methods in Enumerable instead of those in Queryable." ...
– B12Toaster
Dec 18 '18 at 15:21
1
...and it looks like when resolving/executing theEnumerableQuery
eventually, it will simply execute theWhere
andJoin
in the order it was stated in the chained expression – so usingAsQueryable
seems not to provide an optimization benefit here.
– B12Toaster
Dec 18 '18 at 15:21
add a comment |
For in memory lists (IEnumerables), no optimization is applied and query execution is made in chained order for in-memory lists.
I also tried result
by first casting it to IQueryable
then apply filtering but apparently casting time is pretty high for this big table.
I made a quick test for this case.
Console.WriteLine($"List Row Count = {list.Count()}");
Console.WriteLine($"JoinList Row Count = {joinList.Count()}");
var watch = Stopwatch.StartNew();
var result = list.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Where(t => t.inner.Prop3 == "Prop13")
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result.Dump();
watch.Stop();
Console.WriteLine($"Result1 Elapsed = {watch.ElapsedTicks}");
watch.Restart();
var result2 = list
.Where(t => t.Prop3 == "Prop13")
.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result2.Dump();
watch.Stop();
Console.WriteLine($"Result2 Elapsed = {watch.ElapsedTicks}");
watch.Restart();
var result3 = list.AsQueryable().Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Where(t => t.inner.Prop3 == "Prop13")
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result3.Dump();
watch.Stop();
Console.WriteLine($"Result3 Elapsed = {watch.ElapsedTicks}");
Findings:
List Count = 100
JoinList Count = 10
Result1 Elapsed = 27
Result2 Elapsed = 17
Result3 Elapsed = 591
List Count = 1000
JoinList Count = 10
Result1 Elapsed = 20
Result2 Elapsed = 12
Result3 Elapsed = 586
List Count = 100000
JoinList Count = 10
Result1 Elapsed = 603
Result2 Elapsed = 19
Result3 Elapsed = 1277
List Count = 1000000
JoinList Count = 10
Result1 Elapsed = 1469
Result2 Elapsed = 88
Result3 Elapsed = 3219
For in memory lists (IEnumerables), no optimization is applied and query execution is made in chained order for in-memory lists.
I also tried result
by first casting it to IQueryable
then apply filtering but apparently casting time is pretty high for this big table.
I made a quick test for this case.
Console.WriteLine($"List Row Count = {list.Count()}");
Console.WriteLine($"JoinList Row Count = {joinList.Count()}");
var watch = Stopwatch.StartNew();
var result = list.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Where(t => t.inner.Prop3 == "Prop13")
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result.Dump();
watch.Stop();
Console.WriteLine($"Result1 Elapsed = {watch.ElapsedTicks}");
watch.Restart();
var result2 = list
.Where(t => t.Prop3 == "Prop13")
.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result2.Dump();
watch.Stop();
Console.WriteLine($"Result2 Elapsed = {watch.ElapsedTicks}");
watch.Restart();
var result3 = list.AsQueryable().Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
.Where(t => t.inner.Prop3 == "Prop13")
.Select(t => new { t.inner.Prop4, t.lst.Prop2});
result3.Dump();
watch.Stop();
Console.WriteLine($"Result3 Elapsed = {watch.ElapsedTicks}");
Findings:
List Count = 100
JoinList Count = 10
Result1 Elapsed = 27
Result2 Elapsed = 17
Result3 Elapsed = 591
List Count = 1000
JoinList Count = 10
Result1 Elapsed = 20
Result2 Elapsed = 12
Result3 Elapsed = 586
List Count = 100000
JoinList Count = 10
Result1 Elapsed = 603
Result2 Elapsed = 19
Result3 Elapsed = 1277
List Count = 1000000
JoinList Count = 10
Result1 Elapsed = 1469
Result2 Elapsed = 88
Result3 Elapsed = 3219
edited Dec 18 '18 at 12:47
answered Dec 18 '18 at 12:21
SimonareSimonare
14.4k11738
14.4k11738
Thanks for your tests. I just had closer look at the implementation of AsQueryable and it's related docs entry: "AsQueryable(IEnumerable) returns [...] an IQueryable<T> that executes queries by calling the equivalent query operator methods in Enumerable instead of those in Queryable." ...
– B12Toaster
Dec 18 '18 at 15:21
1
...and it looks like when resolving/executing theEnumerableQuery
eventually, it will simply execute theWhere
andJoin
in the order it was stated in the chained expression – so usingAsQueryable
seems not to provide an optimization benefit here.
– B12Toaster
Dec 18 '18 at 15:21
add a comment |
Thanks for your tests. I just had closer look at the implementation of AsQueryable and it's related docs entry: "AsQueryable(IEnumerable) returns [...] an IQueryable<T> that executes queries by calling the equivalent query operator methods in Enumerable instead of those in Queryable." ...
– B12Toaster
Dec 18 '18 at 15:21
1
...and it looks like when resolving/executing theEnumerableQuery
eventually, it will simply execute theWhere
andJoin
in the order it was stated in the chained expression – so usingAsQueryable
seems not to provide an optimization benefit here.
– B12Toaster
Dec 18 '18 at 15:21
Thanks for your tests. I just had closer look at the implementation of AsQueryable and it's related docs entry: "AsQueryable(IEnumerable) returns [...] an IQueryable<T> that executes queries by calling the equivalent query operator methods in Enumerable instead of those in Queryable." ...
– B12Toaster
Dec 18 '18 at 15:21
Thanks for your tests. I just had closer look at the implementation of AsQueryable and it's related docs entry: "AsQueryable(IEnumerable) returns [...] an IQueryable<T> that executes queries by calling the equivalent query operator methods in Enumerable instead of those in Queryable." ...
– B12Toaster
Dec 18 '18 at 15:21
1
1
...and it looks like when resolving/executing the
EnumerableQuery
eventually, it will simply execute the Where
and Join
in the order it was stated in the chained expression – so using AsQueryable
seems not to provide an optimization benefit here.– B12Toaster
Dec 18 '18 at 15:21
...and it looks like when resolving/executing the
EnumerableQuery
eventually, it will simply execute the Where
and Join
in the order it was stated in the chained expression – so using AsQueryable
seems not to provide an optimization benefit here.– B12Toaster
Dec 18 '18 at 15:21
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53831925%2fdoes-where-position-in-linq-query-matter-when-joining-in-memory%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Yes it does. There's no interpreter. A LINQ to Objects query is executed as is, it's not translated to something else.
Where()
is an iterator that loops over the input and returns any item that matches the predicate. You can check the source code directly to see how it's implemented, for the full framework and .NET Core– Panagiotis Kanavos
Dec 18 '18 at 11:29
If you want tolerable performance you shouldn't join in-memory lists like that. You'll be making M*N comparisons. You should create dictionaries or hashsets to find entries with common keys
– Panagiotis Kanavos
Dec 18 '18 at 11:31
Possible duplicate of Order execution of chain linq query
– SeM
Dec 18 '18 at 11:47
Also Is order of the predicate important when using LINQ?.
– SeM
Dec 18 '18 at 11:49
@SeM please see my points which state why this is not a dublicate. Your second link comes quite close but does not explicitly refer to in-memory joins. However your comment convinced me to sharpen my question in stating that I am interested in linq in the context of dotnet core.
– B12Toaster
Dec 18 '18 at 12:15