MYSQL LEFT JOIN returns unexpected results
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have two tables talk_comments
and talk_comment_votes
.
I run the following code to select, commentId
, numberOfUpvotes
, whetherUserUpvoted
, numberOfDownvotes
, whetherUserDownvoted
usin LEFT JOINs to the same table.
SELECT c.id, COUNT(v1.id) as upvotes, COUNT(v2.id) as userUpvoted, COUNT(v3.id) as downvotes, COUNT(v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
I have the following data in my talk_comment_votes table
So, according to the query, it should select values 2,2,0,1,1
respectively. When I break those JOIN statements and do the queries, it returns the expected results. But, with JOINs, it returns something like the follows.
Can I get some help on fixing this?
Thanks.
mysql
|
show 2 more comments
I have two tables talk_comments
and talk_comment_votes
.
I run the following code to select, commentId
, numberOfUpvotes
, whetherUserUpvoted
, numberOfDownvotes
, whetherUserDownvoted
usin LEFT JOINs to the same table.
SELECT c.id, COUNT(v1.id) as upvotes, COUNT(v2.id) as userUpvoted, COUNT(v3.id) as downvotes, COUNT(v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
I have the following data in my talk_comment_votes table
So, according to the query, it should select values 2,2,0,1,1
respectively. When I break those JOIN statements and do the queries, it returns the expected results. But, with JOINs, it returns something like the follows.
Can I get some help on fixing this?
Thanks.
mysql
1
Not the best for performance but tryCOUNT(DISTINCT ...)
the LEFT JOIN's seams to multiply the count.. Best performing and safest method would be to do the counting in separted subqueries when working with multiple join's
– Raymond Nijland
Nov 26 '18 at 20:33
@RaymondNijland Wow, that returns the correct result. How is it working? Can you explain a little bit? Please add it as an answer so that I can mark it as the answer.
– Supun Kavinda
Nov 26 '18 at 20:37
Well the JOIN's make multiple records that will be counted. try the same query without theGROUP BY
andCOUNT()
then you will see what is going on there... Or change in the current query theCOUNT
intoGROUP_CONCAT
then you will see "duplicated" records.. WhichCOUNT(DISTINCT ..)
filters out..
– Raymond Nijland
Nov 26 '18 at 20:38
I see that. Thank you for the answer. If it is not the best performance, what would be the best one?
– Supun Kavinda
Nov 26 '18 at 20:40
By making subqueries more or less like ...SELECT * FROM (SELECT COUNT(*) FROM table) AS alias INNER JOIN table ON alias.column = table.column
or a subquery in a join like... table INNER JOIN (SELECT COUNT(*) FROM table2) AS alias ON table.column = alias.column
– Raymond Nijland
Nov 26 '18 at 20:43
|
show 2 more comments
I have two tables talk_comments
and talk_comment_votes
.
I run the following code to select, commentId
, numberOfUpvotes
, whetherUserUpvoted
, numberOfDownvotes
, whetherUserDownvoted
usin LEFT JOINs to the same table.
SELECT c.id, COUNT(v1.id) as upvotes, COUNT(v2.id) as userUpvoted, COUNT(v3.id) as downvotes, COUNT(v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
I have the following data in my talk_comment_votes table
So, according to the query, it should select values 2,2,0,1,1
respectively. When I break those JOIN statements and do the queries, it returns the expected results. But, with JOINs, it returns something like the follows.
Can I get some help on fixing this?
Thanks.
mysql
I have two tables talk_comments
and talk_comment_votes
.
I run the following code to select, commentId
, numberOfUpvotes
, whetherUserUpvoted
, numberOfDownvotes
, whetherUserDownvoted
usin LEFT JOINs to the same table.
SELECT c.id, COUNT(v1.id) as upvotes, COUNT(v2.id) as userUpvoted, COUNT(v3.id) as downvotes, COUNT(v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
I have the following data in my talk_comment_votes table
So, according to the query, it should select values 2,2,0,1,1
respectively. When I break those JOIN statements and do the queries, it returns the expected results. But, with JOINs, it returns something like the follows.
Can I get some help on fixing this?
Thanks.
mysql
mysql
edited Nov 26 '18 at 20:34
Supun Kavinda
asked Nov 26 '18 at 20:31
Supun KavindaSupun Kavinda
6217
6217
1
Not the best for performance but tryCOUNT(DISTINCT ...)
the LEFT JOIN's seams to multiply the count.. Best performing and safest method would be to do the counting in separted subqueries when working with multiple join's
– Raymond Nijland
Nov 26 '18 at 20:33
@RaymondNijland Wow, that returns the correct result. How is it working? Can you explain a little bit? Please add it as an answer so that I can mark it as the answer.
– Supun Kavinda
Nov 26 '18 at 20:37
Well the JOIN's make multiple records that will be counted. try the same query without theGROUP BY
andCOUNT()
then you will see what is going on there... Or change in the current query theCOUNT
intoGROUP_CONCAT
then you will see "duplicated" records.. WhichCOUNT(DISTINCT ..)
filters out..
– Raymond Nijland
Nov 26 '18 at 20:38
I see that. Thank you for the answer. If it is not the best performance, what would be the best one?
– Supun Kavinda
Nov 26 '18 at 20:40
By making subqueries more or less like ...SELECT * FROM (SELECT COUNT(*) FROM table) AS alias INNER JOIN table ON alias.column = table.column
or a subquery in a join like... table INNER JOIN (SELECT COUNT(*) FROM table2) AS alias ON table.column = alias.column
– Raymond Nijland
Nov 26 '18 at 20:43
|
show 2 more comments
1
Not the best for performance but tryCOUNT(DISTINCT ...)
the LEFT JOIN's seams to multiply the count.. Best performing and safest method would be to do the counting in separted subqueries when working with multiple join's
– Raymond Nijland
Nov 26 '18 at 20:33
@RaymondNijland Wow, that returns the correct result. How is it working? Can you explain a little bit? Please add it as an answer so that I can mark it as the answer.
– Supun Kavinda
Nov 26 '18 at 20:37
Well the JOIN's make multiple records that will be counted. try the same query without theGROUP BY
andCOUNT()
then you will see what is going on there... Or change in the current query theCOUNT
intoGROUP_CONCAT
then you will see "duplicated" records.. WhichCOUNT(DISTINCT ..)
filters out..
– Raymond Nijland
Nov 26 '18 at 20:38
I see that. Thank you for the answer. If it is not the best performance, what would be the best one?
– Supun Kavinda
Nov 26 '18 at 20:40
By making subqueries more or less like ...SELECT * FROM (SELECT COUNT(*) FROM table) AS alias INNER JOIN table ON alias.column = table.column
or a subquery in a join like... table INNER JOIN (SELECT COUNT(*) FROM table2) AS alias ON table.column = alias.column
– Raymond Nijland
Nov 26 '18 at 20:43
1
1
Not the best for performance but try
COUNT(DISTINCT ...)
the LEFT JOIN's seams to multiply the count.. Best performing and safest method would be to do the counting in separted subqueries when working with multiple join's– Raymond Nijland
Nov 26 '18 at 20:33
Not the best for performance but try
COUNT(DISTINCT ...)
the LEFT JOIN's seams to multiply the count.. Best performing and safest method would be to do the counting in separted subqueries when working with multiple join's– Raymond Nijland
Nov 26 '18 at 20:33
@RaymondNijland Wow, that returns the correct result. How is it working? Can you explain a little bit? Please add it as an answer so that I can mark it as the answer.
– Supun Kavinda
Nov 26 '18 at 20:37
@RaymondNijland Wow, that returns the correct result. How is it working? Can you explain a little bit? Please add it as an answer so that I can mark it as the answer.
– Supun Kavinda
Nov 26 '18 at 20:37
Well the JOIN's make multiple records that will be counted. try the same query without the
GROUP BY
and COUNT()
then you will see what is going on there... Or change in the current query the COUNT
into GROUP_CONCAT
then you will see "duplicated" records.. Which COUNT(DISTINCT ..)
filters out..– Raymond Nijland
Nov 26 '18 at 20:38
Well the JOIN's make multiple records that will be counted. try the same query without the
GROUP BY
and COUNT()
then you will see what is going on there... Or change in the current query the COUNT
into GROUP_CONCAT
then you will see "duplicated" records.. Which COUNT(DISTINCT ..)
filters out..– Raymond Nijland
Nov 26 '18 at 20:38
I see that. Thank you for the answer. If it is not the best performance, what would be the best one?
– Supun Kavinda
Nov 26 '18 at 20:40
I see that. Thank you for the answer. If it is not the best performance, what would be the best one?
– Supun Kavinda
Nov 26 '18 at 20:40
By making subqueries more or less like ...
SELECT * FROM (SELECT COUNT(*) FROM table) AS alias INNER JOIN table ON alias.column = table.column
or a subquery in a join like ... table INNER JOIN (SELECT COUNT(*) FROM table2) AS alias ON table.column = alias.column
– Raymond Nijland
Nov 26 '18 at 20:43
By making subqueries more or less like ...
SELECT * FROM (SELECT COUNT(*) FROM table) AS alias INNER JOIN table ON alias.column = table.column
or a subquery in a join like ... table INNER JOIN (SELECT COUNT(*) FROM table2) AS alias ON table.column = alias.column
– Raymond Nijland
Nov 26 '18 at 20:43
|
show 2 more comments
2 Answers
2
active
oldest
votes
I ran a benchmark on queries based on @spencer7593 and @RaymondNijland's 2 answers.
LEFT JOINS wins!
1. Using LEFT JOINS
SELECT c.id, COUNT(DISTINCT v1.id) as upvotes, COUNT(DISTINCT v2.id) as userUpvoted, COUNT(DISTINCT v3.id) as downvotes, COUNT(DISTINCT v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.55000805854797s
2. Using Sub Queries
SELECT c.id,c.user_id, c.time,c.body, c.reply_to,
(SELECT COUNT(v1.id) FROM talk_comment_votes v1 WHERE v1.comment_id = c.id AND v1.status = 1 LIMIT 1) as upvotes,
(SELECT COUNT(v2.id) FROM talk_comment_votes v2 WHERE v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 LIMIT 1) as clientUpvoted,
(SELECT COUNT(v3.id) FROM talk_comment_votes v3 WHERE v3.comment_id = c.id AND v3.status = 2 LIMIT 1) as downvotes,
(SELECT COUNT(v4.id) FROM talk_comment_votes v4 WHERE v4.comment_id = c.id AND v4.status = 2 AND v4.user_id = 1 LIMIT 1) as clientDownvoted
FROM talk_comments c
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.95499300956726s
3. Using SUM, IF
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
Time for 1000 queries: 1.2266919612885s
Thank you for all the answers.
add a comment |
I'd use conditional aggregation. A join to a single reference to tall_comment_votes
, and then check conditions in expressions.
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP
BY c.id
This avoids the problem of the partial cross product, when there are multiple rows returned from v1, v2, v3 and v4.
The MySQL IF()
expression could replaced with a more ANSI standards compliant CASE
expression, e.g.
, SUM(CASE WHEN v.status = 1 THEN 1 ELSE 0 END) AS upvotes
FOLLOWUP
setup test case and observe execution plans and performance
populate tables
CREATE TABLE talk_comments (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT);
CREATE TABLE talk_comment_votes (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, comment_id INT UNSIGNED NOT NULL, user_id INT UNSIGNED NOT NULL, is_anonymous TINYINT(1) UNSIGNED NOT NULL, STATUS TINYINT UNSIGNED, time_ INT UNSIGNED);
CREATE INDEX talk_comment_votes_IX1 ON talk_comment_votes (comment_id, STATUS, user_id, is_anonymous) ;
INSERT INTO talk_comments (id) VALUES (1),(2),(3);
INSERT INTO talk_comment_votes (id, comment_id, user_id, is_anonymous, STATUS, time_) VALUES (1,2,2,0,1,0),(2,1,1,0,1,0),(3,2,1,0,2,NULL),(4,7,1,0,2,NULL),(5,1,14,1,1,NULL),(6,2,14,1,1,NULL);
query execution plans
EXPLAIN
SELECT c.id, COUNT(DISTINCT v1.id) AS upvotes, COUNT(DISTINCT v2.id) AS userUpvoted, COUNT(DISTINCT v3.id) AS downvotes, COUNT(DISTINCT v4.id) AS userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
;
EXPLAIN
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
;
output from explain
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ----------------------- ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v1 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 2 Using index
-- 1 SIMPLE v2 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- 1 SIMPLE v3 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 1 Using index
-- 1 SIMPLE v4 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ------ ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v ref talk_comment_votes_IX1 talk_comment_votes_IX1 4 const 3 Using index
measured performance:
100 executions round 1 round 2 round 3
------------------------------------ ---------- ---------- ---------
multiple left join, count(distinct 0.123 secs 0.130 secs 0.125 secs
conditional aggregation sum(if 0.113 secs 0.114 secs 0.111 secs
Thank you very much! It works like a charm.
– Supun Kavinda
Nov 28 '18 at 2:19
But, according to the performance test, still LEFT JOIN wins
– Supun Kavinda
Nov 28 '18 at 2:22
I'm a little surprised that the conditional aggregation would be slower; we could include criteria in theON
clauseAND v.status IN (1,2)
, but I wouldn't expect that to make much of a difference, unless there's a slew of rows withstatus
values other than 1 or 2. Of course with large sets, performance will horrendous if appropriate indexes are not defined.
– spencer7593
Nov 28 '18 at 19:41
A test case shows performance of conditional aggregation is 10% faster.
– spencer7593
Nov 28 '18 at 20:10
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53488595%2fmysql-left-join-returns-unexpected-results%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I ran a benchmark on queries based on @spencer7593 and @RaymondNijland's 2 answers.
LEFT JOINS wins!
1. Using LEFT JOINS
SELECT c.id, COUNT(DISTINCT v1.id) as upvotes, COUNT(DISTINCT v2.id) as userUpvoted, COUNT(DISTINCT v3.id) as downvotes, COUNT(DISTINCT v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.55000805854797s
2. Using Sub Queries
SELECT c.id,c.user_id, c.time,c.body, c.reply_to,
(SELECT COUNT(v1.id) FROM talk_comment_votes v1 WHERE v1.comment_id = c.id AND v1.status = 1 LIMIT 1) as upvotes,
(SELECT COUNT(v2.id) FROM talk_comment_votes v2 WHERE v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 LIMIT 1) as clientUpvoted,
(SELECT COUNT(v3.id) FROM talk_comment_votes v3 WHERE v3.comment_id = c.id AND v3.status = 2 LIMIT 1) as downvotes,
(SELECT COUNT(v4.id) FROM talk_comment_votes v4 WHERE v4.comment_id = c.id AND v4.status = 2 AND v4.user_id = 1 LIMIT 1) as clientDownvoted
FROM talk_comments c
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.95499300956726s
3. Using SUM, IF
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
Time for 1000 queries: 1.2266919612885s
Thank you for all the answers.
add a comment |
I ran a benchmark on queries based on @spencer7593 and @RaymondNijland's 2 answers.
LEFT JOINS wins!
1. Using LEFT JOINS
SELECT c.id, COUNT(DISTINCT v1.id) as upvotes, COUNT(DISTINCT v2.id) as userUpvoted, COUNT(DISTINCT v3.id) as downvotes, COUNT(DISTINCT v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.55000805854797s
2. Using Sub Queries
SELECT c.id,c.user_id, c.time,c.body, c.reply_to,
(SELECT COUNT(v1.id) FROM talk_comment_votes v1 WHERE v1.comment_id = c.id AND v1.status = 1 LIMIT 1) as upvotes,
(SELECT COUNT(v2.id) FROM talk_comment_votes v2 WHERE v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 LIMIT 1) as clientUpvoted,
(SELECT COUNT(v3.id) FROM talk_comment_votes v3 WHERE v3.comment_id = c.id AND v3.status = 2 LIMIT 1) as downvotes,
(SELECT COUNT(v4.id) FROM talk_comment_votes v4 WHERE v4.comment_id = c.id AND v4.status = 2 AND v4.user_id = 1 LIMIT 1) as clientDownvoted
FROM talk_comments c
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.95499300956726s
3. Using SUM, IF
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
Time for 1000 queries: 1.2266919612885s
Thank you for all the answers.
add a comment |
I ran a benchmark on queries based on @spencer7593 and @RaymondNijland's 2 answers.
LEFT JOINS wins!
1. Using LEFT JOINS
SELECT c.id, COUNT(DISTINCT v1.id) as upvotes, COUNT(DISTINCT v2.id) as userUpvoted, COUNT(DISTINCT v3.id) as downvotes, COUNT(DISTINCT v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.55000805854797s
2. Using Sub Queries
SELECT c.id,c.user_id, c.time,c.body, c.reply_to,
(SELECT COUNT(v1.id) FROM talk_comment_votes v1 WHERE v1.comment_id = c.id AND v1.status = 1 LIMIT 1) as upvotes,
(SELECT COUNT(v2.id) FROM talk_comment_votes v2 WHERE v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 LIMIT 1) as clientUpvoted,
(SELECT COUNT(v3.id) FROM talk_comment_votes v3 WHERE v3.comment_id = c.id AND v3.status = 2 LIMIT 1) as downvotes,
(SELECT COUNT(v4.id) FROM talk_comment_votes v4 WHERE v4.comment_id = c.id AND v4.status = 2 AND v4.user_id = 1 LIMIT 1) as clientDownvoted
FROM talk_comments c
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.95499300956726s
3. Using SUM, IF
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
Time for 1000 queries: 1.2266919612885s
Thank you for all the answers.
I ran a benchmark on queries based on @spencer7593 and @RaymondNijland's 2 answers.
LEFT JOINS wins!
1. Using LEFT JOINS
SELECT c.id, COUNT(DISTINCT v1.id) as upvotes, COUNT(DISTINCT v2.id) as userUpvoted, COUNT(DISTINCT v3.id) as downvotes, COUNT(DISTINCT v4.id) as userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.55000805854797s
2. Using Sub Queries
SELECT c.id,c.user_id, c.time,c.body, c.reply_to,
(SELECT COUNT(v1.id) FROM talk_comment_votes v1 WHERE v1.comment_id = c.id AND v1.status = 1 LIMIT 1) as upvotes,
(SELECT COUNT(v2.id) FROM talk_comment_votes v2 WHERE v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 LIMIT 1) as clientUpvoted,
(SELECT COUNT(v3.id) FROM talk_comment_votes v3 WHERE v3.comment_id = c.id AND v3.status = 2 LIMIT 1) as downvotes,
(SELECT COUNT(v4.id) FROM talk_comment_votes v4 WHERE v4.comment_id = c.id AND v4.status = 2 AND v4.user_id = 1 LIMIT 1) as clientDownvoted
FROM talk_comments c
WHERE c.id = 2 GROUP BY c.id
Time for 1000 queries: 0.95499300956726s
3. Using SUM, IF
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
Time for 1000 queries: 1.2266919612885s
Thank you for all the answers.
answered Nov 28 '18 at 2:31
Supun KavindaSupun Kavinda
6217
6217
add a comment |
add a comment |
I'd use conditional aggregation. A join to a single reference to tall_comment_votes
, and then check conditions in expressions.
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP
BY c.id
This avoids the problem of the partial cross product, when there are multiple rows returned from v1, v2, v3 and v4.
The MySQL IF()
expression could replaced with a more ANSI standards compliant CASE
expression, e.g.
, SUM(CASE WHEN v.status = 1 THEN 1 ELSE 0 END) AS upvotes
FOLLOWUP
setup test case and observe execution plans and performance
populate tables
CREATE TABLE talk_comments (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT);
CREATE TABLE talk_comment_votes (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, comment_id INT UNSIGNED NOT NULL, user_id INT UNSIGNED NOT NULL, is_anonymous TINYINT(1) UNSIGNED NOT NULL, STATUS TINYINT UNSIGNED, time_ INT UNSIGNED);
CREATE INDEX talk_comment_votes_IX1 ON talk_comment_votes (comment_id, STATUS, user_id, is_anonymous) ;
INSERT INTO talk_comments (id) VALUES (1),(2),(3);
INSERT INTO talk_comment_votes (id, comment_id, user_id, is_anonymous, STATUS, time_) VALUES (1,2,2,0,1,0),(2,1,1,0,1,0),(3,2,1,0,2,NULL),(4,7,1,0,2,NULL),(5,1,14,1,1,NULL),(6,2,14,1,1,NULL);
query execution plans
EXPLAIN
SELECT c.id, COUNT(DISTINCT v1.id) AS upvotes, COUNT(DISTINCT v2.id) AS userUpvoted, COUNT(DISTINCT v3.id) AS downvotes, COUNT(DISTINCT v4.id) AS userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
;
EXPLAIN
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
;
output from explain
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ----------------------- ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v1 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 2 Using index
-- 1 SIMPLE v2 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- 1 SIMPLE v3 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 1 Using index
-- 1 SIMPLE v4 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ------ ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v ref talk_comment_votes_IX1 talk_comment_votes_IX1 4 const 3 Using index
measured performance:
100 executions round 1 round 2 round 3
------------------------------------ ---------- ---------- ---------
multiple left join, count(distinct 0.123 secs 0.130 secs 0.125 secs
conditional aggregation sum(if 0.113 secs 0.114 secs 0.111 secs
Thank you very much! It works like a charm.
– Supun Kavinda
Nov 28 '18 at 2:19
But, according to the performance test, still LEFT JOIN wins
– Supun Kavinda
Nov 28 '18 at 2:22
I'm a little surprised that the conditional aggregation would be slower; we could include criteria in theON
clauseAND v.status IN (1,2)
, but I wouldn't expect that to make much of a difference, unless there's a slew of rows withstatus
values other than 1 or 2. Of course with large sets, performance will horrendous if appropriate indexes are not defined.
– spencer7593
Nov 28 '18 at 19:41
A test case shows performance of conditional aggregation is 10% faster.
– spencer7593
Nov 28 '18 at 20:10
add a comment |
I'd use conditional aggregation. A join to a single reference to tall_comment_votes
, and then check conditions in expressions.
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP
BY c.id
This avoids the problem of the partial cross product, when there are multiple rows returned from v1, v2, v3 and v4.
The MySQL IF()
expression could replaced with a more ANSI standards compliant CASE
expression, e.g.
, SUM(CASE WHEN v.status = 1 THEN 1 ELSE 0 END) AS upvotes
FOLLOWUP
setup test case and observe execution plans and performance
populate tables
CREATE TABLE talk_comments (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT);
CREATE TABLE talk_comment_votes (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, comment_id INT UNSIGNED NOT NULL, user_id INT UNSIGNED NOT NULL, is_anonymous TINYINT(1) UNSIGNED NOT NULL, STATUS TINYINT UNSIGNED, time_ INT UNSIGNED);
CREATE INDEX talk_comment_votes_IX1 ON talk_comment_votes (comment_id, STATUS, user_id, is_anonymous) ;
INSERT INTO talk_comments (id) VALUES (1),(2),(3);
INSERT INTO talk_comment_votes (id, comment_id, user_id, is_anonymous, STATUS, time_) VALUES (1,2,2,0,1,0),(2,1,1,0,1,0),(3,2,1,0,2,NULL),(4,7,1,0,2,NULL),(5,1,14,1,1,NULL),(6,2,14,1,1,NULL);
query execution plans
EXPLAIN
SELECT c.id, COUNT(DISTINCT v1.id) AS upvotes, COUNT(DISTINCT v2.id) AS userUpvoted, COUNT(DISTINCT v3.id) AS downvotes, COUNT(DISTINCT v4.id) AS userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
;
EXPLAIN
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
;
output from explain
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ----------------------- ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v1 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 2 Using index
-- 1 SIMPLE v2 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- 1 SIMPLE v3 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 1 Using index
-- 1 SIMPLE v4 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ------ ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v ref talk_comment_votes_IX1 talk_comment_votes_IX1 4 const 3 Using index
measured performance:
100 executions round 1 round 2 round 3
------------------------------------ ---------- ---------- ---------
multiple left join, count(distinct 0.123 secs 0.130 secs 0.125 secs
conditional aggregation sum(if 0.113 secs 0.114 secs 0.111 secs
Thank you very much! It works like a charm.
– Supun Kavinda
Nov 28 '18 at 2:19
But, according to the performance test, still LEFT JOIN wins
– Supun Kavinda
Nov 28 '18 at 2:22
I'm a little surprised that the conditional aggregation would be slower; we could include criteria in theON
clauseAND v.status IN (1,2)
, but I wouldn't expect that to make much of a difference, unless there's a slew of rows withstatus
values other than 1 or 2. Of course with large sets, performance will horrendous if appropriate indexes are not defined.
– spencer7593
Nov 28 '18 at 19:41
A test case shows performance of conditional aggregation is 10% faster.
– spencer7593
Nov 28 '18 at 20:10
add a comment |
I'd use conditional aggregation. A join to a single reference to tall_comment_votes
, and then check conditions in expressions.
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP
BY c.id
This avoids the problem of the partial cross product, when there are multiple rows returned from v1, v2, v3 and v4.
The MySQL IF()
expression could replaced with a more ANSI standards compliant CASE
expression, e.g.
, SUM(CASE WHEN v.status = 1 THEN 1 ELSE 0 END) AS upvotes
FOLLOWUP
setup test case and observe execution plans and performance
populate tables
CREATE TABLE talk_comments (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT);
CREATE TABLE talk_comment_votes (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, comment_id INT UNSIGNED NOT NULL, user_id INT UNSIGNED NOT NULL, is_anonymous TINYINT(1) UNSIGNED NOT NULL, STATUS TINYINT UNSIGNED, time_ INT UNSIGNED);
CREATE INDEX talk_comment_votes_IX1 ON talk_comment_votes (comment_id, STATUS, user_id, is_anonymous) ;
INSERT INTO talk_comments (id) VALUES (1),(2),(3);
INSERT INTO talk_comment_votes (id, comment_id, user_id, is_anonymous, STATUS, time_) VALUES (1,2,2,0,1,0),(2,1,1,0,1,0),(3,2,1,0,2,NULL),(4,7,1,0,2,NULL),(5,1,14,1,1,NULL),(6,2,14,1,1,NULL);
query execution plans
EXPLAIN
SELECT c.id, COUNT(DISTINCT v1.id) AS upvotes, COUNT(DISTINCT v2.id) AS userUpvoted, COUNT(DISTINCT v3.id) AS downvotes, COUNT(DISTINCT v4.id) AS userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
;
EXPLAIN
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
;
output from explain
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ----------------------- ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v1 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 2 Using index
-- 1 SIMPLE v2 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- 1 SIMPLE v3 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 1 Using index
-- 1 SIMPLE v4 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ------ ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v ref talk_comment_votes_IX1 talk_comment_votes_IX1 4 const 3 Using index
measured performance:
100 executions round 1 round 2 round 3
------------------------------------ ---------- ---------- ---------
multiple left join, count(distinct 0.123 secs 0.130 secs 0.125 secs
conditional aggregation sum(if 0.113 secs 0.114 secs 0.111 secs
I'd use conditional aggregation. A join to a single reference to tall_comment_votes
, and then check conditions in expressions.
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP
BY c.id
This avoids the problem of the partial cross product, when there are multiple rows returned from v1, v2, v3 and v4.
The MySQL IF()
expression could replaced with a more ANSI standards compliant CASE
expression, e.g.
, SUM(CASE WHEN v.status = 1 THEN 1 ELSE 0 END) AS upvotes
FOLLOWUP
setup test case and observe execution plans and performance
populate tables
CREATE TABLE talk_comments (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT);
CREATE TABLE talk_comment_votes (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, comment_id INT UNSIGNED NOT NULL, user_id INT UNSIGNED NOT NULL, is_anonymous TINYINT(1) UNSIGNED NOT NULL, STATUS TINYINT UNSIGNED, time_ INT UNSIGNED);
CREATE INDEX talk_comment_votes_IX1 ON talk_comment_votes (comment_id, STATUS, user_id, is_anonymous) ;
INSERT INTO talk_comments (id) VALUES (1),(2),(3);
INSERT INTO talk_comment_votes (id, comment_id, user_id, is_anonymous, STATUS, time_) VALUES (1,2,2,0,1,0),(2,1,1,0,1,0),(3,2,1,0,2,NULL),(4,7,1,0,2,NULL),(5,1,14,1,1,NULL),(6,2,14,1,1,NULL);
query execution plans
EXPLAIN
SELECT c.id, COUNT(DISTINCT v1.id) AS upvotes, COUNT(DISTINCT v2.id) AS userUpvoted, COUNT(DISTINCT v3.id) AS downvotes, COUNT(DISTINCT v4.id) AS userDownvoted FROM talk_comments c
LEFT JOIN talk_comment_votes v1 ON v1.comment_id = c.id AND v1.status = 1
LEFT JOIN talk_comment_votes v2 ON v2.comment_id = c.id AND v2.status = 1 AND v2.user_id = 1 AND v2.is_anonymous = 0
LEFT JOIN talk_comment_votes v3 ON c.id = v3.comment_id AND v3.status = 2
LEFT JOIN talk_comment_votes v4 ON c.id = v4.comment_id AND v4.status = 2 AND v4.user_id = 1 AND v4.is_anonymous = 0
WHERE c.id = 2 GROUP BY c.id
;
EXPLAIN
SELECT c.id
, SUM(IF(v.status = 1 ,1,0)) AS upvotes
, SUM(IF(v.status = 1 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userUpvoted
, SUM(IF(v.status = 2 ,1,0)) AS downvotes
, SUM(IF(v.status = 2 AND v.user_id = 1 AND v.is_anonymous = 0 ,1,0)) AS userDownvoted
FROM talk_comments c
LEFT
JOIN talk_comment_votes v
ON v.comment_id = c.id
WHERE c.id = 2
GROUP BY c.id
;
output from explain
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ----------------------- ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v1 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 2 Using index
-- 1 SIMPLE v2 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- 1 SIMPLE v3 ref talk_comment_votes_IX1 talk_comment_votes_IX1 6 const,const 1 Using index
-- 1 SIMPLE v4 ref talk_comment_votes_IX1 talk_comment_votes_IX1 11 const,const,const,const 1 Using index
-- id select_type table type possible_keys key key_len ref rows Extra
-- ------ ----------- ------ ------ ---------------------- ---------------------- ------- ------ ------ -------------
-- 1 SIMPLE c const PRIMARY PRIMARY 4 const 1 Using index
-- 1 SIMPLE v ref talk_comment_votes_IX1 talk_comment_votes_IX1 4 const 3 Using index
measured performance:
100 executions round 1 round 2 round 3
------------------------------------ ---------- ---------- ---------
multiple left join, count(distinct 0.123 secs 0.130 secs 0.125 secs
conditional aggregation sum(if 0.113 secs 0.114 secs 0.111 secs
edited Nov 28 '18 at 20:05
answered Nov 26 '18 at 22:00
spencer7593spencer7593
86.6k118197
86.6k118197
Thank you very much! It works like a charm.
– Supun Kavinda
Nov 28 '18 at 2:19
But, according to the performance test, still LEFT JOIN wins
– Supun Kavinda
Nov 28 '18 at 2:22
I'm a little surprised that the conditional aggregation would be slower; we could include criteria in theON
clauseAND v.status IN (1,2)
, but I wouldn't expect that to make much of a difference, unless there's a slew of rows withstatus
values other than 1 or 2. Of course with large sets, performance will horrendous if appropriate indexes are not defined.
– spencer7593
Nov 28 '18 at 19:41
A test case shows performance of conditional aggregation is 10% faster.
– spencer7593
Nov 28 '18 at 20:10
add a comment |
Thank you very much! It works like a charm.
– Supun Kavinda
Nov 28 '18 at 2:19
But, according to the performance test, still LEFT JOIN wins
– Supun Kavinda
Nov 28 '18 at 2:22
I'm a little surprised that the conditional aggregation would be slower; we could include criteria in theON
clauseAND v.status IN (1,2)
, but I wouldn't expect that to make much of a difference, unless there's a slew of rows withstatus
values other than 1 or 2. Of course with large sets, performance will horrendous if appropriate indexes are not defined.
– spencer7593
Nov 28 '18 at 19:41
A test case shows performance of conditional aggregation is 10% faster.
– spencer7593
Nov 28 '18 at 20:10
Thank you very much! It works like a charm.
– Supun Kavinda
Nov 28 '18 at 2:19
Thank you very much! It works like a charm.
– Supun Kavinda
Nov 28 '18 at 2:19
But, according to the performance test, still LEFT JOIN wins
– Supun Kavinda
Nov 28 '18 at 2:22
But, according to the performance test, still LEFT JOIN wins
– Supun Kavinda
Nov 28 '18 at 2:22
I'm a little surprised that the conditional aggregation would be slower; we could include criteria in the
ON
clause AND v.status IN (1,2)
, but I wouldn't expect that to make much of a difference, unless there's a slew of rows with status
values other than 1 or 2. Of course with large sets, performance will horrendous if appropriate indexes are not defined.– spencer7593
Nov 28 '18 at 19:41
I'm a little surprised that the conditional aggregation would be slower; we could include criteria in the
ON
clause AND v.status IN (1,2)
, but I wouldn't expect that to make much of a difference, unless there's a slew of rows with status
values other than 1 or 2. Of course with large sets, performance will horrendous if appropriate indexes are not defined.– spencer7593
Nov 28 '18 at 19:41
A test case shows performance of conditional aggregation is 10% faster.
– spencer7593
Nov 28 '18 at 20:10
A test case shows performance of conditional aggregation is 10% faster.
– spencer7593
Nov 28 '18 at 20:10
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53488595%2fmysql-left-join-returns-unexpected-results%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Not the best for performance but try
COUNT(DISTINCT ...)
the LEFT JOIN's seams to multiply the count.. Best performing and safest method would be to do the counting in separted subqueries when working with multiple join's– Raymond Nijland
Nov 26 '18 at 20:33
@RaymondNijland Wow, that returns the correct result. How is it working? Can you explain a little bit? Please add it as an answer so that I can mark it as the answer.
– Supun Kavinda
Nov 26 '18 at 20:37
Well the JOIN's make multiple records that will be counted. try the same query without the
GROUP BY
andCOUNT()
then you will see what is going on there... Or change in the current query theCOUNT
intoGROUP_CONCAT
then you will see "duplicated" records.. WhichCOUNT(DISTINCT ..)
filters out..– Raymond Nijland
Nov 26 '18 at 20:38
I see that. Thank you for the answer. If it is not the best performance, what would be the best one?
– Supun Kavinda
Nov 26 '18 at 20:40
By making subqueries more or less like ...
SELECT * FROM (SELECT COUNT(*) FROM table) AS alias INNER JOIN table ON alias.column = table.column
or a subquery in a join like... table INNER JOIN (SELECT COUNT(*) FROM table2) AS alias ON table.column = alias.column
– Raymond Nijland
Nov 26 '18 at 20:43