Select only the “most complete” record
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I need to solve the following problem.
Let's suppose I have a table with 4 fields called a, b, c, d.
I have the following records:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | 4 row 2
1 | 2 | | 4 row 3
1 | 2 | 3 | row 4
As it's possible to observe, rows 1,3,4 are "sub-records" of row 2.
What I would like to do is, to extract only 2nd row.
Could you help me please?
Thanks in advance for the answer
EDIT: I need to be more specific.
I could have also the cases:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | | 4 row 2
1 | | | 4 row 3
where I need to extract the 2nd row,
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | row 2
1 | | 3 | row 3
and again I need to extract the 2nd row.
Same for couples,
a | b | c | d
-------------------------------------
1 | | | row 1
1 | | 3 | row 2
| | 3 | row 3
and so on for the other examples.
(Of course, it's now always 2nd row)
sql postgresql
add a comment |
I need to solve the following problem.
Let's suppose I have a table with 4 fields called a, b, c, d.
I have the following records:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | 4 row 2
1 | 2 | | 4 row 3
1 | 2 | 3 | row 4
As it's possible to observe, rows 1,3,4 are "sub-records" of row 2.
What I would like to do is, to extract only 2nd row.
Could you help me please?
Thanks in advance for the answer
EDIT: I need to be more specific.
I could have also the cases:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | | 4 row 2
1 | | | 4 row 3
where I need to extract the 2nd row,
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | row 2
1 | | 3 | row 3
and again I need to extract the 2nd row.
Same for couples,
a | b | c | d
-------------------------------------
1 | | | row 1
1 | | 3 | row 2
| | 3 | row 3
and so on for the other examples.
(Of course, it's now always 2nd row)
sql postgresql
what if there is more than 1 row with same sub-records?
– Vamsi Prabhala
Nov 26 '18 at 16:11
this case shouldn't be contemplated, however I still need only the most complete record
– M.Orlandi
Nov 26 '18 at 16:12
add a comment |
I need to solve the following problem.
Let's suppose I have a table with 4 fields called a, b, c, d.
I have the following records:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | 4 row 2
1 | 2 | | 4 row 3
1 | 2 | 3 | row 4
As it's possible to observe, rows 1,3,4 are "sub-records" of row 2.
What I would like to do is, to extract only 2nd row.
Could you help me please?
Thanks in advance for the answer
EDIT: I need to be more specific.
I could have also the cases:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | | 4 row 2
1 | | | 4 row 3
where I need to extract the 2nd row,
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | row 2
1 | | 3 | row 3
and again I need to extract the 2nd row.
Same for couples,
a | b | c | d
-------------------------------------
1 | | | row 1
1 | | 3 | row 2
| | 3 | row 3
and so on for the other examples.
(Of course, it's now always 2nd row)
sql postgresql
I need to solve the following problem.
Let's suppose I have a table with 4 fields called a, b, c, d.
I have the following records:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | 4 row 2
1 | 2 | | 4 row 3
1 | 2 | 3 | row 4
As it's possible to observe, rows 1,3,4 are "sub-records" of row 2.
What I would like to do is, to extract only 2nd row.
Could you help me please?
Thanks in advance for the answer
EDIT: I need to be more specific.
I could have also the cases:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | | 4 row 2
1 | | | 4 row 3
where I need to extract the 2nd row,
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | row 2
1 | | 3 | row 3
and again I need to extract the 2nd row.
Same for couples,
a | b | c | d
-------------------------------------
1 | | | row 1
1 | | 3 | row 2
| | 3 | row 3
and so on for the other examples.
(Of course, it's now always 2nd row)
sql postgresql
sql postgresql
edited Nov 26 '18 at 16:25
M.Orlandi
asked Nov 26 '18 at 16:09
M.OrlandiM.Orlandi
498
498
what if there is more than 1 row with same sub-records?
– Vamsi Prabhala
Nov 26 '18 at 16:11
this case shouldn't be contemplated, however I still need only the most complete record
– M.Orlandi
Nov 26 '18 at 16:12
add a comment |
what if there is more than 1 row with same sub-records?
– Vamsi Prabhala
Nov 26 '18 at 16:11
this case shouldn't be contemplated, however I still need only the most complete record
– M.Orlandi
Nov 26 '18 at 16:12
what if there is more than 1 row with same sub-records?
– Vamsi Prabhala
Nov 26 '18 at 16:11
what if there is more than 1 row with same sub-records?
– Vamsi Prabhala
Nov 26 '18 at 16:11
this case shouldn't be contemplated, however I still need only the most complete record
– M.Orlandi
Nov 26 '18 at 16:12
this case shouldn't be contemplated, however I still need only the most complete record
– M.Orlandi
Nov 26 '18 at 16:12
add a comment |
5 Answers
5
active
oldest
votes
Using a NOT EXISTS the records that have a better duplicate can be filtered out.
create table abcd (
a int,
b int,
c int,
d int
);
insert into abcd (a, b, c, d) values
(1, 2, null, null)
,(1, 2, 3, 4)
,(1, 2, null, 4)
,(1, 2, 3, null)
,(2, 3, null,null)
,(2, 3, null, 5)
,(2, null, null, 5)
,(3, null, null, null)
,(3, null, 5, null)
,(null, null, 5, null)
SELECT *
FROM abcd AS t
WHERE NOT EXISTS
(
select 1
from abcd as d
where (t.a is null or d.a = t.a)
and (t.b is null or d.b = t.b)
and (t.c is null or d.c = t.c)
and (t.d is null or d.d = t.d)
and (case when t.a is null then 0 else 1 end +
case when t.b is null then 0 else 1 end +
case when t.c is null then 0 else 1 end +
case when t.d is null then 0 else 1 end) <
(case when d.a is null then 0 else 1 end +
case when d.b is null then 0 else 1 end +
case when d.c is null then 0 else 1 end +
case when d.d is null then 0 else 1 end)
);
a | b | c | d
-: | ---: | ---: | ---:
1 | 2 | 3 | 4
2 | 3 | null | 5
3 | null | 5 | null
db<>fiddle here
that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you
– M.Orlandi
Nov 27 '18 at 11:08
@M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples(4, null, 6, null), (4, 5, null, null), (4, 6, null, null)
. But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.
– LukStorms
Nov 27 '18 at 12:01
add a comment |
You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:
(CASE WHEN a IS NULL THEN 0 ELSE 1) +
(CASE WHEN b IS NULL THEN 0 ELSE 1) +
(CASE WHEN c IS NULL THEN 0 ELSE 1) +
(CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex
Then SELECT the top 1 ordered by CompletionIndex in descending order.
This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.
add a comment |
Most complete rows, by your definition, are the ones with the least null columns:
SELECT * FROM tablename
WHERE (
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END)
) =
(SELECT MAX(
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END))
FROM tablename)
Or more simply, since you're checking all columns:WHERE NOT tablename IS NULL
– 404
Nov 26 '18 at 16:20
1
@eurotrash What is this? Is this a valid statement?
– forpas
Nov 26 '18 at 16:24
i have edited my question to be more specific.
– M.Orlandi
Nov 26 '18 at 16:26
@forpas Yes, assuming the table name istablename
, by sayingNOT tablename IS NULL
it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.
– 404
Nov 26 '18 at 16:27
@eurotrash have you got any documentation about it?
– forpas
Nov 26 '18 at 16:28
|
show 6 more comments
Hmmm . . . I think you can use not exists
:
with t as (
select t.*, row_number() over (order by a) as id
from t
)
select t.*
from t
where not exists (select 1
from t t2
where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
(t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
(t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
(t2.d is not distinct from t.d or t2.d is not null and t.d is null)
) and
t2.id <> t.id
);
The logic is that no more specific row exists, where the values match
Here is a db<>fiddle.
Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen
– M.Orlandi
Nov 26 '18 at 16:27
@M.Orlandi . . . The additional clarifications in the question helped me refine the logic.
– Gordon Linoff
Nov 26 '18 at 16:30
add a comment |
As mentioned by Gordon Linoff, we do have to use something like not exists too,
Edit Using EXCEPT helps
This might work...
SELECT * from table1
EXCEPT
(
SELECT t1.*
FROM table1 t1
JOIN table1 t2
ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
)
Here, t1 is every subset row.
Note: We are assuming value -1 as sentinel value and it does not occur in any column.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53485031%2fselect-only-the-most-complete-record%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using a NOT EXISTS the records that have a better duplicate can be filtered out.
create table abcd (
a int,
b int,
c int,
d int
);
insert into abcd (a, b, c, d) values
(1, 2, null, null)
,(1, 2, 3, 4)
,(1, 2, null, 4)
,(1, 2, 3, null)
,(2, 3, null,null)
,(2, 3, null, 5)
,(2, null, null, 5)
,(3, null, null, null)
,(3, null, 5, null)
,(null, null, 5, null)
SELECT *
FROM abcd AS t
WHERE NOT EXISTS
(
select 1
from abcd as d
where (t.a is null or d.a = t.a)
and (t.b is null or d.b = t.b)
and (t.c is null or d.c = t.c)
and (t.d is null or d.d = t.d)
and (case when t.a is null then 0 else 1 end +
case when t.b is null then 0 else 1 end +
case when t.c is null then 0 else 1 end +
case when t.d is null then 0 else 1 end) <
(case when d.a is null then 0 else 1 end +
case when d.b is null then 0 else 1 end +
case when d.c is null then 0 else 1 end +
case when d.d is null then 0 else 1 end)
);
a | b | c | d
-: | ---: | ---: | ---:
1 | 2 | 3 | 4
2 | 3 | null | 5
3 | null | 5 | null
db<>fiddle here
that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you
– M.Orlandi
Nov 27 '18 at 11:08
@M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples(4, null, 6, null), (4, 5, null, null), (4, 6, null, null)
. But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.
– LukStorms
Nov 27 '18 at 12:01
add a comment |
Using a NOT EXISTS the records that have a better duplicate can be filtered out.
create table abcd (
a int,
b int,
c int,
d int
);
insert into abcd (a, b, c, d) values
(1, 2, null, null)
,(1, 2, 3, 4)
,(1, 2, null, 4)
,(1, 2, 3, null)
,(2, 3, null,null)
,(2, 3, null, 5)
,(2, null, null, 5)
,(3, null, null, null)
,(3, null, 5, null)
,(null, null, 5, null)
SELECT *
FROM abcd AS t
WHERE NOT EXISTS
(
select 1
from abcd as d
where (t.a is null or d.a = t.a)
and (t.b is null or d.b = t.b)
and (t.c is null or d.c = t.c)
and (t.d is null or d.d = t.d)
and (case when t.a is null then 0 else 1 end +
case when t.b is null then 0 else 1 end +
case when t.c is null then 0 else 1 end +
case when t.d is null then 0 else 1 end) <
(case when d.a is null then 0 else 1 end +
case when d.b is null then 0 else 1 end +
case when d.c is null then 0 else 1 end +
case when d.d is null then 0 else 1 end)
);
a | b | c | d
-: | ---: | ---: | ---:
1 | 2 | 3 | 4
2 | 3 | null | 5
3 | null | 5 | null
db<>fiddle here
that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you
– M.Orlandi
Nov 27 '18 at 11:08
@M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples(4, null, 6, null), (4, 5, null, null), (4, 6, null, null)
. But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.
– LukStorms
Nov 27 '18 at 12:01
add a comment |
Using a NOT EXISTS the records that have a better duplicate can be filtered out.
create table abcd (
a int,
b int,
c int,
d int
);
insert into abcd (a, b, c, d) values
(1, 2, null, null)
,(1, 2, 3, 4)
,(1, 2, null, 4)
,(1, 2, 3, null)
,(2, 3, null,null)
,(2, 3, null, 5)
,(2, null, null, 5)
,(3, null, null, null)
,(3, null, 5, null)
,(null, null, 5, null)
SELECT *
FROM abcd AS t
WHERE NOT EXISTS
(
select 1
from abcd as d
where (t.a is null or d.a = t.a)
and (t.b is null or d.b = t.b)
and (t.c is null or d.c = t.c)
and (t.d is null or d.d = t.d)
and (case when t.a is null then 0 else 1 end +
case when t.b is null then 0 else 1 end +
case when t.c is null then 0 else 1 end +
case when t.d is null then 0 else 1 end) <
(case when d.a is null then 0 else 1 end +
case when d.b is null then 0 else 1 end +
case when d.c is null then 0 else 1 end +
case when d.d is null then 0 else 1 end)
);
a | b | c | d
-: | ---: | ---: | ---:
1 | 2 | 3 | 4
2 | 3 | null | 5
3 | null | 5 | null
db<>fiddle here
Using a NOT EXISTS the records that have a better duplicate can be filtered out.
create table abcd (
a int,
b int,
c int,
d int
);
insert into abcd (a, b, c, d) values
(1, 2, null, null)
,(1, 2, 3, 4)
,(1, 2, null, 4)
,(1, 2, 3, null)
,(2, 3, null,null)
,(2, 3, null, 5)
,(2, null, null, 5)
,(3, null, null, null)
,(3, null, 5, null)
,(null, null, 5, null)
SELECT *
FROM abcd AS t
WHERE NOT EXISTS
(
select 1
from abcd as d
where (t.a is null or d.a = t.a)
and (t.b is null or d.b = t.b)
and (t.c is null or d.c = t.c)
and (t.d is null or d.d = t.d)
and (case when t.a is null then 0 else 1 end +
case when t.b is null then 0 else 1 end +
case when t.c is null then 0 else 1 end +
case when t.d is null then 0 else 1 end) <
(case when d.a is null then 0 else 1 end +
case when d.b is null then 0 else 1 end +
case when d.c is null then 0 else 1 end +
case when d.d is null then 0 else 1 end)
);
a | b | c | d
-: | ---: | ---: | ---:
1 | 2 | 3 | 4
2 | 3 | null | 5
3 | null | 5 | null
db<>fiddle here
answered Nov 26 '18 at 20:20
LukStormsLukStorms
14.1k31734
14.1k31734
that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you
– M.Orlandi
Nov 27 '18 at 11:08
@M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples(4, null, 6, null), (4, 5, null, null), (4, 6, null, null)
. But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.
– LukStorms
Nov 27 '18 at 12:01
add a comment |
that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you
– M.Orlandi
Nov 27 '18 at 11:08
@M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples(4, null, 6, null), (4, 5, null, null), (4, 6, null, null)
. But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.
– LukStorms
Nov 27 '18 at 12:01
that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you
– M.Orlandi
Nov 27 '18 at 11:08
that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you
– M.Orlandi
Nov 27 '18 at 11:08
@M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples
(4, null, 6, null), (4, 5, null, null), (4, 6, null, null)
. But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.– LukStorms
Nov 27 '18 at 12:01
@M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples
(4, null, 6, null), (4, 5, null, null), (4, 6, null, null)
. But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.– LukStorms
Nov 27 '18 at 12:01
add a comment |
You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:
(CASE WHEN a IS NULL THEN 0 ELSE 1) +
(CASE WHEN b IS NULL THEN 0 ELSE 1) +
(CASE WHEN c IS NULL THEN 0 ELSE 1) +
(CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex
Then SELECT the top 1 ordered by CompletionIndex in descending order.
This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.
add a comment |
You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:
(CASE WHEN a IS NULL THEN 0 ELSE 1) +
(CASE WHEN b IS NULL THEN 0 ELSE 1) +
(CASE WHEN c IS NULL THEN 0 ELSE 1) +
(CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex
Then SELECT the top 1 ordered by CompletionIndex in descending order.
This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.
add a comment |
You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:
(CASE WHEN a IS NULL THEN 0 ELSE 1) +
(CASE WHEN b IS NULL THEN 0 ELSE 1) +
(CASE WHEN c IS NULL THEN 0 ELSE 1) +
(CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex
Then SELECT the top 1 ordered by CompletionIndex in descending order.
This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.
You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:
(CASE WHEN a IS NULL THEN 0 ELSE 1) +
(CASE WHEN b IS NULL THEN 0 ELSE 1) +
(CASE WHEN c IS NULL THEN 0 ELSE 1) +
(CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex
Then SELECT the top 1 ordered by CompletionIndex in descending order.
This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.
answered Nov 26 '18 at 16:43
Larry LustigLarry Lustig
40.5k1284130
40.5k1284130
add a comment |
add a comment |
Most complete rows, by your definition, are the ones with the least null columns:
SELECT * FROM tablename
WHERE (
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END)
) =
(SELECT MAX(
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END))
FROM tablename)
Or more simply, since you're checking all columns:WHERE NOT tablename IS NULL
– 404
Nov 26 '18 at 16:20
1
@eurotrash What is this? Is this a valid statement?
– forpas
Nov 26 '18 at 16:24
i have edited my question to be more specific.
– M.Orlandi
Nov 26 '18 at 16:26
@forpas Yes, assuming the table name istablename
, by sayingNOT tablename IS NULL
it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.
– 404
Nov 26 '18 at 16:27
@eurotrash have you got any documentation about it?
– forpas
Nov 26 '18 at 16:28
|
show 6 more comments
Most complete rows, by your definition, are the ones with the least null columns:
SELECT * FROM tablename
WHERE (
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END)
) =
(SELECT MAX(
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END))
FROM tablename)
Or more simply, since you're checking all columns:WHERE NOT tablename IS NULL
– 404
Nov 26 '18 at 16:20
1
@eurotrash What is this? Is this a valid statement?
– forpas
Nov 26 '18 at 16:24
i have edited my question to be more specific.
– M.Orlandi
Nov 26 '18 at 16:26
@forpas Yes, assuming the table name istablename
, by sayingNOT tablename IS NULL
it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.
– 404
Nov 26 '18 at 16:27
@eurotrash have you got any documentation about it?
– forpas
Nov 26 '18 at 16:28
|
show 6 more comments
Most complete rows, by your definition, are the ones with the least null columns:
SELECT * FROM tablename
WHERE (
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END)
) =
(SELECT MAX(
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END))
FROM tablename)
Most complete rows, by your definition, are the ones with the least null columns:
SELECT * FROM tablename
WHERE (
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END)
) =
(SELECT MAX(
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END))
FROM tablename)
edited Nov 26 '18 at 16:46
answered Nov 26 '18 at 16:19
forpasforpas
19.8k4830
19.8k4830
Or more simply, since you're checking all columns:WHERE NOT tablename IS NULL
– 404
Nov 26 '18 at 16:20
1
@eurotrash What is this? Is this a valid statement?
– forpas
Nov 26 '18 at 16:24
i have edited my question to be more specific.
– M.Orlandi
Nov 26 '18 at 16:26
@forpas Yes, assuming the table name istablename
, by sayingNOT tablename IS NULL
it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.
– 404
Nov 26 '18 at 16:27
@eurotrash have you got any documentation about it?
– forpas
Nov 26 '18 at 16:28
|
show 6 more comments
Or more simply, since you're checking all columns:WHERE NOT tablename IS NULL
– 404
Nov 26 '18 at 16:20
1
@eurotrash What is this? Is this a valid statement?
– forpas
Nov 26 '18 at 16:24
i have edited my question to be more specific.
– M.Orlandi
Nov 26 '18 at 16:26
@forpas Yes, assuming the table name istablename
, by sayingNOT tablename IS NULL
it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.
– 404
Nov 26 '18 at 16:27
@eurotrash have you got any documentation about it?
– forpas
Nov 26 '18 at 16:28
Or more simply, since you're checking all columns:
WHERE NOT tablename IS NULL
– 404
Nov 26 '18 at 16:20
Or more simply, since you're checking all columns:
WHERE NOT tablename IS NULL
– 404
Nov 26 '18 at 16:20
1
1
@eurotrash What is this? Is this a valid statement?
– forpas
Nov 26 '18 at 16:24
@eurotrash What is this? Is this a valid statement?
– forpas
Nov 26 '18 at 16:24
i have edited my question to be more specific.
– M.Orlandi
Nov 26 '18 at 16:26
i have edited my question to be more specific.
– M.Orlandi
Nov 26 '18 at 16:26
@forpas Yes, assuming the table name is
tablename
, by saying NOT tablename IS NULL
it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.– 404
Nov 26 '18 at 16:27
@forpas Yes, assuming the table name is
tablename
, by saying NOT tablename IS NULL
it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.– 404
Nov 26 '18 at 16:27
@eurotrash have you got any documentation about it?
– forpas
Nov 26 '18 at 16:28
@eurotrash have you got any documentation about it?
– forpas
Nov 26 '18 at 16:28
|
show 6 more comments
Hmmm . . . I think you can use not exists
:
with t as (
select t.*, row_number() over (order by a) as id
from t
)
select t.*
from t
where not exists (select 1
from t t2
where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
(t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
(t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
(t2.d is not distinct from t.d or t2.d is not null and t.d is null)
) and
t2.id <> t.id
);
The logic is that no more specific row exists, where the values match
Here is a db<>fiddle.
Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen
– M.Orlandi
Nov 26 '18 at 16:27
@M.Orlandi . . . The additional clarifications in the question helped me refine the logic.
– Gordon Linoff
Nov 26 '18 at 16:30
add a comment |
Hmmm . . . I think you can use not exists
:
with t as (
select t.*, row_number() over (order by a) as id
from t
)
select t.*
from t
where not exists (select 1
from t t2
where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
(t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
(t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
(t2.d is not distinct from t.d or t2.d is not null and t.d is null)
) and
t2.id <> t.id
);
The logic is that no more specific row exists, where the values match
Here is a db<>fiddle.
Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen
– M.Orlandi
Nov 26 '18 at 16:27
@M.Orlandi . . . The additional clarifications in the question helped me refine the logic.
– Gordon Linoff
Nov 26 '18 at 16:30
add a comment |
Hmmm . . . I think you can use not exists
:
with t as (
select t.*, row_number() over (order by a) as id
from t
)
select t.*
from t
where not exists (select 1
from t t2
where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
(t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
(t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
(t2.d is not distinct from t.d or t2.d is not null and t.d is null)
) and
t2.id <> t.id
);
The logic is that no more specific row exists, where the values match
Here is a db<>fiddle.
Hmmm . . . I think you can use not exists
:
with t as (
select t.*, row_number() over (order by a) as id
from t
)
select t.*
from t
where not exists (select 1
from t t2
where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
(t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
(t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
(t2.d is not distinct from t.d or t2.d is not null and t.d is null)
) and
t2.id <> t.id
);
The logic is that no more specific row exists, where the values match
Here is a db<>fiddle.
edited Nov 26 '18 at 16:51
answered Nov 26 '18 at 16:21
Gordon LinoffGordon Linoff
794k37318423
794k37318423
Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen
– M.Orlandi
Nov 26 '18 at 16:27
@M.Orlandi . . . The additional clarifications in the question helped me refine the logic.
– Gordon Linoff
Nov 26 '18 at 16:30
add a comment |
Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen
– M.Orlandi
Nov 26 '18 at 16:27
@M.Orlandi . . . The additional clarifications in the question helped me refine the logic.
– Gordon Linoff
Nov 26 '18 at 16:30
Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen
– M.Orlandi
Nov 26 '18 at 16:27
Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen
– M.Orlandi
Nov 26 '18 at 16:27
@M.Orlandi . . . The additional clarifications in the question helped me refine the logic.
– Gordon Linoff
Nov 26 '18 at 16:30
@M.Orlandi . . . The additional clarifications in the question helped me refine the logic.
– Gordon Linoff
Nov 26 '18 at 16:30
add a comment |
As mentioned by Gordon Linoff, we do have to use something like not exists too,
Edit Using EXCEPT helps
This might work...
SELECT * from table1
EXCEPT
(
SELECT t1.*
FROM table1 t1
JOIN table1 t2
ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
)
Here, t1 is every subset row.
Note: We are assuming value -1 as sentinel value and it does not occur in any column.
add a comment |
As mentioned by Gordon Linoff, we do have to use something like not exists too,
Edit Using EXCEPT helps
This might work...
SELECT * from table1
EXCEPT
(
SELECT t1.*
FROM table1 t1
JOIN table1 t2
ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
)
Here, t1 is every subset row.
Note: We are assuming value -1 as sentinel value and it does not occur in any column.
add a comment |
As mentioned by Gordon Linoff, we do have to use something like not exists too,
Edit Using EXCEPT helps
This might work...
SELECT * from table1
EXCEPT
(
SELECT t1.*
FROM table1 t1
JOIN table1 t2
ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
)
Here, t1 is every subset row.
Note: We are assuming value -1 as sentinel value and it does not occur in any column.
As mentioned by Gordon Linoff, we do have to use something like not exists too,
Edit Using EXCEPT helps
This might work...
SELECT * from table1
EXCEPT
(
SELECT t1.*
FROM table1 t1
JOIN table1 t2
ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
)
Here, t1 is every subset row.
Note: We are assuming value -1 as sentinel value and it does not occur in any column.
edited Nov 27 '18 at 1:14
answered Nov 26 '18 at 16:55
Sahil DhokedSahil Dhoked
373212
373212
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53485031%2fselect-only-the-most-complete-record%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
what if there is more than 1 row with same sub-records?
– Vamsi Prabhala
Nov 26 '18 at 16:11
this case shouldn't be contemplated, however I still need only the most complete record
– M.Orlandi
Nov 26 '18 at 16:12