Select only the “most complete” record





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







2















I need to solve the following problem.



Let's suppose I have a table with 4 fields called a, b, c, d.



I have the following records:



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | 4 row 2
1 | 2 | | 4 row 3
1 | 2 | 3 | row 4


As it's possible to observe, rows 1,3,4 are "sub-records" of row 2.



What I would like to do is, to extract only 2nd row.



Could you help me please?



Thanks in advance for the answer



EDIT: I need to be more specific.



I could have also the cases:



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | | 4 row 2
1 | | | 4 row 3


where I need to extract the 2nd row,



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | row 2
1 | | 3 | row 3


and again I need to extract the 2nd row.



Same for couples,





   a   |    b    |    c    |    d  
-------------------------------------
1 | | | row 1
1 | | 3 | row 2
| | 3 | row 3


and so on for the other examples.



(Of course, it's now always 2nd row)










share|improve this question

























  • what if there is more than 1 row with same sub-records?

    – Vamsi Prabhala
    Nov 26 '18 at 16:11











  • this case shouldn't be contemplated, however I still need only the most complete record

    – M.Orlandi
    Nov 26 '18 at 16:12


















2















I need to solve the following problem.



Let's suppose I have a table with 4 fields called a, b, c, d.



I have the following records:



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | 4 row 2
1 | 2 | | 4 row 3
1 | 2 | 3 | row 4


As it's possible to observe, rows 1,3,4 are "sub-records" of row 2.



What I would like to do is, to extract only 2nd row.



Could you help me please?



Thanks in advance for the answer



EDIT: I need to be more specific.



I could have also the cases:



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | | 4 row 2
1 | | | 4 row 3


where I need to extract the 2nd row,



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | row 2
1 | | 3 | row 3


and again I need to extract the 2nd row.



Same for couples,





   a   |    b    |    c    |    d  
-------------------------------------
1 | | | row 1
1 | | 3 | row 2
| | 3 | row 3


and so on for the other examples.



(Of course, it's now always 2nd row)










share|improve this question

























  • what if there is more than 1 row with same sub-records?

    – Vamsi Prabhala
    Nov 26 '18 at 16:11











  • this case shouldn't be contemplated, however I still need only the most complete record

    – M.Orlandi
    Nov 26 '18 at 16:12














2












2








2








I need to solve the following problem.



Let's suppose I have a table with 4 fields called a, b, c, d.



I have the following records:



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | 4 row 2
1 | 2 | | 4 row 3
1 | 2 | 3 | row 4


As it's possible to observe, rows 1,3,4 are "sub-records" of row 2.



What I would like to do is, to extract only 2nd row.



Could you help me please?



Thanks in advance for the answer



EDIT: I need to be more specific.



I could have also the cases:



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | | 4 row 2
1 | | | 4 row 3


where I need to extract the 2nd row,



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | row 2
1 | | 3 | row 3


and again I need to extract the 2nd row.



Same for couples,





   a   |    b    |    c    |    d  
-------------------------------------
1 | | | row 1
1 | | 3 | row 2
| | 3 | row 3


and so on for the other examples.



(Of course, it's now always 2nd row)










share|improve this question
















I need to solve the following problem.



Let's suppose I have a table with 4 fields called a, b, c, d.



I have the following records:



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | 4 row 2
1 | 2 | | 4 row 3
1 | 2 | 3 | row 4


As it's possible to observe, rows 1,3,4 are "sub-records" of row 2.



What I would like to do is, to extract only 2nd row.



Could you help me please?



Thanks in advance for the answer



EDIT: I need to be more specific.



I could have also the cases:



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | | 4 row 2
1 | | | 4 row 3


where I need to extract the 2nd row,



-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | row 2
1 | | 3 | row 3


and again I need to extract the 2nd row.



Same for couples,





   a   |    b    |    c    |    d  
-------------------------------------
1 | | | row 1
1 | | 3 | row 2
| | 3 | row 3


and so on for the other examples.



(Of course, it's now always 2nd row)







sql postgresql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 16:25







M.Orlandi

















asked Nov 26 '18 at 16:09









M.OrlandiM.Orlandi

498




498













  • what if there is more than 1 row with same sub-records?

    – Vamsi Prabhala
    Nov 26 '18 at 16:11











  • this case shouldn't be contemplated, however I still need only the most complete record

    – M.Orlandi
    Nov 26 '18 at 16:12



















  • what if there is more than 1 row with same sub-records?

    – Vamsi Prabhala
    Nov 26 '18 at 16:11











  • this case shouldn't be contemplated, however I still need only the most complete record

    – M.Orlandi
    Nov 26 '18 at 16:12

















what if there is more than 1 row with same sub-records?

– Vamsi Prabhala
Nov 26 '18 at 16:11





what if there is more than 1 row with same sub-records?

– Vamsi Prabhala
Nov 26 '18 at 16:11













this case shouldn't be contemplated, however I still need only the most complete record

– M.Orlandi
Nov 26 '18 at 16:12





this case shouldn't be contemplated, however I still need only the most complete record

– M.Orlandi
Nov 26 '18 at 16:12












5 Answers
5






active

oldest

votes


















1














Using a NOT EXISTS the records that have a better duplicate can be filtered out.






create table abcd (
a int,
b int,
c int,
d int
);






insert into abcd (a, b, c, d) values
(1, 2, null, null)
,(1, 2, 3, 4)
,(1, 2, null, 4)
,(1, 2, 3, null)

,(2, 3, null,null)
,(2, 3, null, 5)
,(2, null, null, 5)

,(3, null, null, null)
,(3, null, 5, null)
,(null, null, 5, null)






SELECT *  
FROM abcd AS t
WHERE NOT EXISTS
(
select 1
from abcd as d
where (t.a is null or d.a = t.a)
and (t.b is null or d.b = t.b)
and (t.c is null or d.c = t.c)
and (t.d is null or d.d = t.d)
and (case when t.a is null then 0 else 1 end +
case when t.b is null then 0 else 1 end +
case when t.c is null then 0 else 1 end +
case when t.d is null then 0 else 1 end) <
(case when d.a is null then 0 else 1 end +
case when d.b is null then 0 else 1 end +
case when d.c is null then 0 else 1 end +
case when d.d is null then 0 else 1 end)
);



a | b | c | d
-: | ---: | ---: | ---:
1 | 2 | 3 | 4
2 | 3 | null | 5
3 | null | 5 | null



db<>fiddle here






share|improve this answer
























  • that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you

    – M.Orlandi
    Nov 27 '18 at 11:08











  • @M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples (4, null, 6, null), (4, 5, null, null), (4, 6, null, null). But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.

    – LukStorms
    Nov 27 '18 at 12:01





















0














You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:



   (CASE WHEN a IS NULL THEN 0 ELSE 1) +
(CASE WHEN b IS NULL THEN 0 ELSE 1) +
(CASE WHEN c IS NULL THEN 0 ELSE 1) +
(CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex


Then SELECT the top 1 ordered by CompletionIndex in descending order.



This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.






share|improve this answer































    0














    Most complete rows, by your definition, are the ones with the least null columns:



    SELECT * FROM tablename
    WHERE (
    (CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
    (CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
    (CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
    (CASE WHEN d IS NULL THEN 0 ELSE 1 END)
    ) =
    (SELECT MAX(
    (CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
    (CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
    (CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
    (CASE WHEN d IS NULL THEN 0 ELSE 1 END))
    FROM tablename)





    share|improve this answer


























    • Or more simply, since you're checking all columns: WHERE NOT tablename IS NULL

      – 404
      Nov 26 '18 at 16:20








    • 1





      @eurotrash What is this? Is this a valid statement?

      – forpas
      Nov 26 '18 at 16:24











    • i have edited my question to be more specific.

      – M.Orlandi
      Nov 26 '18 at 16:26













    • @forpas Yes, assuming the table name is tablename, by saying NOT tablename IS NULL it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.

      – 404
      Nov 26 '18 at 16:27











    • @eurotrash have you got any documentation about it?

      – forpas
      Nov 26 '18 at 16:28



















    0














    Hmmm . . . I think you can use not exists:



    with t as (
    select t.*, row_number() over (order by a) as id
    from t
    )
    select t.*
    from t
    where not exists (select 1
    from t t2
    where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
    (t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
    (t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
    (t2.d is not distinct from t.d or t2.d is not null and t.d is null)
    ) and
    t2.id <> t.id
    );


    The logic is that no more specific row exists, where the values match



    Here is a db<>fiddle.






    share|improve this answer


























    • Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen

      – M.Orlandi
      Nov 26 '18 at 16:27













    • @M.Orlandi . . . The additional clarifications in the question helped me refine the logic.

      – Gordon Linoff
      Nov 26 '18 at 16:30



















    0














    As mentioned by Gordon Linoff, we do have to use something like not exists too,



    Edit Using EXCEPT helps



    This might work...



    SELECT * from table1 
    EXCEPT
    (
    SELECT t1.*
    FROM table1 t1
    JOIN table1 t2
    ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
    AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
    AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
    AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
    )


    Here, t1 is every subset row.



    Note: We are assuming value -1 as sentinel value and it does not occur in any column.






    share|improve this answer


























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53485031%2fselect-only-the-most-complete-record%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      Using a NOT EXISTS the records that have a better duplicate can be filtered out.






      create table abcd (
      a int,
      b int,
      c int,
      d int
      );






      insert into abcd (a, b, c, d) values
      (1, 2, null, null)
      ,(1, 2, 3, 4)
      ,(1, 2, null, 4)
      ,(1, 2, 3, null)

      ,(2, 3, null,null)
      ,(2, 3, null, 5)
      ,(2, null, null, 5)

      ,(3, null, null, null)
      ,(3, null, 5, null)
      ,(null, null, 5, null)






      SELECT *  
      FROM abcd AS t
      WHERE NOT EXISTS
      (
      select 1
      from abcd as d
      where (t.a is null or d.a = t.a)
      and (t.b is null or d.b = t.b)
      and (t.c is null or d.c = t.c)
      and (t.d is null or d.d = t.d)
      and (case when t.a is null then 0 else 1 end +
      case when t.b is null then 0 else 1 end +
      case when t.c is null then 0 else 1 end +
      case when t.d is null then 0 else 1 end) <
      (case when d.a is null then 0 else 1 end +
      case when d.b is null then 0 else 1 end +
      case when d.c is null then 0 else 1 end +
      case when d.d is null then 0 else 1 end)
      );



      a | b | c | d
      -: | ---: | ---: | ---:
      1 | 2 | 3 | 4
      2 | 3 | null | 5
      3 | null | 5 | null



      db<>fiddle here






      share|improve this answer
























      • that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you

        – M.Orlandi
        Nov 27 '18 at 11:08











      • @M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples (4, null, 6, null), (4, 5, null, null), (4, 6, null, null). But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.

        – LukStorms
        Nov 27 '18 at 12:01


















      1














      Using a NOT EXISTS the records that have a better duplicate can be filtered out.






      create table abcd (
      a int,
      b int,
      c int,
      d int
      );






      insert into abcd (a, b, c, d) values
      (1, 2, null, null)
      ,(1, 2, 3, 4)
      ,(1, 2, null, 4)
      ,(1, 2, 3, null)

      ,(2, 3, null,null)
      ,(2, 3, null, 5)
      ,(2, null, null, 5)

      ,(3, null, null, null)
      ,(3, null, 5, null)
      ,(null, null, 5, null)






      SELECT *  
      FROM abcd AS t
      WHERE NOT EXISTS
      (
      select 1
      from abcd as d
      where (t.a is null or d.a = t.a)
      and (t.b is null or d.b = t.b)
      and (t.c is null or d.c = t.c)
      and (t.d is null or d.d = t.d)
      and (case when t.a is null then 0 else 1 end +
      case when t.b is null then 0 else 1 end +
      case when t.c is null then 0 else 1 end +
      case when t.d is null then 0 else 1 end) <
      (case when d.a is null then 0 else 1 end +
      case when d.b is null then 0 else 1 end +
      case when d.c is null then 0 else 1 end +
      case when d.d is null then 0 else 1 end)
      );



      a | b | c | d
      -: | ---: | ---: | ---:
      1 | 2 | 3 | 4
      2 | 3 | null | 5
      3 | null | 5 | null



      db<>fiddle here






      share|improve this answer
























      • that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you

        – M.Orlandi
        Nov 27 '18 at 11:08











      • @M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples (4, null, 6, null), (4, 5, null, null), (4, 6, null, null). But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.

        – LukStorms
        Nov 27 '18 at 12:01
















      1












      1








      1







      Using a NOT EXISTS the records that have a better duplicate can be filtered out.






      create table abcd (
      a int,
      b int,
      c int,
      d int
      );






      insert into abcd (a, b, c, d) values
      (1, 2, null, null)
      ,(1, 2, 3, 4)
      ,(1, 2, null, 4)
      ,(1, 2, 3, null)

      ,(2, 3, null,null)
      ,(2, 3, null, 5)
      ,(2, null, null, 5)

      ,(3, null, null, null)
      ,(3, null, 5, null)
      ,(null, null, 5, null)






      SELECT *  
      FROM abcd AS t
      WHERE NOT EXISTS
      (
      select 1
      from abcd as d
      where (t.a is null or d.a = t.a)
      and (t.b is null or d.b = t.b)
      and (t.c is null or d.c = t.c)
      and (t.d is null or d.d = t.d)
      and (case when t.a is null then 0 else 1 end +
      case when t.b is null then 0 else 1 end +
      case when t.c is null then 0 else 1 end +
      case when t.d is null then 0 else 1 end) <
      (case when d.a is null then 0 else 1 end +
      case when d.b is null then 0 else 1 end +
      case when d.c is null then 0 else 1 end +
      case when d.d is null then 0 else 1 end)
      );



      a | b | c | d
      -: | ---: | ---: | ---:
      1 | 2 | 3 | 4
      2 | 3 | null | 5
      3 | null | 5 | null



      db<>fiddle here






      share|improve this answer













      Using a NOT EXISTS the records that have a better duplicate can be filtered out.






      create table abcd (
      a int,
      b int,
      c int,
      d int
      );






      insert into abcd (a, b, c, d) values
      (1, 2, null, null)
      ,(1, 2, 3, 4)
      ,(1, 2, null, 4)
      ,(1, 2, 3, null)

      ,(2, 3, null,null)
      ,(2, 3, null, 5)
      ,(2, null, null, 5)

      ,(3, null, null, null)
      ,(3, null, 5, null)
      ,(null, null, 5, null)






      SELECT *  
      FROM abcd AS t
      WHERE NOT EXISTS
      (
      select 1
      from abcd as d
      where (t.a is null or d.a = t.a)
      and (t.b is null or d.b = t.b)
      and (t.c is null or d.c = t.c)
      and (t.d is null or d.d = t.d)
      and (case when t.a is null then 0 else 1 end +
      case when t.b is null then 0 else 1 end +
      case when t.c is null then 0 else 1 end +
      case when t.d is null then 0 else 1 end) <
      (case when d.a is null then 0 else 1 end +
      case when d.b is null then 0 else 1 end +
      case when d.c is null then 0 else 1 end +
      case when d.d is null then 0 else 1 end)
      );



      a | b | c | d
      -: | ---: | ---: | ---:
      1 | 2 | 3 | 4
      2 | 3 | null | 5
      3 | null | 5 | null



      db<>fiddle here







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Nov 26 '18 at 20:20









      LukStormsLukStorms

      14.1k31734




      14.1k31734













      • that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you

        – M.Orlandi
        Nov 27 '18 at 11:08











      • @M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples (4, null, 6, null), (4, 5, null, null), (4, 6, null, null). But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.

        – LukStorms
        Nov 27 '18 at 12:01





















      • that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you

        – M.Orlandi
        Nov 27 '18 at 11:08











      • @M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples (4, null, 6, null), (4, 5, null, null), (4, 6, null, null). But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.

        – LukStorms
        Nov 27 '18 at 12:01



















      that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you

      – M.Orlandi
      Nov 27 '18 at 11:08





      that's exactly what I was looking for! I have only added a select distinct over a,b,c,d instead of select * and it worked perfectly! Thank you

      – M.Orlandi
      Nov 27 '18 at 11:08













      @M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples (4, null, 6, null), (4, 5, null, null), (4, 6, null, null). But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.

      – LukStorms
      Nov 27 '18 at 12:01







      @M.Orlandi Thx for the rep. Just a warning though. This won't filter out special cases like the tupples (4, null, 6, null), (4, 5, null, null), (4, 6, null, null). But the nice thing about this method is that you can select the "lesses" dups by simply removing the NOT. For example if you want to delete the less complete dups.

      – LukStorms
      Nov 27 '18 at 12:01















      0














      You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:



         (CASE WHEN a IS NULL THEN 0 ELSE 1) +
      (CASE WHEN b IS NULL THEN 0 ELSE 1) +
      (CASE WHEN c IS NULL THEN 0 ELSE 1) +
      (CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex


      Then SELECT the top 1 ordered by CompletionIndex in descending order.



      This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.






      share|improve this answer




























        0














        You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:



           (CASE WHEN a IS NULL THEN 0 ELSE 1) +
        (CASE WHEN b IS NULL THEN 0 ELSE 1) +
        (CASE WHEN c IS NULL THEN 0 ELSE 1) +
        (CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex


        Then SELECT the top 1 ordered by CompletionIndex in descending order.



        This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.






        share|improve this answer


























          0












          0








          0







          You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:



             (CASE WHEN a IS NULL THEN 0 ELSE 1) +
          (CASE WHEN b IS NULL THEN 0 ELSE 1) +
          (CASE WHEN c IS NULL THEN 0 ELSE 1) +
          (CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex


          Then SELECT the top 1 ordered by CompletionIndex in descending order.



          This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.






          share|improve this answer













          You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:



             (CASE WHEN a IS NULL THEN 0 ELSE 1) +
          (CASE WHEN b IS NULL THEN 0 ELSE 1) +
          (CASE WHEN c IS NULL THEN 0 ELSE 1) +
          (CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex


          Then SELECT the top 1 ordered by CompletionIndex in descending order.



          This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 26 '18 at 16:43









          Larry LustigLarry Lustig

          40.5k1284130




          40.5k1284130























              0














              Most complete rows, by your definition, are the ones with the least null columns:



              SELECT * FROM tablename
              WHERE (
              (CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN d IS NULL THEN 0 ELSE 1 END)
              ) =
              (SELECT MAX(
              (CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN d IS NULL THEN 0 ELSE 1 END))
              FROM tablename)





              share|improve this answer


























              • Or more simply, since you're checking all columns: WHERE NOT tablename IS NULL

                – 404
                Nov 26 '18 at 16:20








              • 1





                @eurotrash What is this? Is this a valid statement?

                – forpas
                Nov 26 '18 at 16:24











              • i have edited my question to be more specific.

                – M.Orlandi
                Nov 26 '18 at 16:26













              • @forpas Yes, assuming the table name is tablename, by saying NOT tablename IS NULL it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.

                – 404
                Nov 26 '18 at 16:27











              • @eurotrash have you got any documentation about it?

                – forpas
                Nov 26 '18 at 16:28
















              0














              Most complete rows, by your definition, are the ones with the least null columns:



              SELECT * FROM tablename
              WHERE (
              (CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN d IS NULL THEN 0 ELSE 1 END)
              ) =
              (SELECT MAX(
              (CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN d IS NULL THEN 0 ELSE 1 END))
              FROM tablename)





              share|improve this answer


























              • Or more simply, since you're checking all columns: WHERE NOT tablename IS NULL

                – 404
                Nov 26 '18 at 16:20








              • 1





                @eurotrash What is this? Is this a valid statement?

                – forpas
                Nov 26 '18 at 16:24











              • i have edited my question to be more specific.

                – M.Orlandi
                Nov 26 '18 at 16:26













              • @forpas Yes, assuming the table name is tablename, by saying NOT tablename IS NULL it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.

                – 404
                Nov 26 '18 at 16:27











              • @eurotrash have you got any documentation about it?

                – forpas
                Nov 26 '18 at 16:28














              0












              0








              0







              Most complete rows, by your definition, are the ones with the least null columns:



              SELECT * FROM tablename
              WHERE (
              (CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN d IS NULL THEN 0 ELSE 1 END)
              ) =
              (SELECT MAX(
              (CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN d IS NULL THEN 0 ELSE 1 END))
              FROM tablename)





              share|improve this answer















              Most complete rows, by your definition, are the ones with the least null columns:



              SELECT * FROM tablename
              WHERE (
              (CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN d IS NULL THEN 0 ELSE 1 END)
              ) =
              (SELECT MAX(
              (CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
              (CASE WHEN d IS NULL THEN 0 ELSE 1 END))
              FROM tablename)






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 26 '18 at 16:46

























              answered Nov 26 '18 at 16:19









              forpasforpas

              19.8k4830




              19.8k4830













              • Or more simply, since you're checking all columns: WHERE NOT tablename IS NULL

                – 404
                Nov 26 '18 at 16:20








              • 1





                @eurotrash What is this? Is this a valid statement?

                – forpas
                Nov 26 '18 at 16:24











              • i have edited my question to be more specific.

                – M.Orlandi
                Nov 26 '18 at 16:26













              • @forpas Yes, assuming the table name is tablename, by saying NOT tablename IS NULL it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.

                – 404
                Nov 26 '18 at 16:27











              • @eurotrash have you got any documentation about it?

                – forpas
                Nov 26 '18 at 16:28



















              • Or more simply, since you're checking all columns: WHERE NOT tablename IS NULL

                – 404
                Nov 26 '18 at 16:20








              • 1





                @eurotrash What is this? Is this a valid statement?

                – forpas
                Nov 26 '18 at 16:24











              • i have edited my question to be more specific.

                – M.Orlandi
                Nov 26 '18 at 16:26













              • @forpas Yes, assuming the table name is tablename, by saying NOT tablename IS NULL it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.

                – 404
                Nov 26 '18 at 16:27











              • @eurotrash have you got any documentation about it?

                – forpas
                Nov 26 '18 at 16:28

















              Or more simply, since you're checking all columns: WHERE NOT tablename IS NULL

              – 404
              Nov 26 '18 at 16:20







              Or more simply, since you're checking all columns: WHERE NOT tablename IS NULL

              – 404
              Nov 26 '18 at 16:20






              1




              1





              @eurotrash What is this? Is this a valid statement?

              – forpas
              Nov 26 '18 at 16:24





              @eurotrash What is this? Is this a valid statement?

              – forpas
              Nov 26 '18 at 16:24













              i have edited my question to be more specific.

              – M.Orlandi
              Nov 26 '18 at 16:26







              i have edited my question to be more specific.

              – M.Orlandi
              Nov 26 '18 at 16:26















              @forpas Yes, assuming the table name is tablename, by saying NOT tablename IS NULL it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.

              – 404
              Nov 26 '18 at 16:27





              @forpas Yes, assuming the table name is tablename, by saying NOT tablename IS NULL it checks the whole row, and the statement will return TRUE only if none of the fields are NULL. So it's a shorter way than specifying every field.

              – 404
              Nov 26 '18 at 16:27













              @eurotrash have you got any documentation about it?

              – forpas
              Nov 26 '18 at 16:28





              @eurotrash have you got any documentation about it?

              – forpas
              Nov 26 '18 at 16:28











              0














              Hmmm . . . I think you can use not exists:



              with t as (
              select t.*, row_number() over (order by a) as id
              from t
              )
              select t.*
              from t
              where not exists (select 1
              from t t2
              where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
              (t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
              (t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
              (t2.d is not distinct from t.d or t2.d is not null and t.d is null)
              ) and
              t2.id <> t.id
              );


              The logic is that no more specific row exists, where the values match



              Here is a db<>fiddle.






              share|improve this answer


























              • Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen

                – M.Orlandi
                Nov 26 '18 at 16:27













              • @M.Orlandi . . . The additional clarifications in the question helped me refine the logic.

                – Gordon Linoff
                Nov 26 '18 at 16:30
















              0














              Hmmm . . . I think you can use not exists:



              with t as (
              select t.*, row_number() over (order by a) as id
              from t
              )
              select t.*
              from t
              where not exists (select 1
              from t t2
              where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
              (t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
              (t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
              (t2.d is not distinct from t.d or t2.d is not null and t.d is null)
              ) and
              t2.id <> t.id
              );


              The logic is that no more specific row exists, where the values match



              Here is a db<>fiddle.






              share|improve this answer


























              • Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen

                – M.Orlandi
                Nov 26 '18 at 16:27













              • @M.Orlandi . . . The additional clarifications in the question helped me refine the logic.

                – Gordon Linoff
                Nov 26 '18 at 16:30














              0












              0








              0







              Hmmm . . . I think you can use not exists:



              with t as (
              select t.*, row_number() over (order by a) as id
              from t
              )
              select t.*
              from t
              where not exists (select 1
              from t t2
              where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
              (t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
              (t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
              (t2.d is not distinct from t.d or t2.d is not null and t.d is null)
              ) and
              t2.id <> t.id
              );


              The logic is that no more specific row exists, where the values match



              Here is a db<>fiddle.






              share|improve this answer















              Hmmm . . . I think you can use not exists:



              with t as (
              select t.*, row_number() over (order by a) as id
              from t
              )
              select t.*
              from t
              where not exists (select 1
              from t t2
              where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
              (t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
              (t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
              (t2.d is not distinct from t.d or t2.d is not null and t.d is null)
              ) and
              t2.id <> t.id
              );


              The logic is that no more specific row exists, where the values match



              Here is a db<>fiddle.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 26 '18 at 16:51

























              answered Nov 26 '18 at 16:21









              Gordon LinoffGordon Linoff

              794k37318423




              794k37318423













              • Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen

                – M.Orlandi
                Nov 26 '18 at 16:27













              • @M.Orlandi . . . The additional clarifications in the question helped me refine the logic.

                – Gordon Linoff
                Nov 26 '18 at 16:30



















              • Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen

                – M.Orlandi
                Nov 26 '18 at 16:27













              • @M.Orlandi . . . The additional clarifications in the question helped me refine the logic.

                – Gordon Linoff
                Nov 26 '18 at 16:30

















              Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen

              – M.Orlandi
              Nov 26 '18 at 16:27







              Thank you for your answer! i have edited my question to be more specific because this is only a pattern that can happen

              – M.Orlandi
              Nov 26 '18 at 16:27















              @M.Orlandi . . . The additional clarifications in the question helped me refine the logic.

              – Gordon Linoff
              Nov 26 '18 at 16:30





              @M.Orlandi . . . The additional clarifications in the question helped me refine the logic.

              – Gordon Linoff
              Nov 26 '18 at 16:30











              0














              As mentioned by Gordon Linoff, we do have to use something like not exists too,



              Edit Using EXCEPT helps



              This might work...



              SELECT * from table1 
              EXCEPT
              (
              SELECT t1.*
              FROM table1 t1
              JOIN table1 t2
              ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
              AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
              AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
              AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
              )


              Here, t1 is every subset row.



              Note: We are assuming value -1 as sentinel value and it does not occur in any column.






              share|improve this answer






























                0














                As mentioned by Gordon Linoff, we do have to use something like not exists too,



                Edit Using EXCEPT helps



                This might work...



                SELECT * from table1 
                EXCEPT
                (
                SELECT t1.*
                FROM table1 t1
                JOIN table1 t2
                ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
                AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
                AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
                AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
                )


                Here, t1 is every subset row.



                Note: We are assuming value -1 as sentinel value and it does not occur in any column.






                share|improve this answer




























                  0












                  0








                  0







                  As mentioned by Gordon Linoff, we do have to use something like not exists too,



                  Edit Using EXCEPT helps



                  This might work...



                  SELECT * from table1 
                  EXCEPT
                  (
                  SELECT t1.*
                  FROM table1 t1
                  JOIN table1 t2
                  ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
                  AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
                  AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
                  AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
                  )


                  Here, t1 is every subset row.



                  Note: We are assuming value -1 as sentinel value and it does not occur in any column.






                  share|improve this answer















                  As mentioned by Gordon Linoff, we do have to use something like not exists too,



                  Edit Using EXCEPT helps



                  This might work...



                  SELECT * from table1 
                  EXCEPT
                  (
                  SELECT t1.*
                  FROM table1 t1
                  JOIN table1 t2
                  ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
                  AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
                  AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
                  AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
                  )


                  Here, t1 is every subset row.



                  Note: We are assuming value -1 as sentinel value and it does not occur in any column.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 27 '18 at 1:14

























                  answered Nov 26 '18 at 16:55









                  Sahil DhokedSahil Dhoked

                  373212




                  373212






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53485031%2fselect-only-the-most-complete-record%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Wiesbaden

                      Marschland

                      Dieringhausen