Mysql “near” duplicate only with a pattern












1














I want a MySQL query :



To show "near" duplicate rows with : a reference and the same reference + the pattern "-??" ("-" and 2 chars ONLY, "?" is a random char).



Example with a table with id,reference :



id reference
1 DGGDL
2 DGGDL
3 HSDKH
4 HSDKH-45
5 2KXQF
6 2KXQF
7 2J6SF
8 2J6SF-442
9 FSM
10 148-54
11 148-54
12 148
13 BWZM-67


I want a request on this table with exactly this result :



 id reference
3 HSDKH
4 HSDKH-45
10 148-54
12 148


2J6SF-442 is not here because the pattern is "-" + 2 char only (442 is 3 char so it doesn't match the pattern).
HSDKH and HSDKH-45 are in the result because HSDKH-45 match "HSDKH-??" and HSDKH exist, BWZM-67 is NOT in the results because it match "BWZM-??" but there is no reference "BWZM" in the table.
All other "duplicates" that dont match the pattern are excluded from the result (like DGGDL because there is no reference like DGGDL-?? in the table).



my table name is products, and the simplified structure is :



id,reference


I tried many different requests without success… that's why I will not post useless request.
I don't know if I am very clear, but the example show exactly what I want.
Thank you !










share|improve this question



























    1














    I want a MySQL query :



    To show "near" duplicate rows with : a reference and the same reference + the pattern "-??" ("-" and 2 chars ONLY, "?" is a random char).



    Example with a table with id,reference :



    id reference
    1 DGGDL
    2 DGGDL
    3 HSDKH
    4 HSDKH-45
    5 2KXQF
    6 2KXQF
    7 2J6SF
    8 2J6SF-442
    9 FSM
    10 148-54
    11 148-54
    12 148
    13 BWZM-67


    I want a request on this table with exactly this result :



     id reference
    3 HSDKH
    4 HSDKH-45
    10 148-54
    12 148


    2J6SF-442 is not here because the pattern is "-" + 2 char only (442 is 3 char so it doesn't match the pattern).
    HSDKH and HSDKH-45 are in the result because HSDKH-45 match "HSDKH-??" and HSDKH exist, BWZM-67 is NOT in the results because it match "BWZM-??" but there is no reference "BWZM" in the table.
    All other "duplicates" that dont match the pattern are excluded from the result (like DGGDL because there is no reference like DGGDL-?? in the table).



    my table name is products, and the simplified structure is :



    id,reference


    I tried many different requests without success… that's why I will not post useless request.
    I don't know if I am very clear, but the example show exactly what I want.
    Thank you !










    share|improve this question

























      1












      1








      1







      I want a MySQL query :



      To show "near" duplicate rows with : a reference and the same reference + the pattern "-??" ("-" and 2 chars ONLY, "?" is a random char).



      Example with a table with id,reference :



      id reference
      1 DGGDL
      2 DGGDL
      3 HSDKH
      4 HSDKH-45
      5 2KXQF
      6 2KXQF
      7 2J6SF
      8 2J6SF-442
      9 FSM
      10 148-54
      11 148-54
      12 148
      13 BWZM-67


      I want a request on this table with exactly this result :



       id reference
      3 HSDKH
      4 HSDKH-45
      10 148-54
      12 148


      2J6SF-442 is not here because the pattern is "-" + 2 char only (442 is 3 char so it doesn't match the pattern).
      HSDKH and HSDKH-45 are in the result because HSDKH-45 match "HSDKH-??" and HSDKH exist, BWZM-67 is NOT in the results because it match "BWZM-??" but there is no reference "BWZM" in the table.
      All other "duplicates" that dont match the pattern are excluded from the result (like DGGDL because there is no reference like DGGDL-?? in the table).



      my table name is products, and the simplified structure is :



      id,reference


      I tried many different requests without success… that's why I will not post useless request.
      I don't know if I am very clear, but the example show exactly what I want.
      Thank you !










      share|improve this question













      I want a MySQL query :



      To show "near" duplicate rows with : a reference and the same reference + the pattern "-??" ("-" and 2 chars ONLY, "?" is a random char).



      Example with a table with id,reference :



      id reference
      1 DGGDL
      2 DGGDL
      3 HSDKH
      4 HSDKH-45
      5 2KXQF
      6 2KXQF
      7 2J6SF
      8 2J6SF-442
      9 FSM
      10 148-54
      11 148-54
      12 148
      13 BWZM-67


      I want a request on this table with exactly this result :



       id reference
      3 HSDKH
      4 HSDKH-45
      10 148-54
      12 148


      2J6SF-442 is not here because the pattern is "-" + 2 char only (442 is 3 char so it doesn't match the pattern).
      HSDKH and HSDKH-45 are in the result because HSDKH-45 match "HSDKH-??" and HSDKH exist, BWZM-67 is NOT in the results because it match "BWZM-??" but there is no reference "BWZM" in the table.
      All other "duplicates" that dont match the pattern are excluded from the result (like DGGDL because there is no reference like DGGDL-?? in the table).



      my table name is products, and the simplified structure is :



      id,reference


      I tried many different requests without success… that's why I will not post useless request.
      I don't know if I am very clear, but the example show exactly what I want.
      Thank you !







      mysql sql






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 '18 at 12:44









      neoteknicneoteknic

      1,449824




      1,449824
























          3 Answers
          3






          active

          oldest

          votes


















          2














          I think you want:



          select t.col
          from t
          where exists (select 1
          from t t2
          where t2.col like concat(t.col, '%-__') or
          t1.col like concat(t2.col, '%-__')
          );


          If the two characters are specifically numeric:



                        where t2.col regexp concat(t.col, '-[0-9]{2}$') or
          t1.col regexp concat(t2.col, '-[0-9]{2}$')


          Or, if you want the results on one row for each group:



          select group_concat(t.col)
          from t
          group by substring_index(t.col, '-', 1)
          having sum(t.col like '%-__') > 0 and
          sum(t.col not like '%-__') > 0;





          share|improve this answer























          • -% mean - + any char, I want two char only, maybe '-__' better ? I try this.
            – neoteknic
            Nov 21 '18 at 13:47










          • @neoteknic . . . Yes.
            – Gordon Linoff
            Nov 21 '18 at 14:10










          • Ok It'is working, but only on my test table. On the production table, 26k rows... Way too long (timeout after 2minutes...). reference is indexed, engine is innodb, MariaDB 10.1 Any way to improve performance ?
            – neoteknic
            Nov 22 '18 at 16:52








          • 1




            @neoteknic . . . Did you try the group by query?
            – Gordon Linoff
            Nov 23 '18 at 3:09










          • I try it, did'nt have much time yesterday.
            – neoteknic
            Nov 23 '18 at 13:02



















          1














          You are looking for all references that have a counterpart in the same table, where the two references only differ by the last three characters '-??'. In LIKE the character wildcard is _.



          The query:



          select *
          from mytable t1
          where exists
          (
          select *
          from mytable t2
          where t1.reference like concat(t2.reference, '-__')
          or t2.reference like concat(t1.reference, '-__')

          )
          order by reference;





          share|improve this answer





















          • Trying this buy very long and timeout in PMA. I will try on a test table ! 26k rows. reference is in index. Same answer than @Gordon Linoff
            – neoteknic
            Nov 22 '18 at 16:39












          • Maybe MAX_EXECUTION_TIME is set to a low value. If so remove that restriction: SET SESSION MAX_EXECUTION_TIME=0;.
            – Thorsten Kettner
            Nov 22 '18 at 19:13










          • Yes but inusable in production, I need a faster query (less than 10s). Thank you.
            – neoteknic
            Nov 23 '18 at 13:11



















          1














          Here is another approach: Add a computed column to the table holding the reference minus the trailing '-??'. Then create an index on that column.



          alter table mytable add column refshaved varchar(20) generated always as 
          (case when reference like '%-__'
          then left(reference, length(reference)-3)
          else reference end) stored;

          create index idx on mytable(refshaved, reference);

          select *
          from mytable t1
          where exists
          (
          select *
          from mytable t2
          where t2.refshaved = t1.refshaved
          and t2.reference <> t1.reference
          )
          order by reference;


          Rextester demo: https://rextester.com/OLHJ35843






          share|improve this answer





















          • Seems to be good, but you have to alter the table and add an index. I prefer the group_concat way, very fast (56ms) on big table and no need to alter the table. Thank you. Good answer too !
            – neoteknic
            Nov 23 '18 at 13:08













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412319%2fmysql-near-duplicate-only-with-a-pattern%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          I think you want:



          select t.col
          from t
          where exists (select 1
          from t t2
          where t2.col like concat(t.col, '%-__') or
          t1.col like concat(t2.col, '%-__')
          );


          If the two characters are specifically numeric:



                        where t2.col regexp concat(t.col, '-[0-9]{2}$') or
          t1.col regexp concat(t2.col, '-[0-9]{2}$')


          Or, if you want the results on one row for each group:



          select group_concat(t.col)
          from t
          group by substring_index(t.col, '-', 1)
          having sum(t.col like '%-__') > 0 and
          sum(t.col not like '%-__') > 0;





          share|improve this answer























          • -% mean - + any char, I want two char only, maybe '-__' better ? I try this.
            – neoteknic
            Nov 21 '18 at 13:47










          • @neoteknic . . . Yes.
            – Gordon Linoff
            Nov 21 '18 at 14:10










          • Ok It'is working, but only on my test table. On the production table, 26k rows... Way too long (timeout after 2minutes...). reference is indexed, engine is innodb, MariaDB 10.1 Any way to improve performance ?
            – neoteknic
            Nov 22 '18 at 16:52








          • 1




            @neoteknic . . . Did you try the group by query?
            – Gordon Linoff
            Nov 23 '18 at 3:09










          • I try it, did'nt have much time yesterday.
            – neoteknic
            Nov 23 '18 at 13:02
















          2














          I think you want:



          select t.col
          from t
          where exists (select 1
          from t t2
          where t2.col like concat(t.col, '%-__') or
          t1.col like concat(t2.col, '%-__')
          );


          If the two characters are specifically numeric:



                        where t2.col regexp concat(t.col, '-[0-9]{2}$') or
          t1.col regexp concat(t2.col, '-[0-9]{2}$')


          Or, if you want the results on one row for each group:



          select group_concat(t.col)
          from t
          group by substring_index(t.col, '-', 1)
          having sum(t.col like '%-__') > 0 and
          sum(t.col not like '%-__') > 0;





          share|improve this answer























          • -% mean - + any char, I want two char only, maybe '-__' better ? I try this.
            – neoteknic
            Nov 21 '18 at 13:47










          • @neoteknic . . . Yes.
            – Gordon Linoff
            Nov 21 '18 at 14:10










          • Ok It'is working, but only on my test table. On the production table, 26k rows... Way too long (timeout after 2minutes...). reference is indexed, engine is innodb, MariaDB 10.1 Any way to improve performance ?
            – neoteknic
            Nov 22 '18 at 16:52








          • 1




            @neoteknic . . . Did you try the group by query?
            – Gordon Linoff
            Nov 23 '18 at 3:09










          • I try it, did'nt have much time yesterday.
            – neoteknic
            Nov 23 '18 at 13:02














          2












          2








          2






          I think you want:



          select t.col
          from t
          where exists (select 1
          from t t2
          where t2.col like concat(t.col, '%-__') or
          t1.col like concat(t2.col, '%-__')
          );


          If the two characters are specifically numeric:



                        where t2.col regexp concat(t.col, '-[0-9]{2}$') or
          t1.col regexp concat(t2.col, '-[0-9]{2}$')


          Or, if you want the results on one row for each group:



          select group_concat(t.col)
          from t
          group by substring_index(t.col, '-', 1)
          having sum(t.col like '%-__') > 0 and
          sum(t.col not like '%-__') > 0;





          share|improve this answer














          I think you want:



          select t.col
          from t
          where exists (select 1
          from t t2
          where t2.col like concat(t.col, '%-__') or
          t1.col like concat(t2.col, '%-__')
          );


          If the two characters are specifically numeric:



                        where t2.col regexp concat(t.col, '-[0-9]{2}$') or
          t1.col regexp concat(t2.col, '-[0-9]{2}$')


          Or, if you want the results on one row for each group:



          select group_concat(t.col)
          from t
          group by substring_index(t.col, '-', 1)
          having sum(t.col like '%-__') > 0 and
          sum(t.col not like '%-__') > 0;






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 21 '18 at 14:10

























          answered Nov 21 '18 at 12:49









          Gordon LinoffGordon Linoff

          760k35294399




          760k35294399












          • -% mean - + any char, I want two char only, maybe '-__' better ? I try this.
            – neoteknic
            Nov 21 '18 at 13:47










          • @neoteknic . . . Yes.
            – Gordon Linoff
            Nov 21 '18 at 14:10










          • Ok It'is working, but only on my test table. On the production table, 26k rows... Way too long (timeout after 2minutes...). reference is indexed, engine is innodb, MariaDB 10.1 Any way to improve performance ?
            – neoteknic
            Nov 22 '18 at 16:52








          • 1




            @neoteknic . . . Did you try the group by query?
            – Gordon Linoff
            Nov 23 '18 at 3:09










          • I try it, did'nt have much time yesterday.
            – neoteknic
            Nov 23 '18 at 13:02


















          • -% mean - + any char, I want two char only, maybe '-__' better ? I try this.
            – neoteknic
            Nov 21 '18 at 13:47










          • @neoteknic . . . Yes.
            – Gordon Linoff
            Nov 21 '18 at 14:10










          • Ok It'is working, but only on my test table. On the production table, 26k rows... Way too long (timeout after 2minutes...). reference is indexed, engine is innodb, MariaDB 10.1 Any way to improve performance ?
            – neoteknic
            Nov 22 '18 at 16:52








          • 1




            @neoteknic . . . Did you try the group by query?
            – Gordon Linoff
            Nov 23 '18 at 3:09










          • I try it, did'nt have much time yesterday.
            – neoteknic
            Nov 23 '18 at 13:02
















          -% mean - + any char, I want two char only, maybe '-__' better ? I try this.
          – neoteknic
          Nov 21 '18 at 13:47




          -% mean - + any char, I want two char only, maybe '-__' better ? I try this.
          – neoteknic
          Nov 21 '18 at 13:47












          @neoteknic . . . Yes.
          – Gordon Linoff
          Nov 21 '18 at 14:10




          @neoteknic . . . Yes.
          – Gordon Linoff
          Nov 21 '18 at 14:10












          Ok It'is working, but only on my test table. On the production table, 26k rows... Way too long (timeout after 2minutes...). reference is indexed, engine is innodb, MariaDB 10.1 Any way to improve performance ?
          – neoteknic
          Nov 22 '18 at 16:52






          Ok It'is working, but only on my test table. On the production table, 26k rows... Way too long (timeout after 2minutes...). reference is indexed, engine is innodb, MariaDB 10.1 Any way to improve performance ?
          – neoteknic
          Nov 22 '18 at 16:52






          1




          1




          @neoteknic . . . Did you try the group by query?
          – Gordon Linoff
          Nov 23 '18 at 3:09




          @neoteknic . . . Did you try the group by query?
          – Gordon Linoff
          Nov 23 '18 at 3:09












          I try it, did'nt have much time yesterday.
          – neoteknic
          Nov 23 '18 at 13:02




          I try it, did'nt have much time yesterday.
          – neoteknic
          Nov 23 '18 at 13:02













          1














          You are looking for all references that have a counterpart in the same table, where the two references only differ by the last three characters '-??'. In LIKE the character wildcard is _.



          The query:



          select *
          from mytable t1
          where exists
          (
          select *
          from mytable t2
          where t1.reference like concat(t2.reference, '-__')
          or t2.reference like concat(t1.reference, '-__')

          )
          order by reference;





          share|improve this answer





















          • Trying this buy very long and timeout in PMA. I will try on a test table ! 26k rows. reference is in index. Same answer than @Gordon Linoff
            – neoteknic
            Nov 22 '18 at 16:39












          • Maybe MAX_EXECUTION_TIME is set to a low value. If so remove that restriction: SET SESSION MAX_EXECUTION_TIME=0;.
            – Thorsten Kettner
            Nov 22 '18 at 19:13










          • Yes but inusable in production, I need a faster query (less than 10s). Thank you.
            – neoteknic
            Nov 23 '18 at 13:11
















          1














          You are looking for all references that have a counterpart in the same table, where the two references only differ by the last three characters '-??'. In LIKE the character wildcard is _.



          The query:



          select *
          from mytable t1
          where exists
          (
          select *
          from mytable t2
          where t1.reference like concat(t2.reference, '-__')
          or t2.reference like concat(t1.reference, '-__')

          )
          order by reference;





          share|improve this answer





















          • Trying this buy very long and timeout in PMA. I will try on a test table ! 26k rows. reference is in index. Same answer than @Gordon Linoff
            – neoteknic
            Nov 22 '18 at 16:39












          • Maybe MAX_EXECUTION_TIME is set to a low value. If so remove that restriction: SET SESSION MAX_EXECUTION_TIME=0;.
            – Thorsten Kettner
            Nov 22 '18 at 19:13










          • Yes but inusable in production, I need a faster query (less than 10s). Thank you.
            – neoteknic
            Nov 23 '18 at 13:11














          1












          1








          1






          You are looking for all references that have a counterpart in the same table, where the two references only differ by the last three characters '-??'. In LIKE the character wildcard is _.



          The query:



          select *
          from mytable t1
          where exists
          (
          select *
          from mytable t2
          where t1.reference like concat(t2.reference, '-__')
          or t2.reference like concat(t1.reference, '-__')

          )
          order by reference;





          share|improve this answer












          You are looking for all references that have a counterpart in the same table, where the two references only differ by the last three characters '-??'. In LIKE the character wildcard is _.



          The query:



          select *
          from mytable t1
          where exists
          (
          select *
          from mytable t2
          where t1.reference like concat(t2.reference, '-__')
          or t2.reference like concat(t1.reference, '-__')

          )
          order by reference;






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 '18 at 12:55









          Thorsten KettnerThorsten Kettner

          50.6k22542




          50.6k22542












          • Trying this buy very long and timeout in PMA. I will try on a test table ! 26k rows. reference is in index. Same answer than @Gordon Linoff
            – neoteknic
            Nov 22 '18 at 16:39












          • Maybe MAX_EXECUTION_TIME is set to a low value. If so remove that restriction: SET SESSION MAX_EXECUTION_TIME=0;.
            – Thorsten Kettner
            Nov 22 '18 at 19:13










          • Yes but inusable in production, I need a faster query (less than 10s). Thank you.
            – neoteknic
            Nov 23 '18 at 13:11


















          • Trying this buy very long and timeout in PMA. I will try on a test table ! 26k rows. reference is in index. Same answer than @Gordon Linoff
            – neoteknic
            Nov 22 '18 at 16:39












          • Maybe MAX_EXECUTION_TIME is set to a low value. If so remove that restriction: SET SESSION MAX_EXECUTION_TIME=0;.
            – Thorsten Kettner
            Nov 22 '18 at 19:13










          • Yes but inusable in production, I need a faster query (less than 10s). Thank you.
            – neoteknic
            Nov 23 '18 at 13:11
















          Trying this buy very long and timeout in PMA. I will try on a test table ! 26k rows. reference is in index. Same answer than @Gordon Linoff
          – neoteknic
          Nov 22 '18 at 16:39






          Trying this buy very long and timeout in PMA. I will try on a test table ! 26k rows. reference is in index. Same answer than @Gordon Linoff
          – neoteknic
          Nov 22 '18 at 16:39














          Maybe MAX_EXECUTION_TIME is set to a low value. If so remove that restriction: SET SESSION MAX_EXECUTION_TIME=0;.
          – Thorsten Kettner
          Nov 22 '18 at 19:13




          Maybe MAX_EXECUTION_TIME is set to a low value. If so remove that restriction: SET SESSION MAX_EXECUTION_TIME=0;.
          – Thorsten Kettner
          Nov 22 '18 at 19:13












          Yes but inusable in production, I need a faster query (less than 10s). Thank you.
          – neoteknic
          Nov 23 '18 at 13:11




          Yes but inusable in production, I need a faster query (less than 10s). Thank you.
          – neoteknic
          Nov 23 '18 at 13:11











          1














          Here is another approach: Add a computed column to the table holding the reference minus the trailing '-??'. Then create an index on that column.



          alter table mytable add column refshaved varchar(20) generated always as 
          (case when reference like '%-__'
          then left(reference, length(reference)-3)
          else reference end) stored;

          create index idx on mytable(refshaved, reference);

          select *
          from mytable t1
          where exists
          (
          select *
          from mytable t2
          where t2.refshaved = t1.refshaved
          and t2.reference <> t1.reference
          )
          order by reference;


          Rextester demo: https://rextester.com/OLHJ35843






          share|improve this answer





















          • Seems to be good, but you have to alter the table and add an index. I prefer the group_concat way, very fast (56ms) on big table and no need to alter the table. Thank you. Good answer too !
            – neoteknic
            Nov 23 '18 at 13:08


















          1














          Here is another approach: Add a computed column to the table holding the reference minus the trailing '-??'. Then create an index on that column.



          alter table mytable add column refshaved varchar(20) generated always as 
          (case when reference like '%-__'
          then left(reference, length(reference)-3)
          else reference end) stored;

          create index idx on mytable(refshaved, reference);

          select *
          from mytable t1
          where exists
          (
          select *
          from mytable t2
          where t2.refshaved = t1.refshaved
          and t2.reference <> t1.reference
          )
          order by reference;


          Rextester demo: https://rextester.com/OLHJ35843






          share|improve this answer





















          • Seems to be good, but you have to alter the table and add an index. I prefer the group_concat way, very fast (56ms) on big table and no need to alter the table. Thank you. Good answer too !
            – neoteknic
            Nov 23 '18 at 13:08
















          1












          1








          1






          Here is another approach: Add a computed column to the table holding the reference minus the trailing '-??'. Then create an index on that column.



          alter table mytable add column refshaved varchar(20) generated always as 
          (case when reference like '%-__'
          then left(reference, length(reference)-3)
          else reference end) stored;

          create index idx on mytable(refshaved, reference);

          select *
          from mytable t1
          where exists
          (
          select *
          from mytable t2
          where t2.refshaved = t1.refshaved
          and t2.reference <> t1.reference
          )
          order by reference;


          Rextester demo: https://rextester.com/OLHJ35843






          share|improve this answer












          Here is another approach: Add a computed column to the table holding the reference minus the trailing '-??'. Then create an index on that column.



          alter table mytable add column refshaved varchar(20) generated always as 
          (case when reference like '%-__'
          then left(reference, length(reference)-3)
          else reference end) stored;

          create index idx on mytable(refshaved, reference);

          select *
          from mytable t1
          where exists
          (
          select *
          from mytable t2
          where t2.refshaved = t1.refshaved
          and t2.reference <> t1.reference
          )
          order by reference;


          Rextester demo: https://rextester.com/OLHJ35843







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 22 '18 at 19:21









          Thorsten KettnerThorsten Kettner

          50.6k22542




          50.6k22542












          • Seems to be good, but you have to alter the table and add an index. I prefer the group_concat way, very fast (56ms) on big table and no need to alter the table. Thank you. Good answer too !
            – neoteknic
            Nov 23 '18 at 13:08




















          • Seems to be good, but you have to alter the table and add an index. I prefer the group_concat way, very fast (56ms) on big table and no need to alter the table. Thank you. Good answer too !
            – neoteknic
            Nov 23 '18 at 13:08


















          Seems to be good, but you have to alter the table and add an index. I prefer the group_concat way, very fast (56ms) on big table and no need to alter the table. Thank you. Good answer too !
          – neoteknic
          Nov 23 '18 at 13:08






          Seems to be good, but you have to alter the table and add an index. I prefer the group_concat way, very fast (56ms) on big table and no need to alter the table. Thank you. Good answer too !
          – neoteknic
          Nov 23 '18 at 13:08




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412319%2fmysql-near-duplicate-only-with-a-pattern%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Wiesbaden

          Marschland

          Dieringhausen