Very slow MySQL query performance

I've a query that takes about 18 seconds to finish:

THE QUERY:

SELECT YEAR(c.date), MONTH(c.date), p.district_id, COUNT(p.owner_id)

FROM commission c

  INNER JOIN partner p ON c.customer_id = p.id

WHERE (c.date BETWEEN '2018-01-01' AND '2018-12-31')

  AND (c.company_id = 90)

  AND (c.source = 'ACTUAL')

  AND (p.id IN (3062, 3063, 3064, 3065, 3066, 3067, 3068, 3069, 3070, 3071,

    3072, 3073, 3074, 3075, 3076, 3077, 3078, 3079, 3081, 3082, 3083, 3084,

    3085, 3086, 3087, 3088, 3089, 3090, 3091, 3092, 3093, 3094, 3095, 3096,

    3097, 3098, 3099, 3448, 3449, 3450, 3451, 3452, 3453, 3454, 3455, 3456,

    3457, 3458, 3459, 3460, 3461, 3471, 3490, 3491, 6307, 6368, 6421))

  GROUP BY YEAR(c.date), MONTH(c.date), p.district_id

The commission table has around 2,8 millions of records, of which 860 000+ belong to the current year 2018. The partner table has at this moment 8600+ records.

RESULT

| `YEAR(c.date)` | `MONTH(c.date)` | district_id | `COUNT(c.id)` | 

|----------------|-----------------|-------------|---------------| 

| 2018           | 1               | 1           | 19154         | 

| 2018           | 1               | 5           | 9184          | 

| 2018           | 1               | 6           | 2706          | 

| 2018           | 1               | 12          | 36296         | 

| 2018           | 1               | 15          | 13085         | 

| 2018           | 2               | 1           | 21231         | 

| 2018           | 2               | 5           | 10242         | 

| ...            | ...             | ...         | ...           | 



55 rows retrieved starting from 1 in 18 s 374 ms 

(execution: 18 s 368 ms, fetching: 6 ms)

EXPLAIN:

| id | select_type | table | partitions | type  | possible_keys                                                                                        | key                  | key_len | ref             | rows | filtered | extra                                        | 

|----|-------------|-------|------------|-------|------------------------------------------------------------------------------------------------------|----------------------|---------|-----------------|------|----------|----------------------------------------------| 

| 1  | SIMPLE      | p     | null       | range | PRIMARY                                                                                              | PRIMARY              | 4       |                 | 57   | 100      | Using where; Using temporary; Using filesort | 

| 1  | SIMPLE      | c     | null       | ref   | UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73,IDX_6F7146F09395C3F3,IDX_6F7146F0979B1AD6,IDX_6F7146F0AA9E377A | IDX_6F7146F09395C3F3 | 5       | p.id            | 6716 | 8.33     | Using where                                  |

DDL:

create table if not exists commission (

    id int auto_increment

        primary key,

    date date not null,

    source enum('ACTUAL', 'EXPECTED') not null,

    customer_id int null,

    transaction_id varchar(255) not null,

    company_id int null,

    constraint UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73 unique (company_id, transaction_id, source),

    constraint FK_6F7146F09395C3F3 foreign key (customer_id) references partner (id),

    constraint FK_6F7146F0979B1AD6 foreign key (company_id) references companies (id)

) collate=utf8_unicode_ci;

create index IDX_6F7146F09395C3F3 on commission (customer_id);

create index IDX_6F7146F0979B1AD6 on commission (company_id);

create index IDX_6F7146F0AA9E377A on commission (date);

I noted that by removing the partner IN condition MySQL takes only 3s. I tried to replace it doing something crazy like this:

AND (',3062,3063,3064,3065,3066,3067,3068,3069,3070,3071,3072,3073,3074,3075,3076,3077,3078,3079,3081,3082,3083,3084,3085,3086,3087,3088,3089,3090,3091,3092,3093,3094,3095,3096,3097,3098,3099,3448,3449,3450,3451,3452,3453,3454,3455,3456,3457,3458,3459,3460,3461,3471,3490,3491,6307,6368,6421,'

     LIKE CONCAT('%,', p.id, ',%'))

and the result was about 5s... great! but it's a hack.

WHY this query is taking a very long execution time when I uses IN statement? workaround, tips, links, etc. Thanks!

edited Nov 21 at 19:16

Rick James

65.9k55897

asked Nov 20 at 23:11

yceruto

7,08521945

1

Where does the p.id IN (....) list come from?
– danblack
Nov 20 at 23:28

As far as I understand, MySQL can only use one of those indexes on commission in it's WHERE; that part would better be server by a composite index on (date, company_id, source).
– Uueerdo
Nov 20 at 23:30

Note if your data range is 2018, the GROUP BY YEAR(c.date) is constant. Its not the largest contributor to query time by far.
– danblack
Nov 20 at 23:57

The LIKE hack won't work well because it cannot use PRIMARY KEY(id) on p. You do have that??
– Rick James
Nov 21 at 19:21

1

@danblack - The time taken for a function call is usually so insignificant (compared to fetching rows) as to be ignorable.
– Rick James
Nov 21 at 23:50

|
show 3 more comments

I've a query that takes about 18 seconds to finish:

THE QUERY:

SELECT YEAR(c.date), MONTH(c.date), p.district_id, COUNT(p.owner_id)

FROM commission c

  INNER JOIN partner p ON c.customer_id = p.id

WHERE (c.date BETWEEN '2018-01-01' AND '2018-12-31')

  AND (c.company_id = 90)

  AND (c.source = 'ACTUAL')

  AND (p.id IN (3062, 3063, 3064, 3065, 3066, 3067, 3068, 3069, 3070, 3071,

    3072, 3073, 3074, 3075, 3076, 3077, 3078, 3079, 3081, 3082, 3083, 3084,

    3085, 3086, 3087, 3088, 3089, 3090, 3091, 3092, 3093, 3094, 3095, 3096,

    3097, 3098, 3099, 3448, 3449, 3450, 3451, 3452, 3453, 3454, 3455, 3456,

    3457, 3458, 3459, 3460, 3461, 3471, 3490, 3491, 6307, 6368, 6421))

  GROUP BY YEAR(c.date), MONTH(c.date), p.district_id

The commission table has around 2,8 millions of records, of which 860 000+ belong to the current year 2018. The partner table has at this moment 8600+ records.

RESULT

| `YEAR(c.date)` | `MONTH(c.date)` | district_id | `COUNT(c.id)` | 

|----------------|-----------------|-------------|---------------| 

| 2018           | 1               | 1           | 19154         | 

| 2018           | 1               | 5           | 9184          | 

| 2018           | 1               | 6           | 2706          | 

| 2018           | 1               | 12          | 36296         | 

| 2018           | 1               | 15          | 13085         | 

| 2018           | 2               | 1           | 21231         | 

| 2018           | 2               | 5           | 10242         | 

| ...            | ...             | ...         | ...           | 



55 rows retrieved starting from 1 in 18 s 374 ms 

(execution: 18 s 368 ms, fetching: 6 ms)

EXPLAIN:

| id | select_type | table | partitions | type  | possible_keys                                                                                        | key                  | key_len | ref             | rows | filtered | extra                                        | 

|----|-------------|-------|------------|-------|------------------------------------------------------------------------------------------------------|----------------------|---------|-----------------|------|----------|----------------------------------------------| 

| 1  | SIMPLE      | p     | null       | range | PRIMARY                                                                                              | PRIMARY              | 4       |                 | 57   | 100      | Using where; Using temporary; Using filesort | 

| 1  | SIMPLE      | c     | null       | ref   | UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73,IDX_6F7146F09395C3F3,IDX_6F7146F0979B1AD6,IDX_6F7146F0AA9E377A | IDX_6F7146F09395C3F3 | 5       | p.id            | 6716 | 8.33     | Using where                                  |

DDL:

create table if not exists commission (

    id int auto_increment

        primary key,

    date date not null,

    source enum('ACTUAL', 'EXPECTED') not null,

    customer_id int null,

    transaction_id varchar(255) not null,

    company_id int null,

    constraint UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73 unique (company_id, transaction_id, source),

    constraint FK_6F7146F09395C3F3 foreign key (customer_id) references partner (id),

    constraint FK_6F7146F0979B1AD6 foreign key (company_id) references companies (id)

) collate=utf8_unicode_ci;

create index IDX_6F7146F09395C3F3 on commission (customer_id);

create index IDX_6F7146F0979B1AD6 on commission (company_id);

create index IDX_6F7146F0AA9E377A on commission (date);

I noted that by removing the partner IN condition MySQL takes only 3s. I tried to replace it doing something crazy like this:

AND (',3062,3063,3064,3065,3066,3067,3068,3069,3070,3071,3072,3073,3074,3075,3076,3077,3078,3079,3081,3082,3083,3084,3085,3086,3087,3088,3089,3090,3091,3092,3093,3094,3095,3096,3097,3098,3099,3448,3449,3450,3451,3452,3453,3454,3455,3456,3457,3458,3459,3460,3461,3471,3490,3491,6307,6368,6421,'

     LIKE CONCAT('%,', p.id, ',%'))

and the result was about 5s... great! but it's a hack.

WHY this query is taking a very long execution time when I uses IN statement? workaround, tips, links, etc. Thanks!

edited Nov 21 at 19:16

Rick James

65.9k55897

asked Nov 20 at 23:11

yceruto

7,08521945

1

Where does the p.id IN (....) list come from?
– danblack
Nov 20 at 23:28

As far as I understand, MySQL can only use one of those indexes on commission in it's WHERE; that part would better be server by a composite index on (date, company_id, source).
– Uueerdo
Nov 20 at 23:30

Note if your data range is 2018, the GROUP BY YEAR(c.date) is constant. Its not the largest contributor to query time by far.
– danblack
Nov 20 at 23:57

The LIKE hack won't work well because it cannot use PRIMARY KEY(id) on p. You do have that??
– Rick James
Nov 21 at 19:21

1

@danblack - The time taken for a function call is usually so insignificant (compared to fetching rows) as to be ignorable.
– Rick James
Nov 21 at 23:50

|
show 3 more comments

I've a query that takes about 18 seconds to finish:

THE QUERY:

SELECT YEAR(c.date), MONTH(c.date), p.district_id, COUNT(p.owner_id)

FROM commission c

  INNER JOIN partner p ON c.customer_id = p.id

WHERE (c.date BETWEEN '2018-01-01' AND '2018-12-31')

  AND (c.company_id = 90)

  AND (c.source = 'ACTUAL')

  AND (p.id IN (3062, 3063, 3064, 3065, 3066, 3067, 3068, 3069, 3070, 3071,

    3072, 3073, 3074, 3075, 3076, 3077, 3078, 3079, 3081, 3082, 3083, 3084,

    3085, 3086, 3087, 3088, 3089, 3090, 3091, 3092, 3093, 3094, 3095, 3096,

    3097, 3098, 3099, 3448, 3449, 3450, 3451, 3452, 3453, 3454, 3455, 3456,

    3457, 3458, 3459, 3460, 3461, 3471, 3490, 3491, 6307, 6368, 6421))

  GROUP BY YEAR(c.date), MONTH(c.date), p.district_id

The commission table has around 2,8 millions of records, of which 860 000+ belong to the current year 2018. The partner table has at this moment 8600+ records.

RESULT

| `YEAR(c.date)` | `MONTH(c.date)` | district_id | `COUNT(c.id)` | 

|----------------|-----------------|-------------|---------------| 

| 2018           | 1               | 1           | 19154         | 

| 2018           | 1               | 5           | 9184          | 

| 2018           | 1               | 6           | 2706          | 

| 2018           | 1               | 12          | 36296         | 

| 2018           | 1               | 15          | 13085         | 

| 2018           | 2               | 1           | 21231         | 

| 2018           | 2               | 5           | 10242         | 

| ...            | ...             | ...         | ...           | 



55 rows retrieved starting from 1 in 18 s 374 ms 

(execution: 18 s 368 ms, fetching: 6 ms)

EXPLAIN:

| id | select_type | table | partitions | type  | possible_keys                                                                                        | key                  | key_len | ref             | rows | filtered | extra                                        | 

|----|-------------|-------|------------|-------|------------------------------------------------------------------------------------------------------|----------------------|---------|-----------------|------|----------|----------------------------------------------| 

| 1  | SIMPLE      | p     | null       | range | PRIMARY                                                                                              | PRIMARY              | 4       |                 | 57   | 100      | Using where; Using temporary; Using filesort | 

| 1  | SIMPLE      | c     | null       | ref   | UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73,IDX_6F7146F09395C3F3,IDX_6F7146F0979B1AD6,IDX_6F7146F0AA9E377A | IDX_6F7146F09395C3F3 | 5       | p.id            | 6716 | 8.33     | Using where                                  |

DDL:

create table if not exists commission (

    id int auto_increment

        primary key,

    date date not null,

    source enum('ACTUAL', 'EXPECTED') not null,

    customer_id int null,

    transaction_id varchar(255) not null,

    company_id int null,

    constraint UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73 unique (company_id, transaction_id, source),

    constraint FK_6F7146F09395C3F3 foreign key (customer_id) references partner (id),

    constraint FK_6F7146F0979B1AD6 foreign key (company_id) references companies (id)

) collate=utf8_unicode_ci;

create index IDX_6F7146F09395C3F3 on commission (customer_id);

create index IDX_6F7146F0979B1AD6 on commission (company_id);

create index IDX_6F7146F0AA9E377A on commission (date);

I noted that by removing the partner IN condition MySQL takes only 3s. I tried to replace it doing something crazy like this:

AND (',3062,3063,3064,3065,3066,3067,3068,3069,3070,3071,3072,3073,3074,3075,3076,3077,3078,3079,3081,3082,3083,3084,3085,3086,3087,3088,3089,3090,3091,3092,3093,3094,3095,3096,3097,3098,3099,3448,3449,3450,3451,3452,3453,3454,3455,3456,3457,3458,3459,3460,3461,3471,3490,3491,6307,6368,6421,'

     LIKE CONCAT('%,', p.id, ',%'))

and the result was about 5s... great! but it's a hack.

WHY this query is taking a very long execution time when I uses IN statement? workaround, tips, links, etc. Thanks!

edited Nov 21 at 19:16

Rick James

65.9k55897

asked Nov 20 at 23:11

yceruto

7,08521945

I've a query that takes about 18 seconds to finish:

THE QUERY:

SELECT YEAR(c.date), MONTH(c.date), p.district_id, COUNT(p.owner_id)

FROM commission c

  INNER JOIN partner p ON c.customer_id = p.id

WHERE (c.date BETWEEN '2018-01-01' AND '2018-12-31')

  AND (c.company_id = 90)

  AND (c.source = 'ACTUAL')

  AND (p.id IN (3062, 3063, 3064, 3065, 3066, 3067, 3068, 3069, 3070, 3071,

    3072, 3073, 3074, 3075, 3076, 3077, 3078, 3079, 3081, 3082, 3083, 3084,

    3085, 3086, 3087, 3088, 3089, 3090, 3091, 3092, 3093, 3094, 3095, 3096,

    3097, 3098, 3099, 3448, 3449, 3450, 3451, 3452, 3453, 3454, 3455, 3456,

    3457, 3458, 3459, 3460, 3461, 3471, 3490, 3491, 6307, 6368, 6421))

  GROUP BY YEAR(c.date), MONTH(c.date), p.district_id

The commission table has around 2,8 millions of records, of which 860 000+ belong to the current year 2018. The partner table has at this moment 8600+ records.

RESULT

| `YEAR(c.date)` | `MONTH(c.date)` | district_id | `COUNT(c.id)` | 

|----------------|-----------------|-------------|---------------| 

| 2018           | 1               | 1           | 19154         | 

| 2018           | 1               | 5           | 9184          | 

| 2018           | 1               | 6           | 2706          | 

| 2018           | 1               | 12          | 36296         | 

| 2018           | 1               | 15          | 13085         | 

| 2018           | 2               | 1           | 21231         | 

| 2018           | 2               | 5           | 10242         | 

| ...            | ...             | ...         | ...           | 



55 rows retrieved starting from 1 in 18 s 374 ms 

(execution: 18 s 368 ms, fetching: 6 ms)

EXPLAIN:

| id | select_type | table | partitions | type  | possible_keys                                                                                        | key                  | key_len | ref             | rows | filtered | extra                                        | 

|----|-------------|-------|------------|-------|------------------------------------------------------------------------------------------------------|----------------------|---------|-----------------|------|----------|----------------------------------------------| 

| 1  | SIMPLE      | p     | null       | range | PRIMARY                                                                                              | PRIMARY              | 4       |                 | 57   | 100      | Using where; Using temporary; Using filesort | 

| 1  | SIMPLE      | c     | null       | ref   | UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73,IDX_6F7146F09395C3F3,IDX_6F7146F0979B1AD6,IDX_6F7146F0AA9E377A | IDX_6F7146F09395C3F3 | 5       | p.id            | 6716 | 8.33     | Using where                                  |

DDL:

create table if not exists commission (

    id int auto_increment

        primary key,

    date date not null,

    source enum('ACTUAL', 'EXPECTED') not null,

    customer_id int null,

    transaction_id varchar(255) not null,

    company_id int null,

    constraint UNIQ_6F7146F0979B1AD62FC0CB0F5F8A7F73 unique (company_id, transaction_id, source),

    constraint FK_6F7146F09395C3F3 foreign key (customer_id) references partner (id),

    constraint FK_6F7146F0979B1AD6 foreign key (company_id) references companies (id)

) collate=utf8_unicode_ci;

create index IDX_6F7146F09395C3F3 on commission (customer_id);

create index IDX_6F7146F0979B1AD6 on commission (company_id);

create index IDX_6F7146F0AA9E377A on commission (date);

I noted that by removing the partner IN condition MySQL takes only 3s. I tried to replace it doing something crazy like this:

AND (',3062,3063,3064,3065,3066,3067,3068,3069,3070,3071,3072,3073,3074,3075,3076,3077,3078,3079,3081,3082,3083,3084,3085,3086,3087,3088,3089,3090,3091,3092,3093,3094,3095,3096,3097,3098,3099,3448,3449,3450,3451,3452,3453,3454,3455,3456,3457,3458,3459,3460,3461,3471,3490,3491,6307,6368,6421,'

     LIKE CONCAT('%,', p.id, ',%'))

and the result was about 5s... great! but it's a hack.

WHY this query is taking a very long execution time when I uses IN statement? workaround, tips, links, etc. Thanks!

mysql query-performance mysql-5.7

edited Nov 21 at 19:16

Rick James

65.9k55897

asked Nov 20 at 23:11

yceruto

7,08521945

edited Nov 21 at 19:16

Rick James

65.9k55897

asked Nov 20 at 23:11

yceruto

7,08521945

edited Nov 21 at 19:16

Rick James

65.9k55897

edited Nov 21 at 19:16

Rick James

65.9k55897

edited Nov 21 at 19:16

Rick James

65.9k55897

asked Nov 20 at 23:11

yceruto

7,08521945

asked Nov 20 at 23:11

yceruto

7,08521945

asked Nov 20 at 23:11

yceruto

7,08521945

1

Where does the p.id IN (....) list come from?
– danblack
Nov 20 at 23:28

As far as I understand, MySQL can only use one of those indexes on commission in it's WHERE; that part would better be server by a composite index on (date, company_id, source).
– Uueerdo
Nov 20 at 23:30

Note if your data range is 2018, the GROUP BY YEAR(c.date) is constant. Its not the largest contributor to query time by far.
– danblack
Nov 20 at 23:57

The LIKE hack won't work well because it cannot use PRIMARY KEY(id) on p. You do have that??
– Rick James
Nov 21 at 19:21

1

@danblack - The time taken for a function call is usually so insignificant (compared to fetching rows) as to be ignorable.
– Rick James
Nov 21 at 23:50

|
show 3 more comments

1

Where does the p.id IN (....) list come from?
– danblack
Nov 20 at 23:28

As far as I understand, MySQL can only use one of those indexes on commission in it's WHERE; that part would better be server by a composite index on (date, company_id, source).
– Uueerdo
Nov 20 at 23:30

Note if your data range is 2018, the GROUP BY YEAR(c.date) is constant. Its not the largest contributor to query time by far.
– danblack
Nov 20 at 23:57

The LIKE hack won't work well because it cannot use PRIMARY KEY(id) on p. You do have that??
– Rick James
Nov 21 at 19:21

1

@danblack - The time taken for a function call is usually so insignificant (compared to fetching rows) as to be ignorable.
– Rick James
Nov 21 at 23:50

Where does the p.id IN (....) list come from?
– danblack
Nov 20 at 23:28

As far as I understand, MySQL can only use one of those indexes on commission in it's WHERE; that part would better be server by a composite index on (date, company_id, source).
– Uueerdo
Nov 20 at 23:30

Note if your data range is 2018, the GROUP BY YEAR(c.date) is constant. Its not the largest contributor to query time by far.
– danblack
Nov 20 at 23:57

The LIKE hack won't work well because it cannot use PRIMARY KEY(id) on p. You do have that??
– Rick James
Nov 21 at 19:21

@danblack - The time taken for a function call is usually so insignificant (compared to fetching rows) as to be ignorable.
– Rick James
Nov 21 at 23:50

|
show 3 more comments

3 Answers
3

active

oldest

votes

MySQL can use one index at a time. For this query you need a compound index covering the aspects of the search. Constant aspects of the WHERE clause should be used before range aspects like:

ALTER TABLE commission

DROP INDEX IDX_6F7146F0979B1AD6,

ADD INDEX IDX_6F7146F0979B1AD6 (company_id, source, date)

answered Nov 20 at 23:30

danblack

1,5071214

query optimization
– danblack
Nov 20 at 23:39

How MySQL Uses indexes
– danblack
Nov 20 at 23:59

Adding a composite index on (customer_id, company_id, date, source) solves the performance issue (takes now less than 1s). Thanks for your answer and links.
– yceruto
Nov 21 at 16:29

1

And an index cookbook
– Rick James
Nov 21 at 19:18

1

@yceruto - Your 4-column index works well if the Optimizer starts with p, but you need to move date to the end. Dan's 3-column index works well if it starts with c. Add both.
– Rick James
Nov 21 at 19:19

add a comment |

Here's what the Optimizer sees in your query.

Checking whether to use an index for the GROUP BY:

Functions (YEAR()) in the GROUP BY, so no.

Multiple tables (c and p) mentioned, so no.

For a JOIN, Optimizer will (almost always) start with one, then reach into the other. So, let's look at the two options:

If starting with p:

Assuming you have PRIMARY KEY(id), there is not much to think about. It will simply use that index.

For each row selected from p, it will then look into c, and any variation of this INDEX would be optimal.

c: INDEX(company_id, source, customer_id,  -- in any order (all are tested "=")

         date)       -- last, since it is tested as a range

If starting with c:

c: INDEX(company_id, source,  -- in any order (all are tested "=")

         date)       -- last, since it is tested as a range

-- slightly better:

c: INDEX(company_id, source,  -- in any order (all are tested "=")

         date,       -- last, since it is tested as a range

         customer_id)  -- really last -- added only to make it "covering".

The Optimizer will look at "statistics" to crudely decide which table to start with. So, add all the indexes I suggested.

A "covering" index is one that contains all the columns needed anywhere in the query. It is sometimes wise to extend a 'good' index with more columns to make it "covering".

But there is a monkey wrench in here. c.customer_id = p.id means that customer_id IN (...) effectively exists. But now there are two "range-like" constraints -- one is an IN, the other is a 'range'. In some newer versions, the Optimizer will happily jump around due to the IN and still be able to do "range" scans. So, I recommend this ordering:

Test(s) of column = constant

Test(s) with IN

One 'range' test (BETWEEN, >=, LIKE with trailing wildcard, etc)

Perhaps add more columns to make it "covering" -- but don't do this step if you end up with more than, say, 5 columns in the index.

Hence, for c, the following is optimal for the WHERE, and happens to be "covering".

INDEX(company_id, source,  -- first, but in any order (all "=")

      customer_id,  -- "IN"

      date)       -- last, since it is tested as a range



p: (same as above)

Since there was an IN or "range", there is no use seeing if the index can also handle the GROUP BY.

A note on COUNT(x) -- it checks that x is NOT NULL. It is usually just as correct to say COUNT(*), which counts the number of rows without any extra checking.

This is a non-starter since it hides the indexed column (id) in a function:

AND (',3062,3063,3064,3065,3066,...6368,6421,'

     LIKE CONCAT('%,', p.id, ',%'))

answered Nov 21 at 20:04

Rick James

65.9k55897

Yes, I tried with (company_id, date, customer_id, source) is slightly better. Thanks!
– yceruto
Nov 21 at 20:53

The order of the columns in the index is vital to performance. Your attempt has date too early.
– Rick James
Nov 21 at 21:40

Sorry (typo) I meant (company_id, source, customer_id, date) obviously :)
– yceruto
Nov 21 at 21:43

@yceruto - Interesting. How many rows in the resultset?
– Rick James
Nov 21 at 21:59

After the composite index > EXPLAIN > rows: 4548, filtered: 50 (for c table)
– yceruto
Nov 21 at 23:45

|
show 3 more comments

With your LIKE-hack you are tricking optimizer so it uses different plan (most probably using IDX_6F7146F0AA9E377A index on the first place).
You should be able to see this in explain.

I think the real issue in your case is the second line of explain: server executing multiple functions (MONTH, YEAR) for 6716 rows and then trying to group all these rows. During this time all these 6716 rows should be stored (in memory or on disk that is based on your server configuration).

SELECT COUNT(*) FROM commission WHERE (date BETWEEN '2018-01-01' AND '2018-12-31') AND company_id = 90 AND source = 'ACTUAL';

=> How many rows are we talking about?

If the number in above query is much lower then 6716 I'd try to add covering index on columns customer_id, company_id, source and date. Not sure about the best order as it depends on data you have (check cardinality for these columns). I'd started with index (date, company_id, source, customer_id). Also, I'd add unique index (id, district_id, owner_id) on partner.

It is also possible to add additional generated stored columns _year and _month (if your server is a bit old you can add normal columns and fill them in with trigger) to rid off the multiple function executions.

edited Nov 21 at 2:51

answered Nov 21 at 1:20

fifonik

38518

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403008%2fvery-slow-mysql-query-performance%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

MySQL can use one index at a time. For this query you need a compound index covering the aspects of the search. Constant aspects of the WHERE clause should be used before range aspects like:

ALTER TABLE commission

DROP INDEX IDX_6F7146F0979B1AD6,

ADD INDEX IDX_6F7146F0979B1AD6 (company_id, source, date)

answered Nov 20 at 23:30

danblack

1,5071214

query optimization
– danblack
Nov 20 at 23:39

How MySQL Uses indexes
– danblack
Nov 20 at 23:59

Adding a composite index on (customer_id, company_id, date, source) solves the performance issue (takes now less than 1s). Thanks for your answer and links.
– yceruto
Nov 21 at 16:29

1

And an index cookbook
– Rick James
Nov 21 at 19:18

1

@yceruto - Your 4-column index works well if the Optimizer starts with p, but you need to move date to the end. Dan's 3-column index works well if it starts with c. Add both.
– Rick James
Nov 21 at 19:19

add a comment |

MySQL can use one index at a time. For this query you need a compound index covering the aspects of the search. Constant aspects of the WHERE clause should be used before range aspects like:

ALTER TABLE commission

DROP INDEX IDX_6F7146F0979B1AD6,

ADD INDEX IDX_6F7146F0979B1AD6 (company_id, source, date)

answered Nov 20 at 23:30

danblack

1,5071214

query optimization
– danblack
Nov 20 at 23:39

How MySQL Uses indexes
– danblack
Nov 20 at 23:59

Adding a composite index on (customer_id, company_id, date, source) solves the performance issue (takes now less than 1s). Thanks for your answer and links.
– yceruto
Nov 21 at 16:29

1

And an index cookbook
– Rick James
Nov 21 at 19:18

1

@yceruto - Your 4-column index works well if the Optimizer starts with p, but you need to move date to the end. Dan's 3-column index works well if it starts with c. Add both.
– Rick James
Nov 21 at 19:19

add a comment |

MySQL can use one index at a time. For this query you need a compound index covering the aspects of the search. Constant aspects of the WHERE clause should be used before range aspects like:

ALTER TABLE commission

DROP INDEX IDX_6F7146F0979B1AD6,

ADD INDEX IDX_6F7146F0979B1AD6 (company_id, source, date)

answered Nov 20 at 23:30

danblack

1,5071214

MySQL can use one index at a time. For this query you need a compound index covering the aspects of the search. Constant aspects of the WHERE clause should be used before range aspects like:

ALTER TABLE commission

DROP INDEX IDX_6F7146F0979B1AD6,

ADD INDEX IDX_6F7146F0979B1AD6 (company_id, source, date)

answered Nov 20 at 23:30

danblack

1,5071214

answered Nov 20 at 23:30

danblack

1,5071214

answered Nov 20 at 23:30

danblack

1,5071214

answered Nov 20 at 23:30

danblack

1,5071214

query optimization
– danblack
Nov 20 at 23:39

How MySQL Uses indexes
– danblack
Nov 20 at 23:59

Adding a composite index on (customer_id, company_id, date, source) solves the performance issue (takes now less than 1s). Thanks for your answer and links.
– yceruto
Nov 21 at 16:29

1

And an index cookbook
– Rick James
Nov 21 at 19:18

1

@yceruto - Your 4-column index works well if the Optimizer starts with p, but you need to move date to the end. Dan's 3-column index works well if it starts with c. Add both.
– Rick James
Nov 21 at 19:19

add a comment |

query optimization
– danblack
Nov 20 at 23:39

How MySQL Uses indexes
– danblack
Nov 20 at 23:59

Adding a composite index on (customer_id, company_id, date, source) solves the performance issue (takes now less than 1s). Thanks for your answer and links.
– yceruto
Nov 21 at 16:29

1

And an index cookbook
– Rick James
Nov 21 at 19:18

1

@yceruto - Your 4-column index works well if the Optimizer starts with p, but you need to move date to the end. Dan's 3-column index works well if it starts with c. Add both.
– Rick James
Nov 21 at 19:19

query optimization
– danblack
Nov 20 at 23:39

How MySQL Uses indexes
– danblack
Nov 20 at 23:59

Adding a composite index on (customer_id, company_id, date, source) solves the performance issue (takes now less than 1s). Thanks for your answer and links.
– yceruto
Nov 21 at 16:29

And an index cookbook
– Rick James
Nov 21 at 19:18

@yceruto - Your 4-column index works well if the Optimizer starts with p, but you need to move date to the end. Dan's 3-column index works well if it starts with c. Add both.
– Rick James
Nov 21 at 19:19

add a comment |

Here's what the Optimizer sees in your query.

Checking whether to use an index for the GROUP BY:

Functions (YEAR()) in the GROUP BY, so no.

Multiple tables (c and p) mentioned, so no.

For a JOIN, Optimizer will (almost always) start with one, then reach into the other. So, let's look at the two options:

If starting with p:

Assuming you have PRIMARY KEY(id), there is not much to think about. It will simply use that index.

For each row selected from p, it will then look into c, and any variation of this INDEX would be optimal.

c: INDEX(company_id, source, customer_id,  -- in any order (all are tested "=")

         date)       -- last, since it is tested as a range

If starting with c:

c: INDEX(company_id, source,  -- in any order (all are tested "=")

         date)       -- last, since it is tested as a range

-- slightly better:

c: INDEX(company_id, source,  -- in any order (all are tested "=")

         date,       -- last, since it is tested as a range

         customer_id)  -- really last -- added only to make it "covering".

The Optimizer will look at "statistics" to crudely decide which table to start with. So, add all the indexes I suggested.

A "covering" index is one that contains all the columns needed anywhere in the query. It is sometimes wise to extend a 'good' index with more columns to make it "covering".

Test(s) of column = constant

Test(s) with IN

One 'range' test (BETWEEN, >=, LIKE with trailing wildcard, etc)

Perhaps add more columns to make it "covering" -- but don't do this step if you end up with more than, say, 5 columns in the index.

Hence, for c, the following is optimal for the WHERE, and happens to be "covering".

INDEX(company_id, source,  -- first, but in any order (all "=")

      customer_id,  -- "IN"

      date)       -- last, since it is tested as a range



p: (same as above)

Since there was an IN or "range", there is no use seeing if the index can also handle the GROUP BY.

A note on COUNT(x) -- it checks that x is NOT NULL. It is usually just as correct to say COUNT(*), which counts the number of rows without any extra checking.

This is a non-starter since it hides the indexed column (id) in a function:

AND (',3062,3063,3064,3065,3066,...6368,6421,'

     LIKE CONCAT('%,', p.id, ',%'))

answered Nov 21 at 20:04

Rick James

65.9k55897

Yes, I tried with (company_id, date, customer_id, source) is slightly better. Thanks!
– yceruto
Nov 21 at 20:53

The order of the columns in the index is vital to performance. Your attempt has date too early.
– Rick James
Nov 21 at 21:40

Sorry (typo) I meant (company_id, source, customer_id, date) obviously :)
– yceruto
Nov 21 at 21:43

@yceruto - Interesting. How many rows in the resultset?
– Rick James
Nov 21 at 21:59

After the composite index > EXPLAIN > rows: 4548, filtered: 50 (for c table)
– yceruto
Nov 21 at 23:45

|
show 3 more comments

Here's what the Optimizer sees in your query.

Checking whether to use an index for the GROUP BY:

Functions (YEAR()) in the GROUP BY, so no.

Multiple tables (c and p) mentioned, so no.

For a JOIN, Optimizer will (almost always) start with one, then reach into the other. So, let's look at the two options:

If starting with p:

Assuming you have PRIMARY KEY(id), there is not much to think about. It will simply use that index.

For each row selected from p, it will then look into c, and any variation of this INDEX would be optimal.

c: INDEX(company_id, source, customer_id,  -- in any order (all are tested "=")

         date)       -- last, since it is tested as a range

If starting with c:

c: INDEX(company_id, source,  -- in any order (all are tested "=")

         date)       -- last, since it is tested as a range

-- slightly better:

c: INDEX(company_id, source,  -- in any order (all are tested "=")

         date,       -- last, since it is tested as a range

         customer_id)  -- really last -- added only to make it "covering".

The Optimizer will look at "statistics" to crudely decide which table to start with. So, add all the indexes I suggested.

A "covering" index is one that contains all the columns needed anywhere in the query. It is sometimes wise to extend a 'good' index with more columns to make it "covering".

Test(s) of column = constant

Test(s) with IN

One 'range' test (BETWEEN, >=, LIKE with trailing wildcard, etc)

Perhaps add more columns to make it "covering" -- but don't do this step if you end up with more than, say, 5 columns in the index.

Hence, for c, the following is optimal for the WHERE, and happens to be "covering".

INDEX(company_id, source,  -- first, but in any order (all "=")

      customer_id,  -- "IN"

      date)       -- last, since it is tested as a range



p: (same as above)

Since there was an IN or "range", there is no use seeing if the index can also handle the GROUP BY.

A note on COUNT(x) -- it checks that x is NOT NULL. It is usually just as correct to say COUNT(*), which counts the number of rows without any extra checking.

This is a non-starter since it hides the indexed column (id) in a function:

AND (',3062,3063,3064,3065,3066,...6368,6421,'

     LIKE CONCAT('%,', p.id, ',%'))

answered Nov 21 at 20:04

Rick James

65.9k55897

Yes, I tried with (company_id, date, customer_id, source) is slightly better. Thanks!
– yceruto
Nov 21 at 20:53

The order of the columns in the index is vital to performance. Your attempt has date too early.
– Rick James
Nov 21 at 21:40

Sorry (typo) I meant (company_id, source, customer_id, date) obviously :)
– yceruto
Nov 21 at 21:43

@yceruto - Interesting. How many rows in the resultset?
– Rick James
Nov 21 at 21:59

After the composite index > EXPLAIN > rows: 4548, filtered: 50 (for c table)
– yceruto
Nov 21 at 23:45

|
show 3 more comments

Here's what the Optimizer sees in your query.

Checking whether to use an index for the GROUP BY:

Functions (YEAR()) in the GROUP BY, so no.

Multiple tables (c and p) mentioned, so no.

For a JOIN, Optimizer will (almost always) start with one, then reach into the other. So, let's look at the two options:

If starting with p:

Assuming you have PRIMARY KEY(id), there is not much to think about. It will simply use that index.

For each row selected from p, it will then look into c, and any variation of this INDEX would be optimal.

c: INDEX(company_id, source, customer_id,  -- in any order (all are tested "=")

         date)       -- last, since it is tested as a range

If starting with c:

c: INDEX(company_id, source,  -- in any order (all are tested "=")

         date)       -- last, since it is tested as a range

-- slightly better:

c: INDEX(company_id, source,  -- in any order (all are tested "=")

         date,       -- last, since it is tested as a range

         customer_id)  -- really last -- added only to make it "covering".

The Optimizer will look at "statistics" to crudely decide which table to start with. So, add all the indexes I suggested.

A "covering" index is one that contains all the columns needed anywhere in the query. It is sometimes wise to extend a 'good' index with more columns to make it "covering".

Test(s) of column = constant

Test(s) with IN

One 'range' test (BETWEEN, >=, LIKE with trailing wildcard, etc)

Perhaps add more columns to make it "covering" -- but don't do this step if you end up with more than, say, 5 columns in the index.

Hence, for c, the following is optimal for the WHERE, and happens to be "covering".

INDEX(company_id, source,  -- first, but in any order (all "=")

      customer_id,  -- "IN"

      date)       -- last, since it is tested as a range



p: (same as above)

Since there was an IN or "range", there is no use seeing if the index can also handle the GROUP BY.

A note on COUNT(x) -- it checks that x is NOT NULL. It is usually just as correct to say COUNT(*), which counts the number of rows without any extra checking.

This is a non-starter since it hides the indexed column (id) in a function:

AND (',3062,3063,3064,3065,3066,...6368,6421,'

     LIKE CONCAT('%,', p.id, ',%'))

answered Nov 21 at 20:04

Rick James

65.9k55897

Here's what the Optimizer sees in your query.

Checking whether to use an index for the GROUP BY:

Functions (YEAR()) in the GROUP BY, so no.

Multiple tables (c and p) mentioned, so no.

For a JOIN, Optimizer will (almost always) start with one, then reach into the other. So, let's look at the two options:

If starting with p:

Assuming you have PRIMARY KEY(id), there is not much to think about. It will simply use that index.

For each row selected from p, it will then look into c, and any variation of this INDEX would be optimal.

c: INDEX(company_id, source, customer_id,  -- in any order (all are tested "=")

         date)       -- last, since it is tested as a range

If starting with c:

c: INDEX(company_id, source,  -- in any order (all are tested "=")

         date)       -- last, since it is tested as a range

-- slightly better:

c: INDEX(company_id, source,  -- in any order (all are tested "=")

         date,       -- last, since it is tested as a range

         customer_id)  -- really last -- added only to make it "covering".

The Optimizer will look at "statistics" to crudely decide which table to start with. So, add all the indexes I suggested.

A "covering" index is one that contains all the columns needed anywhere in the query. It is sometimes wise to extend a 'good' index with more columns to make it "covering".

Test(s) of column = constant

Test(s) with IN

One 'range' test (BETWEEN, >=, LIKE with trailing wildcard, etc)

Perhaps add more columns to make it "covering" -- but don't do this step if you end up with more than, say, 5 columns in the index.

Hence, for c, the following is optimal for the WHERE, and happens to be "covering".

INDEX(company_id, source,  -- first, but in any order (all "=")

      customer_id,  -- "IN"

      date)       -- last, since it is tested as a range



p: (same as above)

Since there was an IN or "range", there is no use seeing if the index can also handle the GROUP BY.

A note on COUNT(x) -- it checks that x is NOT NULL. It is usually just as correct to say COUNT(*), which counts the number of rows without any extra checking.

This is a non-starter since it hides the indexed column (id) in a function:

AND (',3062,3063,3064,3065,3066,...6368,6421,'

     LIKE CONCAT('%,', p.id, ',%'))

answered Nov 21 at 20:04

Rick James

65.9k55897

answered Nov 21 at 20:04

Rick James

65.9k55897

answered Nov 21 at 20:04

Rick James

65.9k55897

answered Nov 21 at 20:04

Rick James

65.9k55897

Yes, I tried with (company_id, date, customer_id, source) is slightly better. Thanks!
– yceruto
Nov 21 at 20:53

The order of the columns in the index is vital to performance. Your attempt has date too early.
– Rick James
Nov 21 at 21:40

Sorry (typo) I meant (company_id, source, customer_id, date) obviously :)
– yceruto
Nov 21 at 21:43

@yceruto - Interesting. How many rows in the resultset?
– Rick James
Nov 21 at 21:59

After the composite index > EXPLAIN > rows: 4548, filtered: 50 (for c table)
– yceruto
Nov 21 at 23:45

|
show 3 more comments

Yes, I tried with (company_id, date, customer_id, source) is slightly better. Thanks!
– yceruto
Nov 21 at 20:53

The order of the columns in the index is vital to performance. Your attempt has date too early.
– Rick James
Nov 21 at 21:40

Sorry (typo) I meant (company_id, source, customer_id, date) obviously :)
– yceruto
Nov 21 at 21:43

@yceruto - Interesting. How many rows in the resultset?
– Rick James
Nov 21 at 21:59

After the composite index > EXPLAIN > rows: 4548, filtered: 50 (for c table)
– yceruto
Nov 21 at 23:45

Yes, I tried with (company_id, date, customer_id, source) is slightly better. Thanks!
– yceruto
Nov 21 at 20:53

The order of the columns in the index is vital to performance. Your attempt has date too early.
– Rick James
Nov 21 at 21:40

Sorry (typo) I meant (company_id, source, customer_id, date) obviously :)
– yceruto
Nov 21 at 21:43

@yceruto - Interesting. How many rows in the resultset?
– Rick James
Nov 21 at 21:59

After the composite index > EXPLAIN > rows: 4548, filtered: 50 (for c table)
– yceruto
Nov 21 at 23:45

|
show 3 more comments

With your LIKE-hack you are tricking optimizer so it uses different plan (most probably using IDX_6F7146F0AA9E377A index on the first place).
You should be able to see this in explain.

SELECT COUNT(*) FROM commission WHERE (date BETWEEN '2018-01-01' AND '2018-12-31') AND company_id = 90 AND source = 'ACTUAL';

=> How many rows are we talking about?

edited Nov 21 at 2:51

answered Nov 21 at 1:20

fifonik

38518

add a comment |

With your LIKE-hack you are tricking optimizer so it uses different plan (most probably using IDX_6F7146F0AA9E377A index on the first place).
You should be able to see this in explain.

SELECT COUNT(*) FROM commission WHERE (date BETWEEN '2018-01-01' AND '2018-12-31') AND company_id = 90 AND source = 'ACTUAL';

=> How many rows are we talking about?

edited Nov 21 at 2:51

answered Nov 21 at 1:20

fifonik

38518

add a comment |

With your LIKE-hack you are tricking optimizer so it uses different plan (most probably using IDX_6F7146F0AA9E377A index on the first place).
You should be able to see this in explain.

SELECT COUNT(*) FROM commission WHERE (date BETWEEN '2018-01-01' AND '2018-12-31') AND company_id = 90 AND source = 'ACTUAL';

=> How many rows are we talking about?

edited Nov 21 at 2:51

answered Nov 21 at 1:20

fifonik

38518

With your LIKE-hack you are tricking optimizer so it uses different plan (most probably using IDX_6F7146F0AA9E377A index on the first place).
You should be able to see this in explain.

SELECT COUNT(*) FROM commission WHERE (date BETWEEN '2018-01-01' AND '2018-12-31') AND company_id = 90 AND source = 'ACTUAL';

=> How many rows are we talking about?

edited Nov 21 at 2:51

answered Nov 21 at 1:20

fifonik

38518

edited Nov 21 at 2:51

answered Nov 21 at 1:20

fifonik

38518

answered Nov 21 at 1:20

fifonik

38518

answered Nov 21 at 1:20

fifonik

38518

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytukyg