SQL performance optimization, writing high-quality SQL statements!

When writing SQL statements we are often concerned about the results of the execution of SQL, but is it really concerned about the efficiency of the execution of SQL, is it pay attention to the specification of the SQL writing method?

The following dry sharing is summarized in the actual development process, and I hope it will be helpful to you!

1. limit paging optimization

When the offset is particularly large, LIMIT can be very inefficient.

SELECT id FROM A LIMIT 1000,10

SELECT id FROM A LIMIT 90000,10

Program I:

select id from A order by id limit 90000,10;

If we use it in combination with order by. It is very fast, 0.04 seconds OK. Because of the use of id primary key for indexing! Of course, whether or not you can use the index also needs to be based on business logic to determine, here just to remind you, in the paging time also need to be careful to use!

select id from A order by id  between 90000 and 90010;

2. Use limit 1, top 1 to get a line

Some business logic performs a query operation (especially when taking the maximum amount based on a field DESC). You can use limit 1 or top 1 to terminate [database index] to continue to scan the entire table or index.

SELECT id FROM A LIKE 'abc%'

SELECT id FROM A LIKE 'abc%' limit 1

3. Do not use select * from table in any case, with a specific list of fields to replace the “*”, do not return to the field that can not be used, to avoid a full scan!

SELECT * FROM A

SELECT id FROM A

4. Batch insertion optimization

INSERT into person(name,age) values('A',24)
INSERT into person(name,age) values('B',24)
INSERT into person(name,age) values('C',24)

INSERT into person(name,age) values('A',24),('B',24),('C',24),

The optimization of sql statement mainly lies in the correct use of indexes, and we often make the mistake in the development is to do a full scan of the table, which affects the performance and consumes time!

5. Optimization of the like statement

SELECT id FROM A WHERE name like '%abc%'

As abc before the use of “%”, so the query must go to the full table query, unless necessary (fuzzy query needs to include abc), otherwise do not add % in front of the keyword

SELECT id FROM A WHERE name like 'abc%'

mysql version: 5.7.26

select nick_name from member where nick_name like '%%'

like’%Xiaoming%’ does not use an index!

select nick_name from member where nick_name like '%'

like’Xiaoming%’ Successfully used the index!

6. where clause optimization using or

Usually replacing “or” with union all or union gives better results. where clause uses the or keyword, the index will be dropped.

SELECT id FROM A WHERE num = 10 or num = 20

SELECT id FROM A WHERE num = 10 union all SELECT id FROM A WHERE num=20

7. Optimization of using IS NULL or IS NOT NULL in the where clause

SELECT id FROM A WHERE num IS NULL

Using IS NULL or IS NOT NULL judgment in the where clause, the index will be dropped and a full table lookup will be performed.

Optimized to set the default value of 0 on the num, to ensure that there is no null value in the table num, IS NULL usage in the actual business scenarios SQL usage rate is extremely high, we should pay attention to avoid full table scanning

SELECT id FROM A WHERE num=0

8. Optimization of expression operations on fields in where clauses

Do not perform functions, arithmetic operations, or other expressions to the left of the “=” in the where clause, or the system may not use the index correctly.

SELECT id FROM A WHERE datediff(day,createdate,'2019-11-30')=0

SELECT id FROM A WHERE createdate>='2019-11-30' and createdate<'2019-12-1'

SELECT id FROM A WHERE year(addate) <2020

SELECT id FROM A where addate<'2020-01-01'

9. Indexing problems with sorting

The mysql query just uses an index, so the columns in the ORDER BY are not indexed if the index is already used in the WHERE clause. Therefore, the default database sorting can meet the requirements of the case do not use the sort operation;

Try not to include multiple columns in the sort, if needed it is best to create composite indexes for these columns.

10. Replace union with union all wherever possible

The difference between union and union all is that the former needs to merge two (or more) result sets and then perform the uniqueness filtering operation, which will involve sorting, increase a large number of cpu operations, and increase resource consumption and latency. So when we can confirm that there is no possibility of duplicate result sets or we do not care about duplicate result sets, we try to use union all instead of union.

11.Inner join 和 left join、right join

First: inner join is also called equal join is, left/rightjoin is outer join.

SELECT A.id,A.name,B.id,B.name FROM A LEFT JOIN B ON A.id =B.id;

SELECT A.id,A.name,B.id,B.name FROM A RIGHT JOIN ON B A.id= B.id;

SELECT A.id,A.name,B.id,B.name FROM A INNER JOIN ON A.id =B.id;

It has been proven in many ways that inner join is faster because it is an equal join, perhaps returning fewer rows. But we have to remember that some statements invisibly use equal joins, such as:

SELECT A.id,A.name,B.id,B.name FROM A,B WHERE A.id = B.id;

Recommended: can use inner join connection try to use inner join connection

Second: The performance of subqueries is again slower than the performance of outer joins, try to replace subqueries with outer joins.

mysql is executing a full table query on outer table A first, and then executing subqueries one by one based on uuid, and if the outer table is a very large table, we can imagine that the query performance will perform even worse than that.

Select* from A where exists (select * from B where id>=3000 and A.uuid=B.uuid);

Execution time: about 2s

Select* from A inner join B ON A.uuid=B.uuid where b.uuid>=3000;

Execution time: less than 1s

Third: When using JOIN, you should use small results to drive large results

left join left table results as small as possible, if there is a condition should be put on the left first processing, right join the same reason in reverse. If there is a condition, it should be put on the left side first, right join is the same as reverse:

Select * from A left join B A.id=B.ref_id where  A.id>10

select * from (select * from A wehre id >10) T1 left join B on T1.id=B.ref_id;

12.exist & in

SELECT * from A WHERE id in ( SELECT id from B )

SELECT * from A WHERE id EXISTS ( SELECT 1 from A.id= B.id )

in is traversed in memory to compare

exist requires querying the database, so when the amount of data in B is large, exists is more efficient than in**.

in() is executed only once to cache all the id fields in table B. After that, it checks whether the ids in table A are equal to the ids in table B. If the ids are equal, then it adds the records in table A to the result set until it has traversed all the records in table A.

The flow principle of the In operation is the same as the following code

    List resultSet={};

    Array A=(select * from A);
    Array B=(select id from B);

    for(int i=0;i<A.length;i++) {
          for(int j=0;j<B.length;j++) {
          if(A[i].id==B[j].id) {
             resultSet.add(A[i]);
             break;
          }
       }
    }
    return resultSet;

As you can see, it is not appropriate to use in() when the B-table data is large, because it will traverse all the B-table data once

For example, if table A has 10000 records and table B has 1000000 records, then it is possible to traverse 10000*1000000 times at most, which is very inefficient.

Another example: table A has 10000 records, table B has 100 records, then it is possible to traverse up to 10000 * 100 times, the number of traversals is greatly reduced, the efficiency is greatly improved.

Conclusion: in() is suitable for cases where table B has smaller data than table A

exists() will execute A.length() times, the execution process code is as follows


List resultSet={};
Array A=(select * from A);
for(int i=0;i<A.length;i++) {
    if(exists(A[i].id) {  
       resultSet.add(A[i]);
    }
}return resultSet;

Exists() is appropriate when table B has larger data than table A because it doesn’t have as many traversal operations and only needs to execute the query one more time.

For example, if table A has 10000 records and table B has 1000000 records, then exists() will be executed 10000 times to determine if the ids in table A are equal to the ids in table B.

For example: A table has 10,000 records, B table has 100,000,000 records, then exists () is still executed 10,000 times, because it only executes A.length times, it can be seen that the more data in the B table, the more suitable for exists () to play a role in the effect.

Again, if table A has 10000 records and table B has 100 records, then exists() still executes 10000 times, it is better to use in() to traverse 10000*100 times, because in() traverses and compares in memory, while exists() needs to query the database.

We all know that querying the database consumes higher performance and memory comparison is fast.

Conclusion: exists() is suitable for the case where table B has larger data than table A

SQL performance optimization, writing high-quality SQL statements!

1. limit paging optimization

2. Use limit 1, top 1 to get a line

3. Do not use select * from table in any case, with a specific list of fields to replace the “*”, do not return to the field that can not be used, to avoid a full scan!

4. Batch insertion optimization

5. Optimization of the like statement

6. where clause optimization using or

7. Optimization of using IS NULL or IS NOT NULL in the where clause

8. Optimization of expression operations on fields in where clauses

9. Indexing problems with sorting

10. Replace union with union all wherever possible

11.Inner join 和 left join、right join

12.exist & in

By lzz

Related Post

Leave a Reply Cancel reply

You Missed

8 Python practical scripts, save them for future use!

Python logging library logging summary – probably the best article summarizing the logging library so far

I hear you know Python?

An article on collection manipulation functions in Kotlin