I. Background


Recently, the company has added new features to the project, and after going online, I found that the list query time of some features is very long. The reason is that the new function uses the interface of the old function, and the SQL query statement of these old interfaces is associated with 5,6 tables and is not written in a standardized way, which leads to the index failure of MySQL when executing the SQL statement, and a full table scan is performed. The colleague who was responsible for the optimization went home on leave, so the problem of optimizing the query data fell into the hands of the author. I successfully solved the problem after checking the online SQL optimization information, and here I record and summarize the MySQL query optimization related skills from a ==global perspective==.

 II. Optimization ideas


Slow data query does not mean that there is a problem with the way the SQL statement is written. First of all, we need to find the source of the problem in order to “prescribe the right medicine”. I use a flow chart to show the idea of MySQL optimization:


Without further words, it is clear from the figure that there are various reasons for slow data querying, such as: cache failure, where the MySQL server crashes due to highly concurrent accesses over a period of time; SQL statement writing issues; MySQL server parameter issues; hardware configuration limitations MySQL service performance issues, etc.

 Viewing the Status Values of a Running MySQL Server


If the system does not have a high number of concurrent requests and the queries are slow, you can ignore this step and proceed directly to the SQL statement tuning step.

 Execute the command:

show status


Since there are too many results returned, we will not post the results here. Among the results returned again, we mainly focus on the “Queries”, “Threads_connected” and “Threads_running ” values, i.e., the number of queries, the number of threads connected, and the number of threads running.


We can monitor the status values of a running MySQL server by executing the following script

#!/bin/bash
while true
do
mysqladmin -uroot -p"1111" ext | awk '/Queries/{q=$4}/Threads_connected/{c=$4}/Threads_running/{r=$4}END{printf("%d %d %d\n",q,c,r)}' >> status.txt
sleep 1
done


Execute the script for 24 hours, get the contents of status.txt, and awk it again to calculate the number of requests per second to the MySQL service.

awk '{q=$1-last;last=$1}{printf("%d %d %d\n",q,$2,$3)}' status.txt


Copy the calculations into Excel to generate a chart to observe the periodicity of the data.


If there are periodic changes in the observed data, as explained above, the cache invalidation policy needs to be modified.


Getting one of the values in the interval [3,6,9] as the cache expiration time by a random number, which spreads the cache expiration time and thus saves some memory consumption.


When access is peaked, some requests are diverted to the unvalidated cache, while others access the MySQL database, which reduces the pressure on the MySQL server.

 Getting the SQL statements to optimize

 4.1 Way 1: Viewing running threads

 Execute the command:

show processlist

 Returns results:

mysql> show processlist;
+----+------+-----------+------+---------+------+----------+------------------+
| Id | User | Host      | db   | Command | Time | State    | Info             |
+----+------+-----------+------+---------+------+----------+------------------+
|  9 | root | localhost | test | Query   |    0 | starting | show processlist |
+----+------+-----------+------+---------+------+----------+------------------+
1 row in set (0.00 sec)


From the returned results we can see what commands/SQL statements were executed by the thread and when they were executed. In practice, the query will return N records.


Among them, the value of the returned State is the key to determine the performance, the value of which appears as follows, then the SQL statement for that row of records needs to be optimized:

 


The State field has a number of values, for more information see the link provided at the end of the article.

 4.2 Method 2: Enable slow query logging


Add two parameters to the configuration file my.cnf under the [mysqld] line:

slow_query_log = 1
slow_query_log_file=/var/lib/mysql/slow-query.log
long_query_time = 2

log_queries_not_using_indexes = 1


where slow_query_log = 1 indicates that slow query is enabled; slow_query_log_file indicates where the slow query log is stored;


long_query_time = 2 indicates that queries >= 2 seconds are logged; log_queries_not_using_indexes = 1 logs SQL statements that do not use indexes.


Note: The path to slow_query_log_file should not be written casually, otherwise the MySQL server may not have permission to write the log file to the specified directory. It is recommended to copy the path directly from above.


After modifying the save file, restart the MySQL service. The slow-query.log log file is created in the /var/lib/mysql/ directory. You can check the configuration by connecting to the MySQL server and executing the following command.

show variables like 'slow_query%';

show variables like 'long_query_time';

 Test the slow query log:

mysql> select sleep(2);
+----------+
| sleep(2) |
+----------+
|        0 |
+----------+
1 row in set (2.00 sec)

 Open the slow query log file

[root@localhost mysql]# vim /var/lib/mysql/slow-query.log
/usr/sbin/mysqld, Version: 5.7.19-log (MySQL Community Server (GPL)). started with:
Tcp port: 0  Unix socket: /var/lib/mysql/mysql.sock
Time                 Id Command    Argument
# Time: 2017-10-05T04:39:11.408964Z
# User@Host: root[root] @ localhost []  Id:     3
# Query_time: 2.001395  Lock_time: 0.000000 Rows_sent: 1  Rows_examined: 0
use test;
SET timestamp=1507178351;
select sleep(2);


We can see that the SQL statement that was just executed for 2 seconds was logged.


Although the slow query logs record information about slow SQL queries, the logs are dense and not easily accessible. Therefore, we need a tool to filter out the SQL.


MySQL provides the mysqldumpslow utility to analyze the logs. You can use mysqldumpslow –help to see the usage of the command.

 Commonly used parameters are listed below:

 
 
mysqldumpslow -s r -t 10 /var/lib/mysql/slow-query.log

 
mysqldumpslow -s c -t 10 /var/lib/mysql/slow-query.log

 
mysqldumpslow -s t -t 10 -g "left join" /var/lib/mysql/slow-query.log

 V. Analyzing SQL Statements

 5.1 Mode 1: explain


To filter out the problematic SQL, we can use explain provided by MySQL to view the SQL execution plan (related tables, table query order, index usage, etc.).

用法:

explain select * from category;

 Returns results:

mysql> explain select * from category;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | SIMPLE      | category | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | NULL  |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

 Field Explanation:

1) id: select query sequence number. id is the same, the order of execution from top to bottom; id is different, the larger the value of id the higher the priority, the more the first to be executed


2) select_type: the type of operation to query the data, and its value is as follows:

simple: 
primary: 
subquery: 
derived: 
union: 
union result: 

 3) table: shows which table the data is about.

 4) partitions: matching partitions


5) type: the connection type of the table, its value, performance from high to bottom is listed below:

system: 
const: 
eq_ref: 
ref: 
range: 
index: 
ALL: 


Note: The first 5 cases are ideal cases of index usage. It is common to optimize at least to range level and preferably to ref


6) possible_keys: Indicates which indexes MySQL uses to find rows in the table. If the value is NULL, it means that no index is used, and an index can be created to improve performance.


7) key: shows the index actually used by MySQL. If it is NULL, then no index query is used.


8) key_len: indicates the number of bytes used in the index, the length of the index used in the query is calculated from this column. The shorter the length, the better, without loss of accuracy

 The maximum length of the index field is shown, not the actual length used


9) ref: shows which field of which table is associated with the index field of this table


10) rows: according to the table statistics and selection, roughly estimated to find the required records or the number of rows to be read, the smaller the value the better


11) filtered: the number of rows returned as a percentage of the number of rows read, the larger the value the better


12) extra: contains additional information that is not appropriate to display in other columns but is very important, the common values are as follows:

using filesort: 
using temporary: 
using index: 
using where
using join buffer: 
distinct: 

 Note: For the first 2 values, the SQL statement must be optimized.

 5.2 Approach II: Profiling


Use the profiling command to get detailed information about the resources consumed by SQL statements (the overhead of each execution step).


5.2.1 Viewing profile openings

select @@profiling;

 Returns results:

mysql> select @@profiling;
+-------------+
| @@profiling |
+-------------+
|           0 |
+-------------+
1 row in set, 1 warning (0.00 sec)

 0 means off, 1 means on.

 5.2.2 Enabling profile

set profiling = 1;

 Returns results:

mysql> set profiling = 1;  
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> select @@profiling;
+-------------+
| @@profiling |
+-------------+
|           1 |
+-------------+
1 row in set, 1 warning (0.00 sec)


After the connection is closed, the profiling status is automatically set to the closed state.


5.2.3 Viewing the list of executed SQLs

show profiles;

 Returns results:

mysql> show profiles;
+----------+------------+------------------------------+
| Query_ID | Duration   | Query                        |
+----------+------------+------------------------------+
|        1 | 0.00062925 | select @@profiling           |
|        2 | 0.00094150 | show tables                  |
|        3 | 0.00119125 | show databases               |
|        4 | 0.00029750 | SELECT DATABASE()            |
|        5 | 0.00025975 | show databases               |
|        6 | 0.00023050 | show tables                  |
|        7 | 0.00042000 | show tables                  |
|        8 | 0.00260675 | desc role                    |
|        9 | 0.00074900 | select name,is_key from role |
+----------+------------+------------------------------+
9 rows in set, 1 warning (0.00 sec)


Other SQL statements need to be executed before this command can be logged.


5.2.4 Querying execution details for a specified ID

show profile for query Query_ID;

 Returns results:

mysql> show profile for query 9;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000207 |
| checking permissions | 0.000010 |
| Opening tables       | 0.000042 |
| init                 | 0.000050 |
| System lock          | 0.000012 |
| optimizing           | 0.000003 |
| statistics           | 0.000011 |
| preparing            | 0.000011 |
| executing            | 0.000002 |
| Sending data         | 0.000362 |
| end                  | 0.000006 |
| query end            | 0.000006 |
| closing tables       | 0.000006 |
| freeing items        | 0.000011 |
| cleaning up          | 0.000013 |
+----------------------+----------+
15 rows in set, 1 warning (0.00 sec)


Each row shows the process of the state changes and how long they lasted. the column Status is identical to the State of the show processlist. Therefore, the points to optimize are the same as described above.


The value of the Status field can also be found in the link at the end.


5.2.5 Obtaining CPU, Block IO, and Other Information

show profile block io,cpu for query Query_ID;

show profile cpu,block io,memory,swaps,context switches,source for query Query_ID;

show profile all for query Query_ID;

 VI. Optimization tools


It is mainly explained in terms of query optimization, index usage and table structure design.

 6.1 Query Optimization


1) Avoid SELECT * and query the corresponding field for whatever data you need.


2) Small table drives big table, i.e. small data set drives big data set. For example, take two tables A and B as an example, the two tables are related by id field.

 
select * from A where id in (select id from B)
 
select * from A where exists (select 1 from B where B.id = A.id)


3) In some cases, a join can be used instead of a subquery because with join, MySQL does not create temporary tables in memory.

 4) Add redundant fields as appropriate to reduce table associations.


5) Reasonable use of indexes (described below). E.g., create indexes for sorted and grouped fields to avoid filesort.

 6.2 Use of indexes

 6.2.1 Scenarios in which indexing is appropriate

 1) Automatic creation of unique indexes for primary keys

 2) Fields that are frequently used as query conditions

 3) Fields associated with other tables in a query

 4) Sorted Fields in Queries

 5) Statistical or grouping fields in a query

 6.2.2 Scenarios in which indexing is not appropriate

 1) Frequently updated fields

 2) Fields not used in the where condition

 3) Too few table records

 4) Tables that are frequently added, deleted or modified

 5) The values of the fields have little variability or high repeatability

 6.2.3 Principles of index creation and use


1) Single table query: which column is used as the query condition, the index will be created on that column.


2) Multi-table query: left join, the index is added to the right table associated fields; right join, the index is added to the left table associated fields


3) Do not perform any operations on the indexed columns (calculations, functions, type conversions)


4) Do not use ! =, <> not equal to


5) Index columns should not be empty, and do not use is null or is not null judgment


6) index field is a string type, the value of the query condition should be added ” single quotes, to avoid the underlying type of automatic conversion


Violation of the above principles may result in index failure, which you need to check with the explain command.

 6.2.4 Indexing failures


In addition to violating the principles of index creation and usage, the following situations can lead to index failure:


1) Fuzzy queries start with %.


2) The use of or, e.g. field 1 (non-indexed) OR field 2 (indexed) will cause the index to fail.

 3) When using a compound index, the first index column is not used.


index(a,b,c) to field a,b,c as a composite index example:

1 Whether indexing is in effect
where a = 1 Yes, field a index is in effect
where a = 1 and b = 2
Yes, fields a and b indexes are in effect
where a = 1 and b = 2 and c = 3 Yes, all effective

where b = 2 or where c = 3
where a = 1 and c = 3
Field a is active, field c is inactive
where a = 1 and b > 2 and c = 3
Fields a, b active, field c inactive
where a = 1 and b like ‘xxx%’ and c = 3
Fields a, b active, field c inactive

 6.3 Database table structure design

 6.3.1 Choosing the right data type

 1) Use the data type that holds the least amount of data.


2) Use simple datatypes. int is easier to handle than varchar in mysql.


3) Try to use tinyint, smallint, mediumint as integer types instead of int.


4) Define fields with not null whenever possible, as null takes up 4 bytes of space.


5) minimize the use of text type, must be used when it is best to consider the split table


6) Try to use timestamp instead of datetime.


7) Do not have too many fields in a single table, it is recommended to be within 20

 6.3.2 Table splitting


When the data in the database is very large, and the query optimization scheme can not solve the problem of slow query speed, we can consider splitting the table, so that the amount of data in each table becomes smaller, thus improving query efficiency.


1) Vertical Split: Separate multiple columns in a table into different tables. For example, if some fields in a user’s table are frequently accessed, these fields are placed in one table, and some other fields that are not frequently accessed are placed in another table.

 When inserting data, use transactions to ensure data consistency between the two tables.


2) Horizontal Split: Split by rows. For example, in the user table, use the user ID, take the remainder of 10 for the user ID, and evenly distribute the user data to the 10 user tables from 0 to 9. When searching, the data is also queried according to this rule.

 6.3.3 Read-write separation


Generally speaking, databases have more reads and fewer writes. In other words, most of the pressure on the database is caused by a large number of operations to read data. We can use a database clustering scheme, using a library as the master, responsible for writing data; other libraries as a slave, responsible for reading data. This can relieve the pressure of accessing the database.

 VII. Server parameter tuning

 7.1 Memory-related


sort_buffer_size sort_buffer_memory_size


join_buffer_size Use join buffer size


read_buffer_size Allocated buffer size for full table scans

 7.2 IO Related


Innodb_log_file_size Transaction log size


Innodb_log_files_in_group Number of transaction logs


Innodb_log_buffer_size Transaction log buffer size


Innodb_flush_log_at_trx_commit Transaction log flush policy , with the following values:


0: Write logs to cache and flush logs to disk once per second.


1: At each transaction commit, the execution log is written to the cache and the log is flushed to disk.


2: Every time a transaction is committed, the log data is written to the cache, and the flush log is flushed to disk once per second.

 7.3 Security-related


expire_logs_days Specifies the number of days to automatically purge binlogs.


max_allowed_packet Controls the size of packets MySQL can receive.


skip_name_resolve Disable DNS lookups


read_only disables write access for non-super privileged users.


skip_slave_start level you use slave auto-recovery

 7.4 Other


max_connections controls the maximum number of connections allowed

 tmp_table_size tmp_table_size


max_heap_table_size Maximum memory table size


I did not use these parameters to tune the MySQL server, please refer to the information at the end of the article or Baidu for details and performance results.

 VIII. Hardware selection and parameter optimization


The performance of the hardware directly determines the performance of the MySQL database. The performance bottleneck of the hardware directly determines the running data and efficiency of the MySQL database.


As software development programmers, we are mainly concerned about the optimization of the software content, the following hardware optimization as an understanding can be

 8.1 Memory-related


Memory IO is much faster than hard disk, and you can increase the system’s buffer capacity to keep data in memory longer to reduce disk IO


8.2 Disk I/O Related


1) Use SSD or PCle SSD devices to get at least hundreds if not tens of thousands of times higher IOPS


2) Purchasing an array card with both CACHE and BBU modules can significantly increase IOPS.


3) Use RAID-10 instead of RAID-5 wherever possible


8.3 Configuring CUP Related

 In the server’s BIOS setup, adjust the following configuration:


1) Select Performance Per Watt Optimized (DAPC) mode to maximize CPU performance.


2) Disable options such as C1E and C States to improve CPU efficiency.

3) Memory Frequency   Maximum Performance

By lzz

Leave a Reply

Your email address will not be published. Required fields are marked *