Efficient Data Structures


There are six underlying data structures in Redis, namely, simple dynamic strings, bi-directional linked lists, compressed lists, hash tables, jump tables, and integer arrays, which correspond to the data types shown in the following figure:


This article will be followed by an in-depth source code level analysis of all the above data structures.

 Single Threaded vs Multi-Threaded


When studying computer operating systems, you must have come across this question: Is multi-threading always faster than single-threading? I’m sure you won’t fall into Ao Wei’s trap like the silly Nezha above.


It’s true that multi-threading is sometimes faster than single-threading, but there are also many times when it’s not as fast as single-threading. Let’s start by explaining the difference between concurrency and parallelism in a diagram that a 3 year old could understand:


  • Concurrency (concurrency): refers to the same moment there can only be one instruction execution, but more than one process instructions are quickly rotated execution, so that in the macroscopic effect of simultaneous execution of multiple processes, but in the microscopic effect is not simultaneous execution, but only the time is divided into a number of segments, so that more than one process is quickly alternating execution.

  • Parallel (parallel): refers to the fact that there are multiple instructions being executed simultaneously on multiple processors at the same moment. So both are executed together, both from a micro and macro point of view.


It is not difficult to find concurrency at the same time only one instruction is executed, but the process (thread) in the CPU to switch quickly, the speed is very fast, to give people the impression that it is “running at the same time”, in fact, at the same time only one instruction is carried out. But in reality, if we use multiple threads in an application, it takes a lot of time to rotate between threads and switch contexts.

Talk is cheap,Show me the code

 The following code demonstrates the serial and concurrent execution and accumulation of operations:

public class ConcurrencyTest {
    private static final long count = 1000000000;

    public static void main(String[] args) {
        try {
            concurrency();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        serial();
    }

    private static void concurrency() throws InterruptedException {
        long start = System.currentTimeMillis();
        Thread thread = new Thread(new Runnable() {

            @Override
            public void run() {
                 int a = 0;
                 for (long i = 0; i < count; i++)
                 {
                     a += 5;
                 }
            }
        });
        thread.start();
        int b = 0;
        for (long i = 0; i < count; i++) {
            b--;
        }
        thread.join();
        long time = System.currentTimeMillis() - start;
        System.out.println("concurrency : " + time + "ms,b=" + b);
    }

    private static void serial() {
        long start = System.currentTimeMillis();
        int a = 0;
        for (long i = 0; i < count; i++)
        {
            a += 5;
        }
        int b = 0;
        for (long i = 0; i < count; i++) {
            b--;
        }
        long time = System.currentTimeMillis() - start;
        System.out.println("serial : " + time + "ms,b=" + b);
    }

}


The execution times are shown in the table below, and it is easy to see that when the concurrent execution of the accumulate operation does not exceed a million times, it will be slower than the serial execution of the accumulate operation.

 Number of cycles Serial execution time consumption/ms concurrent execution time-consuming How much faster is concurrency than serialization?
1亿130771x
1千万1891x
1百万55 there is not much difference
10万43 there is not much difference
1万01slow


Since threads have the overhead of creation and context switching, this leads to situations where concurrent execution can be slower than serial.

 context switch


Multiple threads can be executed on single-core or multi-core CPUs. single-core CPUs also support multi-threaded execution of code, and the CPU implements this mechanism by allocating CPU time slices (chances) to each thread. in order to execute multiple threads, the CPU needs to keep switching between the threads to be executed so as to ensure that all of them have the chance of being executed within a period of time.


At this point, the CPU allocates each thread an execution time period called its time slice.CPU time slices are typically tens of milliseconds.The CPU cycles through tasks by means of a time-slice allocation algorithm.The current task executes for one time slice and then switches to the next task.


However, the state of the previous task is saved before switching so that the state of this task can be loaded again the next time you switch back to it. So the process of a task from saving to reloading is a context switch.


According to the running state of multi-threading to illustrate: multi-threaded environment, when the state of a thread from Runnable to non-Runnable (Blocked, Waiting, Timed_Waiting), the corresponding thread’s context information (including the contents of the CPU’s registers and program counters at a certain point in time, etc.) need to be saved, so that the corresponding thread to enter the Runnable state again later to build on its previous execution progress. And a thread entering a Runnable state from a non-Runnable state may involve restoring previously saved context information. This process of saving and restoring a thread’s context is called a context switch.

 memory-based


Take MySQL as an example, MySQL’s data and indexes are persistently stored on disk, so when we execute a query command using a SQL statement, if the indexes of the target database have not been loaded into memory, then first we have to load the indexes into memory first, and then through a number of addressing locality and disk I/O, we have to load the disk blocks corresponding to the data into memory, and then finally we have to read the data.


If it is a mechanical hard drive, then first you need to find where the data is located, i.e. the address of the disk that needs to be read. Take a look at this schematic:


The first step in reading data from a hard disk is to find the desired track, which is a circle centered on the middle axis. First we need to find the track we need to align with and move the magnetic head to the corresponding track, a process called track searching.


Then, we need to wait until the disk rotates so that the head points to the position where we need to read the beginning of the data, the time spent here is called rotational delay, usually we say that the hard disk rotation speed is fast or slow, the main influence is the time spent here, and this direction of rotation is unidirectional, if you miss the beginning of the data position, you have to wait until the disk is rotated to the next turn to start reading.


Finally, the magnetic head begins to read the data recorded on the disk, this principle is actually similar to the reading principle of the CD-ROM, due to a layer of magnetic media on the magnetic track, when the magnetic head sweeps through a specific location, the magnetic head senses the magnetic state of different locations can be converted from magnetic signals to electrical signals.


As you can see, both the movement of the heads and the rotation of the disk are actually mechanical movements in nature, which is why this type of hard disk is called a mechanical hard disk, and the efficiency of the mechanical movement is the bottleneck of disk reading and writing.


Pulled a little far away, let’s go back to redis, if the data in memory like redis, read and write directly to the database operations, naturally less than the hard disk database to the disk to read the data of this step, and this step is precisely the computer to deal with the bottleneck of the I / O is located.


Reading data in memory is essentially the transfer of electrical signals, which is much faster than mechanical movement to transfer signals.


So, it’s safe to say that the fact that Redis is so fast certainly has a lot to do with the fact that it runs on memory. But that’s far from being the whole reason.

Redis FAQ


In the face of single-threaded Redis you may have questions: Oop, my multi-core CPU can not play a role ah! Don’t worry, Redis for this problem specifically to answer.


It is not common for the CPU to be the performance bottleneck for Redis, as Redis is usually limited by memory or network constraints. For example, on Linux systems using pipelined Redis can serve even 1 million requests per second, so if your application uses mostly O(N) or O(log(N)) commands, it hardly uses much CPU.


However, to maximize CPU utilization, you can start multiple Redis instances in the same node and treat them as different Redis services. In some cases, a single node may not be enough, so if you want to use multiple CPUs, you can start considering some of the earlier sharding methods.


You can find more information about using multiple Redis instances on the Partitioning page.


However, in Redis 4.0 we started making Redis more threaded. Currently this is limited to deleting objects in the background and blocking commands implemented through the Redis module. For future releases, our plan is to make Redis increasingly multithreaded.


Note: The single-threaded Redis we’ve been talking about is just a single thread that handles our network requests; a full-fledged Redis Server is definitely running with more than one thread!


For example, when Redis does persistence, it forks a child process to perform the persistence operation.

 The four IO models


When a network IO occurs (assuming it’s a read), it involves two system objects, the process that called this IO and the system kernel.

 When a READ operation occurs, it goes through two phases:

 ① Wait for data preparation;

 ② Copy data from the kernel to the process.


In order to solve the problems in network IO, four network IO models are proposed:

  •  Blocking IO Model
  •  non-blocking IO model
  •  multiplexed IO model
  •  asynchronous IO model


The concepts of blocking and non-blocking describe the way in which a user thread invokes a kernel IO operation: blocking means that the IO operation needs to be completely finished before returning to user space; non-blocking means that the IO operation returns a status value to the user as soon as it has been invoked, and does not need to wait for the IO operation to be completely finished.

 Blocking IO Model


In Linux, all sockets are blocking by default, and a typical read operation is shown below:


When the application process calls the system call recvfrom, the system kernel begins the first phase of IO: preparing the data.


For network IO, many times when data hasn’t arrived at the beginning (e.g. a full TCP packet hasn’t been received yet), the system kernel has to wait for enough data to arrive. On the user process side, the whole process will be blocked.


When the system kernel waits until the data is ready, it copies the data from the system kernel to the user’s memory, and then the system kernel returns the result before the user process is released from its blocked state and runs again. So, the blocking IO model is characterized by blocking in both phases of IO execution (waiting for data and copying data).

 non-blocking IO model


In Linux, IO can be made non-blocking by setting up a socket. When a read operation is performed on a non-blocking socket, the read operation flow is shown below:


As you can see from the figure, when the user process issues a read operation, if the data in the kernel is not ready, it does not block the user process, but returns an error immediately.


From the user process’s point of view, it doesn’t need to wait after it initiates a read operation, but gets a result right away. When the user process determines that the result is an error, it knows that the data is not ready and it can send the read operation again.


As soon as the data is ready in the kernel and it receives a system call from the user process again, it immediately copies the data into user memory and then returns the correct return value.


So, in non-blocking IO, the user process actually needs to constantly and actively ask the kernel if the data is ready. The significant difference between non-blocking interfaces compared to blocking interfaces is that they return immediately after being called.

 multiplexed IO model


Multiplexed IO, sometimes called event-driven IO (Reactor design pattern). Its basic principle is that a function will constantly poll all the sockets in charge, when a socket has data arriving, it will notify the user process, the flow of the multiplexed IO model is shown in Figure:


When the user process calls select, then the whole process will be blocked, and at the same time, the kernel will “monitor” all the sockets that select is responsible for, and when the data in any of the sockets is ready, select will return. When any of the sockets is ready, select will return. At this time, the user process will call the read operation to copy the data from the kernel to the user process.


This model is not really that different from the blocking IO model, in fact it’s worse. This is because two system calls (select and recvfrom) need to be used here, whereas blocking IO only calls one system call (recvfrom). However, the advantage of using select is that it can handle multiple connections at the same time. So, if the number of connections on the system is not very high, a web server using select/epoll does not necessarily perform better than a web server using multi-threaded blocking IO, and may have more latency; the advantage of select/epoll is not that it can process a single connection faster, but rather that it can process more connections.


If select() finds that a handle captures a “readable event”, the server program should do recv() in time and prepare the data to be sent according to the received data, and add the corresponding handle value to writefds to prepare for the next select() detection of “writable event”. Similarly, if select() finds that a handle captures a “writable event”, the program should do send() in time and prepare for the next “readable event” detection.


The following figure shows the event-driven working model, when different events are generated the handler will sense and execute the corresponding events, like a multiway switch.


IO multiplexing is the most commonly used IO model, but it is not “completely” asynchronous because it uses a select system call that blocks the thread. Therefore, IO multiplexing can only be called asynchronous blocking IO, not true asynchronous IO.

 asynchronous IO model


“Real” asynchronous IO requires stronger support from the operating system. The following shows the flow of the asynchronous IO model (Proactor design pattern):


After the user process initiates a read operation, it can immediately start to do other things; on the other hand, from the kernel’s point of view, when it receives an asynchronous read request operation, it will first return immediately, so there will not be any blocking for the user process.


The kernel then waits for the data to finish being prepared and then copies the data into user memory, and when this is done, the kernel sends a signal to the user process returning the message that the read operation has completed.

 IO Model Summary


A call to blocking IO blocks the corresponding process until the operation is complete, while non-blocking IO returns immediately while the kernel is still preparing data.


The difference between the two is that synchronous IO blocks the process when it performs an IO operation. According to this definition, blocking IO, non-blocking IO, and multiplexing are all synchronous IO, and the real IO operation is the recvfrom system call in the example.


Non-blocking IO does not block the process when the system call recvfrom is executed if the data in the kernel is not ready. However, when the data in the kernel is ready, recvfrom copies the data from the kernel to user memory and the process is blocked.


Asynchronous IO is different, when the process initiates the IO operation, it returns directly until the kernel sends a signal to tell the process that the IO has been completed, then the process is not blocked at all during this entire process.

 A comparison of the various IO models is shown below:

 Applications in Redis


The Redis server is an event driver, and the server needs to handle the following two types of events:


  • File Events: The Redis server connects to clients (or other Redis servers) via sockets, and file events are an abstraction of the server’s operations on the sockets. The communication between the server and the client (or other servers) generates corresponding file events, and the server listens to and handles these events to complete a series of network communication operations.

  • Time events: Some operations in the Redis server (e.g., serverCron ) functions need to be executed at a given point in time, and time events are the server’s abstraction of such timed operations.

 I/O multiplexing program


All of the functionality of Redis’ I/O multiplexer is implemented by wrapping the common select , epoll , evport , kqueue libraries of multiplexing functions.


Because Redis implements the same API for each of the I/O multiplexing libraries, the underlying implementations of the I/O multiplexers are interchangeable.


Redis defines the rules in the I/O multiplexer implementation source code with the #include macro, and the program automatically selects the highest-performing I/O multiplexer library in the system at compile time as the underlying implementation of Redis’ I/O multiplexer ( ae.c file):

/* Include the best multiplexing layer supported by this system.
 * The following should be ordered by performances, descending. */
#ifdef HAVE_EVPORT
#include "ae_evport.c"
#else
    #ifdef HAVE_EPOLL
    #include "ae_epoll.c"
    #else
        #ifdef HAVE_KQUEUE
        #include "ae_kqueue.c"
        #else
        #include "ae_select.c"
        #endif
    #endif
#endif

 Document Event Handler


Redis has developed its own network event processor based on the Reactor pattern: this processor is called the file event processor:


  • The file event handler uses an I/O multiplexer to listen to multiple sockets at the same time and associates different event handlers to the sockets depending on the task they are currently performing.

  • When the listened socket is ready to perform a connection answer ( accept ), read ( read ), write ( write ), or close ( close ) operation, file events corresponding to the operation are generated, and the file event handler calls the event handlers that were previously associated with the socket to handle these events.


The following figure shows the four components of the file event handler


A file event is an abstraction of a socket operation that is generated whenever a socket is ready to perform a connection answer, write, read, close, and so on. Because a server typically connects to multiple sockets, it is possible for multiple file events to occur concurrently.The I/O multiplexer is responsible for listening to multiple sockets and transmitting to the file event dispatcher those sockets that have generated events.


Nezha’s question is a great one. Imagine a group of people going to the cafeteria to get their food, and the auntie says the most common thing: “Line up! Line up! No one will be left behind!


That’s right, it all comes from life! Redis’ I/O multiplexer always puts all event-generating sockets into a queue, and then passes sockets through that queue to the file event dispatcher in an orderly, synchronized, one-socket-at-a-time fashion. After the events generated by the previous socket have been processed, the I/O multiplexer proceeds to transfer the next socket to the file event dispatcher.


Redis writes multiple processors for file event handlers, which are each used to implement different network communication requirements:


  • In order to answer to individual clients connecting to the server, the server associates with the listening socket;

  • In order to accept incoming command requests from the client, the server associates  with the client socket;

  • In order to return the results of the execution of the command to the client, the server associates  with the client socket;

  • When the master and slave servers perform replication operations, both the master and slave servers need to associate written specifically for the replication function.
 ARP


networking.c/acceptTcpHandler function is Redis’ connection answer handler, which is used to answer clients connecting to the server’s listening sockets, and is implemented as a wrapper around the sys/socket.h/acccept function.


When the Redis server is initialized, the program associates the connection answer processor with the AE_READABLE time of the server listening socket. When a client connects to the server listening socket using the sys/socket.h/connect function, the socket generates the AE_READABLE event, which triggers the execution of the connection answer processor and performs the corresponding socket answering operation.

 command request processor


networking.c/readQueryFromClient function is Redis’s command request processor. This processor is responsible for reading in the contents of the command request sent by the client from the socket, and is implemented as a wrapper around the unistd.h/read function.


When a client successfully connects to the server through the connection response processor, the server associates the AE_READABLE event of the client socket with the command request processor. When the client sends a command request to the server, the socket generates the AE_READABLE event, which triggers the execution of the command request processor and performs the corresponding socket read-in operation.


The server keeps associating a command request handler for the client socket AE_READABLE event throughout the client’s connection to the server.

 Command Response Processor


networking.c/sendReplyToClient function is Redis’s command reply processor, this processor is responsible for returning the command reply from the server after executing the command to the client via a socket, which is implemented as a wrapper around the unistd.h/write function.


When the server has a command reply that needs to be transmitted to the client, the server associates the AE_WRITABLE event of the client’s socket with the command reply processor. When the client is ready to receive the command reply back from the server, it generates the AE_WRITABLE event, which triggers the execution of the command reply processor and performs the corresponding socket write operation.


Once the command reply has been sent, the server disassociates the command reply handler from the AE_WRITABLE event on the client socket.


One sentence describes the application of IO multiplexing in Redis: Redis puts all event-generating sockets into a queue, and transmits the sockets in an ordered, synchronized, one-socket-at-a-time manner to the file event dispatcher, which selects the responding processor for processing based on the events corresponding to the sockets, thus realizing efficient network requests.

 Custom protocols for Redis


Redis clients use the RESP (Redis Serialization Protocol) protocol to communicate with Redis servers. It is simple to implement, fast to parse and human readable.


RESP supports the following data types: simple strings, errors, integers, bulk strings, and arrays.


RESP is used as a request-response protocol in Redis in the following way:


  • The client sends the command to the Redis server as a RESP array of bulk strings.

  • The server replies with one of the RESP types based on the command implementation.


In RESP, the type of certain data depends on the first byte:

  •  For simple strings, the first byte of the reply is “+”
  •  For errors, the first byte of the reply is “-“
  •  For integers, the first byte of the response is “:”
  •  For batch strings, the first byte of the reply is “$”
  •  For arrays, the first byte of the reply is “*”


In addition, RESP is able to represent Null values using special variants of bulk strings or arrays specified later. In RESP, different parts of the protocol are always terminated with “\r\n” (CRLF).


The following is only a brief introduction to the encoding of strings and the encoding of errors, you can check the Redis official website for a detailed description of the RESP.

 simple string


It is encoded as follows: a “+” sign followed by a string and ending with “\r\n”, the string cannot contain “\r\n”. Simple strings are used to transfer relatively short binary-safe strings. For example, the successful execution of many redis commands will return “OK”, encoded in RESP is 5 bytes:

"+OK\r\n"


To send binary-safe strings, you need to use block strings with RESP. When redis returns a simple string, the client library needs to return to the caller the string after the “+” sign (not included) and before the CRLF (not included).

 RESP error


RESP has a type specifically designed for errors. The error type is actually much like the RESP simple string type, but with a “-” as the first character. The difference between the simple string type and the error type is that the client treats the error type as an exception, and the string contained in the error type is the exception message. The format is:

"-Error message\r\n"


An error type is returned when an error occurs, such as when you perform an operation that is wrong for a certain type of error, or when a command does not exist. The client library should raise an exception when an error type is returned. The following is an example of an error type

-ERR unknown command 'foobar' -WRONGTYPE Operation against a key holding the wrong kind of value


The string after the “-” sign followed by a space or before a newline character represents the type of error returned, which is only a convention and not the format required by RESP. For example, ERR is a general error, and WRONGTYPE is a more specific error indicating that the client is attempting to perform an operation on the wrong type of error. This is called an error prefix and allows the client to more easily identify the type of error.


The client may return different exceptions for different errors, or it may just provide a general method to catch the error and provide the error name. But you can’t rely on these features provided by the client, because some clients just return general errors, such as false.

 High Performance Redis Protocol Analyzer


Even though the Redis protocol is very human-readable and simple to define, the implementation can still perform as fast as a binary protocol.


Because the Redis protocol puts the length of the data before the body of the data, the program doesn’t have to scan the entire payload for a special character, as it does with JSON, and it doesn’t have to quote the payload as it is sent to the server.


The program can look for CR characters while processing individual characters in the protocol text and calculate the length of a batch reply or multiple batch replies, like this:

#include <stdio.h>

int main(void) {
    unsigned char *p = "$123\r\n";
    int len = 0;

    p++;
    while(*p != '\r') {
        len = (len*10)+(*p - '0');
        p++;
    }

    /* Now p points at '\r', and the len is in bulk_len. */
    printf("%d\n", len);
    return 0;
}


Once the length of a batch reply or multiple batch replies has been obtained, the program simply calls the read function once to read the body of the reply into memory without having to do any processing of the data. The CR and LF at the end of the reply are not processed and are discarded.


The implementation of the Redis protocol is comparable to that of binary protocols, and due to the simplicity of the Redis protocol, it can be easily implemented in most high-level languages, which greatly reduces the number of bugs in client software.

 Cold Fact: How fast is redis?


After successfully installing Redis, Redis comes with a command redis-benchmark that can be used for performance testing. By running this command, we can simulate a scenario where N clients are sending requests at the same time and monitor the time it takes for Redis to process these requests.


According to official documents, Redis after more than 60,000 connections in the benchmark test, and still be able to maintain the efficiency of 50,000 q / s in these conditions, the same amount of requests if you hit the MySQL, it must not be able to carry, directly on the collapse. It is also for this reason that Redis often exists as a cache, which can play a role in protecting the database.


Can see ah, Redis claimed 100,000 throughput really did not brag, and in the future you can pretend that the interview inadvertently mention this order of magnitude, found that a lot of people on the “100,000 level”, “million level” of such quantities often used indiscriminately, to be able to It’s a plus to be able to say it more accurately.

By lzz

Leave a Reply

Your email address will not be published. Required fields are marked *