Talk about consistency


Consistency is when the data remains consistent, which in a distributed system is understood to mean that the values of the data in multiple nodes are consistent.


  • Strong consistency: this level of consistency is the most intuitive to the user, it requires that what the system writes, what it reads out will also be what the user experience is good, but the implementation tends to have a large impact on the performance of the system

  • Weak Consistency: This level of consistency constrains the system from committing to read the written value immediately after a successful write, and does not commit to how long it will take for the data to reach consistency, but will try to ensure that the data reaches a consistent state after a certain level of time (e.g., seconds).

  • Final consistency: final consistency is a special case of weak consistency, the system will ensure that in a certain period of time, can achieve a consistent state of data. Here the reason why the final consistency alone, because it is weak consistency in a very respected consistency model, but also in the industry in a large distributed system of data consistency on the more respected models

 Three classic caching patterns


Caching can improve performance and relieve database pressure, but using caching can also lead to data inconsistency problems. How do we generally use caching? There are three classic caching patterns:

  • Cache-Aside Pattern
  • Read-Through/Write through
  • Write behind

Cache-Aside Pattern


The Cache-Aside Pattern, or bypass caching pattern, was proposed to solve the problem of data inconsistency between the cache and the database as much as possible.

 Cache-Aside Read Process


The read request flow for the Cache-Aside Pattern is as follows:

  1.  When reading, read the cache first, and if the cache hits, return the data directly.

  2. If the cache doesn’t hit, it goes to read the database, takes the data out of the database, puts it in the cache, and returns the response at the same time.

 Cache-Aside Write Process


The write request flow for Cache-Aside Pattern is as follows:

 When updating, update the database first, then delete the cache.


Read-Through/Write-Through


In the Read/Write Through pattern, the server uses the cache as the primary data store. The application interacts with the database cache through the abstraction cache layer.

Read-Through

 The brief process of Read-Through is as follows

  1.  Read data from the cache, read directly back to the

  2. If it can’t be read, it is loaded from the database, written to the cache, and then the response is returned.


Isn’t this brief process very similar to Cache-Aside? In fact, Read-Through is just one more layer of Cache-Provider, and the flow is as follows:


Read-Through is actually just a layer of encapsulation on top of Cache-Aside, which makes the program code cleaner and reduces the load on the data source.

Write-Through


In Write-Through mode, when a write request occurs, the data source and cache data are also updated by the cache abstraction layer, and the process is as follows:

 Write behind (asynchronous cached writes)


Write behind is similar to Read-Through/Write-Through in that Cache Provider is responsible for reading and writing to the cache and database. However, there is a big difference: Read/Write Through updates the cache and data synchronously, while Write Behind only updates the cache, not the database directly, and updates the database in a batch asynchronous way.


In this way, the cache and database consistency is not strong, and systems with high consistency requirements should be used with caution. But it is suitable for frequent write scenarios, MySQL’s InnoDB Buffer Pool mechanism uses to this mode.

 When manipulating the cache, do you delete the cache or update it?


In general business scenarios, we use the Cache-Aside pattern. Some of you may ask why Cache-Aside deletes the cache instead of updating it when writing a request.


When we manipulate the cache, should we delete it or update it? Let’s look at an example first:

  1.  Thread A initiates a write operation by updating the database in the first step.
  2.  Thread B initiates another write operation, and the second step updates the database
  3.  Thread B updated the cache first due to network and other reasons
  4.  Thread A Updating the cache.


At this time, the cache stores the data of A (old data), the database stores the data of B (new data), the data is inconsistent, dirty data appears. If the cache is deleted instead of updating the cache, the dirty data problem does not occur.

 Updating the cache has two other disadvantages over deleting it:


  • If you write to a cache value that is the result of a complex calculation, it is a waste of performance to update the cache frequently. If you update the cache frequently, it’s a waste of performance.

  • In the case of more write database scenarios and fewer read data scenarios, the data is often updated before it is read, which also wastes performance (in fact, with more write scenarios, it’s not very cost-effective to use cache anymore)

 In the case of double writing, do you operate the database or the cache first?


Cache-Aside In cache mode, some of you still have questions about why the database is operated first when writing a request. Why don’t you operate the cache first?


Suppose there are two requests A and B. Request A does an update operation and request B does a query read operation.

  1.  Thread A initiates a write operation, the first step del cache
  2.  At this point, thread B initiates a read operation, and the cache misses
  3.  Thread B continues to read the DB and reads out an old data
  4.  Then thread B sets the old data into cache
  5.  Thread A writes the latest data from the DB


There’s a problem with the sauce, the cache and the database don’t match. The cache holds old data and the database holds new data. Therefore, the Cache-Aside caching schema chooses to operate on the database first instead of the cache.

 Cache Delay Double Deletion


Some partners may say, do not necessarily have to operate the database first ah, the use of cache delayed double deletion strategy on it? What is delayed double-deletion?

  1.  Delete the cache first
  2.  Updating the database again
  3.  Hibernate for a while (say 1 second) and delete the cache again.

 This hibernates for a while, how long does it usually take? How long does it usually take?


This hibernation time = time spent reading business logic data + a few hundred milliseconds. To ensure that the read request ends, the write request can delete any cached dirty data that may have been brought in by the read request.

 Delete Cache Retry Mechanism


Whether it is a delayed double deletion or Cache-Aside first operate the database and then delete the cache, if the second step of deleting the cache fails to do so, the deletion failure will lead to dirty data Oh ~!


If the deletion fails, then delete the cache several times to ensure the success of deleting the cache, so you can introduce a retry mechanism to delete the cache.

  1.  Write request to update the database
  2.  Cache deletion failed for some reason
  3.  Put the key that failed to be deleted into the message queue
  4.  Consume messages from the message queue to get the key to be deleted
  5.  Retrying the Delete Cache Operation

 Read biglog asynchronously delete cache


Retry to delete the cache mechanism is okay, is that it will cause so many business code invasion. In fact, you can also asynchronously eliminate the key through the database binlog.


Take mysql as an example, you can use Ali’s canal to collect the binlog logs and send them to the MQ queue, and then confirm the processing of this update message through the ACK mechanism to delete the cache and ensure the consistency of the data cache.

By lzz

Leave a Reply

Your email address will not be published. Required fields are marked *