Regular expressions are really flirty, too bad you can't write them!!!!

Regular expressions are available in almost all languages, whether it’s JavaScript on the front-end, or Java or c# on the back-end. They all provide corresponding interfaces/functions to support regular expressions.

But here’s the amazing thing: no matter which computer language you chose in college, there was no course on regular expressions for you to take, and until you learned them, you had to watch the regular gurus, writing strings of alien-like characters, replace the large chunks of if else code you had to use to do some content-checking.

Since I like it, then learn it, but when you Baidu out of a bunch of related information, you find that none of the exceptions to the boring to the extreme, difficult to learn (honestly, when Ignore is also such a mentality 😂😂😂).

Below, the unreasonable gentleman tries to use a more popular way to talk about the rules, so that you can after reading, write some simple rules, and then not good, can see the rules written by others, that’s not bad.

1. Metacharacters

Everything has a source, and so does a regular, and a metacharacter is a fundamental element in the construction of a regular expression.
Let’s start by memorizing a few common metacharacters:


.	Matches any character except newlines
\w	Match letters or numbers or underscores or Chinese characters
\s	Match any blank character
\d	matching number
\b	Match the beginning or end of a word
^	Match the start of the string
$	Matches the end of the string

With the metacharacters in place, we can use them to write some simple regular expressions, the
比如：

Matches strings that start with abc:

1\babc or ^abc

Match the 8-digit QQ number:

1^\d\d\d\d\d\d\d\d$

Match 11-digit cell phone numbers starting with 1:

1^1\d\d\d\d\d\d\d\d\d\d$

2. Repeated qualifiers

With metacharacters you can write a lot of regular expressions, but if you are careful, you may find that the rules written by others are concise and clear, while the rules written by Ignore are composed of a bunch of messy and repetitive metacharacters. Doesn’t regular provide a way to deal with these repeated metacharacters?

The answer is yes!

In order to deal with these repetitions, regular expressions in some of the repetitive qualifiers, the repetitive part of the replacement with the appropriate qualifier, we look at some of the qualifiers below:


*	Repeat zero or more times
+	Repeat one or more times
?	Repeat zero or one
{n}	Repeat n times
{n,}	Repeat n or more times
{n,m}	Repeat n to m times

With these qualifiers in place, we can transform the previous regular expression, for example:

Match the 8-digit QQ number:

1^\d{8}$

Match 11-digit cell phone numbers starting with 1:

1^1\d{10}$

The matching bank card number is a 14 to 18 digit number:

1^\d{14,18}$

Matches strings starting with a and ending with zero or more b’s.

1^ab*$

3. Grouping

From the above example (4), we see that the * qualifier is the closest character to his left, so the question is, if I want to ab at the same time by the * qualifier that how to do it?

Regular expressions use parentheses () for grouping, that is, the contents of the parentheses as a whole.

So when we want to match more than one ab, we can do this
E.g. Match strings containing 0 to more than one ab at the beginning:

1^(ab)*

4. Transposition

We see that regular expressions use parentheses for grouping, so here’s the problem:

If the string to be matched contains parentheses itself, is that a conflict? What should I do?

In this case, the regular provides a way to escape, that is, to escape these metacharacters, qualifiers or keywords into ordinary characters, the practice is very simple, that is, in front of the character to be escaped with a slash, that is, \ can be.
E.g., to match starting with (ab):

1^(\(ab\))*

5. Conditional or

Back to our cell phone number matching, we all know: domestic numbers are from the three networks, they have their own number, such as Unicom has 130/131/132/155/156/185/186/145/176 and other segments, if we are allowed to match the number of a Unicom, that according to what we have learned so far the rules, it should be impossible to get started, because it contains some parallel conditions, that is, “or”, then in the rules is how to say “or”?

Regulars use the symbol | to denote or, which is also called a branching condition. When any of the branching conditions in a regular are met, it is treated as a successful match.

Then we can use the or condition to deal with the problem

1^(130|131|132|155|156|185|186|145|176)\d{8}$

6. Interval

Seeing the examples above, do you see a pattern? Is there still an urge to simplify?
Actually, there is.

The regular provides a metacharacter center bracket [] to indicate an interval condition.
The limit 0 to 9 can be written as [0-9].
Limit A-Z to [A-Z].
Limited to certain numbers [165]

Then the above regular we also change to this:

1^((13[0-2])|(15[56])|(18[5-6])|145|176)\d{8}$

Well, the basic use of regular expressions here, in fact, it has a lot of knowledge and metacharacters, we only list some of the metacharacters and syntax to speak, aiming at those who do not understand the rules or want to learn the rules, but there are people who can not read the document to do a quick entry-level tutorials, see the end of the tutorials, even if you can not write a high level of the rules, at least you can also write some simple regular or understand others to write the rules, if you need advanced learning, then rely on you to practice well.

Found this article helpful? Please share it with more people !

Regular expressions are really flirty, too bad you can’t write them!!!!

1. Metacharacters

2. Repeated qualifiers

3. Grouping

4. Transposition

5. Conditional or

6. Interval

By lzz

Related Post

Leave a Reply Cancel reply

You Missed

8 Python practical scripts, save them for future use!

Python logging library logging summary – probably the best article summarizing the logging library so far

I hear you know Python?

An article on collection manipulation functions in Kotlin