regular expression (math.)

 1.1 Basic grammar

 A review of the basics of regular expressions through a chart

single char Quantifiers (number)  position
 \d Matching numbers  * :: 0 or more  ^ Beginning of line
 \w Match word (numbers, letters)  + 1 or more, at least 1  End of $ line
 \W Match non-word (numbers, letters) The following are some of the options that can be used in this program. 0 or 1, an Optional  \b Word bounds

\s match white space (including spaces, tabs, etc.)
 The number of occurrences of {min,max} is within a range

\S matches non-white space (including spaces, tabs, etc.)
 {n} matches n occurrences of
 . Match any, any character

1.1.1. single char

 Suppose you have a paragraph of characters as follows:

  • \w

 will match all words, except, of course, characters like () – etc.

  • \w\w\w


Found matches for ‘ The se are som e pho ne number s …’ Note that regular expressions are rules for matching a continuous string, so you can see that three-letter words can be matched, as well as six-word ones.

  • \s\s

 Match to two consecutive spaces in a line

quantifiers

 Suppose we have this passage:

The colors of the rainbow have many colours 
and the rainbow does not  have a single colour.


We’re trying to find all the colors colors colours colour


Answers colou?rs? Well, it looks simple and easy.



Well, now you want to match 4 numbers in a line, or 5 letters in a line, etc. This is where quantifiers come in handy.

 I’m looking for words with 5 letters.


  • \w{5} Is that okay? Hmmm… No, look at what it matches, as follows: ‘ These are some phone numbe rs 915-555-1234…’ Indeed, our template is very simple, it only looks for sequences of 5 consecutive letters in a line. So it’s good to improve it now


  • \w{5}\s To be able to find the word, so I want 5 letters followed by a sequence of spaces, that should do it, look at the match: ‘ These are some phone nu mbers 915-555-1234…’ Well, yes, with only these current methods, it can’t be done. So, we need a third tool “position”

1.1.2. position


Before returning to the earlier question, familiarize yourself with ^ $ and the \b

This is somthing
is about
a blah
words
sequence of words
Hello and
GoodBye and 
Go gogo!

  Take a look at what the various rules match


  • \w+ There should be no doubt that this one matches all the words


  • ^\w+ There is an extra ^ which, in this way, only matches the word at the beginning of each line This is a words sequence Hello GoodBye Go


  • \w+$ This will match the last letter of each line

 Back to the earlier question.

 Now trying to find words with 5 letters.


It becomes very simple to use the word conjunctions \b

 The answer. \b\w{5}\b

 1.1.3 Get a phone number.

 Finally, look for a phone number that just came up 123-456-1231


The most basic regularization method is \d{3}-\d{3}-\d{4} , and that’s how you find it. But sometimes, the phone number is 123.456.1234 or (212)867-4233 structure how to do?


The regular expression or other expressions are described below.

 1.2 Character classes


The previous section documented the most basic methods, followed by the classifiers []


This symbol is used to represent the logical relationship , for example [abc] means a or b or c. [-.] means the symbol - or . (note here that the . symbol in [] represents this symbol, but if it is outside, it means a match all. So if it is not in [] , and you want to match ‘. , you have to use the escape symbol \. )

 1.2.1 Simple applications of classification

 Character Sequence:

The lynk is quite a link don't you think? l nk l(nk

 Regular Expressions: l[yi (]nk

so:

lynk  link  l nk   l(nk


It’s easy to understand, it’s expressing logic.

 1.2.2. match all possible phone numbers


Ok, now back to the previous legacy, there are the following fields, please match all possible phone numbers:

These are some phone numbers 915-134-3122. Also,
you can call me at 643.123.1333 and of course,
I'm always reachable at (212)867-5509


Okay, step by step, we just used \d{3}-\d{3}-\d{4} to match the hyphenated case. Now we can easily add the . case to it

 Step one: \d{3}[-.]\d{3}[-.]\d{4}


Step 2: To be able to match the brackets, you can use ? to, since this is an option choice. So you end up with

\(?\d{3}[-.)]\d{3}[-.]\d{4}


It is still important to note that in [], special characters do not need to be escaped and can be used directly, such as [.()] ,but outside, it is necessary to escape \( \. , etc.

 1.2.3. Special syntax of []


The simplest and most basic functions have just been described, but there are some special points to note

  1.  -When the concatenator is the first character


For example, [-.] means the hyphen - or the dot . . However, when the hyphen is not the first character, as in the case of [a-z] , this means from the letter a to the character z.

  1.  ^ in [].


^ In the previous introduction, it means the beginning of a line, but in [] , it has a different meaning. [ab] It means a or b [^ab] anything except a or b (anything except a and b), which is equivalent to the inverse of

 1.2.4. [] and ()


In addition to using [] for or logic, () is also possible. The usage is (a|b) for a or b

 For example, the following example matches all emails

[email protected]  
[email protected] 
[email protected]

 The first thing to think about is what exactly I’m matching, and here’s what I’m trying to match


  1. Any one beginning with words, one or more \w+

  2. Immediately followed by a @ symbol \w+@

  3. Followed by one or more words \w+@\w+

  4. followed by a . punctuation \w+@\w+\.

  5. followed by a com net or edu \w+@\w+\.(com|net|edu)


Still drawing attention to the \. escape symbol in step 4


Well, this can match all the above mailboxes. But there is still a problem, because the mailbox username can have . , such as [email protected]

 It’s still really simple, and the fix is as follows: [\w.]+@\w+\.(com|net|edu)

 1.2.5 Summary


  1. [] The role of the English expression is “alternation”, expressing the logic of an or;

  2. /[-.(]/ The hyphen - in a symbol is placed first to indicate the hyphen itself, or in the middle to indicate “from… to…”. to…” For example, [a-z] means a-z

  3. [.)] Special symbols in parentheses indicate themselves without being escaped

  4. [^ab] ^ in parentheses means not, anythings except a and b

  5. (a|b) Can also mean choice, but it has more power ….


So what is the powerful feature of () ? Grouping capture, which is helpful for sequence substitution, swapping. Learning logging in a later section

1.3. capturing groups

 What is group capture, now back to the previous phone number example

212-555-1234
915-412-1333

👇👇👇👇👇👇👇👇👇👇👇👇

212-xxx-xxxx
915-xxx-xxxx


Following the previous practice of \d{3}-\d{3}-\d{4} ,this kind of matching is to match the whole phone number as a group (group). We call 212-555-1234 such as Group0 .


At this point, if we add a parenthesis \d{3}-(\d{3})-\d{4} then the match to 555 is called Group1 . By analogy, if there are two parentheses \d{3}-(\d{3})-(\d{4}) then the grouping is the following:

212-555-1234   Group0
555            Group1
1234           Group2

  1.3.1 Selection of groups

 Now that the groups have been divided, how do I select the groups that have been matched?


There are two methods here, the first uses the $ symbol, such as $1 for 555 , $2 for 1234 ; the second, uses \ ,such as \1 for 555 . The two kinds of use scenarios are different, let’s start with $

 Now to fulfill the very first requirement, we can do this

reg: \(?(\d{3})[-.)]\d{3}[-.]\d{4}

replace: $1-xxx-xxxx


ps: Here you can directly use the JS replace function to operate, but the regular is not exclusive to JS, so here is the first introduction to the general method, and then summarize the JS part of the

 1.3.2 Scenario-based training


  1. Now there is a list list but the last name and first name are reversed and I need to swap him over
shiffina, Daniel
shifafl, Daniell
shquer, Danny
...

  Realization method.

reg: (\w+),\s(\w+)

replace: $2 $1


Note: $0 is all matches to, so the first one with brackets is the $1


  1. Match link tags in markdown and replace with html tags
[google](http://google.com)
[itp](http://itp.nyu.edu)
[Coding Rainbow](http://codingrainbow.com)

  Ans: This question is a bit of a pitfall and you need to take your time.


The first thing I wanted to consider when I saw this was matching the [google] thing, and immediately thought of the regular expression \[.*\] . This one is a huge pitfall, and at the current time, it does match the three above correctly. But if the text looks like this:


As you can see, the first line will match all the way down, without being able to distinguish between [google] and [test] . The reason for this is that . is greedy, he means all, all that can be matched, so of course it includes ] , and it doesn’t stop until the last one in the line, ] .


So in order for it to match correctly, this greedy attribute needs to be removed. Here is used. When ? is placed after the quantifiers symbol, it means that the greedy attribute is removed and the match stops when the termination condition is reached.


\[.*?\] In this way, you can separate [google] and [test] , the effect is as follows:

 Finish everything next:

reg: \[(.*?)\]\((http.*?)\)

replace: <a href="$2">$1</a>

 1.3.3. Using the \ selector


$ Selectors are flags or selections made at the time of substitution, but if in the regular expression itself, it’s time to use \ to select. For example the following scenario

This is is a a dog , I think think this is is really
a a good good dog. Don't you you thinks so so ?


We want to match sequential sequences such as is is so so , so we use the following expression. (\w+)\s\1


Well, it almost works, but there are a few minor bugs, such as the first sentence, This is is a , which doesn’t match correctly, and matches the last letter of the first This. This uses the character boundaries \b mentioned in the first section, which becomes \b(\w+)\s\1\b


Well, the big job is done, so I won’t post the results, just make up your own mind.

 1.3.4 Summary


  1. Grouping capture, use () for data grouping, number 0 represents the entire match, selected groups start at number 1

  2. The selector can be used with $1 and \1 , but in different scenarios, \ is used for regular expressions themselves

  3. ? The symbol disables the greedy attribute, and is placed after .* to indicate that a single match can be stopped when it encounters the focus. Otherwise, it will keep matching backwards.

 1.4. in JavaScript


In js, the main regular expressions are involved in the application of string.

var str = "hello"
var r = /w+/


These are the literal creation methods for string and reg respectively. The methods r.test() and str.match() as well as str.replace are used when regulars are to be used for manipulation.

1.4.1. reg.test()


The regular expression itself has a test method, which can only test for inclusion and returns a bool variable.

var r = /\d{3}/;
var a = '123';
var b = '123ABC';
var c = 'abc';

r.test(a)  //true
r.test(b) //true
r.test(c) //false


Well, this one is pretty simple and not used practically much, so here are some ways to focus on the str.

1.4.2. str.match()


Unlike test(), instead of just returning the bool variable, it will return what you matched to.

var r = /compus/
var reg = /w+/
var s = "compus, I know something about you"
r.test(s)  //true
s.match(r)  //["compus"]
s.match(reg) //["compus"]


Wait, there’s something wrong. Why is the last one returned “compus”? That’s not scientific.


Well, actually, match() returns the first sequence that can be matched. To achieve the previous effect, you need to use a couple of flags in JS regarding regularity

1.4.2.1. flag

 This flag should be present at the time of the creation of the rule, and there are three main ones

flagsense
g All of them. Match me up with all of them.
i ignore capitals
m multilinear matching


So to solve the problem, just set up the reg like this

var reg = /w+/g

  Look at the following exercise

var str = "Here is a Phone Number 111-2313 and 133-2311"

var r = /\d{3}[-.]\d{4}/
var rg = /\d{3}[-.]d{4}/g

console.log(str.match(r)); //["111-2313"]
console.log(str.match(rg));//["111-2313","133-2311"]


Well, finding phone numbers, yes, is convenient. But there’s another question… I was talking about grouping, so does match return the grouping?

var sr = /(\d{3})[-.]\d{4}/
var srg = /(\d{3})[-.]\d{4}/g

console.log(str.match(sr)); //["111-2313","111"]
console.log(str.match(srg)); //["111-2313","133-2311"]


So the conclusion is: when the global flag g is used, it will not return the group, but all the matched results; if g is not used, it will return the matched results and the group as an array.

 So how do you implement global grouping?

1.4.3. reg.exec()


Literally, the regular expression execution method. This method enables matching globally and returns grouped results.


reg.exec() each call, return a matching result, matching results and grouping in the form of an array to return, the next call can be returned to the next result, until the return of null

var str = "Here is a Phone Number 111-2313 and 133-2311" ;
var srg = /(\d{3})[-.]\d{4}/g;
var result = srg.exec(str);
while(result !== null) {
    console.log(result);
    result = srg.exec(str);
}


The result may contain more than meets the eye, it is an array of, for example, the first execution, who results in:

["133-2311", "133", index: 36, 
input: "Here is a Phone Number 111-2313 and 133-2311" groups: undefined]

1.4.4. str.split


Now comes to a stronger function, first of all splitting, we know that split is a string according to a certain character separated, for example, there is the following paragraph, you need to split it into words.

var s = "unicorns and rainbows And, Cupcakes"


The first thing that comes to mind when splitting into words is to separate them by spaces, so this can be done in the following way

var result = s.split(' ');
var result1 = s.split(/\s/);
//["unicorns", "and", "rainbows", "And,", "Cupcakes"]


Well, that doesn’t reflect the power of regularity, and most of all, it doesn’t fulfill the requirement. Because there is another “And,”. So I’m going to use a regular, and the match condition is 

result = s.split(/[,\s]/);

//["unicorns", "and", "rainbows", "And", "", "Cupcakes"]


The result is still different from what is needed, because there is an extra “”. We don’t want to make it split based on ,the basis should be . Adding a + to the original base and changing it to /[,\s]+/ , the meaning of this is 

result = s.split(/[,\s]+/);
// ["unicorns", "and", "rainbows", "And", "Cupcakes"]

  1.4.4.1. word segmentation


Well, to expand on that, a regular expression that implements word splitting for a paragraph is

result = s.split(/[,.!?\s]+/)

  Of course, there’s an easiest way to go about it, and we can go about it like this

result = s.split(/\W+/);


Next, if we want to separate all of the sentences in a paragraph, an achievable expression would be

result = s.split(/[.,!?]+/)


Finally, there is a small requirement to split sentences while keeping the corresponding separators.

var s = 
"Hello,My name is Vincent. Nice to Meet you!What's your name? Haha."


It’s a little ponit, remember that if you want to keep the separators, just group the matches together

var result = s.split(/([.,!?]+)/)
//["Hello", ",", "My name is Vincent", ".", " Nice to Meet you", "!", "What's your name", "?", " Haha", ".", ""]

  As you can see, this stores the separators as well.

1.4.5. str.replace()


replace is also a string method, its basic usage is str.replace(reg,replace|function) , the first parameter is a regular expression representing the match, the second parameter is the replacement string or a fallback function.


Note that replace doesn’t modify the original string, it just returns a modified string; except that regular expressions that don’t use the g flag also match/replace the first string, just like match .

 1.4.5.1 Simplest substitution


Replace a vowel letter (aeiou) in a sequence by replacing it with a double. e.g. x->xx

var s = "Hello,My name is Vincent."
var result = s.replace(/([aeiou])/g,"$1$1")
//"Heelloo,My naamee iis Viinceent."


Note that the second argument must be a string; be careful not to forget to add the g


1.4.5.2. Here come the awesome function parameters!


Well, that’s the most powerful part, the second parameter passed into function, let’s look at the simplest example first

var s = "Hello,My name is Vincent. What is your name?"
var newStr = s.replace(/\b\w{4}\b/g,replacer)
console.log(newStr)
function replacer(match) {
    console.log(match);
    return match.toUpperCase();
}
/*
name
What
your
name
Hello,My NAME is Vincent. WHAT is YOUR NAME?
*/


So, the parameters of the function are the content that is matched to, and the return is the content that needs to be replaced. Well, the basic example explains the basic usage, so what about the previously discussed grouping? How to realize the grouping?


function replacer(match,group1,group2) {
    console.log(group1);
    console.log(group2);
}


If regular expressions are handled in groups, then in the callback function, the second and third arguments to the function are group1,group2. this way, you can do a lot of amazing things!

 1.4.5.3 Comprehensive exercise questions

  1.  Determine the character with the most occurrences in a string and count the number of occurrences
var s = 'aaabbbcccaaabbbaaa';
var a = s.split('').sort().join("");  //"aaaaaaaaabbbbbbccc"
var ans = a.match(/(\w)\1+/g);
ans.sort(function(a,b) {
    return a.length - b.length;
})
console.log('ans is : ' + ans[ans.length-1])

  1.4.6 Summary


  1. In js, regular expression literal /reg/ and string literal "str" are used to create regulars and strings. There are two methods on the regular reg.test() and reg.exec()

  2. reg.test(str) method, which returns a boolean variable indicating whether or not there was a match; reg.exec(str) is somewhat similar to an iterator, returning the matches and groupings each time it is executed, until it ends with null .

  3. The three main string methods are str.match(reg) , str.split(reg) and str.replace(reg,str|function) .

  4. match Specifically, if the regular contains a group and does not have the g flag, it returns the match and the group; if it does not have a group and has the g flag, it returns all matches.

  5. split method is mainly used for string splitting, remember to group matches (wrap them in parentheses) if you want to save separators

  6. replace is the most powerful method, when using the fallback function, the return value is the replacement value; the parameters are  group1 group2

By hbb

Leave a Reply

Your email address will not be published. Required fields are marked *