Regular expression is one of extremely useful technique to extract matching patterns from text. It has been used in many places, such as simple text editor, user input validation or even complex data search engine use in order to search or validate data. This is really really important(Hot topic) for developers nowadays. Beginning of my career also faced same difficulty to understand what the hell of %#$@! characters are doing and now I’ll share my knowledge with you. First thing is try to understand each character responsibility and just play around with it.
Anchors and Boundaries
^
Matches the beginning of a line or beginning of the string depend on multiline on. (But when [^inside brackets], it means “no”)
$
Match the end of a line or end of string depend on multiline on.
[…]
One of the characters in the brackets
–
Range indicator ([0-9] this is from 0 to 9 numbers)
[^x]
One character that is not x
[…]{n}
One of the characters in the brackets n times.
\b
Boundary of word, i.e., start-of-word or end-of-word
\B
Inverse of \b, non-start-of-word or non-end-of-word.
\<, \>
Start-of-word and end-of-word respectively, similar to \b
One character that is not a digit as defined by your engine
\W
One character that is not a word character as defined by
\S
One character that is not a whitespace character
Quantifiers
+
One or more repetition
*
Zero or more times
?
Once or none
{3}
Exactly three times
{2 ,4}
Two to four times
{3, }
Three or more times
Logic
|
Alternation / OR operand
( … )
Capturing group
\
Escape character if you are using $,.&^ characters in search
Play Time
Sample 1 : How to validate an Email Address
Let validate this email address user@gmail.com in order to do that we know one @ sign and one . Sign exist in every email.
Step 1: validate first text before @ sign. so that can we use any letter and numbers and underscore, butt can’t be space, special characters like $%^&.
Go back to above table and find suitable way. We can use Range validate or special character base on your case. That mean first part will be \w+
Step 2: @ sign validation, we can use @
Step 3: domain validation we know always format will be name.com, name.net, name.io
Validation will be \w+\.\w+
When join all together \w+@\w+\.\w+ we can use something like this [0-9a-z]+@[0-9a-z]+.[0-9a-z]+ or else [a-z,@,\.]+
You can see that, there are many ways to validate email address using regex, expression will be changed base on what you want to validate.
Sample 2 : Validate mobile number lets validate simple format first – 78585258 easiest validation is \d+, but there are many ways [0-9]+
lets validate +1-202-555-0171 Its start with + so we have to use \+, 3 digit number repeat twice so we can use (-\d{3}) this represent -202 or -555 so (-\d{3}) {n}
+\d(-\d{3}){3}\d or else +\d(-\d{3}){2}(-\d{4}){1}
Again, there are many ways to do that.
Golden Rule here is, remembering all of table symbols and their limitation, and just play around with it. Remember each programing language has its own slightly different characters, but overall most of symbol mention above are common across all of programming languages.
Your article is really helpful. thnks…