In our previous post, we talked about regular expression character sets, groups, quantifiers, shorthand and gives a full solution to matching a a 10-digit, US phone number, with or without parentheses, hyphens, or dots and optional 3 digit area code. In this post, we well dive deeper into some examples of pattern matching.
Regular expressions at their core are all about pattern finding and matching – strings, digits, letters or characters.
Question: Show different ways of matching digits and non-digits using regular expressions.
There are many ways to match a single digit
- \d
- [0-9]
- [0123456789]
- [012] – matches to either 0, 1 or 2
Similarly, there are many ways to match to a non-digit.
- \D
- [^0-9]
- [^\d]
A couple of notes:
- ^ signals negation of the expression to the processor.
- \D matches whitespace, punctuation, quotation marks, hyphens, forward slashes, square brackets, and other similar characters.
Question: Explain the different ways to match words and non-words.
- \w is the main character shorthand for matching letters and numbers. Consider it to match alpha numeric characters.
- [a-zA-Z0-9] is the same as \w.
- Note that \D matches whitespace also; \w does not.
- \W (capital W) matches non-word matches - whitespace, punctuation, and other kinds of characters that
aren’t used in words. - [^a-zA-Z0-9] is the same as \W (capital).
Question: Write a regular expression to match anything that is a non-space.
- Space (blank) can be matched by \s. So to match non-space character, we can use \S.
- Another equivalent way to match non-space would be [^\s]
- Yet another way would be [^ \t\n\r]
Question: Write a regular expression to match words that are exactly 7 characters in length and start with P and end with D.
In this question, the interviewer is testing your ability to understand word boundaries and repeat notations. The following expression solves this question: \bP.{5}D\b.
To dissect the expression:
- The shorthand \b matches a word boundary, without consuming any characters.
- The characters P and D also bound the sequence of characters.
- .{5} matches any five characters.
- Match another word boundary with \b.
The above solution does not work if there are multiple 7 character words starting with P and ending in D. It only returns the first such word
ReplyDelete