## Friday, July 27, 2012

### Regular Expressions interview questions–part 2

In our previous post, we talked about regular expression character sets, groups, quantifiers, shorthand and gives a full solution to matching a a 10-digit, US phone number, with or without parentheses, hyphens, or dots and optional 3 digit area code. In this post, we well dive deeper into some examples of pattern matching.

Regular expressions at their core are all about pattern finding and matching – strings, digits, letters or characters.

Question: Show different ways of matching digits and non-digits using regular expressions.

There are many ways to match a single digit

• \d
• [0-9]
• [0123456789]
• [012] – matches to either 0, 1 or 2

Similarly, there are many ways to match to a non-digit.

• \D
• [^0-9]
• [^\d]

A couple of notes:

1. ^ signals negation of the expression to the processor.
2. \D matches whitespace, punctuation, quotation marks, hyphens, forward slashes, square brackets, and other similar characters.

Question: Explain the different ways to match words and non-words.

•  \w is the main character shorthand for matching letters and numbers. Consider it to match alpha numeric characters.
• [a-zA-Z0-9] is the same as \w.
• Note that \D matches whitespace also; \w does not.
• \W (capital W) matches non-word matches - whitespace, punctuation, and other kinds of characters that
aren’t used in words.
• [^a-zA-Z0-9] is the same as \W (capital).

Question: Write a regular expression to match anything that is a non-space.

• Space (blank) can be matched by \s. So to match non-space character, we can use \S.
• Another equivalent way to match non-space would be [^\s]
• Yet another way would be [^ \t\n\r]

Question: Write a regular expression to match words that are exactly 7 characters in length and start with P and end with D.

In this question, the interviewer is testing your ability to understand word boundaries and repeat notations. The following expression solves this question: \bP.{5}D\b.

To dissect the expression:

• The shorthand \b matches a word boundary, without consuming any characters.
• The characters P and D also bound the sequence of characters.
• .{5} matches any five characters.
• Match another word boundary with \b.