Please navigate to the bottom of the page for Table of Contents

Friday, July 27, 2012

Regular Expressions interview questions–part 2

In our previous post, we talked about regular expression character sets, groups, quantifiers, shorthand and gives a full solution to matching a a 10-digit, US phone number, with or without parentheses, hyphens, or dots and optional 3 digit area code. In this post, we well dive deeper into some examples of pattern matching.

Regular expressions at their core are all about pattern finding and matching – strings, digits, letters or characters.

Question: Show different ways of matching digits and non-digits using regular expressions.

There are many ways to match a single digit

  • \d
  • [0-9]
  • [0123456789]
  • [012] – matches to either 0, 1 or 2

Similarly, there are many ways to match to a non-digit.

  • \D
  • [^0-9]
  • [^\d]

A couple of notes:

  1. ^ signals negation of the expression to the processor.
  2. \D matches whitespace, punctuation, quotation marks, hyphens, forward slashes, square brackets, and other similar characters.

Question: Explain the different ways to match words and non-words.

  •  \w is the main character shorthand for matching letters and numbers. Consider it to match alpha numeric characters.
  • [a-zA-Z0-9] is the same as \w.
  • Note that \D matches whitespace also; \w does not.
  • \W (capital W) matches non-word matches - whitespace, punctuation, and other kinds of characters that
    aren’t used in words.
  • [^a-zA-Z0-9] is the same as \W (capital).

Question: Write a regular expression to match anything that is a non-space.

  • Space (blank) can be matched by \s. So to match non-space character, we can use \S.
  • Another equivalent way to match non-space would be [^\s]
  • Yet another way would be [^ \t\n\r]

Question: Write a regular expression to match words that are exactly 7 characters in length and start with P and end with D.

In this question, the interviewer is testing your ability to understand word boundaries and repeat notations. The following expression solves this question: \bP.{5}D\b.

To dissect the expression:

  • The shorthand \b matches a word boundary, without consuming any characters.
  • The characters P and D also bound the sequence of characters.
  • .{5} matches any five characters.
  • Match another word boundary with \b.

2 comments:

  1. The above solution does not work if there are multiple 7 character words starting with P and ending in D. It only returns the first such word

    ReplyDelete
  2. use g identifier as in m/$pattern/g;

    ReplyDelete