Saturday, July 28, 2012

Regular expressions - groups and alternations (Part 3)

This is part 3 in the series on regular expressions programming interview questions and answers. Here are the links to the previous posts:

In this post, we will talk about Groups. Groups in regular expressions allows you to perform different operations such as alternations, sub patterns, quantifiers, etc.

Question: Find all the occurrences of the, The and THE.

Here we will utilize the notion of groups and alternations. Alternation gives you a choice of alternate patterns to match. Let’s try a few solutions to this problem.

  1. The simplest solution is (the|The|THE). This matches any of the 3 different alternatives in the group.
  2. RegEx also has the notion of options. Options let you specify the way you would like to search for a pattern. For our example, we are interested in the ignore case option - (?i). Using this option, we can solve the above problem as (?i)the.
  3. A solution utilizing sub patterns can look like (tT)(hH)(eE).

Question: Find all the even numbers between 0 and 99.

The solution requires you to think and use two alternates – one for 0 to 9 set and the other for remaining using grouping. One possible solution is \b[24680]\b|\b[1-9][24680]\b.

Question: Identify hexadecimal numbers in a string of numbers.

This one is simple. [a-fA-F0-9].

Question: Ignore all vowels in the given text.

In this question, you need to use the negation operator ^. Note that the caret (^) at the beginning of the class means “No, I don’t want these characters.” (The caret must appear at the beginning.). Given this knowledge, the answer is quite simple – [^aeiou]