Saturday, July 28, 2012

Regular expressions - groups and alternations (Part 3)

In this post, we will talk about Groups. Groups in regular expressions allows you to perform different operations such as alternations, sub patterns, quantifiers, etc.

Question: Find all the occurrences of the, The and THE.

Here we will utilize the notion of groups and alternations. Alternation gives you a choice of alternate patterns to match. Let’s try a few solutions to this problem.

  1. The simplest solution is (the|The|THE). This matches any of the 3 different alternatives in the group.
  2. RegEx also has the notion of options. Options let you specify the way you would like to search for a pattern. For our example, we are interested in the ignore case option - (?i). Using this option, we can solve the above problem as (?i)the.
  3. A solution utilizing sub patterns can look like (tT)(hH)(eE).

Question: Find all the even numbers between 0 and 99.

The solution requires you to think and use two alternates – one for 0 to 9 set and the other for remaining using grouping. One possible solution is \b[24680]\b|\b[1-9][24680]\b.

Question: Identify hexadecimal numbers in a string of numbers.

This one is simple. [a-fA-F0-9].

Question: Ignore all vowels in the given text.

In this question, you need to use the negation operator ^. Note that the caret (^) at the beginning of the class means “No, I don’t want these characters.” (The caret must appear at the beginning.). Given this knowledge, the answer is quite simple – [^aeiou]


