Please navigate to the bottom of the page for Table of Contents

Friday, July 27, 2012

Regular Expressions interview questions–part 2

In our previous post, we talked about regular expression character sets, groups, quantifiers, shorthand and gives a full solution to matching a a 10-digit, US phone number, with or without parentheses, hyphens, or dots and optional 3 digit area code. In this post, we well dive deeper into some examples of pattern matching.

Regular expressions at their core are all about pattern finding and matching – strings, digits, letters or characters.

Question: Show different ways of matching digits and non-digits using regular expressions.

There are many ways to match a single digit

  • \d
  • [0-9]
  • [0123456789]
  • [012] – matches to either 0, 1 or 2

Similarly, there are many ways to match to a non-digit.

  • \D
  • [^0-9]
  • [^\d]

A couple of notes:

  1. ^ signals negation of the expression to the processor.
  2. \D matches whitespace, punctuation, quotation marks, hyphens, forward slashes, square brackets, and other similar characters.

Question: Explain the different ways to match words and non-words.

  •  \w is the main character shorthand for matching letters and numbers. Consider it to match alpha numeric characters.
  • [a-zA-Z0-9] is the same as \w.
  • Note that \D matches whitespace also; \w does not.
  • \W (capital W) matches non-word matches - whitespace, punctuation, and other kinds of characters that
    aren’t used in words.
  • [^a-zA-Z0-9] is the same as \W (capital).

Question: Write a regular expression to match anything that is a non-space.

  • Space (blank) can be matched by \s. So to match non-space character, we can use \S.
  • Another equivalent way to match non-space would be [^\s]
  • Yet another way would be [^ \t\n\r]

Question: Write a regular expression to match words that are exactly 7 characters in length and start with P and end with D.

In this question, the interviewer is testing your ability to understand word boundaries and repeat notations. The following expression solves this question: \bP.{5}D\b.

To dissect the expression:

  • The shorthand \b matches a word boundary, without consuming any characters.
  • The characters P and D also bound the sequence of characters.
  • .{5} matches any five characters.
  • Match another word boundary with \b.

6 comments:

  1. The above solution does not work if there are multiple 7 character words starting with P and ending in D. It only returns the first such word

    ReplyDelete
  2. use g identifier as in m/$pattern/g;

    ReplyDelete
  3. this will work.[P]\B.{5}[D]\b

    ReplyDelete
  4. It’s going to be finish of mine day, however before finish I am reading this fantastic post to increase my knowledge.

    I love this site – its so usefull and helpfull Pakar Seo this website is extremely helpful Pakar Seo its so usefull and helpfull Pakar Seo I just want to say I’m newbie to blogging and site-building and actually liked this blog site. Pakar Seo Very likely I’m planning to bookmark your blog post . You amazingly have awesome article content. Regards for revealing your blog site. Pakar Seo ....

    ReplyDelete
  5. Ni Hau,


    Allow me to show my gratitude bloggers. You guys are like unicorns. Never seen but always spreading magic. Your content is yummy. So satisfied.



    I'd like to post some questions
    concerning the next listed code.
    The code is from the Core Java book (the second volume).
    There are two classes: TreeSetTest and class Item which implements Comparable.
    The first class, which contains the main method is straight foreward and does not
    cause any problems for me.




    I read multiple articles and watched many videos about how to use this tool - and was still confused! Your instructions were easy to understand and made the process simple.


    Best Regards,
    Ajeeth

    ReplyDelete