Please navigate to the bottom of the page for Table of Contents

Thursday, July 26, 2012

Regular expressions interview questions–Part 1

Regular expressions (RegEx for short) are special strings that define patterns for matching specific sets of strings. RegEx are a favorite interview question for many developers as it allows you to quickly quiz an interviewee’s ability to decode a problem into smaller parts without needing to write a lot of code.

There are some excellent online tools available to test your regular expression syntax and match. http://regexpal.com/ is an interesting one to play around with. TextMate on Mac and Notepad++ are good alternatives from a desktop perspective.

In this post, we will review some of the basic regular expressions. In future posts, we will look into constructing more complex patterns.

Question: Develop a regular expression to match a US phone number.

Let us take an example phone number – 425-882-8080 (this also happens to be Microsoft’s main line number so don’t call it unless you absolutely have to ).

  1. The simplest RegEx pattern for this can be the number itself. Yes, that works too. But I don’t think the interviewer would be very happy if you give her this answer.
  2. Using character classes or sets, we can  match a group of characters with or without specifying all of them. For example, [0-9] tells the processor to match any digit in the range of 0 to 9. The square brackets are not literally matched because they are treated specially as meta-characters. A meta-character has special meaning in regular expressions and is reserved. A regular expression in the form [0-9] is called a character class, or sometimes
    a character set. In addition, you can be more specific and specify the digits you want matched. For example, [02468] only matches if the input contains one of 0, 2, 4, 6 or 8. As a next step solution for our problem using character classes, the following RegEx will work (but don’t tell the interviewer that this is your final solution yet): [0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]
  3. Let’s bump up our skills now by using character shorthand. A \d matches any digit. A \D matches any non-digit. Our answer can now be shortened to \d\d\d-\d\d\d-\d\d\d\d (which does an exact match for hyphen(-) or better yet to \d\d\d\D\d\d\d\D\d\d\d\d. Note that instead of using a \D, we could have used a dot (.) which allows you to match to any character.
  4. Note that wrapping a part of a regular expression in parentheses () creates a group. We will learn more about groups in a future post.
  5. To shorten our RegEx more, we can enlist the help of Quantifiers. As the name suggests, quantifiers allow you to specify how many times the preceding expression should match. There are a number of ways to specify a quantifier – \d{3} implies match a digit exactly 3 times; The question mark (?) signifies zero or one; plus sign (+), which means one or more, or the asterisk (*) which means zero or more. Given our new knowledge about quantifiers, our answer can be updated to (\d{3}[-]?){2}\d{4} which will match two non-parenthesized sequences of three digits each, followed by an optional hyphen, and then followed by exactly four digits.
  6. We are almost there. It is now time to make our RegEx more robust, professional, smart and production ready. Let’s add the following features:
    • The area code can be optional
    • allow literal parentheses to optionally wrap the first sequence of three digits
    • The separator character can either be a dot (.) or a hyphen (-)

Our final answer that should be good enough for an interview to match a 10-digit, US phone number, with or without parentheses, hyphens, or dots and optional 3 digit area code  can be ^(\(\d{3}\)|^\d{3}[.-]?)?\d{3}[.-]?\d{4}$

Let’s dissect this RegEx on a character by character basis to make sure we are on the right track:

  • ^ (caret) at the beginning of the regular expression, or following the vertical bar (|), means that the phone number will be at the beginning of a line.
  • ( opens a group.
  • \( is a literal open parenthesis.
  • \d matches a digit.
  • {3} is a quantifier that, following \d, matches exactly three digits.
  • \) matches a literal close parenthesis.
  • | (the vertical bar) indicates alternation, that is, a given choice of alternatives. In other words, this says “match an area code with parentheses or without them.”
  • ^ matches the beginning of a line.
  • \d matches a digit.
  • {3} is a quantifier that matches exactly three digits.
  • [.-]? matches an optional dot or hyphen.
  • ) close capturing group.
  • ? make the group optional, that is, the prefix in the group is not required.
  • \d matches a digit.
  • {3} matches exactly three digits.
  • [.-]? matches another optional dot or hyphen.
  • \d matches a digit.
  • {4} matches exactly four digits.
  • $ matches the end of a line.

In future posts, we will look at some more advanced regular expressions with examples.

33 comments:

  1. HI,
    This is very helpful. Very nicely explained. I have a question for the last step. Seems like you divided the first 3 digits in two groups separated for checking with parenthesis or without parenthesis.
    Going by your earlier logic we can check that using a ? so that will indicate 0 or 1 parenthesis instead of making two separate groups for 425 or (425)
    .
    Thanks.
    Nads.

    ReplyDelete
    Replies
    1. Hi Nikhil,

      Your writing shines like gold! There is no room for gibberish here clearly. You nailed it in Regular expressions interview questions–Part 1!

      I will have the mp3 files my customer buys on a WordPress page and a cart will < direct them to that page AWS Training USA . If I want the mp3 files to be downloaded by the customer is there any reason to protect them except to keep them from being indexed by a search engine? Do I need to have a key or do a get operation other than have server-side encryption in S3?

      Very useful article, if I run into challenges along the way, I will share them here.

      Grazie,
      Kevin

      Delete
  2. (\-(\d)*).* : Though not perfect even it would do..?

    ReplyDelete
  3. Thanks its simple and nice

    ReplyDelete
  4. One more wrinkle: US phone numbers cannot have the number 1 in the first or fourth digit. (Don’t believe me? Name an area code that starts with 1.) Handle this case for super-extra bonus points :)

    ReplyDelete
  5. Nice Explanation....

    ReplyDelete
  6. Very nice explanation.

    ReplyDelete
  7. Thank you, very helpful.

    ReplyDelete
  8. Why do you need the second "^"? isn't the ^ before the parenthesis enough? Also doesn't ^ also mean negation? if so then how I distinguish between the two different meanings in a safe way? thanks. Very good article

    ReplyDelete
  9. ^1?[.- ]?\(?[0-9]{3}\)[.- ]?[0-9]{3}[.- ]?[0-9]{4}
    i think this is pretty comprehensive but not fool proof

    ReplyDelete
  10. Hi There,


    Gasping at your brilliance! Thanks a tonne for sharing all that content. Can’t stop reading. Honestly!

    I am searching for a Java Api which can validate boolean expressions.
    For example:
    This is the rule.... ((A & B) | C) and I have a set of codes that should be validated:
    So the code C should return true and so on....
    Do anybody know any API?

    I am so grateful for your blog. Really looking forward to
    read more.

    Regards,
    Morgan

    ReplyDelete
  11. How about this? "(([\(]?\d{3}[\)]?).)?(\d{3}).(\d{4})"

    ReplyDelete
  12. Hello There,


    Hot! That was HOT! Glued to the Programming Interview Questions and Answers your proficiency and style!x


    I am trying to load a few lines ( many strings separated by a space) from a text file and break them into string tokens and store it as structure fields. This functions should be performed by the load items(item); function.
    However there is an anomaly. When I print the structure fields to check if they have been loaded properly, it turns out they are not!. when I print structure fields outside the load items(item); function the fields do not seem to be stored properly in the array.


    Thank you very much and will look for more postings from you.

    Many Thanks,
    Tina

    ReplyDelete
  13. Hi There,

    I am shocked, shocked, that there is such article exist!! But I really think you did a great job highlighting some of the key Regular expressions interview questions–Part 1 in the entire space.

    We were experimenting with AWS and somehow linked existing accounts . If I click the button to say close account then I get a message stating:

    I look forward to see your next updates.

    Merci,
    Preethi.

    ReplyDelete
  14. Hi Mate,



    Your writing shines like gold! There is no room for gibberish here clearly. You nailed it in Regular expressions interview questions–Part 1


    I can not connect to private accessible RDS instance after rebooting EC2 instance. So far everything worked ok for couple of months and I rebooted EC2 instance many times. When I switch RDS instance to be publicly available there is no problem with connection. AWS Training USA





    Thank you very much and will look for more postings from you.


    Obrigado,
    Ajeeth

    ReplyDelete
  15. Aloha,


    A really interesting, clear and easily readable Regular expressions interview questions–Part 1 article of interesting and different perspectives.I will clap. So much is so well covered here.


    I found it pretty much incomprehensible and unusable. It is ten times too fussy and complicated. Things pop up & down & sideways like a mad thing. AWS Training USA
    Somehow you have to "add" your latest entries to some incremental score or total. I could not understand it.




    THANK YOU!! This saved my butt today, I’m immensely grateful.


    Thanks a heaps,
    Ajeeth

    ReplyDelete
  16. Pleasure too visit this site, it's amazing to me
    How about join us too at http://pokernet88.co/

    Agen Poker Online
    Judi Poker IDN
    Agen Poker Indonesia

    ReplyDelete
  17. My spouse and I stumbled over here coming from a different web address and thought I might as well check things out. I like what I see so i am just following you.
    Look forward to looking into your web page again.

    IDN Poker

    freebet Poker
    game kartu poker
    daftar kartu poker
    domino qiu qiu
    panduan menang poker

    ReplyDelete
  18. Whilst looking for a charging spot for my electric car it came to me. When it comes to buying seafood, it just simply isn’t worth trying to save a few pounds choosing dredged over hand-caught scallops. Add that to the list of promises Obama has broken.


    Cfb8

    sbobetasia
    S12888
    Sabung ayam
    Adu ayam
    Adu ayam bangkok
    Sabung ayam bangkok
    Sabung ayam taji
    Sabung ayam online

    ReplyDelete
  19. Love to read it,Waiting For More new Update and I Already Read your Recent Post its Great Thanks.
    game android terbaik
    game pc terbaik

    ReplyDelete
  20. I know this is quality based blogs along with other stuff.
    Bay Area design firm

    ReplyDelete
  21. This Software is easily understandable. I like it. All information are valuable for us. keep shearing!
    See Hear

    ReplyDelete
  22. First time I visit your website and I really impressed by your writing. Must say you have great writing skill. I will visit again you website for new updates. You can also visit my website, if you need any technical help to resolve your email related issue like how to Change AT&T Password.

    ReplyDelete