4/28/2013

Arithmetic with Regexps

Let me start by telling that I'm addicted to the programming challenge site SPOJ, and I was very happy when they added a new section dedicated to code golf. However, it didn't take too long for me to realize that I can participate in the contest using Python or Haskell, but I could only compete by using Perl.

I was not fluent in Perl, so I bought the Camel Book and started to learn it. There's a secret for Perl being so good at code golf, and this secret is clever use of regexp. The regexps from Posix are just finite automata, but the Perl engine is much more powerful than that.

Naturally, I became tempted to check what are the computational limits of Perl's regexp. For example, can we do arithmetic with regexps? After thinking I while, I concluded that yes, we can do arithmetic with regexps!



Before showing how I did it, let me do a quick review of Perl. The only command I used is s///g (substitution with regexp). The first argument is a regexp that will be matched against your input string. The second is a string, possibly interpolated, that will substitute the match. The option g allows for multiple substitutions, starting at the point where the last one finished. For example, here's a regexp to translate Brazilian laughs for international laughs:

input: "kkkkkk goat kkkk yelling"
regexp: s/k+/LOL/g
output: "LOL goat LOL yelling"

Before doing any calculation, we also need to define a numeric representation. At school we learn how to do math using decimals, which have ten symbols, from 0 to 9. Later, computer guys learns binary, which has just two symbols, 0 and 1. This time I'll do math using unary, which has just one symbol: I (the symbol I was chosen because it resembles roman numbers).

In the unary system, the number is represented by repetitions of the unary symbol. For instance:

1  = I
4  = IIII
10 = IIIIIIIIII

Also, just to avoid big regexps, I'll restrict myself to operations defined over the positive integers (meaning both the arguments and the results must be positive integers).

Addition

s/\+|=//g

That's the easiest one, just remove the plus and equal signs. The result is the concatenation of the operands. Let's check how it would work with 4+7:

"IIII+IIIIIII="
s/\+|=//g

"IIIIIIIIIII"

As expected, the result is 11.

Subtraction

s/(I+)-\1=//g

In order to implement subtraction, we'll use a backreference. If a sequence is found in the right side, then you are free to remove this sequence and an equal one on the left side. All that is left is the characters that were on the left side but not on the right, which is the same as the subtraction. Let's check 7-4:

"IIIIIII-IIII="
s/(I+)-\1=//g

"III"

Multiplication

s/(I)(?=I*\*(I+))(\*I+=)?/$2/g

Multiplication is more complex, we need the operator (?=), used to perform positive lookahead: it will match, but it won't consume the input. As a result, the operator /g can make multiple passes over the same string, and each "I" found on the left side is replaced by the entire right side. Let's check a step-by-step example of 3*6 (I'm doing step-by-step here, but Perl will do it with a single call to the command):

"III*IIIIIII="
s/(I)(?=I*\*(I+))(\*I+=)?/$2/g

"IIIIII""II*IIIIIII="
s/(I)(?=I*\*(I+))(\*I+=)?/$2/g

"IIIIIIIIIIII""I*IIIIIII="
s/(I)(?=I*\*(I+))(\*I+=)?/$2/g

"IIIIIIIIIIIIIIIIII"

Division

s/(I+)(?=I*\/\1\=)(\/I+=)?/I/g

Division is similar to multiplication, but we use multiple subtractions instead of multiple additions. Let's check 12/4:

"IIIIIIIIIIII/IIII="
s/(I+)(?=I*\/\1\=)(\/I+=)?/I/g

"I""IIIIIIII/IIII="
s/(I+)(?=I*\/\1\=)(\/I+=)?/I/g

"II""IIII/IIII="
s/(I+)(?=I*\/\1\=)(\/I+=)?/I/g

"III"

GCD

s/gcd\((I+)\1*,\1+\)=/$1/g

Why stop at the four operations? We can also implement the gcd (greatest common divisor). In this case we'll ask for repeated backreferences, this will only match when both the left and right sides are a multiple of the backreference. Since the first group is greedy, it will try to make the longest valid match, and as a result the backreference will be the gcd of the two arguments. Let's check 15 and 9:

"gcd(IIIIIIIIIIIIIII,IIIIIIIII)="
s/gcd\((I+)\1*,\1+\)=/$1/g

"III"

Playing with regexps like this is fun, but in the real world never use this without the help of a responsible adult. :)

No comments:

Post a Comment