venerdì 19 maggio 2017

Arabic to Roman number converter in Perl 5

Puzzled by a post on the Planet KDE about GCompris and roman numbers, and needing an easy way to explain to my six-years old son about roman numbers, I thought it would be an easy task to make a simple program in Perl to convert an arabic number into its roman notation.
Not so easy, pal!
Well, it's not a problem about Perl, of course, rather I found it required a quite brainpower for me to write down rules to convert numbers, and I did not search for the web for a copy-and-paste alghoritm. Please note: if you need a rock-solid way to handle conversions, have a look at CPAN that is full of modules for this particular aim.
Here I'm going to discuss the solution I found and how I implemented it. It is not supposed to be the best one, or the faster one, it's just my solution from scratch.

The program

I split the problem of converting an arabic number into a roman one into three steps, with one dedicated subroutine for each step, so that the main loop reduces to something like the following:
say "$_ = " . $roman_string->( $reassemble->( $disassemble->( $_ ) ) )
              for ( 1..30 );
that produces the following output:
1 = I
2 = II
3 = III
4 = IV
5 = V
6 = VI
7 = VII
8 = VIII
9 = IX
10 = X
11 = XI
12 = XII
13 = XIII
14 = XIV
15 = XV
16 = XVI
17 = XVII
18 = XVIII
19 = XIX
20 = XX
21 = XXI
22 = XXII
23 = XXIII
24 = XXIV
25 = XXV
26 = XXVI
27 = XXVII
28 = XXVIII
29 = XXIX
30 = XXX
The steps must be read from the inner subroutine to the outer, of course, and therefore we have:
  • disassemble that translates an arabic number into roman basis, that is computes how many units, tens, hundreds and thousands are required. In this phase there is no application of roman rules, so numbers are decomposed into a linear string of letters. As an example the number 4 is translated into IIII, which is of course a non-existent roman number.
  • reassemble applies roman rules, in particular promoting numbers so that groups are translated, when needed, into higher order letters. For instance IIII is promoted into two groups: I and V.
  • roman_string compose the promoted groups into the final string. The main difficulty of this part is to understand when a letter has to be placed on the right (addition) or on the left (subtraction) of another letter. For instance, having the groups I and V the function must understand if the output have to be VI (6) or IV (4).
To speed up the writing of the code, I placed main roman letters and their correspondance with arabic numbers into a global hash:
my $roman = {
    1    => 'I',
    5    => 'V',
    10   => 'X',
    50   => 'L',
    100  => 'C',
    500  => 'D',
    1000 => 'M',
};
Each method references $roman when needing to convert from an arabic number to its roman letter. In order to allow method to cooperate together, they accept and return an hash keyed by a roman letter and the number of occurences such letter must appear in the final string. The following is an example of the hash for a few numbers:
# 4 (IV)
{ 'I' => 1, 'V' => 1 }
# 19 (XIX)
{ 'I' => 1, 'X' => 2 }
# 5 (V)
{ 'V' => 1 }
# 17 (XVII)
{ 'X' => 1, 'V' => 1, 'I' => 2 }

The disassemble function

The following is the code for the disassemble function, that accepts as only input the arabic number.
# Accepts the arabic number and provides an hash
# keyed by each letter, with the value of how many times
# such letter should be summed in order to obtain the
# starting number.
my $disassemble = sub{
    my ( $number ) = @_;
    my $items = {};

    # sort the keys, that are arabic thresolds, from
    # the greater to the smaller one
    for my $current_value ( sort { $b <=> $a } keys $roman->%* ){

        my $how_many = int( $number / $current_value );
        next unless ( $how_many );

        my $letter = $roman->%{ $current_value };
        $items->{ $letter } = $how_many;

        $number -= $current_value * $how_many;
    }

    return $items;
};
The first thing the method does it to create the hash $items that is what it will return to allow other methods to consume. Each key of the $roman hash is passed ordered by the bigger to the smaller (please note that sort has $b first!). In this way we can surely scompose the number from the thousands, hundreds, tens, and units in this exact order. The $how_many variable contains the integer part of each letter. For example the number 29 is processed as follows:
  1. 29 / 10 that drives $how_many to be 2 and the remaining to be a 9;
  2. 9 / 5 that makes $how_many to be 1 and the remaining to be a 4;
  3. 4 / 1 that makes $how_many to be 4 and there's nothing more to do.
At each step the roman letter and the $how_many value is inserted into the $items has, that in the above ecample becomes:
# 29 (XIX)
{ 'X' => 2,
  'V' => 1,
  'I' => 4
}

The reassemble method

The reassemble method takes as input the hash produced by disassemble and checks if any letter requires a promotion. Here it is the code:
# Accepts an hash with keys the letters and values the number
# of times each letter should appear.
# Traverse the hash from the smaller to the greater
# in order to "promote" smaller aggregates. For instance
# 'IIII' (4) is aggregated and therefore the hash is modified
# so there's only an 'I' and another 'V', in such case
# the quantity of the promoted letter is negative to indicate
# it has been promoted.
my $reassemble = sub{
    my ( $items ) = @_;
    my @sorted_thresolds = sort { $a <=> $b } keys $roman->%*;

    for ( my $i = 0; $i < @sorted_thresolds; $i++ ){
        my $current_value = $sorted_thresolds[ $i ];
        my $key      = $roman->%{ $current_value };
        my $how_many = $items->%{ $key };

        next unless ( $how_many );

        my $greater_value = ( $i + 1 > @sorted_thresolds ? 1000 : $sorted_thresolds[ $i + 1 ] );
        my $greater_key   = $roman->%{ $greater_value };


        my $need_to_promote = $how_many == 4
            || ( $greater_value / $current_value == $how_many );

        if ( $need_to_promote ){
            $items->{ $greater_key }++;
            $how_many = $greater_value - $how_many * $current_value;
            $items->{ $key } = $how_many * -1;

        }

    }

    return $items;
};
The promotion must be done from the smaller letter to the greater one, so this time the letters are walked in ascending order (i.e., sort has $a first!). Since to promote a letter I need to access the following one, I need a C-style for loop.
A letter requires to be promoted if its quantity is 4 or /it is 2 and the right bigger value is exactly the double of the current one~, that is while ( $greater_value / $current_value == $how_many ). This makes, for instance IIII to be promoted (the quantity is 4), and VV to be promoted into X (because the quantity is 2 and the X is exactly the double of V). The promotion manipulates the hash increasing by one the right bigger letter and leaving a single current letter. In order to flag the promoted letter, I decided to use a negative quantity (where the absolute value is the exact one).
So for instance, the 29 hash of the previous paragraph is passed as follows:
# input to the method
{ 'X' => 2,
  'V' => 1,
  'I' => 4
}


# first for step (I)
{ 'X' => 2,
  'V' => 2,
  'I' => -1  # promoted, keep 1 and increase 'V'
}

# second step (V)
{ 'X' => 3,
  'V' => 0,  # promoted, increase X by one
  'I' => -1
}
At the end of method we know the final string will be made by three X and one I, the point now is to understand how to render them in the correct order. This is the aim of the roman_string method.

The roman_string method

The method accepts the normalized hash (i.e., groups are already formed) and compose the final string placing letter on the left or the right of each other depending on their quantity. The following is the code of the method:
# Do the hard work of composing
# each group of letters in order to compose the roman string.
my $roman_string = sub {
    my ( $items ) = @_;
    my @chars;

    for my $current_value ( sort { $b <=> $a } keys $roman->%* ){
        my $letter   = $roman->%{ $current_value };
        my $how_many = $items->%{ $letter };

        next unless ( $how_many );


        if ( $how_many > 0 ){
            push @chars, $letter for ( 1 .. $how_many );
        }
        else{
            # this is promoted, so it has to be inserted as last-to-last
            # in the previous chain
            # example: @chars( X, X ) and here I've 'I' to make XIX (19)
            push @chars, ( $letter, pop @chars );
        }

    }

    return join "", @chars;
};
In order to be able to manipulate easily the final string, moving letters from left to right and vice-versa, I decided to place each single letter into the @chars array, that is then join -ed into a single string.
Let's suppose we need just to add letters: in this case we need to write letters from the greater to the smaller from left to right, and this is the order I traverse the letters of $roman (again, note that sort has $b first!). If the quantity of the letter is positive the letter has not been promoted and therefore it will not be placed to the left of another letter, so just insert into @chars the $letter for the $how_many quantity. On the other hand, if $how_many is negative, the letter has been promoted and therefore have to be printed on the left of the last printed letter. This is as easy as doing:
push @chars, ( $letter, pop @chars );
that inserts into @chars the $letter and the previous last character that has been removed via pop.
With regards to the previous example of 29 we have that:
# method input
{ 'X' => 3,
  'I' => -1
}

# first step: prints X
# with quantity 3 (not promoted)
@chars = ( 'X', 'X', 'X' );

# second step: prints I
# that has been promoted
# and must be inserted ascending
# as last-to-last
@chars = ( 'X', 'X' ,
           ( 'I', # $letter
             'X'  # pop @chars
         ) );

Conclusions

Well, it has been much code that I expected to write. Using an object notation, instead of plain hashes, could surely make the program more robust. I'm pretty sure there's a way to shrink the code down and to avoid that ugly C-style for loop, as well as the promotion part could be simplified keeping in mind that it often reduces to -1 for the current letter and +1 for the greater one. Anyway, it does what I need and seems correct!

2 commenti:

brian d foy ha detto...

You might also see my Roman::Unicode module.

Luca Ferrari ha detto...

Thank you brin, very interesting! I'm thinking you probably have written any kind of module I can guess of...