Templates II: Electric Boogaloo

Last time on this adventure writing the Template Toolkit language in Perl 6, we’d just created a small test suite that encompasses some of the problems we’re going to encounter. It’s no use without a grammar and a bunch of other parts, but it does give us an idea of what it’s going to look like.

use Test;
use Template::Toolkit::Grammar;
use Template::Toolkit::Actions;

# ... similar lines above this
is-deeply the-tree( 'xx[% name %]x' ),
    [ 'a', 'a', Directive.new( :content( 'name' ) ), 'a', ];
# ... and similar lines below this.

The list here is what we’re going to return to render(), and I’d love to make that as simple as it can be without being too simple. Let’s focus for the moment just on one bit of the test suite here, the array I’m getting back.

[ 'a', 'a', Directive.new( :content( 'name' ) ), 'a', ];

If these elements were all strings, then all render() would have to do is join the strings together, simples!

method render( Str $text ) returns Str {
  my @terms = # magic to turn text into array of terms
  @terms.join: '';

Let’s create the ‘Directive’ class and see what happens, though.

class Directive { has $.content }

my @terms = 'a', 'a', Directive.new( :content( 'name' ) ), 'a';
say @terms.join: '';
# aaDirective<94444485232315>a

Whoops, that’s not what we want. Not bad exactly, but not what we want, either. Well, not to fear. Remember that in Template Toolkit, directives will always return a string. It may be an empty string, but they’ll always return some kind of string.

As a side note, this may not always be true – some directives will even tell the renderer to stop parsing entirely. But it’s a pretty solid starting assumption. For instance, we could say that encountering the STOP directive just makes all future directives return ”.

Of course, I’m harping on the term ‘string’ for a reason. Internally, everything is an object, and every object has a method that returns a readable value. Our Directive class didn’t specify one, so we get the default that returns ‘$name<$address>’.

So, let’s supply our own method.

class Directive { has $.content; method Str { $.content } }

my @terms = 'a', 'a', Directive.new( :content( 'name' ) ), 'a';
say @terms.join: ', ';
# a, a, name, a

There. If we supply a .Str method we can make Directives do what we want. INCLUDE directives would open the file, slurp the contents and return them. Argument directives would take their argument name, look up the value, and return that. Or, more likely, would have a context object passed that does the lookup for them.

Where do we go from here?

Next time we’ll convince Grammars and Actions to work together, making processing a template as simple as:

parse-template( $text ).join( '' );

Next in this series on writing your own template language using Perl 6, you should be able to define your own Template Toolkit directives and have them return the pre-processed text. We’ll add support for context and the ability to do simple ‘[% name %]’ tags, and maybe explore how to change ‘[%’..’%]’ tags on-the-fly.

Thank you again, dear reader, for your interest, comments and critiques.

A Regex amuse-bouche

Before continuing with the Template series, I thought I’d talk briefly about an interesting (well, at least to me) solution to a little problem. System and user libraries (the kind that end in .so or .a, not Perl libraries) have a section at the top that maps a function name (‘load_user’ or whatever) to an offset into the library, say, 0x193a.

This arrangement worked fine for many years for C, Algol, FORTRAN and most other languages out there. But then along came languages that upset the apple cart, like C++ and Smalltalk, where a programmer could write two ‘load_user’ functions, call ‘load_user(1234)’ or ‘load_user(“Smith, John”)’ and expect the linker to load the right version of ‘load_user.’

The problem here is that the library, the linker and all of the other programs in the tool chain expect there to only be one function called ‘load_user’ in any given library.

Those of us that do Perl 5 and Perl 6 programming don’t have to worry about this, but if you ever want to link to a C++ library, you probably should know at least a bit about “name mangling.”

For a while, utilities like ‘CFront’ for the Macintosh (which the author actually filed bug reports on) were used to “rename” functions like ‘load_user(int)’ and ‘load_user(char*)’ to ‘i_load_user’ and ‘cs_load_user’ before being added to the library, and other tools to do the reverse.

Has Your Mother Sold Her Mangle?

Eventually things settled down, and this process of changing names to fit into the library was “baked in” to the tool chains. Not consistently, of course, couldn’t have that. But conventions arose and even today Wikipedia lists at least 12 different ways to “mangle” ‘void h(void)’ into the existing library formats.

We’ll just look at the first one, ‘_Z1hv’. The ‘_Z’ can be safely ignored, its purpose there is mainly to tell the linker something “special” is going on. ‘1h’ is the function name, and ‘v’ is its first (and only) parameter. Suppose, then, that you were tasked with writing a tool that undid this name mangling.

Your first cut at extracting something useful might look something like

'_Z9load_useri' ~~ m{ ^ '_Z' \d+ (\w+) (.) $ };

Assuming $mangle-me has ‘_Z9load_useri’ in it (The mangled version of ‘void load_user(int)’) the regex engine goes through a bunch of simple steps.

  • Read and ignore ‘_Z’
  • Read and ignore ‘9’
  • Capture ‘load_user’ into $0
  • Capture ‘i’ into $1
  • There is no fifth thing.

But the person that wrote this library is playing silly buggers with someone (obviously us in this case) and there’s also a ‘_Z9load_userss’ which comes out of the other end of the mangle looking like ‘void load_user(char*, char*)’, loading a user with first and last names.

Now we’re in a bit of a quandary. Run the same expression and see what happens:

'_Z9load_userss' ~~ m{ ^ '_Z' \d+ (\w+) (.) $ };

Sure enough, $1 is ‘s’, just as we wanted it, but what about $0? It’s now ‘load_users’, which… y’know, looks too legit to quit. But we must. And now we’re faced with the quandary. Do we make the first parameter an optional capture? ‘m{ … (.)? (.) $ }’ like so?

No, that would capture the ‘r’ of ‘_Z9load_users’. There must be something else in the name that we’re overlooking, some clue… Aha! ‘load_user’ has 9 characters, and look just before it, we’ve got the number 9! Surely that tells us the number of characters in the function name! (and thankfully it actually does.)

Regexes 201

Now, how can we use this to our advantage? First things first, let’s get rid of some dead weight. We don’t care (for the moment) about parameters, so let’s just match the name and number of characters. And because we’re getting all serious up in here, let’s create a quick test.

use Test;
'_Z9load_user' ~~ m{ ^ '_Z' (\d+) (\w+) };
is $0, '9';
is $1, 'load_user';

Run the test script, see if it passes, I’m sure you know the drill. Go ahead and copy that, I’ll wait. Okay, the tests pass, so it’s time to play. I usually am working in a library that’s in git, so I’m usually on the “edit, run tests, git reset, edit…” treadmill by this point.

So… How do we make use of this number? Well, let’s pull up the Regexes page over at docs.perl.org and look around. Back in Perl 5 there used to be this feature ‘m{ a{5} }x’ that matched just 5 copies of whatever it was in front of, that might be a good place to start looking.

That’s now morphed into ‘m{ a ** 5 }’. Great, so let’s replace 5 with $0 and go for it.

'_Z9load_user' ~~ m{ ^ '_Z' (\d+) (\w ** $0) };

“Quantifier quantifies nothing…” That’s weird. $0 is right there, staring me in the face. Maybe I just got the syntax wrong somehow?

'_Z9load_user' ~~ m{ ^ '_Z' (\d+) (\w ** 9) };

Nope, that works. What’s going on here? $0 is defined… Wait, it’s a variable inside a regex, that used to require the ‘e’ modifier, didn’t it? Or something like that… <read the manpage, scratch head… nothing there> Hm. Are we at a dead end?

Kick it up a notch

No, we just need to remember about how string interpolation works. In Perl 6, “Hello, {$name}!” is a perfectly fine way to interpolate variables into your expression, and it works because no matter where it is, {} signals a code block. Let’s try that, surround $0 with braces.

'_Z9load_user' ~~ m{ ^ '_Z' (\d+) (\w ** {$0}) };

Weird. This time the test failed with ” instead of ‘load_user’. Maybe $0 really isn’t defined? Now that it’s just regular Perl code, let’s check.

'_Z9load_user' ~~ m{ ^ '_Z' (\d+) (\w ** {warn "Got '$0'"; $0}) };

“Use of Nil in string context.” So it’s really empty. Now, we have to really do some reading. Looking at the section on general quantifiers says “only basic literal syntax for the right-hand side of the quantifier [what we want to play with] is supported,” so it looks like we’re at a dead end.

But things like ‘{$0}’ do work, so we can use variables. That means that my problem isn’t that the variable is being ignored, it’s just not being populated when I need it. Let’s look at the section on Capture numbers to see when they get populated.

Aha, you need to “publish” the capture using ‘{}’ right after it. Let’s see if that works…

'_Z9load_user' ~~ m{ ^ '_Z' (\d+) {} (\w ** {warn "Got '$0'"; $0}) };

Nope, something else is going on. And the next block down tells us the final solution – ‘:my’. This lets us create a variable inside the scope of the regular expression and use it as well, so let’s do just that.

'_Z9load_user' ~~ m{ ^ '_Z'
                     :my $length;          # Put $length in the proper scope
                     (\d+) {$length = +$0} # Capture the length
                     (\w ** {$length})     # And extract that many chars.

And reformat things just a wee bit so we’ve got some room to work with. Now the test actually runs, and reads only as many characters of the function name as needs be.

And just one more thing…

It’s not just function names that follow this pattern, it’s also namespaces, and any special types that the function might use as parameters, so let’s package this up into something more useful.

my regexp pascalish-string {
  :my $length;
  (\d+) {$length = +$0}
  (\w ** {$length})
'_Z9load_user' ~~ m{ ^ '_Z' <pascalish-string> };
is $/<pascalish-string>[0], 9;
is $/<pascalish-string>[1], 'load_user';

Pascal implementations were done back when RAM was at more of a premium, and stored a string like ‘load_user’ as ‘\x{09}load_user’ so the compiler knew how many bytes were available immediately rather than having to guess. It was limiting, but this was on computers like the early Macs (we’re talking pre-OS X, for that matter pre-System 7, for those of you that remember that far back.)

So we can use this <pascalish-string> regular expression anywhere we want to match one of our counted terms. Because we’re using ‘my’ inside a regular expression nested inside another regular expression inside a burrito wrapped in an enigma, there are no scoping troubles.

There are probably other ways of doing this, and I would love to see them. If you do come up with a better way to solve this, let me know in the comments and I’ll work your solution into an upcoming article.

As usual, gentle reader, thank you for your time and attention, and if you have any comments, questions, clarifications or criticisms (constructive, please) let me know.

Templates and a Clean Start

Before I get into the meat of the topic, which will eventually lead to a self-modifying grammar (yes, you heard me, self-modifying…) I have a confession to make, in that a series of articles on the old site may have led people astray. I wrote that series thinking to make parsing things where no grammar existed easier.

It may have backfired. So, as a penance, I’m simultaneously pointing theperlfisher.{com,net} to this new site, and starting a new series of articles on Perl 6 programming with a different approach. This time I’ll be incorporating more of my thoughts and what hopefully will be a different approach.

Begin as you mean to go on.

I would love to dump the CMS I’m currently using for something written in Perl 6. Among the many challenges that presents is displaying HTML, and to paraphrase Clint Eastwood, I do know my limitations. So, I don’t want to write HTML. Ideally, not ever.

So, that means steal borrowing HTML from other sites and making it my own. Since those are usually Perl 5 sites, that means dealing with Template Toolkit. And already I can hear some of you screaming “Perl 6 already handles everything TT used to! Just use interpolated here-docs!”

And, for the most part, you’re absolutely correct. Instead of the clunky ‘[% variable_name %]’ notation you can use clean inline interpolation with ‘{$variable-name}’, and being able to insert blocks of code inline means you don’t have to go through many of the hoops that you’re required to jump through with Template Toolkit.

That’s all absolutely true, and I hope to be able to use all of those features and more in the final CMS, whatever that happens to be. This approach ignores the fact that most HTML out there is written with Template Toolkit, and that rewriting HTML, even if it’s just a few tiny tags, is an investment of time that could be better done elsewhere.

If only there were Template Toolkit for Perl 6…

Let’s dive in!

If you’re not familiar with Template Toolkit, it’s a fairly lightweight programming language for writing HTML templates, among others. Please don’t confuse it with a markup language, designed to be rendered into HTML. This is a language that lets you combine your own code with a template and generate dynamic displays.

<h1>Hello, [% name %]!</h1>

That is a simple bit of Template Toolkit. Doesn’t look like much, does it? It’s obviously a fragment of a proper HTML document because there’s no ‘<html>’..'</html>’ bracketing it, and obviously whatever’s between ‘[%’ and ‘%]’ is being treated specially. In this case, it’s being rendered by an engine that fills in the name, maybe something like…

$tt.render( 'hello.tt', :name( 'Jeff' ) );

where hello.tt is the name of the template file containing the previous code, and ‘Jeff’ is the name we want to substitute. We’ve got a lot of work to go through before we can get there, though. If you’ve read previous articles of mine on the subject, please try to ignore what I’ve said there.

Off the Deep End

First things first, we need a package to work in. For this, I generally rely on App::Mi6 to do the hard work for me. Start by installing the package with zef, and then we’ll get down to business. (It should be installed by default, if you’re still using rakudobrew please don’t.)

$ zef install App::Mi6
{a bit of noise}
$ mi6 new Template::Toolkit
Successfully created Template-Toolkit
$ cd Template-Toolkit

Ultimately, we want this test (in t/01-basic.t – go ahead and add it) to pass:

use Test;
use Template::Toolkit;
my $tt = Template::Toolkit.new;
is $tt.render( 'hello.tt', :name( 'Jeff' ) ), '<h1>Hello, Jeff!</h1>';

It’ll fail (and miserably, at that) but at least it’ll give us a goal. Also it should give us an idea of how others will use our API. Let’s think about that for a few moments, just to make sure we’re not painting ourselves into any obvious corners.

In order to be useful, our module has to parse Perl 5 Template Toolkit files, and process them in a way that’s useful in Perl 6. Certain things will go by the wayside, to be sure, but the core will be a module that lets us load, maybe compile, and fill in a template.

Hrm, I just said ‘fill in’ rather than ‘render’, what I said above. Should I change the method name? No, not really, the new module will still do what the Perl 5 code used to, it just won’t do it using Perl 5, so some of the old conventions won’t work. Let’s leave that decision for now, and go on.

Retrograde is all the rage

Let’s apply some basic retrograde logic to what we’ve got here, given what we know of Perl 6 tools. In order to get the string ‘<h1>Hello, Jeff!</h1>’ from ‘<h1>Hello, [% name %]!</h1>’, we need a lot of mechanics at work.

At first glance, it seems pretty obvious that ‘[% name %]’ is a substitution marker, so let’s just do a quick regexp like this:

$text ~~ s:g{ '[%' (\w+) '%]' } = %args{$0};

That should replace every marker in the text with something from an %arguments hash that render() supplies to us. End of column, end of story. But not so fast, if all Template Toolkit supplied to us was the ability to substitute values for keys, then … there’s really no need for the module. And in fact, if you look at the docs, it can do many more things for us.

For example, ‘[% INCLUDE %]’ lets us include other template files in our own, ‘[% IF %]’ .. ‘[% END %]’ lets us do things conditionally, and a whole host of other “directives” are available. But you’ll see here the one thing they have in common is they all start with ‘[%’ and end with ‘%]’.

Hold the phone

That isn’t entirely true, and in fact there’s going to be another article in the series about that. But it’s a good starting point. We may not know much about what the language itself looks like, but I can tell you that tags are balanced, not nested, and every ‘[%’ opening tag has a ‘%]’ tag that closes it.

I’ll also point out that directives ( ‘[% foo %]’ ) can occur one after another without any intervening white space, and may not occur at all. So already some special cases are starting to creep in.

In fact, let’s put this in as a separate test file entirely. So separate that we’re going to put it in a nested directory, in fact. Let’s open t/parser/01-basic.t and add this set of tests:

use Test;
use Template::Toolkit::Parser;

my $p = Template::Toolkit::Parser.new;

0000, AAAA
0001, AAAB
0010, AABA
0011, AABB
0100, ABAA
0101, ABAB
... # and so on up to
1110, BBBA
1111, BBBB

Now just HOLD THE PHONE here… we’re testing directives for Template Toolkit, not binary numbers, and whatever that other column is! Well, that’s true. We want to test text and directives, and make sure that we can get back text when we want it, and directives when we want them.

At first blush you might think it’s just enough to make sure that ‘<h1> Hello,’ is parsed as text, and that ‘[% name %]’ is parsed as a directive, and just leave it at that. But those of you that have worked with regular expressions for a while might wonder how ‘[% name %][% other %]’ gets parsed… does it end at the first ‘%]’, or continue on to the next one?

And what about text mixed with directives? Leading? Trailing text? Wow, a lot of combinations. In fact, if you wanted to be thorough, it wouldn’t hurt to cover all possible combinations of text and directives up to… say, 4 in a row.

Let’s call text ‘T’, and directives ‘D’. I’ve got 4 slots, and only two choices for each. Filling the first slot gives me ‘T_ _ _’ and ‘D_ _ _’, for two choices. I can fill the next slot with ‘T T _ _’, ‘T D _ _’, ‘D T _ _’, and ‘D D _ _’, and I think you can see where we’re going with this.

In fact, replace T with 0 and D with 1, and you’ve got the binary numbers from 0000 to 1111. So, let’s take advantage of this fact, and do some clever editing in our editor of choice:

0010, 0010                            =>
is-deeply the-tree( '0010, AABA       =>
is-deeply the-tree( '0010' ), [ AABA  =>
is-deeply the-tree( '0010' ), [ AABA ];

A few quick search-and-replace commands should get you from the first line to the last line. Now it’s looking more like a Perl 6 test, right? We’re not quite there yet, ‘0010’ still doesn’t look like a string of text and directives, and what’s this AABA thing? One more search-and-replace pass, this time global, should solve that.

is-deeply the-tree( '0010' ), [ AABA ]; =>
is-deeply the-tree( 'xx1x' ), [ AABA ]; =>
is-deeply the-tree( 'xx[% name %]x' ), [ AABA ]; =>
is-deeply the-tree( 'xx[% name %]x' ), [ 'a', 'a', B'a', ]; =>
is-deeply the-tree( 'xx[% name %]x' ),
          [ 'a', 'a', B'a', ]; =>
is-deeply the-tree( 'xx[% name %]x' ),
    [ 'a', 'a', Directive.new( :content( 'name' ) ), 'a', ];

Starting out with the padded binary numbers covers every combination of text and directive possible (at least 4 long). A clever bit of search-and-replace in your favorite editor gives us a working set of test cases that check a set of “real-world” strings, and a file you can almost run. Next time we’ll fill in the details, and get from zero to a minimal (albeit working) Template Toolkit implementation.

As always, dear reader, feel free to post whatever comments, questions, and/or suggestions that you may have, including ideas for future articles. I read and respond to every comment, and thank you for your time.

Logic Programming in Perl 6

This is a small example of conference-driven development. I’m sitting in the board room at TPCiP – TCP in Pittsburgh surrounded by people doing both Perl 5 and Perl 6 programming, and decided to look again at Picat, working on some simple examples. I was thinking that I might be able to translate some of the simpler backtracking examples from Picat to Perl 6, and here’s a simple example.

First the Picat code:

fib(0,F) => F=1.
fib(1,F) => F=1.
fib(N,F),N>1 => fib(N-1,F1),fib(N-2,F2),F=F1+F2.

Now here’s my equivalent Perl 6 code:

multi fib( 0, $F is rw ) { $F = 1 }
multi fib( 1, $F is rw ) { $F = 1 }
multi fib( $N is rw where * > 1, $F is rw ) {
  my ( $F1, $F2 ) = 0, 0;
  my $N1 = $N - 1;
  my $N2 = $N - 2;
  fib( $N1, $F1 ) && fib( $N2, $F2 ) && $F = $F1 + $F2

The Perl 6 version is slightly larger because I need to declare some variables that Picat would ordinarily declare for me ($F1, $F2). There may be a way to work around declaring ($N1, $N2), but otherwise the two versions are identical.

How does it work?

You’ve probably guessed based on the inputs that N is index of the Fibonacci number, and F is the Fibonacci number itself. Picat doesn’t require you to declare variables, so you could ask it for the 7th Fibonacci number by calling fib(7,F) and looking at F.

my $Fib = 0;
say $Fib     # 21

Or you could do the above in Perl 6, letting the code populate $Fib for you. This code relies on the fact that Perl 6 lets you dispatch not just on signatures, not just on argument types, but on values. Look at the base case above:

multi fib( 0, $F is rw ) { ... }

fib(…) is the function signature, and this function will get called whenever the first argument is 0, like so: fib(0, $Fib). This happens even if ‘multi fib( $N, $F )’ is the one doing the calling, everything gets run through the same dispatcher each time.

So fib(2, $Fib) calls fib(1, $Fib) which calls ‘multi fib( 1, $F )’ and gives us a base case, for example. This lets the higher-order functions call our base cases, and still get the right value.

What are we missing?

Well, the Picat code can do something the Perl 6 code can’t, at least for the moment, and this is what I want to spend some time working on. In Picat, I can call ‘fib(6,F)’ and F will be 13 when the code is done. This works in Perl 6 too.

But Picat will also let you call ‘fib(N,21)’ and N will be 7 when the calculation is finished. Take some time to let that settle. Yes, you can run the calculation both forward and backward. Give N a value, and F will be the Nth Fibonacci number. Give F a value, and it will tell you what N is.

In fact, Picat will go one step further. If you don’t specify a value for either parameter but just specify variables, like ‘fib(N,F)’, then it will generate all the Fibonacci numbers and their indexes until you tell it to stop.

This is because of the backtracking engine that it uses, which I want to see if I can mimick. ‘F=F1+F2’ doesn’t mean “Assign the sum of F1 and F2 to F”. Instead, “If any values are missing, find values that satisfy the equation, and keep generating them until you run out of possibilities.”

That’s a bit of a mouthful, so let’s look at just F1. Supposing F=8 and F2=5, the backtracking engine would search all values of F1, and return just the matching value of 3. Now of course, it can’t search all values, because that means you’d be waiting forever, so there are pruning algorithms at work here.

But the same logic can work with any combination of arguments, so if both F1 and F2 were missing, then the backtracking engine would run through all possible combinations of values (pruned appropriately) until it found a combination that would work.

In this case, since in our example F=8, it would return a bunch of combinations, starting with (F1=1, F2=7), (F1=2, F2=6) and so on. But why, then, you ask, does it only return (F1=3, F2=5)? That’s because each value F1 also has to satisfy fib(N1,F1), which means that F1 has to be a Fibonacci number, as does F2.


This is the part where Perl 6 breaks down a little bit. But what I think I might be able to do is use a trick I used a while ago, relying on the fact that operators are just functions, and they dispatch just like other functions. So I should be able to start out with something crude like:

my $F = Operator.new( :lhs(3) );
my Value ($F1, $F2);
$F = $F1 + $F2;

This way both $F1 and $F2 are bound to backtracking Values, they return a Operator, and the Operator is part of the backtracking engine. This way once the Operator engine determines the range of possible combinations of $F1 and $F2 that add to 3, it can assign them concurrently to @F1.value and @F2.value.

Smooth Operators

The Operator and Value classes, along with their overloaded operators, would look something like this:

class Operator {
  has ( $.lhs, $.rhs ); }
class PlusOperator is Operator { }
class AssignmentOperator is Operator {
  method make-combinations() {...} }
class Value {
  has @.value }
multi infix:<=>( Operator $lhs, Operator $rhs ) {
  AssignmentOperator.new( :$lhs, :$rhs );
multi infix:<+>( Value $lhs, Value $rhs ) {
  PlusOperator.new( :$lhs, :$rhs );

This is purely a sketch that I haven’t tried out at all. My idea here is that once you’ve executed ‘$F = $F1 + $F2’, $F will be an AssignmentOperator instance. You should be able to call $F.make-combinations() that will solve the equation ‘3 = $F1 + $F2’ for all (constrained) values of $F1 and $F2.

That would populate @F1.values and @F2.values with (1,2) and (2,1) respectively. I’m about to play my first game of Azul, so I’ll leave the article here. The next article will hopefully implement this so you can see this all working. It won’t quite be a true backtracking engine, but it’s a start.

Dear Reader, thank you for your attention, and please feel free to add comments, questions and suggestions.

Quantum Tunneling

Introducing the new Perl Fisher site

Before I get on to the meat of the article, welcome to the new home of The Perl Fisher. I intend to cover both Perl 5 and Perl 6 programming here, but it’ll be mostly Perl 6 content because that’s the language I find the most fun. Please excuse the dust, I’m still very much settling into the new home, and the overall look of the site is bound to change while I play with the new toys available to me.

Defeating Thanos with Perl 6

Don’t worry, no spoilers here. We’re just going to talk about a little-known feature of Perl, the quantum-tunneling variable type. If you’ve worked with Perl 6 for any length of time, you’ve probably seen or written a class declaration that looks something like below.

class Point2D {
  has Real $.x;
  has Real $.y;

While the word ‘has’ does the real work, second-sigil syndrome strikes as well, in the shape of the ‘.’ between the scalar sigil ‘$’ and the variable name. Here it’s syntactical sugar for being an attribute name, but we can enlarge that ‘.’ to a ‘*’ and open up a world of possibilities.

When we add the ‘*’ sigil to a variable name, we turn that variable into one that can quantum tunnel between scopes and solve problems that you probably used to do with a global variable. You can read more about dynamic variables and how they differ from ordinary globals at The_*_twigil at docs.perl6.org.

Testing, testing

I’m working on a project to try to augment the Perl 6 grammar debugger with an emulator. The tools we have on CPAN and modules.perl6.org respectively are wonderful, but they’re limited because Perl 6 compiles grammar rules down to single methods, which is wonderful for speed, but makes it almost impossible to look into.

The grammar I’m writing isn’t important at the moment, but the testing part is. Below is a sample subtest that I’m writing for each term of a grammar that’s probably going to have ~50 terms by the time I’m done.

subtest 'binary-number', {
  subtest 'failing', {
    ok fails( '0b', 'binary-number' );
    ok fails( '3g', 'binary-number' );

  is build-ast( '0101', 'binary-number' ), 5;

This tests the ‘binary-number’ rule to see if it properly fails on ‘0b’ and ‘3g’. ‘0b’ fails because it’s the prefix of a binary number, and ‘3g’ because neither 3 nor ‘g’ are binary digits. It also makes certain that ‘0101’ gets translated into the decimal number 5. All important when testing a grammar that parses … well, itself eventually. Oroborous redux, as it were.

Dry up, will you…

The test is simple, and straightforward. ‘0b’ should fail, ‘0101’ should be built into a node of an abstract syntax tree. But it’s got some flaws. It talks too much. See how ‘binary-number’ repeats itself? If I want to copy that, rename it to ‘hex-number’ and add a few changes, I have to copy the block, rename all the incidences of ‘binary-number’ to ‘hex-number’ and then fix the existing tests.

Thus I run the risk of forgetting to update the name ‘binary-number’. And there’s an even greater bugaboo there. If I don’t, the test won’t fail. Because the subtest doesn’t know that it’s supposed to be testing the ‘binary-number’ rule. There are a bunch of ways to solve this problem, of course, but for this post we’re going to use Ant-Man(tm).

Entering the Quantum Realm

I don’t want to do too much work here, I just want to get rid of the duplicate ‘binary-number’ entries. So, let’s take a look at what fails() does.

sub fails( Str $sample, Str $rule-name ) returns Bool {
  !?( $g.parse( $sample, :rule( $rule-name ) );

The ‘!?(…)’ casts $g.parse(…) to a Boolean and negates it, so if $g can’t parse the statement, it returns True. So, first let’s make $rule-name optional.

sub fails( Str $sample, Str $rule-name? ) returns Bool {
  !?( $g.parse( $sample, :rule( $rule-name ) );

Opening the wormhole

Now, we’re going to summon Ant-Man(tm). Remember earlier I mentioned that quantum variables use a wormhole? Well, we’re going to open one end of the wormhole right here in our fails() function, just like this.

sub fails( Str $sample, Str $rule-name? ) returns Bool {
  !?( $g.parse( $sample, :rule( $*ANT-MAN // $rule-name ) );

Rerun our tests, and … wait, they should fail, we haven’t declared $*ANT-MAN anywhere! Well, just like in quantum physics, $*ANT-MAN doesn’t have enough energy to tunnel over the quantum barrier because we haven’t defined him yet.

So let’s do that, but remember that $*ANT-MAN is a quantum variable, so he can tunnel through the quantum barrier of a function scope. In fact, he can tunnel through any number of them. So, let’s define a new version of subtest() that looks and acts like the old one first before we go boldly where no Perl 6 programmer has gone before.

sub Subtest( Str $rule-name, Block $test-code ) {
  subtest $rule-name, $test-code;

We should be able now to replace the outer subtest() block with our new Subtest() block, and it should act just as it used to.

Subtest 'binary-number', {
  subtest 'failing', {
    ok fail( '0b', 'binary-number' );

Tunneling through

Our test suite still works, and the output still is what we expect. Now, let’s give $*ANT-MAN enough energy to tunnel through the quantum barrier by defining him as the subtest name we want:

sub Subtest( Str $rule-name Block $test-code ) {
  my $*RULE-NAME = $rule-name;
  subtest $rule-name, $test-code;

And now run our test suite. Which… doesn’t change. Come to think of it, we don’t want it to change. If it did change, we’d have to go through and change all of our test suites, which would be bad. So, putting things together, this code works just fine.

sub fails( Str $sample, Str $rule-name? ) returns Bool {
  !?( $g.parse( $sample, :rule( $*ANT-MAN // $rule-name ) );
sub Subtest( Str $rule-name Block $test-code ) {
  my $*RULE-NAME = $rule-name;
  subtest $rule-name, $test-code;

Subtest 'binary-number', {
  subtest 'failing', {
    ok fails( '0b', 'binary-number' );

Notice by the way that $*ANT-MAN has tunneled through not one but two function signatures to get to where he is. And to prove it, finally, delete the inside ‘binary-number’.

Subtest 'binary-number', {
  subtest 'failing', {
    ok fails( '0b' );

So ‘binary-number’ gets passed along to $*ANT-MAN who jumps into the quantum realm, tunnels through the outer and inner pairs of braces, and finally lands in fails() where he passes the value on to the :rule() declaration. Whew, that’s a lot of work.

Oh, snap.

(sorry, couldn’t resist.) If you’re like me, and I know I am, you’ve probably come across a few cases where this technique would come in handy. Especially if you’re dealing with legacy code. Sometimes you need to add just one little flag to a function and set that flag in a top-level handler on one page of a website.

The catch is that between the lower level and the top level there’s a chain of 8 function calls where you have to add that as a parameter. Wouldn’t it be nice if there was a workaround? Well, in Perl 6 there is.

Thanks for getting all the way to the bottom of this, my inaugural article on Perl 6 on the next generation of the Perl Fisher website. Feel free to leave comments and constructive criticism in the comments section below, and come back every so often to watch the website grow over the coming months.

Spacing Out

After having had some comments about the grammar approach I’ve been using, I’ve started to rethink things. I may have isolated at least one problem people may have been having. I’m working on a grammar for a language called ‘picat’ – you can look up a quick explanation at picat.org.

It’s a constraint-based programming language that maps insanely well onto Perl 6. A fragment of the grammar I’m working on follows, done in a top-down fashion. The actual grammar rule <comment> isn’t the important thing, because this problem can occur with anything.

If you must know, it’s a C-style /* .. */ comment. Of course I ran the test to make sure this little block of code properly matched beforehand. This way I could go along making one small change at a time, simple because it’s fairly late at night and I’ve got a flight to catch tomorrow..


'go' '=>'

Breaking up is hard to do

The natural thing to do here is, of course, say to yourself “Hrm, I’ve got 3 <comment> comment blocks in a row. We all know there are only 3 important numbers in computer science, 0, 1, and Infinity. So 3 is wrong and should be replaced with <comment>+.”


'go' '=>'

I then rerun the test, because I’m sticking to my nighttime rule of “one change, one retest”, and to my horror it breaks. I’ve only changed one thing, but … why is it breaking? Surely <A>+ should at least match <A> <A> <A> … that’s how DFA equivalences work in finite automata.

That’s also one point where Perl 6 and traditional DFAs (Deterministic Finite Automata) part ways. After a few years of doing Perl 6 programming, I see Perl 6 as almost overly helpful. Tools like flex and bison made me think of grammars as something that belonged outside the language.

Where it all breaks down

Unfortunately modules like Grammar::Debugger, through no fault of their own, can’t quite help here. While it’s a great module to tell you what particular rule or token failed, the problem here is between the terms.

<A> {whitespace-optional} <A> is subtly different than <A>+ because <A> <A> lets the parser read whitespace between the two terms; <A>+ assumes the terms come one after the other, whitespace be darned.

So, the simplest solution I have to offer is to let the comment eat the whitespace after it as well, so you can insert your <comment> token anywhere you like and it’ll still eat the whitespace no matter how you add it.

Another solution proposed on Reddit would be to use <A>˽+, with a space between the closing ‘>’ and the modifier. Said user went beyond the call of duty and composed a “Seven stages of whitespace” post to make the point.

The <comment> token I have, like I said, is for C/C++ style “balanced” comments. Here they’re not balanced; /* This is a comment */ but this is not */, and/* This is a comment /* so is this */ this looks like it should but really isn’t. */

token comment
  '/*' .+? '*/' \s*

And all is well with the grammar. You can put this rule anywhere you like and it’ll behave whether you write <comment> <comment> or <comment>+. This little article was inspired by a Twitter user inspired after reading my first tutorial series. They got into the actual work of creating a grammar and problems started to happen.

Wrapping up

My original tutorial series was just that, a tutorial, I felt that getting too deep into the process interrupts the flow, so I didn’t talk about the work that went into it. Now that the series is pretty much done, I think it’ll be beneficial to talk about the actual problems of debugging one of these beasts.

And these thing can most definitely be beasts. Using my ANTLR4 to Perl 6 converter you can generate some incredibly huge grammars. But just generating them doesn’t necessarily mean they’ll compile, although a few do right out of the box, which I’m genuinely amazed at.

The full test suite actually chooses a few grammars, converts them to Perl 6, compiles them and tests against sample input. I’m not sure how faithful they are to the real grammar, but they work.

Perl 6 does amazing things with precompiling and JITing. Grammars and regular expressions are one of the hardest-working things in Perl 6, so they get compiled down to functions. This means I can’t step into them even inside NQP, the dark side of Perl 6.

I’ve got ideas, so I’m going to keep working on grammar stuff. That means when I run into problems, well, it’s time to write another article. So look forward to a new series. Likely with a prosaic name of “Perl 6 Grammars Debun^wDebugged” or something similar. Thank you again, dear reader. Comments, clarifications and questions are of course welcome.