Slide 6: Mechanism, and madness?

Parsing perl, and unparsing it again

When you run a Perl program, it goes through two stages. First it's compiled into an optree, then the optree is executed. Here's a program with its optree:

print "Hello, world!\n";
my $x = 2;
$x = $x + 21;

while ($x) { print $x-- }
print $/;

compiles to...

1s <@> leave[t1] vKP/REFC ->(end)
z     <0> enter ->10
10    <;> nextstate(main 7 dem.pl:1) v ->11
13    <@> print vK ->14
11       <0> pushmark s ->12
12       <$> const(PV "Hello, world!\n") s ->13
14    <;> nextstate(main 7 dem.pl:2) v ->15
17    <2> sassign vKS/2 ->18
15       <$> const(IV 2) s ->16
16       <0> padsv[$x:7,end] sRM*/LVINTRO ->17
18    <;> nextstate(main 8 dem.pl:3) v ->19
1b    <2> add[$x:7,end] sK/TARGMY,2 ->1c
19       <0> padsv[$x:7,end] s ->1a
1a       <$> const(IV 21) s ->1b
1c    <;> nextstate(main 9 dem.pl:5) v ->1d
1n    <2> leaveloop vK/2 ->1o
1d       <{> enterloop(next->1j last->1n redo->1e) v ->1l
-        <1> null vK/1 ->1n
1m          <|> and(other->1e) vK/1 ->1n
1l             <0> padsv[$x:7,end] s ->1m
-              <@> lineseq vK ->-
1e                <;> nextstate(main 8 dem.pl:5) v ->1f
1i                <@> print vK ->1j
1f                   <0> pushmark s ->1g
1h                   <1> postdec[t4] sK/1 ->1i
1g                      <0> padsv[$x:7,end] sRM ->1h
1j                <0> unstack v ->1k
1k                <;> nextstate(main 9 dem.pl:5) v ->1l
1o    <;> nextstate(main 10 dem.pl:6) v ->1p
1r    <@> print vK ->1s
1p       <0> pushmark s ->1q
-        <1> ex-rv2sv sK/1 ->1r
1q          <$> gvsv(*/) s ->1r

A compiler backend is a module which goes over the optree after it's been created (but before it's executed) and turns it into something else. B::C turns it into a C program, B::Bytecode serialises the optree onto disk, in a recoverable way. B::Graph draws a graph of the structure. And there are a few like B::Concise, which turn it into (almost) human-readable text.
But my favourite is B::Deparse, which turns the optree back into a Perl program. It sounds insane, but it can be useful for debugging. (It can also be used to serialise code in a human-readable way.)
It takes a lot of work to write a useful compiler backend, because you have to deal with all of perl's bizarre little idiosyncrasies. The optree can contain some very strange optimisations!
I plan to write a new compiler backend (perhaps derived from B::Deparse) which converts the optree into a syntax tree that represents your program in a structured way.
There'll be a simple mechanism for turning this tree back into Perl code, but you'll be able to over-ride parts of it; so you can turn it back into different code.

So here's an imaginary Symbol::Approx::Scalar module, using this imaginary module, which I'll pretend is called B::Parse:

use O 'Parse';

eval B::Parse::as_code(
	'scalar_variable' => sub {
		my $node = shift();
		return "__find_variable(",B::Parse::quote($node->name),")";
	}
);

exit;