perl.gg / snippets

String Scanner

2024-07-13

Ruby has a lovely StringScanner class for lexical scanning. Today we're building our own in Perl, using the power of closures to maintain state. It's a great exercise in functional programming concepts!

Part 1: THE CONCEPT

A string scanner lets you work through a string piece by piece, keeping track of where you are. Think of it like a cursor moving through text. You can:
- Check your current position - Move the position around - Check if you've reached the end - Find patterns and advance past them
The Ruby version is object-oriented. Our Perl version uses closures - functions that remember their environment. Same power, different flavor.

Part 2: THE STRUCTURE

Our scanner returns a hashref of closures. Each closure shares access to the same internal state:
sub create_scanner { my ($string) = @_; my $pos = 0; # This state is shared by all closures return { pos => sub { ... }, mod_pos => sub { ... }, eos_check => sub { ... }, find => sub { ... }, }; }
When you call create_scanner, you get back a hashref with four function references. They all close over $pos and $string, sharing that state.

Part 3: THE POSITION CLOSURES

First, the simple ones - getting and modifying position:
pos => sub { return $pos; }, mod_pos => sub { my ($delta) = @_; $pos += $delta; $pos = 0 if $pos < 0; $pos = length($string) if $pos > length($string); return $pos; },
The pos closure just returns current position. The mod_pos closure adjusts it by a delta (positive or negative) and clamps it to valid bounds. You can't go before the start or past the end.

Part 4: END-OF-STRING CHECK

Simple but essential:
eos_check => sub { return $pos >= length($string); },
Returns true when we've scanned to the end. Useful for loop conditions.

Part 5: THE FIND CLOSURE - WHERE THE MAGIC HAPPENS

This is the heart of the scanner:
find => sub { my ($pattern) = @_; my $remainder = substr($string, $pos); if ($remainder =~ /$pattern/) { my $match_start = $-[0]; # Position where match began my $match = $&amp;; # The matched text my $match_len = length($match); $pos += $match_start + $match_len; # Advance past match return { match => $match, start => $match_start, length => $match_len, new_pos => $pos, }; } return undef; # No match found },

Part 6: THE SPECIAL VARIABLE $-[0]

Here's a gem many Perl programmers don't know about: $-[0]

After a successful regex match, $-[0] contains the offset where the match started within the string. It's the start position of $& (the full match).

There's also $+[0] which is where the match ended. Together:

$-[0] # Match start position $+[0] # Match end position (one past last character) $& # The matched text itself
If you have capture groups, $-[1], $-[2], etc. give you their positions.

Part 7: PUTTING IT ALL TOGETHER

Here's the complete scanner:
sub create_scanner { my ($string) = @_; my $pos = 0; return { pos => sub { return $pos; }, mod_pos => sub { my ($delta) = @_; $pos += $delta; $pos = 0 if $pos < 0; $pos = length($string) if $pos > length($string); return $pos; }, eos_check => sub { return $pos >= length($string); }, find => sub { my ($pattern) = @_; my $remainder = substr($string, $pos); if ($remainder =~ /$pattern/) { my $match_start = $-[0]; my $match = $&amp;; my $match_len = length($match); $pos += $match_start + $match_len; return { match => $match, start => $match_start, length => $match_len, new_pos => $pos, }; } return undef; }, }; }

Part 8: USING THE SCANNER

Here's how you'd use it to tokenize a simple expression:
my $scanner = create_scanner("foo = 42 + bar"); while (!$scanner->{eos_check}->()) { # Skip whitespace $scanner->{find}->('\s+'); # Try to match a word if (my $result = $scanner->{find}->('\w+')) { print "WORD: $result->{match}\n"; } # Try to match an operator elsif ($result = $scanner->{find}->('[=+\-*/]')) { print "OP: $result->{match}\n"; } }
Output:
WORD: foo OP: = WORD: 42 OP: + WORD: bar

Part 9: WHY CLOSURES?

You might wonder why we don't just use a blessed object. Closures offer:
1. No class boilerplate needed 2. True encapsulation - $pos can't be accessed directly 3. Lighter weight than full OO 4. A great way to learn functional programming concepts
Plus, there's something elegant about functions that carry their own state. It's a different way of thinking that will make you a better programmer.

This pattern - returning a hash of closures - is incredibly useful. You can build state machines, iterators, parsers, and more. The string scanner is just the beginning.

Happy scanning!

Created By: Wildcard Wizard. Copyright 2026