String Scanner
Ruby has a lovely StringScanner class for lexical scanning. Today we're building our own in Perl, using the power of closures to maintain state. It's a great exercise in functional programming concepts!Part 1: THE CONCEPT
A string scanner lets you work through a string piece by piece, keeping track of where you are. Think of it like a cursor moving through text. You can:The Ruby version is object-oriented. Our Perl version uses closures - functions that remember their environment. Same power, different flavor.- Check your current position - Move the position around - Check if you've reached the end - Find patterns and advance past them
Part 2: THE STRUCTURE
Our scanner returns a hashref of closures. Each closure shares access to the same internal state:When you call create_scanner, you get back a hashref with four function references. They all close over $pos and $string, sharing that state.sub create_scanner { my ($string) = @_; my $pos = 0; # This state is shared by all closures return { pos => sub { ... }, mod_pos => sub { ... }, eos_check => sub { ... }, find => sub { ... }, }; }
Part 3: THE POSITION CLOSURES
First, the simple ones - getting and modifying position:The pos closure just returns current position. The mod_pos closure adjusts it by a delta (positive or negative) and clamps it to valid bounds. You can't go before the start or past the end.pos => sub { return $pos; }, mod_pos => sub { my ($delta) = @_; $pos += $delta; $pos = 0 if $pos < 0; $pos = length($string) if $pos > length($string); return $pos; },
Part 4: END-OF-STRING CHECK
Simple but essential:Returns true when we've scanned to the end. Useful for loop conditions.eos_check => sub { return $pos >= length($string); },
Part 5: THE FIND CLOSURE - WHERE THE MAGIC HAPPENS
This is the heart of the scanner:find => sub { my ($pattern) = @_; my $remainder = substr($string, $pos); if ($remainder =~ /$pattern/) { my $match_start = $-[0]; # Position where match began my $match = $&; # The matched text my $match_len = length($match); $pos += $match_start + $match_len; # Advance past match return { match => $match, start => $match_start, length => $match_len, new_pos => $pos, }; } return undef; # No match found },
Part 6: THE SPECIAL VARIABLE $-[0]
Here's a gem many Perl programmers don't know about: $-[0]After a successful regex match, $-[0] contains the offset where the match started within the string. It's the start position of $& (the full match).
There's also $+[0] which is where the match ended. Together:
If you have capture groups, $-[1], $-[2], etc. give you their positions.$-[0] # Match start position $+[0] # Match end position (one past last character) $& # The matched text itself
Part 7: PUTTING IT ALL TOGETHER
Here's the complete scanner:sub create_scanner { my ($string) = @_; my $pos = 0; return { pos => sub { return $pos; }, mod_pos => sub { my ($delta) = @_; $pos += $delta; $pos = 0 if $pos < 0; $pos = length($string) if $pos > length($string); return $pos; }, eos_check => sub { return $pos >= length($string); }, find => sub { my ($pattern) = @_; my $remainder = substr($string, $pos); if ($remainder =~ /$pattern/) { my $match_start = $-[0]; my $match = $&; my $match_len = length($match); $pos += $match_start + $match_len; return { match => $match, start => $match_start, length => $match_len, new_pos => $pos, }; } return undef; }, }; }
Part 8: USING THE SCANNER
Here's how you'd use it to tokenize a simple expression:Output:my $scanner = create_scanner("foo = 42 + bar"); while (!$scanner->{eos_check}->()) { # Skip whitespace $scanner->{find}->('\s+'); # Try to match a word if (my $result = $scanner->{find}->('\w+')) { print "WORD: $result->{match}\n"; } # Try to match an operator elsif ($result = $scanner->{find}->('[=+\-*/]')) { print "OP: $result->{match}\n"; } }
WORD: foo OP: = WORD: 42 OP: + WORD: bar
Part 9: WHY CLOSURES?
You might wonder why we don't just use a blessed object. Closures offer:Plus, there's something elegant about functions that carry their own state. It's a different way of thinking that will make you a better programmer.1. No class boilerplate needed 2. True encapsulation - $pos can't be accessed directly 3. Lighter weight than full OO 4. A great way to learn functional programming concepts
This pattern - returning a hash of closures - is incredibly useful. You can build state machines, iterators, parsers, and more. The string scanner is just the beginning.
Happy scanning!
Created By: Wildcard Wizard. Copyright 2026