Date Created: 2024-07-13 Date Modified: 2024-07-13 ============================================================================ STRING SCANNER ============================================================================ Ruby has a lovely StringScanner class for lexical scanning. Today we're building our own in Perl, using the power of closures to maintain state. It's a great exercise in functional programming concepts! ============================================================================ PART 1: THE CONCEPT ============================================================================ A string scanner lets you work through a string piece by piece, keeping track of where you are. Think of it like a cursor moving through text. You can: - Check your current position - Move the position around - Check if you've reached the end - Find patterns and advance past them The Ruby version is object-oriented. Our Perl version uses closures - functions that remember their environment. Same power, different flavor. ============================================================================ PART 2: THE STRUCTURE ============================================================================ Our scanner returns a hash of closures. Each closure shares access to the same internal state: sub create_scanner { my ($string) = @_; my $pos = 0; # This state is shared by all closures return { pos => sub { ... }, mod_pos => sub { ... }, eos_check => sub { ... }, find => sub { ... }, }; } When you call create_scanner, you get back a hashref with four function references. They all close over $pos and $string, sharing that state. ============================================================================ PART 3: THE POSITION CLOSURES ============================================================================ First, the simple ones - getting and modifying position: pos => sub { return $pos; }, mod_pos => sub { my ($delta) = @_; $pos += $delta; $pos = 0 if $pos < 0; $pos = length($string) if $pos > length($string); return $pos; }, The pos closure just returns current position. The mod_pos closure adjusts it by a delta (positive or negative) and clamps it to valid bounds. You can't go before the start or past the end. ============================================================================ PART 4: END-OF-STRING CHECK ============================================================================ Simple but essential: eos_check => sub { return $pos >= length($string); }, Returns true when we've scanned to the end. Useful for loop conditions. ============================================================================ PART 5: THE FIND CLOSURE - WHERE THE MAGIC HAPPENS ============================================================================ This is the heart of the scanner: find => sub { my ($pattern) = @_; my $remainder = substr($string, $pos); if ($remainder =~ /$pattern/) { my $match_start = $-[0]; # Position where match began my $match = $&; # The matched text my $match_len = length($match); $pos += $match_start + $match_len; # Advance past match return { match => $match, start => $match_start, length => $match_len, new_pos => $pos, }; } return undef; # No match found }, ============================================================================ PART 6: THE SPECIAL VARIABLE $-[0] ============================================================================ Here's a gem many Perl programmers don't know about: $-[0] After a successful regex match, $-[0] contains the offset where the match started within the string. It's the start position of $& (the full match). There's also $+[0] which is where the match ended. Together: $-[0] # Match start position $+[0] # Match end position (one past last character) $& # The matched text itself If you have capture groups, $-[1], $-[2], etc. give you their positions. ============================================================================ PART 7: PUTTING IT ALL TOGETHER ============================================================================ Here's the complete scanner: sub create_scanner { my ($string) = @_; my $pos = 0; return { pos => sub { return $pos; }, mod_pos => sub { my ($delta) = @_; $pos += $delta; $pos = 0 if $pos < 0; $pos = length($string) if $pos > length($string); return $pos; }, eos_check => sub { return $pos >= length($string); }, find => sub { my ($pattern) = @_; my $remainder = substr($string, $pos); if ($remainder =~ /$pattern/) { my $match_start = $-[0]; my $match = $&; my $match_len = length($match); $pos += $match_start + $match_len; return { match => $match, start => $match_start, length => $match_len, new_pos => $pos, }; } return undef; }, }; } ============================================================================ PART 8: USING THE SCANNER ============================================================================ Here's how you'd use it to tokenize a simple expression: my $scanner = create_scanner("foo = 42 + bar"); while (!$scanner->{eos_check}->()) { # Skip whitespace $scanner->{find}->('\s+'); # Try to match a word if (my $result = $scanner->{find}->('\w+')) { print "WORD: $result->{match}\n"; } # Try to match an operator elsif ($result = $scanner->{find}->('[=+\-*/]')) { print "OP: $result->{match}\n"; } } Output: WORD: foo OP: = WORD: 42 OP: + WORD: bar ============================================================================ PART 9: WHY CLOSURES? ============================================================================ You might wonder why we don't just use a blessed object. Closures offer: 1. No class boilerplate needed 2. True encapsulation - $pos can't be accessed directly 3. Lighter weight than full OO 4. A great way to learn functional programming concepts Plus, there's something elegant about functions that carry their own state. It's a different way of thinking that will make you a better programmer. ============================================================================ This pattern - returning a hash of closures - is incredibly useful. You can build state machines, iterators, parsers, and more. The string scanner is just the beginning. Happy scanning! perl.gg