String Scanner: A Ruby Inspired Parser
In the world of Perl programming, we often encounter tasks that require parsing and manipulating strings. Today, we’re going to explore my implementation of a string scanner in Perl, inspired by Ruby’s StringScanner. This powerful tool allows us to traverse a string, find patterns, and keep track of our position - all with a clean, functional interface.
Let’s dive into the code and break down its functionality:
#!/usr/bin/env perl
use strict;
use warnings;
package strscan;
sub create
{my $string = qq|@_|;
@_ = undef; # Clear @_ to free memory
# start position
my $pos = 0;
# end of string position
my $eos = length ( $string );
return
{pos => sub { return $pos }, # Closure to return current position
sub { $pos = shift }, # Closure to modify position
mod_pos => sub { return $eos == $pos ? 0 : 1; }, # Check if not at end of string
eos_check => sub # Closure to find regex
find =>
{my $regex = shift;
# Match regex against substring from current position
if ( my ($found) = substr ( $string, $pos ) =~ m~($regex)~ )
{# $-[0] contains the start offset of the match within the substring
my ( $start, $length ) = ( $-[0], length ( $found ) );
# Update position: add start offset and length of match
$pos += $start + $length;
return # Return hash ref with match details
{pos => $pos,
$start,
match_start => $length,
match_length => $found,
match =>
}
}else
{$pos = $eos; # If no match, set position to end of string
return undef;
}
}
}
}
1;
__END__
This code defines a package called strscan with a single function create. Let’s break down what’s happening:
- The create function takes a string as input and initializes the scanner.
- It sets up two important variables: $pos (current position in the string) and $eos (end of string position).
- The function returns a hash reference containing four closure
functions:
- pos: Returns the current position
- mod_pos: Allows modifying the current position
- eos_check: Checks if we’ve reached the end of the string
- find: The core function that searches for a regex pattern
The find function is where the real magic of our string scanner happens:
- It accepts a regex pattern as its input, allowing for flexible searching.
- Using substr, it creates a slice of the string starting from the current position, then attempts to match the provided regex against this substring.
- If a match is found:
- It uses $-[0], a lesser-known Perl feature, to determine the start position of the match. $-[0] contains the offset of the entire match within the string that was matched against.
- It calculates the length of the matched string.
- It updates the scanner’s position by adding both the start offset and the length of the match, effectively moving past the matched portion.
- It returns a hash reference containing detailed information about the match, including the new position, match start, length, and the matched text itself.
- If no match is found:
- It moves the position to the end of the string, signaling that scanning is complete.
- It returns undef to indicate failure to find a match.
Example Use
Now, let’s look at how we can use this scanner:
# Demo
package main;
use Data::Dumper;
use feature qw|say|;
my $scan = strscan::create ( 'This is just a test!' );
say q|start position: |, $scan->{pos}(), qq|\n|; # return position
while ( $scan->{eos_check}() )
{ if ( my $match = $scan->{find}('\w+') )
{ say q|match: |, $match->{match};
say q|pos: |, $match->{pos};
say q||;
}
}
say q|end position: |, $scan->{pos}(); # return position
In This Demo:
- We create a new scanner with the string “This is just a test!”.
- We print the start position (which is 0).
- We enter a loop that continues until we reach the end of the string.
- In each iteration, we search for one or more word characters (‘\w+’).
- For each match, we print the matched word and the new position.
- Finally, we print the end position.
This scanner provides a flexible way to parse strings, allowing us to easily move through the string and find patterns. It’s particularly useful for tasks like tokenizing input or parsing structured text.
The use of closures in this implementation is a powerful Perl technique. It allows us to maintain state (the position and string) without using global variables, providing a clean and encapsulated interface.
In conclusion, this Perl string scanner demonstrates how we can create powerful, Ruby-inspired tools using Perl’s flexible syntax and functional programming capabilities. It’s a testament to Perl’s expressiveness and ability to handle complex string manipulation tasks with elegance.
Copyright ©️ 2024 perl.gg