<!-- category: hidden-gems -->
$& $ $' - The Match Variable Performance Curse
Three variables. If you use any one of them, anywhere in your entire
program, every single regex in your entire program gets slower.
Not just the regex near the variable. Every regex. In every module.
In every library you loaded. All of them. Slower. Because you typed
$& one time.
This is the match variable performance curse. It is one of the most infamous gotchas in Perl's history. It affected real production systems for decades. And it is still lurking in codebases that have not been updated.use Some::Module; # has 200 regexes internally use Another::Module; # has 150 more # somewhere in YOUR code: my $match = $&; # congratulations, you just slowed down # 350 regexes that aren't yours
Part 1: WHAT THE VARIABLES CONTAIN
The three match variables capture parts of the string involved in the most recent successful regex match:my $string = "Hello, World!"; $string =~ m~World~; say $`; # "Hello, " (prematch - everything before the match) say $&; # "World" (match - the matched substring) say $'; # "!" (postmatch - everything after the match)
Together, they let you reconstruct the full original string: $H e l l o , W o r l d ! |___________|_________|_| $` $& $' prematch match postmatch
. $& . $' equals the original string. They give you complete
context about where in the string the match occurred.
Handy, right? Sure. If you do not care about performance.
Part 2: THE GLOBAL PERFORMANCE PENALTY
Here is the horrifying part. Perl's regex engine is optimized to avoid unnecessary work. Normally, when a regex matches, Perl only needs to know where it matched and what the capture groups contain. It does not need to copy the parts of the string before and after the match.But if $&, $`, or $' might be accessed, Perl has to
compute all three for every match. That means copying substrings
of the target string on every successful regex. For long strings,
that is a lot of copying.
The key word is "might." Perl checks at compile time whether any code in the entire program uses these variables. If any code does, the engine activates the expensive path for every regex operation. Not just the regexes near the variable usage. Every regex. Globally.
One variable used once. Every regex pays the tax forever.Program WITHOUT $& / $` / $': regex match --> record position --> done (fast) Program WITH $& / $` / $' anywhere: regex match --> record position --> copy prematch substring --> copy match substring --> copy postmatch substring --> done (slow)
Part 3: WHY PERL DOES THIS
Perl cannot know at compile time which regexes will execute before the code that reads$&. So it has to prepare all of them.
Consider:
Which regex setssub process { my ($text) = @_; $text =~ m~important~; # does this need to set $& ? return $text; } # ... 5000 lines later ... $x =~ m~pattern~; print $&; # reads $& from the LAST successful match
$&? It depends on execution order, which is
unknowable at compile time. So Perl takes the conservative approach:
if $& appears anywhere, compute it everywhere.
This is a global property of the program. The regex engine has one flag: "are match variables in use?" If yes, every regex pays. There is no way to say "only compute $& for this one regex."
At least, not until Perl 5.10.
Part 4: THE ENGLISH MODULE TRAP
TheEnglish module gives human-readable names to special variables.
It also activates the performance curse.
Just loading the English module without the right flag imports the match variable aliases. And importing the aliases counts as "using" them. Your program gets slower just from theuse English; # DANGER # These are now available: # $PREMATCH (alias for $`) # $MATCH (alias for $&) # $POSTMATCH (alias for $')
use statement, even
if you never reference $MATCH in your code.
The fix:
Theuse English qw(-no_match_vars); # safe!
-no_match_vars flag tells English to skip the three dangerous
aliases. You still get $INPUT_RECORD_SEPARATOR, $EVAL_ERROR,
and all the other friendly names. You just do not get the three
that destroy performance.
If you see use English; without -no_match_vars in a codebase,
fix it immediately. That bare use English is silently slowing
down every regex in the program.
Part 5: THE SAFE ALTERNATIVES (5.10+)
Perl 5.10 introduced three new variables that do the same thing as$&, $`, and $' without the global penalty:
Themy $string = "Hello, World!"; $string =~ m~World~p; # note the /p flag say ${^PREMATCH}; # "Hello, " say ${^MATCH}; # "World" say ${^POSTMATCH}; # "!"
/p flag tells the regex engine to compute the match variables
for this specific match only. No global flag. No penalty on other
regexes. Just this one, right here, pays the cost.
TheOLD (global penalty): $string =~ m~pattern~; say $&; # every regex in the program is slower NEW (per-regex cost): $string =~ m~pattern~p; say ${^MATCH}; # only THIS regex pays the cost
${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} variables only
exist after a regex with the /p flag. Without /p, they are
undefined. This is the opt-in behavior that should have existed from
the start.
Part 6: PERL 5.20 FIXED THE CURSE (MOSTLY)
Starting with Perl 5.20, the global penalty for$&, $`, and
$' was removed. These variables are now computed on demand, only
for the regex that was most recently successful. No global flag. No
penalty on other regexes.
So is the curse dead? Almost. Here is why you should still care:# Perl 5.20+: this is fine now $string =~ m~pattern~; say $&; # no global performance penalty
- Legacy code might still run on Perl before 5.20. Many production
- The
/pflag and${^MATCH}variables are still the explicit,
- If you write a CPAN module, your code might run on any Perl
$& in a CPAN module penalizes every user on
Perl before 5.20.
- Habits matter. Understanding why
$&was dangerous teaches you
Part 7: HOW TO CHECK IF YOUR CODE IS AFFECTED
Wondering if your codebase uses the dangerous variables? Grep for them:That grep is ugly because$ grep -rn '\$&\|\\$`\|\$'"'" lib/ bin/
$' conflicts with shell quoting. An
easier approach:
Or use Perl::Critic, which has a policy specifically for this:$ perl -ne 'print "$ARGV:$.: $_" if /\$[&`'"'"']/' lib/*.pm
The$ perlcritic --single-policy ProhibitMatchVars lib/
Perl::Critic policy BuiltinFunctions::ProhibitMatchVars
flags any use of $&, $`, $', or use English without
-no_match_vars. It is included in the default policy set.
Also check for modules you depend on. If any CPAN module you load
uses $& internally, your program pays the price. You can check
with:
In practice, just run Perl::Critic. It catches the common cases.#!/usr/bin/env perl use strict; use warnings; use feature 'say'; # load your modules use Some::Module; use Another::Module; # check if match vars are in use use B; say "Match vars penalty active" if B::regex_pad_av() || $B::Deparse::{'$&'}; # simpler but less reliable: check the variable $_ = "test"; m~(test)~; say "Has \$& active" if defined eval '$&';
Part 8: THE HISTORICAL PAIN
This is not a theoretical problem. Real CPAN modules shipped with$& in their code. Every program that used those modules got
slower. The module authors often had no idea.
The most infamous case was English.pm itself. For years, the
standard way to get readable variable names imported the match
variable aliases by default. The -no_match_vars fix was added
later, but by then, countless tutorials and books had taught
use English; without the flag.
Thousands of Perl programs ran slower than they needed to because of a convenience module that was supposed to make code more readable. The irony is exquisite.
The Perl porters eventually fixed the problem at the engine level in 5.20. But the 15-year gap between "we know this is a problem" and "we fixed it in the engine" left a mark on Perl culture. It made the community acutely aware of global side effects and the hidden costs of convenience.
Part 9: WHAT TO USE INSTEAD
If you need the matched text, use a capture group:If you need prematch and postmatch, use# instead of $& $string =~ m~(pattern)~; my $matched = $1;
@- and @+ (match
position arrays) with substr:
Or on Perl 5.10+, use the safe variables:$string =~ m~pattern~; my $prematch = substr($string, 0, $-[0]); my $match = substr($string, $-[0], $+[0] - $-[0]); my $postmatch = substr($string, $+[0]);
Or on Perl 5.20+, just use$string =~ m~pattern~p; my $prematch = ${^PREMATCH}; my $match = ${^MATCH}; my $postmatch = ${^POSTMATCH};
$& without guilt. The penalty is gone.
But add a comment so future maintainers do not panic:
$string =~ m~pattern~; my $match = $&; # safe on 5.20+, no global penalty
Part 10: LESSONS FROM THE CURSE
The match variable curse is a story about the cost of global state. Three variables that look harmless. A design decision made in Perl's early days. A performance cliff that was invisible unless you knew to look for it..--. |o_o | "Three little variables. |:_/ | Fifteen years of pain. // \ \ One -no_match_vars flag." (| | ) /'\_ _/`\ \___)=(___/
The technical problem is mostly solved in modern Perl. Perl 5.20
removed the global penalty. The /p flag exists for explicit
opt-in. Perl::Critic catches the old patterns.
But the lesson is timeless. Global side effects are dangerous. Convenience features can have hidden costs. And sometimes the most harmful code in your program is a single variable you used once, in one place, that nobody thought to question.
Know the history. Use the safe alternatives. And if you see
use English; without -no_match_vars, add the flag. Your regex
engine will thank you.
perl.gg