<!-- category: regex -->
Dynamic Regex Assembly
You have a list of words. You want to match any of them in a string. You could write this:But what if the list changes at runtime? What if it comes from a config file, a database, or user input? You can't hardcode it.if ($text =~ m~foo|bar|baz|quux~) { say "matched!"; }
Build the regex dynamically:
Three functions. One compiled regex. Completely safe against special characters. This is how you build patterns from data.my @words = qw(foo bar baz quux); my $pattern = join '|', map { quotemeta } @words; $text =~ m~$pattern~;
Part 1: THE PROBLEM
Imagine you have a blocklist:You want to check if a URL contains any of these. Naive approach:my @blocked = ('spam.com', 'evil.org', 'bad-stuff.net');
This loops. It's slow for large lists. And it's dangerous. If a domain contains regex metacharacters likefor my $domain (@blocked) { if ($url =~ m~$domain~) { say "Blocked: $domain"; last; } }
., they match any character instead of a literal dot.
spam.com matches spamXcom because . is a wildcard. That's a bug.
Part 2: QUOTEMETA SAVES YOU
quotemeta escapes all non-alphanumeric characters in a string, prefixing them with backslashes:Every character that could have special meaning in a regex gets escaped. Nowsay quotemeta('spam.com'); # spam\.com say quotemeta('C++ rocks'); # C\+\+\ rocks say quotemeta('price: $5'); # price\:\ \$5 say quotemeta('[array]'); # \[array\]
spam\.com matches only the literal string "spam.com", not "spamXcom".
The \Q...\E escape inside a regex does the same thing:
But when you're building a pattern from an array, quotemeta with map is cleaner.$text =~ m~\Q$user_input\E~;
Part 3: THE ASSEMBLY LINE
Here's the full pipeline:Step by step:my @words = ('foo.bar', 'baz+quux', '$price'); my $pattern = join '|', map { quotemeta } @words; say $pattern; # foo\.bar|baz\+quux|\$price
The map transforms each element. The join glues them together with the alternation operator. The result is a safe regex that matches any of the original strings literally.STEP RESULT --------------------- -------------------------------- @words ('foo.bar', 'baz+quux', '$price') map { quotemeta } ('foo\.bar', 'baz\+quux', '\$price') join '|' 'foo\.bar|baz\+quux|\$price'
Part 4: PRECOMPILE WITH qr
Building the string is step one. Compiling it into a regex object is step two:
Nowmy $re = qr~$pattern~i;
$re is a compiled regex. Perl parses and optimizes it once. You can use it over and over without recompilation:
Without qr, Perl would recompile the pattern on every iteration. With it, the regex engine compiles once, matches many. On a million-line log file, that difference matters.my @words = qw(error warning critical); my $pattern = join '|', map { quotemeta } @words; my $re = qr~$pattern~i; while (<$logfile>) { if (m~$re~) { say "ALERT: $_"; } }
The i flag makes it case-insensitive. You can add any flags you want: qr~$pattern~ix for case-insensitive with extended formatting.
Part 5: WORD BOUNDARY CONTROL
Sometimes you want to match whole words only, not substrings:The# Without boundaries: "warning" matches "forwarning" my $re = qr~$pattern~i; # With boundaries: "warning" only matches the word "warning" my $re = qr~\b(?:$pattern)\b~i;
\b markers require a word boundary on each side. The (?:...) is a non-capturing group that keeps the alternation contained.
Without the group:
With the group:# BROKEN: \b only applies to first and last alternatives qr~\berror|warning|critical\b~ # Matches: \berror OR warning OR critical\b
Always group your alternations. It's the kind of bug that passes tests and fails in production.# CORRECT: \b applies to the entire alternation qr~\b(?:error|warning|critical)\b~ # Matches: \b(error OR warning OR critical)\b
Part 6: LOG FILTERING
Real-world example. You have a list of error patterns and you want to scan a log file for any of them:Notice the baby cart operator#!/usr/bin/env perl use strict; use warnings; use feature 'say'; my @triggers = ( 'segfault', 'out of memory', 'disk full', 'connection refused', 'timeout exceeded', ); my $re = qr~@{[ join '|', map { quotemeta } @triggers ]}~i; open my $fh, '<', '/var/log/syslog' or die "Can't open syslog: $!\n"; while (<$fh>) { if (m~$re~) { chomp; say "LINE $.: $_"; } }
@{[...]} inside the qrYou could also write it as two lines for clarity:
Same result. Pick whichever reads better to you.my $pattern = join '|', map { quotemeta } @triggers; my $re = qr~$pattern~i;
Part 7: KEYWORD HIGHLIGHTING
Turn matched words bold in terminal output:Themy @keywords = qw(error fail crash abort panic); my $re = qr~\b(@{[ join '|', map { quotemeta } @keywords ]})\b~i; while (<STDIN>) { s~$re~\e[1;31m$1\e[0m~g; print; }
\e[1;31m is ANSI bold red. \e[0m resets. Every keyword lights up in your terminal.
Note the capturing parentheses inside qr
Pipe any command through this:
Live log highlighting with zero dependencies.tail -f /var/log/syslog | perl -pe ' BEGIN { @k = qw(error fail crash); $re = join "|", map { quotemeta } @k; $re = qr~\b($re)\b~i; } s~$re~\e[31m$1\e[0m~g; '
Part 8: LOADING PATTERNS FROM FILES
Patterns from a config file, one per line:Now your regex is data-driven. Update the blocklist file, restart the script, and new patterns are active. No code changes needed.open my $fh, '<', 'blocklist.txt' or die "Can't open blocklist: $!\n"; my @blocked = map { chomp; $_ } <$fh>; close $fh; # Remove blanks and comments @blocked = grep { /\S/ && !/^#/ } @blocked; my $re = qr~@{[ join '|', map { quotemeta } @blocked ]}~i;
For a blocklist that looks like:
The grep strips comments and blank lines. quotemeta handles the dots. Your regex matches exactly what you listed.# Domains to block spam.com evil.org # This one is particularly bad bad-stuff.net
Part 9: SORTING MATTERS
Here's a subtle gotcha. What if your word list contains words that are prefixes of other words?The alternation becomesmy @words = ('error', 'error_fatal', 'warn'); my $re = qr~@{[ join '|', map { quotemeta } @words ]}~;
error|error_fatal|warn. Perl's regex engine tries alternatives left to right. If it matches "error" first, it never gets to "error_fatal". For substring matching this is fine. But if you're extracting matches and want the longest one, sort by length descending:
Nowmy $re = qr~@{[ join '|', map { quotemeta } sort { length($b) <=> length($a) } @words ]}~;
error_fatal is tried before error. Longest match wins.
This is the same principle behind why most lexers sort keywords by length. You want "foreach" to match before "for". Perl won't magically pick the longest alternative. It picks the first one that works.
Part 10: PERFORMANCE NOTES
Perl's regex engine is smart about alternations. For simple literal strings joined with|, it builds an optimized trie internally. This means:
Gets compiled into something like:qr~cat|car|cab|can~
A trie. Not four separate comparisons. The engine walks the trie character by character. For large word lists, this is dramatically faster than checking each word in a loop.c - a - t | r | b | n
You can see this optimization in action with use re 'debug':
The debug output shows the trie structure. Perl figured out that all four words start with "ca" and optimized accordingly. You didn't have to ask.use re 'debug'; my $re = qr~cat|car|cab|can~;
However, there's a limit. If your word list has thousands of entries, consider whether a hash lookup might be simpler:
Use regex when you need substring matching, boundaries, or case-insensitive matching across text. Use hashes when you're checking exact values.my %blocked = map { lc $_ => 1 } @words; # For exact matches, hash is O(1) if ($blocked{lc $word}) { say "Blocked!"; }
Part 11: THE FULL TOOLKIT
Here's a summary of everything you need:And the pattern you'll use again and again:FUNCTION PURPOSE ----------- ---------------------------------------- quotemeta Escape regex metacharacters in a string join '|' Build alternation from list map { } Transform each element qr~~ Compile regex once for reuse \b(?:...)\b Match whole words only @{[...]} Inline evaluation (baby cart)
One line. Safe. Fast. Reusable.my @terms = get_terms_from_somewhere(); my $re = qr~\b(?:@{[ join '|', map { quotemeta } @terms ]})\b~i;
Dynamic regex assembly is the kind of technique that turns a pile of hardcoded conditionals into a single, elegant match. You feed it data, it gives you a pattern. That's Perl doing what Perl does best: making text processing feel effortless.
perl.gg@words ──→ map { quotemeta } │ ──→ join '|' │ ──→ qr~...\b~i │ [regex] │ .----|----. | MATCH? | '----+----' / \ yes no .--. |o_o | |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/