perl.gg / regex

<!-- category: regex -->

Dynamic Regex Assembly

2026-03-10

You have a list of words. You want to match any of them in a string. You could write this:
if ($text =~ m~foo|bar|baz|quux~) { say "matched!"; }
But what if the list changes at runtime? What if it comes from a config file, a database, or user input? You can't hardcode it.

Build the regex dynamically:

my @words = qw(foo bar baz quux); my $pattern = join '|', map { quotemeta } @words; $text =~ m~$pattern~;
Three functions. One compiled regex. Completely safe against special characters. This is how you build patterns from data.

Part 1: THE PROBLEM

Imagine you have a blocklist:
my @blocked = ('spam.com', 'evil.org', 'bad-stuff.net');
You want to check if a URL contains any of these. Naive approach:
for my $domain (@blocked) { if ($url =~ m~$domain~) { say "Blocked: $domain"; last; } }
This loops. It's slow for large lists. And it's dangerous. If a domain contains regex metacharacters like ., they match any character instead of a literal dot.

spam.com matches spamXcom because . is a wildcard. That's a bug.

Part 2: QUOTEMETA SAVES YOU

quotemeta escapes all non-alphanumeric characters in a string, prefixing them with backslashes:
say quotemeta('spam.com'); # spam\.com say quotemeta('C++ rocks'); # C\+\+\ rocks say quotemeta('price: $5'); # price\:\ \$5 say quotemeta('[array]'); # \[array\]
Every character that could have special meaning in a regex gets escaped. Now spam\.com matches only the literal string "spam.com", not "spamXcom".

The \Q...\E escape inside a regex does the same thing:

$text =~ m~\Q$user_input\E~;
But when you're building a pattern from an array, quotemeta with map is cleaner.

Part 3: THE ASSEMBLY LINE

Here's the full pipeline:
my @words = ('foo.bar', 'baz+quux', '$price'); my $pattern = join '|', map { quotemeta } @words; say $pattern; # foo\.bar|baz\+quux|\$price
Step by step:
STEP RESULT --------------------- -------------------------------- @words ('foo.bar', 'baz+quux', '$price') map { quotemeta } ('foo\.bar', 'baz\+quux', '\$price') join '|' 'foo\.bar|baz\+quux|\$price'
The map transforms each element. The join glues them together with the alternation operator. The result is a safe regex that matches any of the original strings literally.

Part 4: PRECOMPILE WITH qr

Building the string is step one. Compiling it into a regex object is step two:
my $re = qr~$pattern~i;
Now $re is a compiled regex. Perl parses and optimizes it once. You can use it over and over without recompilation:
my @words = qw(error warning critical); my $pattern = join '|', map { quotemeta } @words; my $re = qr~$pattern~i; while (<$logfile>) { if (m~$re~) { say "ALERT: $_"; } }
Without qr, Perl would recompile the pattern on every iteration. With it, the regex engine compiles once, matches many. On a million-line log file, that difference matters.

The i flag makes it case-insensitive. You can add any flags you want: qr~$pattern~ix for case-insensitive with extended formatting.

Part 5: WORD BOUNDARY CONTROL

Sometimes you want to match whole words only, not substrings:
# Without boundaries: "warning" matches "forwarning" my $re = qr~$pattern~i; # With boundaries: "warning" only matches the word "warning" my $re = qr~\b(?:$pattern)\b~i;
The \b markers require a word boundary on each side. The (?:...) is a non-capturing group that keeps the alternation contained.

Without the group:

# BROKEN: \b only applies to first and last alternatives qr~\berror|warning|critical\b~ # Matches: \berror OR warning OR critical\b
With the group:
# CORRECT: \b applies to the entire alternation qr~\b(?:error|warning|critical)\b~ # Matches: \b(error OR warning OR critical)\b
Always group your alternations. It's the kind of bug that passes tests and fails in production.

Part 6: LOG FILTERING

Real-world example. You have a list of error patterns and you want to scan a log file for any of them:
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; my @triggers = ( 'segfault', 'out of memory', 'disk full', 'connection refused', 'timeout exceeded', ); my $re = qr~@{[ join '|', map { quotemeta } @triggers ]}~i; open my $fh, '<', '/var/log/syslog' or die "Can't open syslog: $!\n"; while (<$fh>) { if (m~$re~) { chomp; say "LINE $.: $_"; } }
Notice the baby cart operator @{[...]} inside the qr. It evaluates the join/map/quotemeta pipeline inline and interpolates the result. One line, no intermediate variable.

You could also write it as two lines for clarity:

my $pattern = join '|', map { quotemeta } @triggers; my $re = qr~$pattern~i;
Same result. Pick whichever reads better to you.

Part 7: KEYWORD HIGHLIGHTING

Turn matched words bold in terminal output:
my @keywords = qw(error fail crash abort panic); my $re = qr~\b(@{[ join '|', map { quotemeta } @keywords ]})\b~i; while (<STDIN>) { s~$re~\e[1;31m$1\e[0m~g; print; }
The \e[1;31m is ANSI bold red. \e[0m resets. Every keyword lights up in your terminal.

Note the capturing parentheses inside qr. We need $1 in the replacement to put the original matched text back (preserving its original case).

Pipe any command through this:

tail -f /var/log/syslog | perl -pe ' BEGIN { @k = qw(error fail crash); $re = join "|", map { quotemeta } @k; $re = qr~\b($re)\b~i; } s~$re~\e[31m$1\e[0m~g; '
Live log highlighting with zero dependencies.

Part 8: LOADING PATTERNS FROM FILES

Patterns from a config file, one per line:
open my $fh, '<', 'blocklist.txt' or die "Can't open blocklist: $!\n"; my @blocked = map { chomp; $_ } <$fh>; close $fh; # Remove blanks and comments @blocked = grep { /\S/ && !/^#/ } @blocked; my $re = qr~@{[ join '|', map { quotemeta } @blocked ]}~i;
Now your regex is data-driven. Update the blocklist file, restart the script, and new patterns are active. No code changes needed.

For a blocklist that looks like:

# Domains to block spam.com evil.org # This one is particularly bad bad-stuff.net
The grep strips comments and blank lines. quotemeta handles the dots. Your regex matches exactly what you listed.

Part 9: SORTING MATTERS

Here's a subtle gotcha. What if your word list contains words that are prefixes of other words?
my @words = ('error', 'error_fatal', 'warn'); my $re = qr~@{[ join '|', map { quotemeta } @words ]}~;
The alternation becomes error|error_fatal|warn. Perl's regex engine tries alternatives left to right. If it matches "error" first, it never gets to "error_fatal". For substring matching this is fine. But if you're extracting matches and want the longest one, sort by length descending:
my $re = qr~@{[ join '|', map { quotemeta } sort { length($b) <=> length($a) } @words ]}~;
Now error_fatal is tried before error. Longest match wins.

This is the same principle behind why most lexers sort keywords by length. You want "foreach" to match before "for". Perl won't magically pick the longest alternative. It picks the first one that works.

Part 10: PERFORMANCE NOTES

Perl's regex engine is smart about alternations. For simple literal strings joined with |, it builds an optimized trie internally. This means:
qr~cat|car|cab|can~
Gets compiled into something like:
c - a - t | r | b | n
A trie. Not four separate comparisons. The engine walks the trie character by character. For large word lists, this is dramatically faster than checking each word in a loop.

You can see this optimization in action with use re 'debug':

use re 'debug'; my $re = qr~cat|car|cab|can~;
The debug output shows the trie structure. Perl figured out that all four words start with "ca" and optimized accordingly. You didn't have to ask.

However, there's a limit. If your word list has thousands of entries, consider whether a hash lookup might be simpler:

my %blocked = map { lc $_ => 1 } @words; # For exact matches, hash is O(1) if ($blocked{lc $word}) { say "Blocked!"; }
Use regex when you need substring matching, boundaries, or case-insensitive matching across text. Use hashes when you're checking exact values.

Part 11: THE FULL TOOLKIT

Here's a summary of everything you need:
FUNCTION PURPOSE ----------- ---------------------------------------- quotemeta Escape regex metacharacters in a string join '|' Build alternation from list map { } Transform each element qr~~ Compile regex once for reuse \b(?:...)\b Match whole words only @{[...]} Inline evaluation (baby cart)
And the pattern you'll use again and again:
my @terms = get_terms_from_somewhere(); my $re = qr~\b(?:@{[ join '|', map { quotemeta } @terms ]})\b~i;
One line. Safe. Fast. Reusable.

Dynamic regex assembly is the kind of technique that turns a pile of hardcoded conditionals into a single, elegant match. You feed it data, it gives you a pattern. That's Perl doing what Perl does best: making text processing feel effortless.

@words ──→ map { quotemeta } │ ──→ join '|' │ ──→ qr~...\b~i │ [regex] │ .----|----. | MATCH? | '----+----' / \ yes no .--. |o_o | |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/
perl.gg