<!-- category: regex -->
Tilde Delimiters for Regex
You are matching a file path. Here is what it looks like with the default/ delimiter:
Count the backslashes. Four of them, just to say "starts with /usr/local/bin/". The slashes in the path collide with the slashes in the regex, so you escape everything, and the result looks like a fence fell over.if ($path =~ /^\/usr\/local\/bin\//) { # do something }
Now the same thing with a tilde delimiter:
Zero backslashes. The path reads exactly like it does on your terminal. The regex is instantly readable because the delimiter does not fight the content.if ($path =~ m~^/usr/local/bin/~) { # do something }
This is not a hack. It is not a secret. It is documented right there
in perlop. But most Perl programmers never change their delimiter,
and they suffer for it every single time they match a URL, a file
path, or an HTML tag.
Part 1: THE RULE
The default regex delimiter is/:
But when you use the explicit$string =~ /pattern/;
m operator, you can pick any
non-whitespace character as the delimiter:
They all do the same thing. The$string =~ m~pattern~; $string =~ m!pattern!; $string =~ m#pattern#; $string =~ m|pattern|; $string =~ m%pattern%;
m prefix tells Perl "I am about to
give you a match operator, and the next character is my delimiter."
You only get to skip the m when using /. Every other character
needs it.
Part 2: WHY TILDES
Any character works, so why tildes specifically?The tilde almost never appears in the kind of data you regex against. File paths useCRITERION ~ ! # | % ----------- --- --- --- --- --- Rare in data yes no no no no Visually distinct yes yes yes meh meh Not a comment char yes yes NO yes yes No shell meaning yes no no no no Easy to type yes yes yes yes yes
/. URLs use /, ?, &, #. HTML uses <, >,
/, ". Config files use =, #, :. Shell commands use $,
!, |.
The tilde sits there on your keyboard, minding its own business, rarely needed in patterns. It is the perfect delimiter.
The # character is a trap. Inside /x mode (extended regex), #
starts a comment. Using it as a delimiter with /x leads to
confusion. The ! collides with shell history expansion in some
contexts. The | looks too much like the regex alternation operator.
Tildes have none of these problems.
Part 3: BEFORE AND AFTER
Let's see the difference on real patterns.File paths:
URLs:# with / $path =~ /^\/var\/log\/nginx\/access\.log$/ # with ~ $path =~ m~^/var/log/nginx/access\.log$~
HTML tags:# with / $url =~ /^https?:\/\/[^\/]+\/api\/v[0-9]+\// # with ~ $url =~ m~^https?://[^/]+/api/v[0-9]+/~
The HTML example is interesting. Neither version needs escaping for slashes, but the tilde version does not need the# with / $html =~ /<a\s+href="([^"]+)"[^>]*>/ # with ~ $html =~ m~<a\s+href="([^"]+)"[^>]*>~
m prefix to look
wrong. With /, your brain has to parse /<a and figure out where
the regex starts and the HTML begins. With ~, the boundary is
obvious.
Part 4: SUBSTITUTION WITH TILDES
Substitution uses three delimiters instead of two. With tildes:Compare to the escaped version:$path =~ s~/old/path~/new/path~;
The tilde version has zero visual noise. The slash version has six backslashes obscuring six characters.$path =~ s/\/old\/path/\/new\/path/;
You can add modifiers as usual:
All modifiers ($text =~ s~https?://\S+~[link removed]~g; $html =~ s~<br\s*/?>~\n~gi; $log =~ s~^(\d+\.\d+\.\d+\.\d+)~[REDACTED]~gm;
g, i, m, s, x, e, r) work with any
delimiter. The delimiter choice is purely cosmetic.
Part 5: PAIRED DELIMITERS
Perl also supports paired delimiters using brackets and braces. These use an opening and closing character:For substitution, paired delimiters get interesting. You can use the same pair for both halves, or mix them:$string =~ m{pattern}; # curly braces $string =~ m[pattern]; # square brackets $string =~ m(pattern); # parentheses $string =~ m<pattern>; # angle brackets
The# same pair for both $string =~ s{old}{new}; # different pairs (legal but unusual) $string =~ s{old}[new]; $string =~ s(old)<new>;
s{}{} form is popular in some codebases, especially for long
patterns where the braces provide clear visual grouping:
With the$html =~ s{ <div \s+ class="old-style" [^>]* > (.*?) </div> }{ <section class="new-style">$1</section> }gsx;
/x modifier and curly braces, this almost reads like
a configuration block.
Part 6: QUOTE OPERATORS TOO
The tilde trick is not limited to regex. Perl's quote operators all accept custom delimiters:The# strings my $path = q~/usr/local/bin~; # single-quoted string my $msg = qq~Hello, $name!~; # double-quoted string # lists my @dirs = qw~/etc /var /tmp~; # word list # regex compilation my $re = qr~^/api/v\d+/~; # compiled regex # command execution my $out = qx~ls /tmp~; # backtick equivalent
q and qq operators are especially handy when your string
contains a lot of quotes:
No escaped quotes anywhere. The tilde delimiter stays out of the way because HTML never uses tildes.# painful my $html = "<a href=\"$url\" class=\"link\" title=\"$title\">$text</a>"; # better with qq and tildes my $html = qq~<a href="$url" class="link" title="$title">$text</a>~;
Paired delimiters work here too:
my $json = q{{"key": "value", "count": 42}}; my $sql = q[SELECT * FROM users WHERE id = ?]; my $xml = q<<?xml version="1.0"?>>;
Part 7: THE QR OPERATOR
Theqr operator compiles a regex into a reusable object. Tildes
work perfectly here:
Compiling withmy $url_re = qr~^https?://([^/]+)(/.*)?$~; my $ipv4_re = qr~^(\d{1,3}\.){3}\d{1,3}$~; my $path_re = qr~^/var/log/[^/]+\.log$~; if ($input =~ $url_re) { my ($host, $path) = ($1, $2); print "Host: $host, Path: $path\n"; }
qr and using tildes means you define the pattern
once, with clean delimiters, and reuse it everywhere. The compiled
regex carries its modifiers with it:
You can build a library of compiled patterns at the top of your script. Each one clean and readable. Then use them throughout your code without ever thinking about delimiters again:my $tag = qr~<(\w+)[^>]*>~i; # case-insensitive, baked in # both of these use the /i modifier from $tag $html =~ $tag; $xml =~ $tag;
# pattern library my %RE = ( url => qr~^https?://\S+~, email => qr~[\w.+-]+@[\w.-]+\.\w{2,}~, ip => qr~\b\d{1,3}(?:\.\d{1,3}){3}\b~, path => qr~^/[\w./-]+$~, ); # usage if ($line =~ $RE{ip}) { print "Found IP address\n"; }
Part 8: WHEN NOT TO USE TILDES
There are a few cases where tildes are the wrong choice.If your pattern contains literal tildes (matching Unix home directories, for example):
If your team's style guide mandates# tilde in the pattern - use a different delimiter $path =~ m{^~/Documents}; # curly braces work here $path =~ m!^~/Documents!; # or exclamation marks
/, follow the style guide.
Consistency across a codebase matters more than any individual
readability gain.
If you are writing a one-liner on the command line, / is fine.
One-liners are throwaway. You are not going to read them six months
later and wonder what the backslashes mean.
USE TILDES WHEN: * Pattern contains / (paths, URLs) * Pattern contains " (HTML, JSON) * Pattern is long and readability matters * You want the cleanest possible regex USE / WHEN: * Pattern is short and simple * No / characters in the pattern * Team style guide says so * Quick one-liner
Part 9: REAL-WORLD PATTERN LIBRARY
Here is a collection of common patterns using tilde delimiters. Copy and use them:Every one of these would need backslash escaping with# match a Unix absolute path m~^/[^\0]+$~ # match a URL with protocol m~^https?://[^\s]+~ # extract domain from URL m~^https?://([^/:]+)~ # match an IPv4 address (loose) m~^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$~ # match a date (YYYY-MM-DD) m~^\d{4}-\d{2}-\d{2}$~ # strip HTML tags s~<[^>]+>~~g # normalize multiple slashes in paths s~/{2,}~/~g # extract filename from path m~([^/]+)$~ # match Apache/Nginx log line (simplified) m~^(\S+)\s+\S+\s+\S+\s+\[([^\]]+)\]\s+"([^"]+)"\s+(\d+)\s+(\d+)~
/ delimiters.
With tildes, the patterns read almost like plain text.
Part 10: THE DELIMITER DECISION TREE
When you sit down to write a regex, take one second to think about the delimiter:One second of thought. Hours of saved confusion over the lifetime of the code.Does your pattern contain / ? | +----+----+ | | YES NO | | Use m~...~ Use /.../ (it's fine) | Does it also contain ~ ? | +----+----+ | | YES NO | | Use m{} ~ You're good with m~~ or m!!
The tilde delimiter is not clever. It is not tricky. It is not showing off. It is just choosing the right tool for the job. A regex should communicate its pattern, not its delimiters..--. |o_o | "Backslash-slash-backslash-slash? |:_/ | There's a better way." // \ \ (| | ) /'\_ _/`\ \___)=(___/
Every escaped slash is a tiny paper cut. Every line of
/\/usr\/local\/bin\// is a small crime against readability. Tildes
make it stop.
Your regex engine does not care which delimiter you use. Your future self, reading the code at 2 AM during an outage, absolutely does.
Pick the delimiter that disappears. That is the whole philosophy. The best delimiter is the one you do not even notice. For most patterns involving paths, URLs, and HTML, the tilde is that delimiter. It gets out of the way and lets your pattern speak.
Try it on your next script. Just once. You will not go back.
perl.gg