<!-- category: functional -->
Chained /r Substitution Pipelines
Transform text without mutating anything. Chain substitutions like Unix pipes, each one feeding its result to the next.Five transformations. One expression. The originalmy $clean = $raw =~ s~\r\n~\n~gr =~ s~<[^>]*>~~gr =~ s~&~&~gr =~ s~\s+~ ~gr =~ s~^\s|\s$~~gr;
$raw is untouched.
The magic is the /r flag, introduced in Perl 5.14. Instead of
modifying $_ or the bound variable, it returns a modified copy. Chain
them together and you get a functional pipeline where data flows
through a series of transformations.
Part 1: THE /r FLAG
Normally,s~~~ modifies the variable and returns the number of
substitutions made:
Withmy $text = "Hello World"; my $count = ($text =~ s~World~Perl~); # $count is 1, $text is "Hello Perl"
/r, it returns the modified string and leaves the original
alone:
The variable is never touched. You get a fresh copy with the substitution applied. Think of it as "r" for "return" (or "non-destructive," if you're reading the docs).my $text = "Hello World"; my $new = ($text =~ s~World~Perl~r); # $new is "Hello Perl", $text is still "Hello World"
Part 2: CHAINING THE CHAINS
Sinces~~~r returns a string, you can immediately bind another
substitution to that string:
my $result = $text =~ s~foo~bar~gr =~ s~baz~qux~gr;
Each$text ─── s~foo~bar~gr ──→ (copy1) ─── s~baz~qux~gr ──→ $result Original First Intermediate Second Final string substitution copy substitution result
=~ s~~~r takes the string on its left, applies the substitution,
and returns a new string. The next =~ binds to that new string.
No temporary variables. No intermediate assignments. Just a pipeline.
Part 3: PRACTICAL PIPELINE - LOG CLEANING
You're parsing a messy log file. Each line needs several cleanups before it's useful:Four transformations, one expression. The originalwhile (my $line = <$fh>) { my $clean = $line =~ s~\s+$~~r # strip trailing whitespace =~ s~^\[[\d:/ ]+\]\s*~~r # strip timestamp brackets =~ s~\x1b\[[0-9;]*m~~gr # strip ANSI color codes =~ s~\s+~ ~gr; # collapse whitespace push @entries, $clean if length $clean; }
$line survives
for debugging if you need it. Every step is documented with a comment.
Compare to the mutation approach:
Works fine, butwhile (my $line = <$fh>) { $line =~ s~\s+$~~; $line =~ s~^\[[\d:/ ]+\]\s*~~; $line =~ s~\x1b\[[0-9;]*m~~g; $line =~ s~\s+~ ~g; push @entries, $line if length $line; }
$line is destroyed. If something goes wrong at step
three and you want to see the original input, it's gone.
Part 4: DATA SANITIZATION
User input is filthy. Clean it in one pass:Pure function. Input in, cleaned output out. Nothing mutated. Easy to test, easy to reason about, easy to extend.sub sanitize { return $_[0] =~ s~<[^>]*>~~gr # strip HTML tags =~ s~<~<~gr # decode HTML entities =~ s~>~>~gr =~ s~&~&~gr =~ s~['"\x00-\x1f]~~gr # strip quotes and control chars =~ s~^\s+|\s+$~~gr; # trim } my $safe = sanitize($user_input);
Want to add a step? Stick another =~ s~~~r in the chain. Want to
remove one? Delete the line. Order is explicit and visible.
Part 5: FORMAT CONVERSION
Converting between data formats is a natural fit:Notice the# Convert a MySQL timestamp to ISO 8601 my $iso = $mysql_ts =~ s~^(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})$~$1-$2-$3T$4:$5:$6~r; # Convert snake_case to camelCase my $camel = $snake =~ s~^(\w)~ uc($1) ~er # capitalize first letter =~ s~_(\w)~ uc($1) ~ger; # capitalize after underscores
/e flag combined with /r. The /e evaluates the
replacement as code. The /r returns the result. They play nicely
together.
Part 6: SLUG GENERATION
Turn a title into a URL slug. Classic web dev problem:Thesub slugify { return $_[0] =~ s~^\s+|\s+$~~gr # trim =~ s~['"]~~gr # strip quotes =~ s~[^a-zA-Z0-9\s-]~~gr # strip non-alphanumeric =~ s~\s+~-~gr # spaces to hyphens =~ s~-{2,}~-~gr # collapse multiple hyphens =~ s~^-|-$~~gr; # strip leading/trailing hyphens } my $slug = lc slugify(" What's New in Perl 5.40?! "); # "whats-new-in-perl-540"
lc at the call site lowercases the final result. You could put
it inside the function, but keeping it outside makes the pipeline
purely about structure, not case.
.--. |o_o | "Data flows down, clean comes out" |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/
Part 7: MAP + /r PIPELINES
Combine chained/r with map for array transformations:
Each element flows through the pipeline. The original array is untouched. You get a fresh, cleaned array back.my @clean = map { s~^\s+|\s+$~~gr =~ s~\.$~~r =~ s~(.+)~lc($1)~er } @raw_hostnames;
This is where /r really shines over in-place mutation. With map,
you're building a new list. You don't want side effects. The /r
flag keeps everything pure.
Without /r in a map, you'd accidentally mutate $_ (which is
aliased to the original array element) AND return the substitution
count instead of the string. Double trouble.
Part 8: GREP + PIPELINE COMBOS
Filter and transform in one chain:Read bottom to top (like Unix pipes): take raw input, keep only lines with an @, then trim and lowercase each one.my @valid_emails = map { s~^\s+|\s+$~~gr =~ s~(.+)~lc($1)~er } grep { m~\@~ } @raw_input;
Or for log analysis:
Grab lines with 500 errors, extract the IP address. Clean pipeline, no temporaries.my @error_ips = map { s~^.*?(\d+\.\d+\.\d+\.\d+).*$~$1~r } grep { m~\b500\b~ } @log_lines;
Part 9: ONE-LINERS WITH /r
Chained/r works beautifully in one-liners too:
Trim and normalize whitespace for every line of a file. Theperl -lpe '$_ = $_ =~ s~\s+$~~r =~ s~^\s+~~r =~ s~\s+~ ~gr' messy.txt
$_ = $_
at the start looks redundant, but you need it because =~ binds to
$_ implicitly only for the first substitution. After the first /r
returns a new string, you need explicit binding for the rest of the
chain.
Or more concisely, just chain from $_ directly:
The firstperl -lpe '$_ = s~\s+$~~r =~ s~^\s+~~r =~ s~\s+~ ~gr' messy.txt
s~~~r operates on $_ implicitly and returns the copy.
The chain continues from there. Then $_ = captures the final result
back into $_ so -p prints it.
Inline CSV cleaning:
Split on commas, trim and unquote each field, rejoin. A full CSV normalizer in one line.perl -F, -lane ' print join ",", map { s~^\s+|\s+$~~gr =~ s~"~~gr } @F ' data.csv
Part 10: GOTCHAS
Forgetting /r breaks the chain:The first substitution (nomy $bad = $text =~ s~foo~bar~g =~ s~baz~qux~gr;
/r) returns a count, not a string. Now
you're trying to regex-match a number. You'll get silence, not errors.
Always use /r on every link in the chain. No exceptions.
Forgetting /g when you need it:
Themy $result = $text =~ s~\s+~ ~r; # Only replaces FIRST match
/r flag doesn't imply /g. You still need /g to replace all
occurrences. Most pipeline steps want both: /gr.
Performance on huge strings:
Each /r creates a copy of the entire string. Five steps in a chain
means five copies. For most use cases this is fine. For multi-megabyte
strings, consider mutating in place or doing it in fewer steps.
Step 1: copy entire string ──→ modify Step 2: copy entire string ──→ modify Step 3: copy entire string ──→ modify ...each one allocates fresh memory
Part 11: THE FUNCTIONAL ANGLE
If you've used Unix pipes, you already think this way:Chainedcat file | sed 's/foo/bar/' | sed 's/baz/qux/' | tr -s ' '
/r is the same idea inside Perl:
Data flows left to right. Each step is a transformation. No state is mutated. The original survives.my $result = $text =~ s~foo~bar~gr =~ s~baz~qux~gr =~ s~\s+~ ~gr;
This is functional programming in Perl. Not with monads or functors, but with regex. Which is honestly more useful 99% of the time.
Part 12: BUILDING REUSABLE PIPELINES
You can store substitution steps as subrefs and compose them:Now your transformations are data. Add steps, remove steps, reorder them. Compose different pipelines for different contexts.my @pipeline = ( sub { $_[0] =~ s~\r\n~\n~gr }, sub { $_[0] =~ s~<[^>]*>~~gr }, sub { $_[0] =~ s~\s+~ ~gr }, sub { $_[0] =~ s~^\s|\s$~~gr }, ); sub run_pipeline { my ($text, @steps) = @_; $text = $_->($text) for @steps; return $text; } my $clean = run_pipeline($raw, @pipeline);
This is dispatch table thinking applied to text processing. Each step is isolated, testable, and reusable.
Part 13: BEFORE AND AFTER
The old way:Themy $text = $input; $text =~ s~\r\n~\n~g; $text =~ s~<[^>]*>~~g; $text =~ s~&~&~g; $text =~ s~\s+~ ~g; $text =~ s~^\s|\s$~~g; return $text;
/r way:
Same result. But thereturn $input =~ s~\r\n~\n~gr =~ s~<[^>]*>~~gr =~ s~&~&~gr =~ s~\s+~ ~gr =~ s~^\s|\s$~~gr;
/r version never creates a mutable variable.
There's no $text to accidentally use later in its modified state.
The input is preserved. The output is derived.
One expression. No side effects. That's the whole point.
perl.gg$input | ┌──────┴──────┐ | s~~~gr | step 1 ├─────────────┤ | s~~~gr | step 2 ├─────────────┤ | s~~~gr | step 3 ├─────────────┤ | s~~~gr | step 4 ├─────────────┤ | s~~~gr | step 5 └──────┬──────┘ | $result Immutable pipeline. Pure output.