🐪 Chained /r Substitution Pipelines

2026-03-21

Transform text without mutating anything. Chain substitutions like Unix pipes, each one feeding its result to the next.

my $clean = $raw
    =~ s~\r\n~\n~gr
    =~ s~<[^>]*>~~gr
    =~ s~&amp;~&~gr
    =~ s~\s+~ ~gr
    =~ s~^\s|\s$~~gr;

Five transformations. One expression. The original $raw is untouched.

The magic is the /r flag, introduced in Perl 5.14. Instead of modifying $_ or the bound variable, it returns a modified copy. Chain them together and you get a functional pipeline where data flows through a series of transformations.

Part 1: THE /r FLAG

Normally, s~~~ modifies the variable and returns the number of substitutions made:

my $text = "Hello World";
my $count = ($text =~ s~World~Perl~);
# $count is 1, $text is "Hello Perl"

With /r, it returns the modified string and leaves the original alone:

my $text = "Hello World";
my $new = ($text =~ s~World~Perl~r);
# $new is "Hello Perl", $text is still "Hello World"

The variable is never touched. You get a fresh copy with the substitution applied. Think of it as "r" for "return" (or "non-destructive," if you're reading the docs).

Part 2: CHAINING THE CHAINS

Since s~~~r returns a string, you can immediately bind another substitution to that string:

my $result = $text =~ s~foo~bar~gr =~ s~baz~qux~gr;

$text ─── s~foo~bar~gr ──→ (copy1) ─── s~baz~qux~gr ──→ $result

Original        First              Intermediate         Second           Final
string       substitution            copy            substitution       result

Each =~ s~~~r takes the string on its left, applies the substitution, and returns a new string. The next =~ binds to that new string.

No temporary variables. No intermediate assignments. Just a pipeline.

Part 3: PRACTICAL PIPELINE - LOG CLEANING

You're parsing a messy log file. Each line needs several cleanups before it's useful:

while (my $line = <;$fh>;) {
    my $clean = $line
        =~ s~\s+$~~r              # strip trailing whitespace
        =~ s~^\[[\d:/ ]+\]\s*~~r  # strip timestamp brackets
        =~ s~\x1b\[[0-9;]*m~~gr  # strip ANSI color codes
        =~ s~\s+~ ~gr;            # collapse whitespace

    push @entries, $clean if length $clean;
}

Four transformations, one expression. The original $line survives for debugging if you need it. Every step is documented with a comment.

Compare to the mutation approach:

while (my $line = <;$fh>;) {
    $line =~ s~\s+$~~;
    $line =~ s~^\[[\d:/ ]+\]\s*~~;
    $line =~ s~\x1b\[[0-9;]*m~~g;
    $line =~ s~\s+~ ~g;
    push @entries, $line if length $line;
}

Works fine, but $line is destroyed. If something goes wrong at step three and you want to see the original input, it's gone.

Part 4: DATA SANITIZATION

User input is filthy. Clean it in one pass:

sub sanitize {
    return $_[0]
        =~ s~<[^>]*>~~gr           # strip HTML tags
        =~ s~&lt;~<~gr             # decode HTML entities
        =~ s~&gt;~>~gr
        =~ s~&amp;~&~gr
        =~ s~['"\x00-\x1f]~~gr    # strip quotes and control chars
        =~ s~^\s+|\s+$~~gr;        # trim
}

my $safe = sanitize($user_input);

Pure function. Input in, cleaned output out. Nothing mutated. Easy to test, easy to reason about, easy to extend.

Want to add a step? Stick another =~ s~~~r in the chain. Want to remove one? Delete the line. Order is explicit and visible.

Part 5: FORMAT CONVERSION

Converting between data formats is a natural fit:

# Convert a MySQL timestamp to ISO 8601
my $iso = $mysql_ts
    =~ s~^(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})$~$1-$2-$3T$4:$5:$6~r;

# Convert snake_case to camelCase
my $camel = $snake
    =~ s~^(\w)~ uc($1) ~er       # capitalize first letter
    =~ s~_(\w)~ uc($1) ~ger;     # capitalize after underscores

Notice the /e flag combined with /r. The /e evaluates the replacement as code. The /r returns the result. They play nicely together.

Part 6: SLUG GENERATION

Turn a title into a URL slug. Classic web dev problem:

sub slugify {
    return $_[0]
        =~ s~^\s+|\s+$~~gr        # trim
        =~ s~['"]~~gr              # strip quotes
        =~ s~[^a-zA-Z0-9\s-]~~gr  # strip non-alphanumeric
        =~ s~\s+~-~gr             # spaces to hyphens
        =~ s~-{2,}~-~gr           # collapse multiple hyphens
        =~ s~^-|-$~~gr;           # strip leading/trailing hyphens
}

my $slug = lc slugify("  What's New in Perl 5.40?! ");
# "whats-new-in-perl-540"

The lc at the call site lowercases the final result. You could put it inside the function, but keeping it outside makes the pipeline purely about structure, not case.

      .--.
     |o_o |    "Data flows down, clean comes out"
     |:_/ |
    //   \ \
   (|     | )
  /'\_   _/`\
  \___)=(___/

Part 7: MAP + /r PIPELINES

Combine chained /r with map for array transformations:

my @clean = map {
    s~^\s+|\s+$~~gr
    =~ s~\.$~~r
    =~ s~(.+)~lc($1)~er
} @raw_hostnames;

Each element flows through the pipeline. The original array is untouched. You get a fresh, cleaned array back.

This is where /r really shines over in-place mutation. With map, you're building a new list. You don't want side effects. The /r flag keeps everything pure.

Without /r in a map, you'd accidentally mutate $_ (which is aliased to the original array element) AND return the substitution count instead of the string. Double trouble.

Part 8: GREP + PIPELINE COMBOS

Filter and transform in one chain:

my @valid_emails =
    map  { s~^\s+|\s+$~~gr =~ s~(.+)~lc($1)~er }
    grep { m~\@~ }
    @raw_input;

Read bottom to top (like Unix pipes): take raw input, keep only lines with an @, then trim and lowercase each one.

Or for log analysis:

my @error_ips =
    map  { s~^.*?(\d+\.\d+\.\d+\.\d+).*$~$1~r }
    grep { m~\b500\b~ }
    @log_lines;

Grab lines with 500 errors, extract the IP address. Clean pipeline, no temporaries.

Part 9: ONE-LINERS WITH /r

Chained /r works beautifully in one-liners too:

perl -lpe '$_ = $_ =~ s~\s+$~~r =~ s~^\s+~~r =~ s~\s+~ ~gr' messy.txt

Trim and normalize whitespace for every line of a file. The $_ = $_ at the start looks redundant, but you need it because =~ binds to $_ implicitly only for the first substitution. After the first /r returns a new string, you need explicit binding for the rest of the chain.

Or more concisely, just chain from $_ directly:

perl -lpe '$_ = s~\s+$~~r =~ s~^\s+~~r =~ s~\s+~ ~gr' messy.txt

The first s~~~r operates on $_ implicitly and returns the copy. The chain continues from there. Then $_ = captures the final result back into $_ so -p prints it.

Inline CSV cleaning:

perl -F, -lane '
    print join ",", map { s~^\s+|\s+$~~gr =~ s~"~~gr } @F
' data.csv

Split on commas, trim and unquote each field, rejoin. A full CSV normalizer in one line.

Part 10: GOTCHAS

Forgetting /r breaks the chain:

my $bad = $text =~ s~foo~bar~g =~ s~baz~qux~gr;

The first substitution (no /r) returns a count, not a string. Now you're trying to regex-match a number. You'll get silence, not errors.

Always use /r on every link in the chain. No exceptions.

Forgetting /g when you need it:

my $result = $text =~ s~\s+~ ~r;    # Only replaces FIRST match

The /r flag doesn't imply /g. You still need /g to replace all occurrences. Most pipeline steps want both: /gr.

Performance on huge strings:

Each /r creates a copy of the entire string. Five steps in a chain means five copies. For most use cases this is fine. For multi-megabyte strings, consider mutating in place or doing it in fewer steps.

Step 1: copy entire string ──→ modify
Step 2: copy entire string ──→ modify
Step 3: copy entire string ──→ modify
   ...each one allocates fresh memory

Part 11: THE FUNCTIONAL ANGLE

If you've used Unix pipes, you already think this way:

cat file | sed 's/foo/bar/' | sed 's/baz/qux/' | tr -s ' '

Chained /r is the same idea inside Perl:

my $result = $text =~ s~foo~bar~gr =~ s~baz~qux~gr =~ s~\s+~ ~gr;

Data flows left to right. Each step is a transformation. No state is mutated. The original survives.

This is functional programming in Perl. Not with monads or functors, but with regex. Which is honestly more useful 99% of the time.

Part 12: BUILDING REUSABLE PIPELINES

You can store substitution steps as subrefs and compose them:

my @pipeline = (
    sub { $_[0] =~ s~\r\n~\n~gr },
    sub { $_[0] =~ s~<[^>]*>~~gr },
    sub { $_[0] =~ s~\s+~ ~gr },
    sub { $_[0] =~ s~^\s|\s$~~gr },
);

sub run_pipeline {
    my ($text, @steps) = @_;
    $text = $_->;($text) for @steps;
    return $text;
}

my $clean = run_pipeline($raw, @pipeline);

Now your transformations are data. Add steps, remove steps, reorder them. Compose different pipelines for different contexts.

This is dispatch table thinking applied to text processing. Each step is isolated, testable, and reusable.

Part 13: BEFORE AND AFTER

The old way:

my $text = $input;
$text =~ s~\r\n~\n~g;
$text =~ s~<[^>]*>~~g;
$text =~ s~&amp;~&~g;
$text =~ s~\s+~ ~g;
$text =~ s~^\s|\s$~~g;
return $text;

The /r way:

return $input
    =~ s~\r\n~\n~gr
    =~ s~<[^>]*>~~gr
    =~ s~&amp;~&~gr
    =~ s~\s+~ ~gr
    =~ s~^\s|\s$~~gr;

Same result. But the /r version never creates a mutable variable. There's no $text to accidentally use later in its modified state. The input is preserved. The output is derived.

One expression. No side effects. That's the whole point.

        $input
           |
    ┌──────┴──────┐
    |   s~~~gr    |  step 1
    ├─────────────┤
    |   s~~~gr    |  step 2
    ├─────────────┤
    |   s~~~gr    |  step 3
    ├─────────────┤
    |   s~~~gr    |  step 4
    ├─────────────┤
    |   s~~~gr    |  step 5
    └──────┬──────┘
           |
        $result

    Immutable pipeline. Pure output.

perl.gg