<!-- category: hidden-gems -->
$/ the Shape-Shifting Record Separator
Perl reads input one line at a time. Until you tell it not to.That just slurped a whole file into a single scalar. No loop. No pushing into an array. One read, one variable, everything.local $/ = undef; my $entire_file = <$fh>;
The magic variable $/ controls what Perl considers a "record." By
default it's a newline, so you get lines. Change it and you change
the fundamental unit of input. Set it to undef and there are no
record boundaries at all. The whole file is one record.
But undef is just the beginning. $/ has four distinct modes,
each one a different shape for your data.
Part 1: THE DEFAULT (NEWLINE)
Out of the box,$/ is "\n". Every <$fh> read returns everything
up to and including the next newline:
This is so fundamental that most Perl programmers never think about it.open my $fh, '<', 'data.txt' or die $!; while (my $line = <$fh>) { chomp $line; # remove the trailing \n say "Got: $line"; } close $fh;
<> reads lines. Lines end with \n. Water is wet.
But $/ is just a variable. You can change it.
Part 2: UNDEF FOR SLURP MODE
Set$/ to undef and records have no delimiter. The next read
operation returns everything remaining in the filehandle:
Themy $content; { local $/; # undef by default when localized without assignment open my $fh, '<', 'config.json' or die $!; $content = <$fh>; close $fh; } # $content now has the entire file as one string my $data = decode_json($content);
local $/ idiom is so common it has a name: slurp mode.
When $/ is undef, Perl doesn't look for record separators at all.
It reads until EOF and gives you everything.
A slightly more compact version:
Themy $content = do { local $/; open my $fh, '<', $file or die "Cannot open $file: $!"; <$fh>; };
do block returns the value of the last expression, which is
<$fh> in slurp mode. One statement, entire file in a scalar.
Why use slurp mode? When you need to process the file as a whole. Multi-line regex matching. JSON parsing. Template processing. Anything where line-by-line processing makes things harder, not easier.
Part 3: EMPTY STRING FOR PARAGRAPH MODE
Set$/ to an empty string and Perl reads in paragraph mode.
A "paragraph" is a chunk of text separated by one or more blank
lines:
Given this input:local $/ = ''; while (my $paragraph = <$fh>) { chomp $paragraph; # remove trailing blank lines say "--- paragraph ---"; say $paragraph; say "--- end ---\n"; }
You get three reads. Each returns a complete paragraph including its internal newlines but terminated at the blank line boundary.This is the first paragraph. It has two lines. This is the second paragraph. It has one line. This is the third.
Paragraph mode is perfect for processing structured text documents, man page sections, changelog entries, or any format where blank lines separate logical records.
# Process a changelog local $/ = ''; while (my $entry = <$fh>) { if ($entry =~ m~^Version\s+(\S+)~) { my $version = $1; say "Found version: $version"; say $entry; } }
Part 4: REFERENCE TO INTEGER FOR FIXED-LENGTH RECORDS
Set$/ to a reference to an integer and Perl reads exactly that
many bytes:
The last read may return fewer bytes if the file isn't evenly divisible. Perl reads what's left and returns it.local $/ = \1024; # read 1024 bytes at a time while (my $chunk = <$fh>) { say "Read " . length($chunk) . " bytes"; process_chunk($chunk); }
This is how you process binary files. Images. Executables. Network protocols with fixed-length headers. Database dumps.
Note the# Read a binary file in 4096-byte chunks open my $fh, '<:raw', $binary_file or die $!; local $/ = \4096; my $offset = 0; while (my $block = <$fh>) { printf "Offset 0x%08X: %d bytes\n", $offset, length($block); $offset += length($block); } close $fh;
:raw layer on the open. When reading binary data with
fixed-length records, you want raw mode to avoid any newline
translation or encoding mangling.
You can also use this for simple hex dump tools:
open my $fh, '<:raw', $file or die $!; local $/ = \16; # 16 bytes per line my $addr = 0; while (my $chunk = <$fh>) { my $hex = join ' ', map { sprintf "%02X", ord($_) } split //, $chunk; my $ascii = $chunk; $ascii =~ s~[^\x20-\x7E]~.~g; printf "%08X %-48s %s\n", $addr, $hex, $ascii; $addr += length($chunk); }
Part 5: CUSTOM STRING DELIMITER
Set$/ to any string and Perl uses that as the record terminator:
Given this input:local $/ = "---\n"; # records end with "---" on its own line while (my $record = <$fh>) { chomp $record; # removes the "---\n" delimiter say "Record: $record"; }
Each read returns one complete record. The delimiter is consumed and included in the returned string (until youname: Alice age: 30 --- name: Bob age: 25 --- name: Carol age: 35 ---
chomp it off).
This is incredibly useful for processing multi-line log entries:
Or processing custom data formats:# Apache error logs with multi-line stack traces # separated by blank lines followed by timestamps local $/ = "\n["; # records start with [timestamp] while (my $entry = <$fh>) { chomp $entry; if ($entry =~ m~error~i) { say "ERROR ENTRY:"; say $entry; } }
# Records separated by "%%" local $/ = "%%\n"; while (my $card = <$fh>) { chomp $card; next unless length $card; process_card($card); }
Part 6: PRACTICAL LOG PROCESSING
Real-world scenario: Java stack traces in a log file. Each exception spans multiple lines. You need to find allNullPointerException
entries.
Line-by-line processing is painful. You'd need to buffer lines,
detect boundaries, and assemble records manually. With $/, you
declare the boundary and let Perl do the work:
What would be 30 lines of stateful line-by-line processing becomes 10 lines of clean paragraph-mode reading.#!/usr/bin/env perl use strict; use warnings; use feature 'say'; my $log_file = $ARGV[0] or die "Usage: $0 <logfile>\n"; open my $fh, '<', $log_file or die "Cannot open $log_file: $!\n"; # Stack traces are separated by blank lines local $/ = "\n\n"; my $count = 0; while (my $entry = <$fh>) { if ($entry =~ m~NullPointerException~) { $count++; say "=== NPE #$count ==="; print $entry; } } close $fh; say "\nTotal NullPointerExceptions: $count";
Part 7: READING BINARY DATA IN CHUNKS
Processing a large binary file without loading it all into memory:The file could be gigabytes. Memory usage stays constant at 8 KB plus overhead. You're streaming through it in fixed-size chunks, which is exactly what#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use Digest::MD5; my $file = $ARGV[0] or die "Usage: $0 <file>\n"; open my $fh, '<:raw', $file or die "Cannot open $file: $!\n"; my $md5 = Digest::MD5->new; my $total_bytes = 0; { local $/ = \8192; # 8 KB chunks while (my $chunk = <$fh>) { $md5->add($chunk); $total_bytes += length($chunk); } } close $fh; printf "File: %s\nSize: %d bytes\nMD5: %s\n", $file, $total_bytes, $md5->hexdigest;
\$integer mode gives you.
Part 8: LOCAL $/ FOR SCOPED CHANGES
Always uselocal $/ to change the record separator. Never assign
to $/ directly in production code:
The# GOOD: scoped change, automatically restored { local $/; my $content = <$fh>; } # $/ is back to "\n" here # BAD: global change, affects everything $/ = undef; my $content = <$fh>; # $/ is still undef! every <> read in the rest of the program slurps!
local keyword saves the current value, sets the new one, and
automatically restores the original when the enclosing block exits.
It's dynamic scoping. Any code called from within that block also
sees the changed $/, which is usually what you want.
You can nest scoped changes:
Paragraph mode on the outer loop. Line mode on the inner loop. Both scoped. Both clean up after themselves.{ local $/ = ''; # paragraph mode while (my $para = <$fh>) { # within this loop, process each paragraph's lines local $/ = "\n"; open my $line_fh, '<', \$para; while (my $line = <$line_fh>) { chomp $line; # process individual line within the paragraph } } }
Part 9: THE CHOMP INTERACTION
chomp removes whatever $/ is set to from the end of a string. It
doesn't just remove newlines. It removes the current record
separator.
In slurp mode (local $/ = "---\n"; my $record = <$fh>; # "name: Alice\nage: 30\n---\n" chomp $record; # "name: Alice\nage: 30\n"
$/ = undef), chomp does nothing. There's no
separator to remove.
In paragraph mode (local $/; my $content = <$fh>; chomp $content; # no-op, $/ is undef
$/ = ''), chomp removes all trailing newlines,
not just one:
This is a subtlety that bites people. In paragraph mode,local $/ = ''; my $para = <$fh>; # "Hello\nWorld\n\n\n" chomp $para; # "Hello\nWorld"
chomp is
greedy with trailing newlines. It strips all of them, not just the
separator.
With fixed-length records ($/ = \N), chomp does nothing. There's
no string separator to remove from a byte-counted read.
Part 10: THE COMPLETE REFERENCE
Most Perl programmers useVALUE OF $/ MODE BEHAVIOR ----------- ---- -------- "\n" (default) Line mode Read until newline undef Slurp mode Read entire file at once "" Paragraph mode Read until blank line(s) \1024 Fixed-length Read exactly N bytes "---\n" Custom delimiter Read until string match $/ | +-- "\n" -> line by line (the one everyone knows) | +-- undef -> whole file, one gulp | +-- "" -> paragraph by paragraph | +-- \N -> N bytes at a time | +-- "xyz" -> read until "xyz" appears "One variable controls how Perl sees the shape of your data." .--. |o_o | |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/
$/ in exactly one way: the default. They
read lines and never think about it. But $/ has four other modes
that can each save you dozens of lines of manual record assembly.
Before you write a loop that buffers lines and detects boundaries,
ask yourself: can I just set $/ to the boundary and let Perl do
it? The answer is usually yes.
perl.gg