<!-- category: hidden-gems -->
use open qw(:std :utf8) - The :std Trick
You write a Perl script. It handles UTF-8 data. You do everything right. Set encoding on your file opens. Decode input. Encode output.Then you print to STDOUT and get:
Your files are fine. Your data is fine. But STDOUT does not know about UTF-8. It is still in raw bytes mode because it was opened before your script even started running.Wide character in print at script.pl line 42.
One line fixes everything:
That is it. Every filehandle you open gets UTF-8 encoding. And the magic part:use open qw(:std :utf8);
:std retroactively applies the encoding to STDIN,
STDOUT, and STDERR, which are already open.
Part 1: THE UTF-8 PROBLEM
Perl's default is bytes. Everything is bytes. When youopen a
file without specifying an encoding, Perl reads and writes raw
bytes. No encoding, no decoding, no opinion about what those
bytes mean.
This means if your file contains the character (U+00E9, two bytes in UTF-8), Perl sees two bytes, not one character. String functions like# default: raw bytes open my $fh, '<', 'data.txt'; my $line = <$fh>; # $line contains raw bytes, even if the file is UTF-8
length, substr, and regex operate on bytes, not characters.
To get character semantics, you need to tell Perl "this is UTF-8":my $str = "caf\xc3\xa9"; # "cafe" as raw UTF-8 bytes say length($str); # 5 (bytes), not 4 (characters)
But doing this manually for every piece of data is tedious and error-prone.use Encode qw(decode); my $decoded = decode('UTF-8', $str); say length($decoded); # 4 (characters)
Part 2: BASIC USE OPEN
Theopen pragma sets default encoding layers for all open
calls in the current lexical scope:
No explicit encoding layer needed on eachuse open ':encoding(UTF-8)'; # now every open in this scope gets UTF-8 automatically open my $fh, '<', 'data.txt'; # UTF-8 decoding on read my $line = <$fh>; # $line is decoded characters open my $out, '>', 'out.txt'; # UTF-8 encoding on write print $out "caf\x{e9}\n"; # writes valid UTF-8 bytes
open. The pragma
handles it.
But there is a gap. STDIN, STDOUT, and STDERR were opened by
Perl's runtime before your use open line executes. The pragma
does not affect already-open handles.
This is the gap thatuse open ':encoding(UTF-8)'; # files opened from now on: UTF-8, good open my $fh, '<', 'data.txt'; # but STDOUT was already open, still in bytes mode print "caf\x{e9}\n"; # Wide character in print!
:std fills.
Part 3: WHAT :STD DOES
The:std flag retroactively applies the encoding to the three
standard handles:
When Perl seesuse open qw(:std :encoding(UTF-8));
:std, it calls binmode on STDIN, STDOUT, and
STDERR with the specified encoding layer. Effectively it does:
But it does it at compile time, as a pragma, so it takes effect before any of your code runs. And it combines with the default layer for new filehandles, so you get complete UTF-8 coverage in one line.binmode(STDIN, ':encoding(UTF-8)'); binmode(STDOUT, ':encoding(UTF-8)'); binmode(STDERR, ':encoding(UTF-8)');
One line. Every handle. Past, present, and future.use open qw(:std :encoding(UTF-8)); # STDOUT now handles UTF-8 print "caf\x{e9}\n"; # no warning, correct output # new files also get UTF-8 open my $fh, '>', 'out.txt'; print $fh "\x{2603}\n"; # snowman character, encoded properly
Part 4: :UTF8 VS :ENCODING(UTF-8)
You will see both forms:They are NOT identical.use open qw(:std :utf8); use open qw(:std :encoding(UTF-8));
:utf8 tells Perl "trust that the data is valid UTF-8 and just
flip the internal flag." It does not validate. If the input
contains malformed byte sequences, Perl will happily accept them
and you get garbage data with no warning.
:encoding(UTF-8) actually validates the bytes. Malformed
sequences produce warnings and are replaced with the Unicode
replacement character (U+FFFD).
For data you control (your own files, your own output),LAYER VALIDATES? SPEED USE WHEN ------------------- ---------- ------- ------------------- :utf8 No Faster You trust the input :encoding(UTF-8) Yes Slower Input might be bad
:utf8
is fine and faster. For user input, network data, or anything
from an untrusted source, :encoding(UTF-8) is safer.
In practice, most people use :utf8 because the validation
overhead is measurable on large files and most UTF-8 data in the
wild is valid. But if you are processing user uploads or scraping
web pages, use :encoding(UTF-8).
Part 5: THE ONE-LINE FIX
If your program has "Wide character in print" warnings, this is probably all you need:Put it in your boilerplate. Right after#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use open qw(:std :utf8); # everything works now say "\x{1F600}"; # grinning face emoji say "caf\x{e9}"; # accented e say "\x{2603}"; # snowman
use feature 'say'. One
line, always there, problem gone forever.
For a module (as opposed to a script), be careful. The pragma is lexically scoped, so it only affects the current file. It does not leak into code that uses your module:
The calling script's filehandles are not affected unless the script also uses the pragma.package MyModule; use open qw(:std :utf8); # affects this file only sub process { # files opened here get UTF-8 open my $fh, '<', $_[0]; # ... } 1;
Part 6: THE BINMODE ALTERNATIVE
If you do not want to use a pragma, you can set encoding on individual handles withbinmode:
This does the same thing asbinmode(STDOUT, ':utf8'); binmode(STDERR, ':utf8'); binmode(STDIN, ':utf8');
:std but manually. It does not
set a default for future open calls.
For one specific handle:
Or set it in theopen my $fh, '<', 'data.txt' or die $!; binmode($fh, ':utf8');
open call directly:
These are all equivalent for that one handle. The advantage ofopen my $fh, '<:utf8', 'data.txt' or die $!;
use open qw(:std :utf8) is that it covers everything at once.
No chance of forgetting one handle.
Part 7: PERL_UNICODE ENVIRONMENT VARIABLE
You can also set UTF-8 handling from outside the script entirely:The letters stand for:export PERL_UNICODE=SDA perl script.pl
SoS = STDIN D = STDOUT (and STDERR) A = @ARGV (decode command-line arguments)
PERL_UNICODE=SDA is roughly equivalent to:
On the command line, you can also use theuse open qw(:std :utf8); use Encode qw(decode); @ARGV = map { decode('UTF-8', $_) } @ARGV;
-C flag:
This is handy for one-liners:perl -CSDA script.pl perl -CS # just STDIN perl -CSD # STDIN and STDOUT
The environment variable is great for system-wide configuration. Dropperl -CSDA -ne 'print if m~\p{Han}~' chinese-text.txt
PERL_UNICODE=SDA in your shell profile and every Perl
script you run gets UTF-8 handling automatically.
Part 8: COMMON PITFALLS
Double encoding. If you apply:utf8 to a handle and then
manually encode before printing, you get double-encoded output:
Pick one approach. Either use the IO layer OR manual encode/decode. Not both.use open qw(:std :utf8); use Encode qw(encode); # BAD: double encoding print encode('UTF-8', "caf\x{e9}"); # The encode() produces UTF-8 bytes # Then the :utf8 layer encodes them AGAIN
Binary files. If you use use open qw(:std :utf8) and then
try to read a binary file (image, gzip archive), the UTF-8 layer
will corrupt the data:
Fix it by explicitly removing the layer:use open qw(:std :utf8); # BAD: reading a binary file through a UTF-8 layer open my $fh, '<', 'photo.jpg'; # gets :utf8 automatically my $data = do { local $/; <$fh> }; # $data is corrupted
Theopen my $fh, '<:raw', 'photo.jpg'; # override the default my $data = do { local $/; <$fh> }; # $data is correct binary content
:raw layer strips all encoding layers and gives you clean
bytes. Always use it for binary files.
Mixing encoded and raw handles. Once you set a global
default, be aware that every open gets it. If you open a file
that is Latin-1 encoded, not UTF-8, the :utf8 layer will
misinterpret the bytes:
Override per-file when needed:use open qw(:std :utf8); # this file is actually Latin-1 open my $fh, '<', 'legacy.txt'; # Perl tries to decode as UTF-8, gets garbage or errors
open my $fh, '<:encoding(iso-8859-1)', 'legacy.txt';
Part 9: TESTING YOUR ENCODING
A quick diagnostic script to verify your encoding setup:Expected output (on a UTF-8 terminal):#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use open qw(:std :utf8); # test characters from different Unicode blocks my @tests = ( ['ASCII', 'Hello World'], ['Latin-1', "caf\x{e9}"], ['Cyrillic', "\x{41f}\x{440}\x{438}\x{432}\x{435}\x{442}"], ['CJK', "\x{4f60}\x{597d}"], ['Emoji', "\x{1F600}\x{1F389}"], ['Math', "\x{221a}\x{222b}\x{2211}"], ); for my $test (@tests) { my ($name, $str) = @$test; printf "%-10s %s (chars: %d, bytes: %d)\n", $name, $str, length($str), do { use bytes; length($str); }; }
If you see "Wide character" warnings or garbled output, your encoding setup is wrong. Go back and add the pragma.ASCII Hello World (chars: 11, bytes: 11) Latin-1 cafe (chars: 4, bytes: 5) Cyrillic Privyet (chars: 6, bytes: 12) CJK ni hao (chars: 2, bytes: 6) Emoji (2 chars) (chars: 2, bytes: 8) Math sqrt/int/sum (chars: 3, bytes: 9)
Part 10: THE COMPLETE BOILERPLATE
Here is the recommended Perl script boilerplate for 2026:Four lines. You get strict mode, warnings,#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use open qw(:std :utf8);
say, and full
UTF-8 support on every filehandle including STDIN/STDOUT/STDERR.
For scripts that also handle binary data, add :raw overrides
where needed:
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use open qw(:std :utf8); # text files: automatic UTF-8 open my $text, '<', 'input.txt' or die $!; # binary files: explicit raw mode open my $bin, '<:raw', 'image.png' or die $!;
Perl's default byte-mode is a relic of a time when ASCII was enough. That time ended decades ago. Every modern Perl script should declare its encoding intent upfront..--. |o_o | ":std - because STDOUT |:_/ | was already open." // \ \ (| | ) /'\_ _/`\ \___)=(___/
use open qw(:std :utf8) is not optional for programs that
handle text. It is the baseline. Without it, you are one
accented character away from a warning and one emoji away from
corrupt output.
The :std flag is the key insight. Without it, the pragma only
affects handles you open yourself. The three handles that Perl
opens for you (STDIN, STDOUT, STDERR) stay in bytes mode, and
that is exactly where your output goes. The :std flag closes
that gap retroactively.
One pragma. One line. No more "Wide character in print." Put it in your boilerplate and forget about it.
perl.gg