perl.gg / hidden-gems

<!-- category: hidden-gems -->

use open qw(:std :utf8) - The :std Trick

2026-04-21

You write a Perl script. It handles UTF-8 data. You do everything right. Set encoding on your file opens. Decode input. Encode output.

Then you print to STDOUT and get:

Wide character in print at script.pl line 42.
Your files are fine. Your data is fine. But STDOUT does not know about UTF-8. It is still in raw bytes mode because it was opened before your script even started running.

One line fixes everything:

use open qw(:std :utf8);
That is it. Every filehandle you open gets UTF-8 encoding. And the magic part: :std retroactively applies the encoding to STDIN, STDOUT, and STDERR, which are already open.

Part 1: THE UTF-8 PROBLEM

Perl's default is bytes. Everything is bytes. When you open a file without specifying an encoding, Perl reads and writes raw bytes. No encoding, no decoding, no opinion about what those bytes mean.
# default: raw bytes open my $fh, '<', 'data.txt'; my $line = <$fh>; # $line contains raw bytes, even if the file is UTF-8
This means if your file contains the character (U+00E9, two bytes in UTF-8), Perl sees two bytes, not one character. String functions like length, substr, and regex operate on bytes, not characters.
my $str = "caf\xc3\xa9"; # "cafe" as raw UTF-8 bytes say length($str); # 5 (bytes), not 4 (characters)
To get character semantics, you need to tell Perl "this is UTF-8":
use Encode qw(decode); my $decoded = decode('UTF-8', $str); say length($decoded); # 4 (characters)
But doing this manually for every piece of data is tedious and error-prone.

Part 2: BASIC USE OPEN

The open pragma sets default encoding layers for all open calls in the current lexical scope:
use open ':encoding(UTF-8)'; # now every open in this scope gets UTF-8 automatically open my $fh, '<', 'data.txt'; # UTF-8 decoding on read my $line = <$fh>; # $line is decoded characters open my $out, '>', 'out.txt'; # UTF-8 encoding on write print $out "caf\x{e9}\n"; # writes valid UTF-8 bytes
No explicit encoding layer needed on each open. The pragma handles it.

But there is a gap. STDIN, STDOUT, and STDERR were opened by Perl's runtime before your use open line executes. The pragma does not affect already-open handles.

use open ':encoding(UTF-8)'; # files opened from now on: UTF-8, good open my $fh, '<', 'data.txt'; # but STDOUT was already open, still in bytes mode print "caf\x{e9}\n"; # Wide character in print!
This is the gap that :std fills.

Part 3: WHAT :STD DOES

The :std flag retroactively applies the encoding to the three standard handles:
use open qw(:std :encoding(UTF-8));
When Perl sees :std, it calls binmode on STDIN, STDOUT, and STDERR with the specified encoding layer. Effectively it does:
binmode(STDIN, ':encoding(UTF-8)'); binmode(STDOUT, ':encoding(UTF-8)'); binmode(STDERR, ':encoding(UTF-8)');
But it does it at compile time, as a pragma, so it takes effect before any of your code runs. And it combines with the default layer for new filehandles, so you get complete UTF-8 coverage in one line.
use open qw(:std :encoding(UTF-8)); # STDOUT now handles UTF-8 print "caf\x{e9}\n"; # no warning, correct output # new files also get UTF-8 open my $fh, '>', 'out.txt'; print $fh "\x{2603}\n"; # snowman character, encoded properly
One line. Every handle. Past, present, and future.

Part 4: :UTF8 VS :ENCODING(UTF-8)

You will see both forms:
use open qw(:std :utf8); use open qw(:std :encoding(UTF-8));
They are NOT identical.

:utf8 tells Perl "trust that the data is valid UTF-8 and just flip the internal flag." It does not validate. If the input contains malformed byte sequences, Perl will happily accept them and you get garbage data with no warning.

:encoding(UTF-8) actually validates the bytes. Malformed sequences produce warnings and are replaced with the Unicode replacement character (U+FFFD).

LAYER VALIDATES? SPEED USE WHEN ------------------- ---------- ------- ------------------- :utf8 No Faster You trust the input :encoding(UTF-8) Yes Slower Input might be bad
For data you control (your own files, your own output), :utf8 is fine and faster. For user input, network data, or anything from an untrusted source, :encoding(UTF-8) is safer.

In practice, most people use :utf8 because the validation overhead is measurable on large files and most UTF-8 data in the wild is valid. But if you are processing user uploads or scraping web pages, use :encoding(UTF-8).

Part 5: THE ONE-LINE FIX

If your program has "Wide character in print" warnings, this is probably all you need:
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use open qw(:std :utf8); # everything works now say "\x{1F600}"; # grinning face emoji say "caf\x{e9}"; # accented e say "\x{2603}"; # snowman
Put it in your boilerplate. Right after use feature 'say'. One line, always there, problem gone forever.

For a module (as opposed to a script), be careful. The pragma is lexically scoped, so it only affects the current file. It does not leak into code that uses your module:

package MyModule; use open qw(:std :utf8); # affects this file only sub process { # files opened here get UTF-8 open my $fh, '<', $_[0]; # ... } 1;
The calling script's filehandles are not affected unless the script also uses the pragma.

Part 6: THE BINMODE ALTERNATIVE

If you do not want to use a pragma, you can set encoding on individual handles with binmode:
binmode(STDOUT, ':utf8'); binmode(STDERR, ':utf8'); binmode(STDIN, ':utf8');
This does the same thing as :std but manually. It does not set a default for future open calls.

For one specific handle:

open my $fh, '<', 'data.txt' or die $!; binmode($fh, ':utf8');
Or set it in the open call directly:
open my $fh, '<:utf8', 'data.txt' or die $!;
These are all equivalent for that one handle. The advantage of use open qw(:std :utf8) is that it covers everything at once. No chance of forgetting one handle.

Part 7: PERL_UNICODE ENVIRONMENT VARIABLE

You can also set UTF-8 handling from outside the script entirely:
export PERL_UNICODE=SDA perl script.pl
The letters stand for:
S = STDIN D = STDOUT (and STDERR) A = @ARGV (decode command-line arguments)
So PERL_UNICODE=SDA is roughly equivalent to:
use open qw(:std :utf8); use Encode qw(decode); @ARGV = map { decode('UTF-8', $_) } @ARGV;
On the command line, you can also use the -C flag:
perl -CSDA script.pl perl -CS # just STDIN perl -CSD # STDIN and STDOUT
This is handy for one-liners:
perl -CSDA -ne 'print if m~\p{Han}~' chinese-text.txt
The environment variable is great for system-wide configuration. Drop PERL_UNICODE=SDA in your shell profile and every Perl script you run gets UTF-8 handling automatically.

Part 8: COMMON PITFALLS

Double encoding. If you apply :utf8 to a handle and then manually encode before printing, you get double-encoded output:
use open qw(:std :utf8); use Encode qw(encode); # BAD: double encoding print encode('UTF-8', "caf\x{e9}"); # The encode() produces UTF-8 bytes # Then the :utf8 layer encodes them AGAIN
Pick one approach. Either use the IO layer OR manual encode/decode. Not both.

Binary files. If you use use open qw(:std :utf8) and then try to read a binary file (image, gzip archive), the UTF-8 layer will corrupt the data:

use open qw(:std :utf8); # BAD: reading a binary file through a UTF-8 layer open my $fh, '<', 'photo.jpg'; # gets :utf8 automatically my $data = do { local $/; <$fh> }; # $data is corrupted
Fix it by explicitly removing the layer:
open my $fh, '<:raw', 'photo.jpg'; # override the default my $data = do { local $/; <$fh> }; # $data is correct binary content
The :raw layer strips all encoding layers and gives you clean bytes. Always use it for binary files.

Mixing encoded and raw handles. Once you set a global default, be aware that every open gets it. If you open a file that is Latin-1 encoded, not UTF-8, the :utf8 layer will misinterpret the bytes:

use open qw(:std :utf8); # this file is actually Latin-1 open my $fh, '<', 'legacy.txt'; # Perl tries to decode as UTF-8, gets garbage or errors
Override per-file when needed:
open my $fh, '<:encoding(iso-8859-1)', 'legacy.txt';

Part 9: TESTING YOUR ENCODING

A quick diagnostic script to verify your encoding setup:
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use open qw(:std :utf8); # test characters from different Unicode blocks my @tests = ( ['ASCII', 'Hello World'], ['Latin-1', "caf\x{e9}"], ['Cyrillic', "\x{41f}\x{440}\x{438}\x{432}\x{435}\x{442}"], ['CJK', "\x{4f60}\x{597d}"], ['Emoji', "\x{1F600}\x{1F389}"], ['Math', "\x{221a}\x{222b}\x{2211}"], ); for my $test (@tests) { my ($name, $str) = @$test; printf "%-10s %s (chars: %d, bytes: %d)\n", $name, $str, length($str), do { use bytes; length($str); }; }
Expected output (on a UTF-8 terminal):
ASCII Hello World (chars: 11, bytes: 11) Latin-1 cafe (chars: 4, bytes: 5) Cyrillic Privyet (chars: 6, bytes: 12) CJK ni hao (chars: 2, bytes: 6) Emoji (2 chars) (chars: 2, bytes: 8) Math sqrt/int/sum (chars: 3, bytes: 9)
If you see "Wide character" warnings or garbled output, your encoding setup is wrong. Go back and add the pragma.

Part 10: THE COMPLETE BOILERPLATE

Here is the recommended Perl script boilerplate for 2026:
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use open qw(:std :utf8);
Four lines. You get strict mode, warnings, say, and full UTF-8 support on every filehandle including STDIN/STDOUT/STDERR.

For scripts that also handle binary data, add :raw overrides where needed:

#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use open qw(:std :utf8); # text files: automatic UTF-8 open my $text, '<', 'input.txt' or die $!; # binary files: explicit raw mode open my $bin, '<:raw', 'image.png' or die $!;
.--. |o_o | ":std - because STDOUT |:_/ | was already open." // \ \ (| | ) /'\_ _/`\ \___)=(___/
Perl's default byte-mode is a relic of a time when ASCII was enough. That time ended decades ago. Every modern Perl script should declare its encoding intent upfront.

use open qw(:std :utf8) is not optional for programs that handle text. It is the baseline. Without it, you are one accented character away from a warning and one emoji away from corrupt output.

The :std flag is the key insight. Without it, the pragma only affects handles you open yourself. The three handles that Perl opens for you (STDIN, STDOUT, STDERR) stay in bytes mode, and that is exactly where your output goes. The :std flag closes that gap retroactively.

One pragma. One line. No more "Wide character in print." Put it in your boilerplate and forget about it.

perl.gg