perl.gg / snippets

<!-- category: snippets -->

DATA: Embed Data Inside Your Script

2026-04-13

Everything after __DATA__ in a Perl script is readable as a filehandle:
while (<DATA>) { chomp; say "Got: $_"; } __DATA__ alpha bravo charlie
That's it. No external files. No heredoc gymnastics. No argument parsing. The data lives inside your script, below the code, and Perl hands it to you through the DATA filehandle as if you'd opened a file.

Self-contained scripts are beautiful. One file to copy, one file to run, one file to understand. __DATA__ makes that possible for any script that needs a small embedded dataset.

Part 1: THE BASICS

When Perl's compiler hits the __DATA__ token on a line by itself, it stops compiling. Everything after that line is ignored as code but made available through the special DATA filehandle.
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; # This is code. Perl compiles it. my @lines; while (<DATA>) { chomp; push @lines, $_; } say "Read ", scalar @lines, " lines"; say "First: $lines[0]"; say "Last: $lines[-1]"; # Code ends here. Data begins below. __DATA__ server1 192.168.1.10 active server2 192.168.1.11 active server3 192.168.1.12 standby server4 192.168.1.13 decommissioned
Output:
Read 4 lines First: server1 192.168.1.10 active Last: server4 192.168.1.13 decommissioned
The DATA filehandle works like any other. You can read it with <DATA>, pass it to functions that expect filehandles, or use it in a while loop. Standard file I/O semantics. No surprises.

Part 2: READING WITH <DATA>

The diamond operator <DATA> reads one line at a time in scalar context, or all remaining lines in list context:
# one line at a time while (my $line = <DATA>) { chomp $line; process($line); } # or slurp all at once my @all_lines = <DATA>; chomp @all_lines; # or into a single string my $blob = do { local $/; <DATA> };
The slurp trick (local $/) unsets the input record separator, so <DATA> reads the entire remaining content as one string. Handy when you need the data as a block rather than individual lines.

Part 3: EMBEDDED TEST DATA

Writing a parser? Need test input? Put it in __DATA__ and iterate without touching the filesystem:
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; my %status_count; while (<DATA>) { chomp; next if m~^\s*$~; next if m~^\s*#~; my ($timestamp, $level, $message) = split m~\s+~, $_, 3; $status_count{$level}++; } for my $level (sort keys %status_count) { say "$level: $status_count{$level}"; } __DATA__ # Sample log entries for testing 2026-04-13T10:00:01 INFO Application started 2026-04-13T10:00:05 INFO Listening on port 8080 2026-04-13T10:01:12 WARN High memory usage: 87% 2026-04-13T10:02:33 ERROR Connection refused: db01 2026-04-13T10:02:34 ERROR Retry failed: db01 2026-04-13T10:03:00 INFO Failover to db02 complete 2026-04-13T10:05:44 WARN Slow query: 3.2s
Output:
ERROR: 2 INFO: 3 WARN: 2
No temp files to create. No cleanup to run. The test data is right there in the script. Change it, run again, see new results. Perfect for development.

Part 4: EMBEDDED TEMPLATES

Need to generate output from a template? Embed it:
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; my %vars = ( hostname => 'web01', port => 8080, workers => 4, env => 'production', ); my $template = do { local $/; <DATA> }; $template =~ s~\{\{(\w+)\}\}~$vars{$1} // "UNDEFINED"~ge; print $template; __DATA__ # Nginx upstream config # Generated for {{hostname}} upstream app_backend { server 127.0.0.1:{{port}}; } server { listen 80; server_name {{hostname}}.example.com; location / { proxy_pass http://app_backend; } }
Slurp the template, run a substitution to replace {{placeholders}} with values, print the result. The template lives in the script. No external template files, no template engine dependencies. Just Perl and __DATA__.

The e modifier on the substitution makes Perl evaluate the replacement as code, so $vars{$1} // "UNDEFINED" does a hash lookup for each placeholder.

Part 5: EMBEDDED SQL

Database scripts often need SQL queries. Instead of scattering them through your code as strings, stash them in __DATA__:
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; # Read all SQL statements separated by blank lines my @queries; my $current = ''; while (<DATA>) { if (m~^\s*$~ && $current =~ m~\S~) { push @queries, $current; $current = ''; next; } $current .= $_; } push @queries, $current if $current =~ m~\S~; for my $i (0 .. $#queries) { say "--- Query ", $i + 1, " ---"; print $queries[$i]; say ""; } __DATA__ CREATE TABLE IF NOT EXISTS servers ( id INTEGER PRIMARY KEY, hostname TEXT NOT NULL, ip_addr TEXT NOT NULL, status TEXT DEFAULT 'active' ); INSERT INTO servers (hostname, ip_addr, status) VALUES ('web01', '192.168.1.10', 'active'); INSERT INTO servers (hostname, ip_addr, status) VALUES ('web02', '192.168.1.11', 'standby'); SELECT hostname, ip_addr FROM servers WHERE status = 'active' ORDER BY hostname;
Each SQL statement is separated by a blank line. The parser collects them into an array. In a real script, you'd pass each one to a DBI handle. Here they just print, but the pattern is the same.

Part 6: END VS DATA

Perl has two tokens that stop compilation: __DATA__ and __END__. They look similar but behave differently.

__DATA__ is tied to the current package. The DATA filehandle belongs to whatever package was active when __DATA__ appeared:

package Foo; # __DATA__ here would create Foo::DATA package main; # __DATA__ here creates main::DATA (which is just DATA)
__END__ is the older form. It always creates the DATA filehandle in the main package, regardless of what package is active. It also signals the absolute end of the script. Nothing after __END__ is compiled, even if another package later tries to use __DATA__.

For most scripts (single-file, package main), they're identical. Use __DATA__. It's the modern convention, and it does the right thing in modules too.

TOKEN PACKAGE TYPICAL USE ------- -------- ---------------------------- __DATA__ current Modules, modern scripts __END__ main:: Legacy scripts, quick hacks

Part 7: SEEKING AND REWINDING

The DATA filehandle supports seek and tell, so you can rewind and read it again:
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use Fcntl qw(:seek); # Remember where DATA starts my $data_start = tell(DATA); # First pass: count lines my $count = 0; $count++ while <DATA>; say "Total lines: $count"; # Rewind to start of DATA section seek(DATA, $data_start, SEEK_SET); # Second pass: process while (<DATA>) { chomp; say "Processing: $_"; } __DATA__ apple banana cherry
The key is tell(DATA) at the top, before you read anything. This captures the byte offset where the data section starts. Then seek(DATA, $data_start, SEEK_SET) jumps back to that position.

Why capture with tell instead of seeking to offset 0? Because offset 0 is the beginning of the file, not the beginning of the data section. The data section starts partway through the file, after all the code. Seeking to 0 would give you the shebang line and your Perl code, which is not what you want.

Part 8: SELF-CONTAINED SCRIPTS

The real power of __DATA__ is making scripts that carry their own payload. A config checker that includes its defaults. A report generator that includes its template. A deployment script that includes its manifest.

Here is a self-contained host checker:

#!/usr/bin/env perl use strict; use warnings; use feature 'say'; # Parse embedded host list my @hosts; while (<DATA>) { chomp; next if m~^\s*$~; next if m~^\s*#~; my ($name, $ip, $role) = split m~\s+~; push @hosts, { name => $name, ip => $ip, role => $role }; } # Check each host for my $h (@hosts) { my $up = (system("ping -c 1 -W 1 $h->{ip} > /dev/null 2>&1") == 0); my $status = $up ? "UP " : "DOWN"; say "$status $h->{name} ($h->{ip}) [$h->{role}]"; } __DATA__ # Production servers web01 192.168.1.10 frontend web02 192.168.1.11 frontend api01 192.168.1.20 backend api02 192.168.1.21 backend # Database tier db01 192.168.1.30 primary db02 192.168.1.31 replica
One file. Copy it to any machine, run it, get results. No config files to forget. No paths to adjust. The host list is right there, editable by anyone who can open a text file. Comments and blank lines are handled. It just works.

Part 9: MULTIPLE DATA SECTIONS (SORT OF)

Perl only supports one __DATA__ section per package. But you can fake multiple sections with delimiters:
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; my %sections; my $current_section = 'default'; while (<DATA>) { chomp; if (m~^@@\s*(\w+)\s*$~) { $current_section = $1; next; } push @{$sections{$current_section}}, $_; } # Now you have named sections say "=== Servers ==="; say " $_" for @{$sections{servers}}; say "=== Queries ==="; say " $_" for @{$sections{queries}}; __DATA__ @@ servers web01 192.168.1.10 web02 192.168.1.11 db01 192.168.1.20 @@ queries SELECT * FROM hosts WHERE active = 1 SELECT COUNT(*) FROM connections @@ templates Hello, {{name}}! Your server is {{status}}.
The @@ lines act as section headers. The parser splits the data into named chunks. Each section becomes an array of lines in a hash. You can add as many sections as you want. Mojolicious uses a similar pattern with its __DATA__ templates, using @@ filename.html.ep to embed multiple template files inside a single Perl file.

Part 10: GOTCHAS AND LIMITATIONS

DATA is read-only. You can't write to the DATA filehandle. It's an input stream backed by your script file.

DATA is exhaustible. Once you've read to the end, it's gone unless you seek back. There is no automatic rewind.

Large data is a bad idea. Embedding a 50MB CSV in your script is technically possible but practically insane. The entire script has to be loaded into memory. Use external files for large datasets.

Binary data is tricky. __DATA__ works with text. For binary data, you'd need to encode it (Base64, for example) and decode it after reading. Doable, but ugly:

use MIME::Base64; my $encoded = do { local $/; <DATA> }; my $binary = decode_base64($encoded); __DATA__ R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Modules and __DATA__ each get their own. If Foo.pm has a __DATA__ section, it's accessible as Foo::DATA. The main script's __DATA__ is main::DATA (or just DATA). They don't interfere with each other.
YOUR SCRIPT +------------------+ | #!/usr/bin/perl | | use strict; | | | | # ...code... | | | | while (<DATA>) { | | # process | <- code reads down here | } | +~~~~~~~~~~~~~~~~~~+ <- __DATA__ boundary | server1 10.0.0.1 | | server2 10.0.0.2 | <- data lives here | server3 10.0.0.3 | +------------------+ .--. |o_o | "One file to rule them all." |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/
The __DATA__ section is one of those features that seems minor until you use it. Then you start putting test data in every script. Then templates. Then config defaults. Then you realize half your scripts are self-contained single-file tools, and you wonder how you ever lived without it.

It's not glamorous. It's not clever. It's just incredibly useful. The best Perl features usually are.

perl.gg