🐪 DATA: Embed Data Inside Your Script

2026-04-13

Everything after __DATA__ in a Perl script is readable as a filehandle:

while (<;DATA>;)
{
    chomp;
    say "Got: $_";
}

__DATA__
alpha
bravo
charlie

That's it. No external files. No heredoc gymnastics. No argument parsing. The data lives inside your script, below the code, and Perl hands it to you through the DATA filehandle as if you'd opened a file.

Self-contained scripts are beautiful. One file to copy, one file to run, one file to understand. __DATA__ makes that possible for any script that needs a small embedded dataset.

Part 1: THE BASICS

When Perl's compiler hits the __DATA__ token on a line by itself, it stops compiling. Everything after that line is ignored as code but made available through the special DATA filehandle.

#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';

# This is code. Perl compiles it.
my @lines;
while (<;DATA>;)
{
    chomp;
    push @lines, $_;
}

say "Read ", scalar @lines, " lines";
say "First: $lines[0]";
say "Last:  $lines[-1]";

# Code ends here. Data begins below.

__DATA__
server1 192.168.1.10 active
server2 192.168.1.11 active
server3 192.168.1.12 standby
server4 192.168.1.13 decommissioned

Output:

Read 4 lines
First: server1 192.168.1.10 active
Last:  server4 192.168.1.13 decommissioned

The DATA filehandle works like any other. You can read it with <DATA>, pass it to functions that expect filehandles, or use it in a while loop. Standard file I/O semantics. No surprises.

Part 2: READING WITH <DATA>

The diamond operator <DATA> reads one line at a time in scalar context, or all remaining lines in list context:

# one line at a time
while (my $line = <;DATA>;)
{
    chomp $line;
    process($line);
}

# or slurp all at once
my @all_lines = <;DATA>;;
chomp @all_lines;

# or into a single string
my $blob = do { local $/; <;DATA>; };

The slurp trick (local $/) unsets the input record separator, so <DATA> reads the entire remaining content as one string. Handy when you need the data as a block rather than individual lines.

Part 3: EMBEDDED TEST DATA

Writing a parser? Need test input? Put it in __DATA__ and iterate without touching the filesystem:

#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';

my %status_count;

while (<;DATA>;)
{
    chomp;
    next if m~^\s*$~;
    next if m~^\s*#~;

    my ($timestamp, $level, $message) = split m~\s+~, $_, 3;
    $status_count{$level}++;
}

for my $level (sort keys %status_count)
{
    say "$level: $status_count{$level}";
}

__DATA__
# Sample log entries for testing
2026-04-13T10:00:01 INFO Application started
2026-04-13T10:00:05 INFO Listening on port 8080
2026-04-13T10:01:12 WARN High memory usage: 87%
2026-04-13T10:02:33 ERROR Connection refused: db01
2026-04-13T10:02:34 ERROR Retry failed: db01
2026-04-13T10:03:00 INFO Failover to db02 complete
2026-04-13T10:05:44 WARN Slow query: 3.2s

Output:

ERROR: 2
INFO: 3
WARN: 2

No temp files to create. No cleanup to run. The test data is right there in the script. Change it, run again, see new results. Perfect for development.

Part 4: EMBEDDED TEMPLATES

Need to generate output from a template? Embed it:

#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';

my %vars = (
    hostname =>; 'web01',
    port     =>; 8080,
    workers  =>; 4,
    env      =>; 'production',
);

my $template = do { local $/; <;DATA>; };

$template =~ s~\{\{(\w+)\}\}~$vars{$1} // "UNDEFINED"~ge;

print $template;

__DATA__
# Nginx upstream config
# Generated for {{hostname}}

upstream app_backend {
    server 127.0.0.1:{{port}};
}

server {
    listen 80;
    server_name {{hostname}}.example.com;

    location / {
        proxy_pass http://app_backend;
    }
}

Slurp the template, run a substitution to replace {{placeholders}} with values, print the result. The template lives in the script. No external template files, no template engine dependencies. Just Perl and __DATA__.

The e modifier on the substitution makes Perl evaluate the replacement as code, so $vars{$1} // "UNDEFINED" does a hash lookup for each placeholder.

Part 5: EMBEDDED SQL

Database scripts often need SQL queries. Instead of scattering them through your code as strings, stash them in __DATA__:

#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';

# Read all SQL statements separated by blank lines
my @queries;
my $current = '';

while (<;DATA>;)
{
    if (m~^\s*$~ &;&; $current =~ m~\S~)
    {
        push @queries, $current;
        $current = '';
        next;
    }
    $current .= $_;
}
push @queries, $current if $current =~ m~\S~;

for my $i (0 .. $#queries)
{
    say "--- Query ", $i + 1, " ---";
    print $queries[$i];
    say "";
}

__DATA__
CREATE TABLE IF NOT EXISTS servers (
    id INTEGER PRIMARY KEY,
    hostname TEXT NOT NULL,
    ip_addr TEXT NOT NULL,
    status TEXT DEFAULT 'active'
);

INSERT INTO servers (hostname, ip_addr, status)
VALUES ('web01', '192.168.1.10', 'active');

INSERT INTO servers (hostname, ip_addr, status)
VALUES ('web02', '192.168.1.11', 'standby');

SELECT hostname, ip_addr
FROM servers
WHERE status = 'active'
ORDER BY hostname;

Each SQL statement is separated by a blank line. The parser collects them into an array. In a real script, you'd pass each one to a DBI handle. Here they just print, but the pattern is the same.

Part 6: END VS DATA

Perl has two tokens that stop compilation: __DATA__ and __END__. They look similar but behave differently.

__DATA__ is tied to the current package. The DATA filehandle belongs to whatever package was active when __DATA__ appeared:

package Foo;
# __DATA__ here would create Foo::DATA

package main;
# __DATA__ here creates main::DATA (which is just DATA)

__END__ is the older form. It always creates the DATA filehandle in the main package, regardless of what package is active. It also signals the absolute end of the script. Nothing after __END__ is compiled, even if another package later tries to use __DATA__.

For most scripts (single-file, package main), they're identical. Use __DATA__. It's the modern convention, and it does the right thing in modules too.

TOKEN       PACKAGE     TYPICAL USE
-------     --------    ----------------------------
__DATA__    current     Modules, modern scripts
__END__     main::      Legacy scripts, quick hacks

Part 7: SEEKING AND REWINDING

The DATA filehandle supports seek and tell, so you can rewind and read it again:

#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
use Fcntl qw(:seek);

# Remember where DATA starts
my $data_start = tell(DATA);

# First pass: count lines
my $count = 0;
$count++ while <;DATA>;;
say "Total lines: $count";

# Rewind to start of DATA section
seek(DATA, $data_start, SEEK_SET);

# Second pass: process
while (<;DATA>;)
{
    chomp;
    say "Processing: $_";
}

__DATA__
apple
banana
cherry

The key is tell(DATA) at the top, before you read anything. This captures the byte offset where the data section starts. Then seek(DATA, $data_start, SEEK_SET) jumps back to that position.

Why capture with tell instead of seeking to offset 0? Because offset 0 is the beginning of the file, not the beginning of the data section. The data section starts partway through the file, after all the code. Seeking to 0 would give you the shebang line and your Perl code, which is not what you want.

Part 8: SELF-CONTAINED SCRIPTS

The real power of __DATA__ is making scripts that carry their own payload. A config checker that includes its defaults. A report generator that includes its template. A deployment script that includes its manifest.

Here is a self-contained host checker:

#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';

# Parse embedded host list
my @hosts;
while (<;DATA>;)
{
    chomp;
    next if m~^\s*$~;
    next if m~^\s*#~;

    my ($name, $ip, $role) = split m~\s+~;
    push @hosts, { name =>; $name, ip =>; $ip, role =>; $role };
}

# Check each host
for my $h (@hosts)
{
    my $up = (system("ping -c 1 -W 1 $h->;{ip} >; /dev/null 2>;&;1") == 0);
    my $status = $up ? "UP  " : "DOWN";
    say "$status  $h->;{name} ($h->;{ip}) [$h->;{role}]";
}

__DATA__
# Production servers
web01   192.168.1.10  frontend
web02   192.168.1.11  frontend
api01   192.168.1.20  backend
api02   192.168.1.21  backend

# Database tier
db01    192.168.1.30  primary
db02    192.168.1.31  replica

One file. Copy it to any machine, run it, get results. No config files to forget. No paths to adjust. The host list is right there, editable by anyone who can open a text file. Comments and blank lines are handled. It just works.

Part 9: MULTIPLE DATA SECTIONS (SORT OF)

Perl only supports one __DATA__ section per package. But you can fake multiple sections with delimiters:

#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';

my %sections;
my $current_section = 'default';

while (<;DATA>;)
{
    chomp;
    if (m~^@@\s*(\w+)\s*$~)
    {
        $current_section = $1;
        next;
    }
    push @{$sections{$current_section}}, $_;
}

# Now you have named sections
say "=== Servers ===";
say "  $_" for @{$sections{servers}};

say "=== Queries ===";
say "  $_" for @{$sections{queries}};

__DATA__
@@ servers
web01 192.168.1.10
web02 192.168.1.11
db01  192.168.1.20
@@ queries
SELECT * FROM hosts WHERE active = 1
SELECT COUNT(*) FROM connections
@@ templates
Hello, {{name}}! Your server is {{status}}.

The @@ lines act as section headers. The parser splits the data into named chunks. Each section becomes an array of lines in a hash. You can add as many sections as you want. Mojolicious uses a similar pattern with its __DATA__ templates, using @@ filename.html.ep to embed multiple template files inside a single Perl file.

Part 10: GOTCHAS AND LIMITATIONS

DATA is read-only. You can't write to the DATA filehandle. It's an input stream backed by your script file.

DATA is exhaustible. Once you've read to the end, it's gone unless you seek back. There is no automatic rewind.

Large data is a bad idea. Embedding a 50MB CSV in your script is technically possible but practically insane. The entire script has to be loaded into memory. Use external files for large datasets.

Binary data is tricky. __DATA__ works with text. For binary data, you'd need to encode it (Base64, for example) and decode it after reading. Doable, but ugly:

use MIME::Base64;

my $encoded = do { local $/; <;DATA>; };
my $binary = decode_base64($encoded);

__DATA__
R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7

Modules and __DATA__ each get their own. If Foo.pm has a __DATA__ section, it's accessible as Foo::DATA. The main script's __DATA__ is main::DATA (or just DATA). They don't interfere with each other.

     YOUR SCRIPT
    +------------------+
    | #!/usr/bin/perl  |
    | use strict;      |
    |                  |
    | # ...code...     |
    |                  |
    | while (<DATA>) { |
    |     # process    |   <- code reads down here
    | }                |
    +~~~~~~~~~~~~~~~~~~+  <- __DATA__ boundary
    | server1 10.0.0.1 |
    | server2 10.0.0.2 |   <- data lives here
    | server3 10.0.0.3 |
    +------------------+

      .--.
     |o_o |    "One file to rule them all."
     |:_/ |
    //   \ \
   (|     | )
  /'\_   _/`\
  \___)=(___/

The __DATA__ section is one of those features that seems minor until you use it. Then you start putting test data in every script. Then templates. Then config defaults. Then you realize half your scripts are self-contained single-file tools, and you wonder how you ever lived without it.

It's not glamorous. It's not clever. It's just incredibly useful. The best Perl features usually are.

perl.gg