<!-- category: hidden-gems -->
Autovivification's Dark Side
You have an empty hash. You check if a deeply nested key exists. The key doesn't exist. But now your hash isn't empty anymore.my %data; if (exists $data{users}{admin}{role}) { say "found it"; } use Data::Dumper; say Dumper \%data;
You never assigned anything. You only asked a question. And Perl created two levels of hash structure just to answer it.$VAR1 = { 'users' => { 'admin' => {} } };
This is the dark side of autovivification. Observation changes the data. The act of looking for a deeply nested key forces Perl to build the path to it. Your innocent exists check just polluted your data structure.
Part 1: NORMAL AUTOVIVIFICATION
First, the good kind. Autovivification is one of Perl's greatest conveniences. It lets you build deep structures without pre-declaring every level:Without autovivification, you'd need:my %inventory; # just assign, Perl creates intermediate hashes automatically $inventory{web}{web01}{ip} = '10.0.0.1'; $inventory{web}{web01}{status} = 'active'; $inventory{db}{db01}{ip} = '10.0.0.5';
Three lines instead of one. Autovivification on write is universally loved. Nobody complains about it. You assign to a deep path, Perl creates the intermediate references. Perfect.$inventory{web} = {} unless exists $inventory{web}; $inventory{web}{web01} = {} unless exists $inventory{web}{web01}; $inventory{web}{web01}{ip} = '10.0.0.1';
The problem is that autovivification also triggers on certain reads.
Part 2: THE DARK SIDE DEMONSTRATED
Let's see exactly when it happens and when it doesn't.use Data::Dumper; $Data::Dumper::Sortkeys = 1; my %h; # simple exists - no autovivification my $test1 = exists $h{a}; say "After exists \$h{a}: " . Dumper(\%h); # deep exists - AUTOVIVIFICATION my $test2 = exists $h{a}{b}{c}; say "After exists \$h{a}{b}{c}: " . Dumper(\%h);
The firstAfter exists $h{a}: $VAR1 = {}; After exists $h{a}{b}{c}: $VAR1 = { 'a' => { 'b' => {} } };
exists on a single level is safe. Nothing created. The second exists on a three-level path created two intermediate hash references. The final level (c) was not created, but everything leading up to it was.
Part 3: WHY THIS HAPPENS
To evaluateexists $h{a}{b}{c}, Perl must:
- Look up
$h{a}to get a reference - Dereference that to look up
{b}to get another reference - Dereference that to check if
{c}exists
Step 1 is the problem. When Perl looks up $h{a} and finds it doesn't exist, it needs a value to dereference in step 2. So it creates an empty hash reference and stores it in $h{a}. Then it does the same thing for {b}.
Perl can't check {c} without a hash to check it in. So it builds the hash. Then it builds the hash that contains that hash. Autovivification cascades upward from the point of access.
The final key ({c}) is never created because exists is the operator at the top level. Perl knows it's an existence check. But the intermediate keys are created by the dereference chain that exists rides on.
exists $data{a}{b}{c} ^^^^^^^^ <-- these levels get autovivified ^^^ <-- this level does not (exists protects it)
Part 4: OTHER OPERATIONS THAT TRIGGER IT
It's not justexists. Any deep dereference on a non-existent path triggers autovivification:
Basically, any time Perl must follow a chain of hash references and an intermediate link doesn't exist, it creates it. The "look before you leap" philosophy backfires when looking changes the ground under your feet.my %h; # reading a deep value my $val = $h{x}{y}{z}; # creates {x} and {x}{y} # using defined on a deep value if (defined $h{p}{q}{r}) { } # creates {p} and {p}{q} # deep value in boolean context if ($h{m}{n}) { } # creates {m} # deep value as hash argument my @keys = keys %{$h{j}{k}}; # creates {j} and {j}{k}
Part 5: THE DATA::DUMPER BEFORE/AFTER
This is a debugging nightmare. You have a function that receives a hash reference. You add a Dumper call to inspect it. Suddenly the function's behavior changes.sub process_config { my ($config) = @_; # debugging: let's see what's in here if (exists $config->{database}{replica}{host}) { say "Has replica config"; } # later, some code checks structure if (exists $config->{database}) { say "Database section exists"; # NOW IT DOES, because you checked replica } } # call with empty config process_config({});
You added a debug check forDatabase section exists
replica and accidentally created the database key. Now downstream code thinks there's a database configuration when there isn't one. The debugging check created the bug it was trying to diagnose.
Part 6: DEFENSIVE PATTERNS
Check each level separately:Each# SAFE: check level by level if (exists $data{a} && exists $data{a}{b} && exists $data{a}{b}{c}) { say "Found it: $data{a}{b}{c}"; }
exists check only proceeds if the previous level was confirmed. No intermediate levels are created because you never dereference a non-existent key.
This is verbose but correct. You can wrap it in a utility function:
No autovivification. Each level is checked before descending. If any level is missing, it returns 0 without touching the structure.sub deep_exists { my ($ref, @keys) = @_; for my $key (@keys) { return 0 unless ref $ref eq 'HASH' && exists $ref->{$key}; $ref = $ref->{$key}; } return 1; } if (deep_exists(\%data, 'users', 'admin', 'role')) { say "Found it"; }
A similar helper for safe deep access:
sub deep_get { my ($ref, @keys) = @_; for my $key (@keys) { return undef unless ref $ref eq 'HASH' && exists $ref->{$key}; $ref = $ref->{$key}; } return $ref; } my $role = deep_get(\%data, 'users', 'admin', 'role'); say $role if defined $role;
Part 7: THE AUTOVIVIFICATION PRAGMA
There's a CPAN module that solves this at the language level:You can be selective about what you disable:use autovivification; # load the pragma no autovivification; # disable autovivification entirely my %data; my $val = $data{a}{b}{c}; # no longer creates {a} or {a}{b} say exists $data{a}; # 0 (nothing was created)
# only disable autovivification on reads (fetches), keep it on stores no autovivification 'fetch'; $data{x}{y} = 1; # still works (store) my $v = $data{p}{q}{r}; # does NOT create intermediates (fetch)
The options:# disable on exists checks only no autovivification 'exists'; exists $data{a}{b}{c}; # safe, nothing created my $v = $data{a}{b}{c}; # still autovivifies (fetch not disabled)
For most code,no autovivification; disable all no autovivification 'fetch'; disable on reads no autovivification 'exists'; disable on exists checks no autovivification 'delete'; disable on delete no autovivification 'store'; disable on writes (rarely wanted)
no autovivification 'fetch'; plus no autovivification 'exists'; is the sweet spot. You keep the convenient auto-creation on assignment but stop the accidental creation on reads.
Part 8: PERL 5.36+ AND BEYOND
Starting with Perl 5.36 and theuse v5.36 pragma, you get many modern features, but autovivification behavior on deep key access is still the same by default. Perl hasn't changed this core behavior.
However, the awareness has grown. Modern Perl style guides increasingly recommend:
The# modern defensive access my $role = $data{users}{admin}{role} // 'default';
// (defined-or) operator doesn't prevent autovivification, but it handles the "key doesn't exist" case gracefully. The intermediate keys still get created, though.
For truly safe deep access, the community increasingly recommends utility functions or hash libraries:
Or data validation modules like# with Hash::Util use Hash::Util qw(lock_hash); lock_hash(%config); # now any modification (including autovivification) dies my $v = $config{a}{b}{c}; # dies if {a} doesn't exist
Params::Validate and Type::Tiny that check structure before you access it.
Part 9: DEBUGGING MYSTERIOUS KEYS
If you ever find unexpected keys in your data structures, autovivification is the first suspect. Here's a debugging technique:use Tie::Hash; package WatchHash; use parent -norequire, 'Tie::StdHash'; sub STORE { my ($self, $key, $value) = @_; if (!exists $self->{$key}) { warn "NEW KEY CREATED: '$key' at " . join(' ', (caller)[1,2]) . "\n"; } $self->{$key} = $value; } package main; my %watched; tie %watched, 'WatchHash'; # now any new key creation prints a warning with file and line $watched{a}{b}{c} = 1;
The tied hash intercepts every STORE operation and warns you where new keys are born. It's heavy for production, but invaluable for tracking down autovivification bugs.NEW KEY CREATED: 'a' at script.pl 18
A simpler approach: dump your data structure at strategic points and diff the output:
use Data::Dumper; $Data::Dumper::Sortkeys = 1; my $before = Dumper(\%data); # suspicious code here some_function(\%data); my $after = Dumper(\%data); if ($before ne $after) { warn "Data structure was modified!\n"; warn "Before: $before\n"; warn "After: $after\n"; }
Part 10: THE RULE
Autovivification is one of Perl's best features and one of its sneakiest traps. On writes, it's pure convenience. On reads, it's a data-corrupting side effect disguised as an innocent lookup..--. |o_o | "I only asked if it existed. |:_/ | It didn't. So Perl built it." // \ \ (| | ) /'\_ _/`\ \___)=(___/ $data{a}{b}{c} = 1; # creates a, a.b, a.b.c (GOOD) exists $data{a}{b}{c}; # creates a, a.b (BAD) my $v = $data{a}{b}{c}; # creates a, a.b (BAD) Level checked by exists: NOT created Intermediate levels: CREATED SAFE ALTERNATIVES: 1. Check level by level 2. Use deep_exists() helper 3. no autovivification 'fetch'; 4. Lock the hash with Hash::Util
The rule is simple: never dereference a chain of hash keys unless you know the intermediate levels exist, or you don't care if they get created. A single exists $data{a}{b}{c} can turn an empty hash into a two-level tree. A debugging Dumper call can create the keys it's trying to inspect.
Check level by level. Use helper functions. Or reach for no autovivification. The language gives you the tools to defend against its own magic. You just have to know the magic is there.
perl.gg