<!-- category: snippets -->
grep { !$seen{$_}++ } for Unique-with-Order
Deduplicate an array. Keep the original order. One line.That is it. No modules. No sorting. No temporary hashes you have to manage. Seven tokens of Perl that do what most languages need a library function for.my @unique = grep { !$seen{$_}++ } @data;
The first time a value appears, it passes through. Every duplicate gets silently dropped. And the order of first appearances is preserved, which is the part that makes naive approaches fall over.
This idiom is older than some programming languages. It shows up in almost every Perl codebase. If you write Perl and you do not recognize it on sight, today is the day.
Part 1: THE TRICK DISSECTED
Let's break it apart, one piece at a time:Start from the inside and work out.grep { !$seen{$_}++ } @data
$seen{$_} looks up the current element in a hash called
%seen. If this is the first time we have seen this value, the
hash entry does not exist yet. In numeric context, a
nonexistent hash value is undef, which Perl treats as 0.
$seen{$_}++ is the post-increment operator. The critical
word is "post." It returns the old value, then increments.
So the first time you see a key, this expression returns 0
(the old value) and then sets $seen{$_} to 1.
!$seen{$_}++ negates the return value. !0 is true (1).
!1 is false (empty string). So the first time you see a value,
the expression is true. Every subsequent time, it is false.
grep { ... } @data keeps elements where the block returns
true. First occurrence? True. Duplicate? False. Dropped.
Order preserved. Duplicates gone. One line.@data = (a, b, a, c, b, d, a) ELEMENT $seen BEFORE ++ RETURNS !RESULT KEPT? ------- ------------ ---------- ------- ----- a undef (0) 0 true YES b undef (0) 0 true YES a 1 1 false no c undef (0) 0 true YES b 1 1 false no d undef (0) 0 true YES a 2 2 false no @unique = (a, b, c, d)
Part 2: WHY POST-INCREMENT IS THE KEY
The entire trick hinges on the difference between$x++ and
++$x:
Post-increment returns the value before the increment. That is why the first access to a new key returns 0. The hash entry gets created and set to 1, but the expression itself evaluates to the pre-existing value of 0.my $x = 0; say $x++; # prints 0, THEN increments to 1 say $x; # now it's 1 say ++$x; # increments to 2, THEN prints 2
If you used pre-increment instead:
Every element would return at least 1 after pre-increment.# BROKEN - this keeps nothing my @unique = grep { !++$seen{$_} } @data;
!1
is false. Nothing passes through. You get an empty list.
The post-increment is not just a style choice. It is the mechanism. Change it and the idiom breaks.
Part 3: DECLARING %SEEN
You might have noticed the examples do not declare%seen. In a
proper script with use strict, you need to:
Some people scope the hash tighter:use strict; use warnings; my @data = qw(apple banana apple cherry banana date); my %seen; my @unique = grep { !$seen{$_}++ } @data; say for @unique;
Themy @unique = do { my %seen; grep { !$seen{$_}++ } @data; };
do block limits %seen to just the dedup operation. After
the block, %seen goes out of scope and is garbage collected.
Clean. No leftover variables.
Or inline it in a subroutine:
Now you have a reusablesub uniq { my %seen; return grep { !$seen{$_}++ } @_; } my @unique = uniq(@data);
uniq function. Four lines. Done.
Part 4: ORDER PRESERVATION
This is the selling point. The alternative approaches do not preserve order:# hash keys - DOES NOT preserve order my %h = map { $_ => 1 } @data; my @unique = keys %h; # order is hash-internal, essentially random
The# sort + uniq - CHANGES order to sorted my @unique = do { my $prev = ''; grep { $_ ne $prev && ($prev = $_) } sort @data; }; # order is now alphabetical, not original
grep { !$seen{$_}++ } pattern is the only one-liner that
deduplicates AND preserves insertion order. The elements come out
in the same sequence they first appeared.
METHOD ORDER PRESERVED? ------------------------------ ---------------- grep { !$seen{$_}++ } YES keys %{{ map { $_ => 1 } }} NO (hash order) sort then adjacent dedup NO (sorted order) List::Util::uniq YES (same trick internally)
Part 5: VARIATIONS: CASE-INSENSITIVE
Deduplicate strings ignoring case:The key ismy @data = qw(Perl perl PERL python Python); my %seen; my @unique = grep { !$seen{lc $_}++ } @data; say for @unique; # Perl # python
lc $_ in the hash lookup. We normalize to lowercase
for comparison, but grep returns the original element. So you
get "Perl" (the first casing seen) and "python" (the first casing
seen), with all other casings dropped.
Part 6: VARIATIONS: BY FIELD
Deduplicate based on a specific field, like email domain:One email per domain, keeping the first one seen. Change the key expression and you change what "duplicate" means. The pattern is infinitely flexible.my @emails = qw( alice@gmail.com bob@yahoo.com carol@gmail.com dave@outlook.com eve@yahoo.com ); my %seen; my @unique = grep { my ($domain) = m~\@(.+)$~; !$seen{$domain}++; } @emails; say for @unique; # alice@gmail.com # bob@yahoo.com # dave@outlook.com
By file extension:
By first word:my %seen; my @one_per_type = grep { my ($ext) = m~\.(\w+)$~; !$seen{$ext // 'none'}++; } @files;
Same pattern, different key. That is the beauty of it.my %seen; my @unique_starts = grep { my ($first) = m~^(\S+)~; !$seen{$first}++; } @lines;
Part 7: UNIQUE LOG ENTRIES
Real-world use. You have a log file with repeated errors. You want to see each unique error message once, in the order they first appeared:#!/usr/bin/env perl use strict; use warnings; use feature 'say'; my %seen; while (<>) { chomp; # extract the error message, ignoring timestamp and PID if (m~^\[[\d\-: ]+\]\s+\[(\w+)\]\s+(.+)$~) { my ($level, $msg) = ($1, $2); next unless $level eq 'ERROR'; say $msg unless $seen{$msg}++; } }
Three unique errors from seven log lines. In order of first appearance. The$ cat app.log [2026-04-10 08:01:23] [ERROR] Connection refused to db-primary [2026-04-10 08:01:24] [ERROR] Connection refused to db-primary [2026-04-10 08:01:25] [ERROR] Connection refused to db-primary [2026-04-10 08:02:00] [ERROR] Timeout reading from cache [2026-04-10 08:02:01] [ERROR] Connection refused to db-primary [2026-04-10 08:02:02] [ERROR] Timeout reading from cache [2026-04-10 08:03:15] [ERROR] Disk space low on /var $ perl dedup_errors.pl app.log Connection refused to db-primary Timeout reading from cache Disk space low on /var
%seen hash remembers what you have already
printed, and unless $seen{$msg}++ is the same trick as
grep { !$seen{$_}++ }, just written differently.
Part 8: THE ONE-LINER VERSION
As a command-line one-liner for deduplicating lines of input:Wait. If you are sorting first, you might as well use$ sort data.txt | perl -ne 'print unless $seen{$_}++'
uniq.
The real power of the Perl version is that you do NOT need to
sort:
That preserves original file order. The Unix$ perl -ne 'print unless $seen{$_}++' data.txt
uniq command only
removes adjacent duplicates, so it requires sorted input. Perl's
%seen trick works on unsorted data.
The awk version does the same thing, by the way. Same concept, different syntax. Good ideas transcend languages.TOOL REQUIRES SORT? PRESERVES ORDER? -------------------- -------------- ---------------- sort | uniq yes no (sorted) perl -ne '!$s{$_}++' no yes awk '!seen[$0]++' no yes sort -u yes no (sorted)
Part 9: COMPARISON TO LIST::UTIL::UNIQ
Perl 5.26 addeduniq to List::Util (core module):
Under the hood, List::Util::uniq uses the sameuse List::Util 'uniq'; my @unique = uniq @data;
%seen approach
but implemented in XS (C code), so it is faster for large lists.
It also handles undef correctly, which the naive %seen trick
does not.
There is also# the %seen trick turns undef into "" my @data = (1, undef, 2, undef, 3); my %seen; my @unique = grep { !$seen{$_ // ''}++ } @data; # you have to handle undef explicitly with // # List::Util::uniq handles it natively use List::Util 'uniq'; my @unique = uniq @data; # just works
uniqstr (string comparison) and uniqnum
(numeric comparison) for when the distinction matters:
Use List::Util::uniq in production code. Use theuse List::Util qw(uniqstr uniqnum); # "1" and "1.0" are different strings my @s = uniqstr("1", "1.0", "1"); # ("1", "1.0") # 1 and 1.0 are the same number my @n = uniqnum(1, 1.0, 1); # (1)
%seen idiom
when you want zero module dependencies, when you need a custom
key function, or when you are writing a one-liner.
Part 10: THE BEAUTY OF THE IDIOM
Seven tokens.grep { !$seen{$_}++ }. No function call. No
module. No temporary array. Just the raw mechanics of Perl doing
what Perl does.
It works because:
- Autovivification creates hash entries on first access
- Post-increment returns the old value before incrementing
!0is true,!Nis false for any positive Ngrepfilters based on boolean return value- Hash keys are unique by definition
Five Perl features conspiring to solve a common problem in one expression. This is not cleverness for its own sake. It is the language working exactly as designed, every feature pulling its weight.
Every Perl programmer should recognize this idiom on sight. It is as fundamental as.--. |o_o | "First time? Come on in. |:_/ | Second time? Get lost." // \ \ (| | ) /'\_ _/`\ \___)=(___/
chomp or split. You will see it in code
reviews, in Stack Overflow answers, in modules, in one-liners.
Once you understand why it works, you understand something deeper about Perl: the language is full of small, composable behaviors that combine into elegant solutions. Post-increment, hash autovivification, boolean negation, and grep. None of them were designed for deduplication. Together, they make deduplication trivial.
That is the Perl way. Not one tool for every job. Every tool for the right job, combined on the fly.
perl.gg