perl.gg / snippets

<!-- category: snippets -->

grep { !$seen{$_}++ } for Unique-with-Order

2026-04-10

Deduplicate an array. Keep the original order. One line.
my @unique = grep { !$seen{$_}++ } @data;
That is it. No modules. No sorting. No temporary hashes you have to manage. Seven tokens of Perl that do what most languages need a library function for.

The first time a value appears, it passes through. Every duplicate gets silently dropped. And the order of first appearances is preserved, which is the part that makes naive approaches fall over.

This idiom is older than some programming languages. It shows up in almost every Perl codebase. If you write Perl and you do not recognize it on sight, today is the day.

Part 1: THE TRICK DISSECTED

Let's break it apart, one piece at a time:
grep { !$seen{$_}++ } @data
Start from the inside and work out.

$seen{$_} looks up the current element in a hash called %seen. If this is the first time we have seen this value, the hash entry does not exist yet. In numeric context, a nonexistent hash value is undef, which Perl treats as 0.

$seen{$_}++ is the post-increment operator. The critical word is "post." It returns the old value, then increments. So the first time you see a key, this expression returns 0 (the old value) and then sets $seen{$_} to 1.

!$seen{$_}++ negates the return value. !0 is true (1). !1 is false (empty string). So the first time you see a value, the expression is true. Every subsequent time, it is false.

grep { ... } @data keeps elements where the block returns true. First occurrence? True. Duplicate? False. Dropped.

@data = (a, b, a, c, b, d, a) ELEMENT $seen BEFORE ++ RETURNS !RESULT KEPT? ------- ------------ ---------- ------- ----- a undef (0) 0 true YES b undef (0) 0 true YES a 1 1 false no c undef (0) 0 true YES b 1 1 false no d undef (0) 0 true YES a 2 2 false no @unique = (a, b, c, d)
Order preserved. Duplicates gone. One line.

Part 2: WHY POST-INCREMENT IS THE KEY

The entire trick hinges on the difference between $x++ and ++$x:
my $x = 0; say $x++; # prints 0, THEN increments to 1 say $x; # now it's 1 say ++$x; # increments to 2, THEN prints 2
Post-increment returns the value before the increment. That is why the first access to a new key returns 0. The hash entry gets created and set to 1, but the expression itself evaluates to the pre-existing value of 0.

If you used pre-increment instead:

# BROKEN - this keeps nothing my @unique = grep { !++$seen{$_} } @data;
Every element would return at least 1 after pre-increment. !1 is false. Nothing passes through. You get an empty list.

The post-increment is not just a style choice. It is the mechanism. Change it and the idiom breaks.

Part 3: DECLARING %SEEN

You might have noticed the examples do not declare %seen. In a proper script with use strict, you need to:
use strict; use warnings; my @data = qw(apple banana apple cherry banana date); my %seen; my @unique = grep { !$seen{$_}++ } @data; say for @unique;
Some people scope the hash tighter:
my @unique = do { my %seen; grep { !$seen{$_}++ } @data; };
The do block limits %seen to just the dedup operation. After the block, %seen goes out of scope and is garbage collected. Clean. No leftover variables.

Or inline it in a subroutine:

sub uniq { my %seen; return grep { !$seen{$_}++ } @_; } my @unique = uniq(@data);
Now you have a reusable uniq function. Four lines. Done.

Part 4: ORDER PRESERVATION

This is the selling point. The alternative approaches do not preserve order:
# hash keys - DOES NOT preserve order my %h = map { $_ => 1 } @data; my @unique = keys %h; # order is hash-internal, essentially random
# sort + uniq - CHANGES order to sorted my @unique = do { my $prev = ''; grep { $_ ne $prev && ($prev = $_) } sort @data; }; # order is now alphabetical, not original
The grep { !$seen{$_}++ } pattern is the only one-liner that deduplicates AND preserves insertion order. The elements come out in the same sequence they first appeared.
METHOD ORDER PRESERVED? ------------------------------ ---------------- grep { !$seen{$_}++ } YES keys %{{ map { $_ => 1 } }} NO (hash order) sort then adjacent dedup NO (sorted order) List::Util::uniq YES (same trick internally)

Part 5: VARIATIONS: CASE-INSENSITIVE

Deduplicate strings ignoring case:
my @data = qw(Perl perl PERL python Python); my %seen; my @unique = grep { !$seen{lc $_}++ } @data; say for @unique; # Perl # python
The key is lc $_ in the hash lookup. We normalize to lowercase for comparison, but grep returns the original element. So you get "Perl" (the first casing seen) and "python" (the first casing seen), with all other casings dropped.

Part 6: VARIATIONS: BY FIELD

Deduplicate based on a specific field, like email domain:
my @emails = qw( alice@gmail.com bob@yahoo.com carol@gmail.com dave@outlook.com eve@yahoo.com ); my %seen; my @unique = grep { my ($domain) = m~\@(.+)$~; !$seen{$domain}++; } @emails; say for @unique; # alice@gmail.com # bob@yahoo.com # dave@outlook.com
One email per domain, keeping the first one seen. Change the key expression and you change what "duplicate" means. The pattern is infinitely flexible.

By file extension:

my %seen; my @one_per_type = grep { my ($ext) = m~\.(\w+)$~; !$seen{$ext // 'none'}++; } @files;
By first word:
my %seen; my @unique_starts = grep { my ($first) = m~^(\S+)~; !$seen{$first}++; } @lines;
Same pattern, different key. That is the beauty of it.

Part 7: UNIQUE LOG ENTRIES

Real-world use. You have a log file with repeated errors. You want to see each unique error message once, in the order they first appeared:
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; my %seen; while (<>) { chomp; # extract the error message, ignoring timestamp and PID if (m~^\[[\d\-: ]+\]\s+\[(\w+)\]\s+(.+)$~) { my ($level, $msg) = ($1, $2); next unless $level eq 'ERROR'; say $msg unless $seen{$msg}++; } }
$ cat app.log [2026-04-10 08:01:23] [ERROR] Connection refused to db-primary [2026-04-10 08:01:24] [ERROR] Connection refused to db-primary [2026-04-10 08:01:25] [ERROR] Connection refused to db-primary [2026-04-10 08:02:00] [ERROR] Timeout reading from cache [2026-04-10 08:02:01] [ERROR] Connection refused to db-primary [2026-04-10 08:02:02] [ERROR] Timeout reading from cache [2026-04-10 08:03:15] [ERROR] Disk space low on /var $ perl dedup_errors.pl app.log Connection refused to db-primary Timeout reading from cache Disk space low on /var
Three unique errors from seven log lines. In order of first appearance. The %seen hash remembers what you have already printed, and unless $seen{$msg}++ is the same trick as grep { !$seen{$_}++ }, just written differently.

Part 8: THE ONE-LINER VERSION

As a command-line one-liner for deduplicating lines of input:
$ sort data.txt | perl -ne 'print unless $seen{$_}++'
Wait. If you are sorting first, you might as well use uniq. The real power of the Perl version is that you do NOT need to sort:
$ perl -ne 'print unless $seen{$_}++' data.txt
That preserves original file order. The Unix uniq command only removes adjacent duplicates, so it requires sorted input. Perl's %seen trick works on unsorted data.
TOOL REQUIRES SORT? PRESERVES ORDER? -------------------- -------------- ---------------- sort | uniq yes no (sorted) perl -ne '!$s{$_}++' no yes awk '!seen[$0]++' no yes sort -u yes no (sorted)
The awk version does the same thing, by the way. Same concept, different syntax. Good ideas transcend languages.

Part 9: COMPARISON TO LIST::UTIL::UNIQ

Perl 5.26 added uniq to List::Util (core module):
use List::Util 'uniq'; my @unique = uniq @data;
Under the hood, List::Util::uniq uses the same %seen approach but implemented in XS (C code), so it is faster for large lists. It also handles undef correctly, which the naive %seen trick does not.
# the %seen trick turns undef into "" my @data = (1, undef, 2, undef, 3); my %seen; my @unique = grep { !$seen{$_ // ''}++ } @data; # you have to handle undef explicitly with // # List::Util::uniq handles it natively use List::Util 'uniq'; my @unique = uniq @data; # just works
There is also uniqstr (string comparison) and uniqnum (numeric comparison) for when the distinction matters:
use List::Util qw(uniqstr uniqnum); # "1" and "1.0" are different strings my @s = uniqstr("1", "1.0", "1"); # ("1", "1.0") # 1 and 1.0 are the same number my @n = uniqnum(1, 1.0, 1); # (1)
Use List::Util::uniq in production code. Use the %seen idiom when you want zero module dependencies, when you need a custom key function, or when you are writing a one-liner.

Part 10: THE BEAUTY OF THE IDIOM

Seven tokens. grep { !$seen{$_}++ }. No function call. No module. No temporary array. Just the raw mechanics of Perl doing what Perl does.

It works because:

Five Perl features conspiring to solve a common problem in one expression. This is not cleverness for its own sake. It is the language working exactly as designed, every feature pulling its weight.

.--. |o_o | "First time? Come on in. |:_/ | Second time? Get lost." // \ \ (| | ) /'\_ _/`\ \___)=(___/
Every Perl programmer should recognize this idiom on sight. It is as fundamental as chomp or split. You will see it in code reviews, in Stack Overflow answers, in modules, in one-liners.

Once you understand why it works, you understand something deeper about Perl: the language is full of small, composable behaviors that combine into elegant solutions. Post-increment, hash autovivification, boolean negation, and grep. None of them were designed for deduplication. Together, they make deduplication trivial.

That is the Perl way. Not one tool for every job. Every tool for the right job, combined on the fly.

perl.gg