🐪 grep { !$seen{$_}++ } for Unique-with-Order

2026-04-10

Deduplicate an array. Keep the original order. One line.

my @unique = grep { !$seen{$_}++ } @data;

That is it. No modules. No sorting. No temporary hashes you have to manage. Seven tokens of Perl that do what most languages need a library function for.

The first time a value appears, it passes through. Every duplicate gets silently dropped. And the order of first appearances is preserved, which is the part that makes naive approaches fall over.

This idiom is older than some programming languages. It shows up in almost every Perl codebase. If you write Perl and you do not recognize it on sight, today is the day.

Part 1: THE TRICK DISSECTED

Let's break it apart, one piece at a time:

grep { !$seen{$_}++ } @data

Start from the inside and work out.

$seen{$_} looks up the current element in a hash called %seen. If this is the first time we have seen this value, the hash entry does not exist yet. In numeric context, a nonexistent hash value is undef, which Perl treats as 0.

$seen{$_}++ is the post-increment operator. The critical word is "post." It returns the old value, then increments. So the first time you see a key, this expression returns 0 (the old value) and then sets $seen{$_} to 1.

!$seen{$_}++ negates the return value. !0 is true (1). !1 is false (empty string). So the first time you see a value, the expression is true. Every subsequent time, it is false.

grep { ... } @data keeps elements where the block returns true. First occurrence? True. Duplicate? False. Dropped.

@data = (a, b, a, c, b, d, a)

ELEMENT    $seen BEFORE    ++ RETURNS    !RESULT    KEPT?
-------    ------------    ----------    -------    -----
a          undef (0)       0             true       YES
b          undef (0)       0             true       YES
a          1               1             false      no
c          undef (0)       0             true       YES
b          1               1             false      no
d          undef (0)       0             true       YES
a          2               2             false      no

@unique = (a, b, c, d)

Order preserved. Duplicates gone. One line.

Part 2: WHY POST-INCREMENT IS THE KEY

The entire trick hinges on the difference between $x++ and ++$x:

my $x = 0;

say $x++;    # prints 0, THEN increments to 1
say $x;      # now it's 1

say ++$x;    # increments to 2, THEN prints 2

Post-increment returns the value before the increment. That is why the first access to a new key returns 0. The hash entry gets created and set to 1, but the expression itself evaluates to the pre-existing value of 0.

If you used pre-increment instead:

# BROKEN - this keeps nothing
my @unique = grep { !++$seen{$_} } @data;

Every element would return at least 1 after pre-increment. !1 is false. Nothing passes through. You get an empty list.

The post-increment is not just a style choice. It is the mechanism. Change it and the idiom breaks.

Part 3: DECLARING %SEEN

You might have noticed the examples do not declare %seen. In a proper script with use strict, you need to:

use strict;
use warnings;

my @data = qw(apple banana apple cherry banana date);

my %seen;
my @unique = grep { !$seen{$_}++ } @data;

say for @unique;

Some people scope the hash tighter:

my @unique = do {
    my %seen;
    grep { !$seen{$_}++ } @data;
};

The do block limits %seen to just the dedup operation. After the block, %seen goes out of scope and is garbage collected. Clean. No leftover variables.

Or inline it in a subroutine:

sub uniq
{
    my %seen;
    return grep { !$seen{$_}++ } @_;
}

my @unique = uniq(@data);

Now you have a reusable uniq function. Four lines. Done.

Part 4: ORDER PRESERVATION

This is the selling point. The alternative approaches do not preserve order:

# hash keys - DOES NOT preserve order
my %h = map { $_ =>; 1 } @data;
my @unique = keys %h;
# order is hash-internal, essentially random

# sort + uniq - CHANGES order to sorted
my @unique = do {
    my $prev = '';
    grep { $_ ne $prev &;&; ($prev = $_) } sort @data;
};
# order is now alphabetical, not original

The grep { !$seen{$_}++ } pattern is the only one-liner that deduplicates AND preserves insertion order. The elements come out in the same sequence they first appeared.

METHOD                          ORDER PRESERVED?
------------------------------  ----------------
grep { !$seen{$_}++ }          YES
keys %{{ map { $_ => 1 } }}    NO (hash order)
sort then adjacent dedup        NO (sorted order)
List::Util::uniq                YES (same trick internally)

Part 5: VARIATIONS: CASE-INSENSITIVE

Deduplicate strings ignoring case:

my @data = qw(Perl perl PERL python Python);

my %seen;
my @unique = grep { !$seen{lc $_}++ } @data;

say for @unique;
# Perl
# python

The key is lc $_ in the hash lookup. We normalize to lowercase for comparison, but grep returns the original element. So you get "Perl" (the first casing seen) and "python" (the first casing seen), with all other casings dropped.

Part 6: VARIATIONS: BY FIELD

Deduplicate based on a specific field, like email domain:

my @emails = qw(
    alice@gmail.com
    bob@yahoo.com
    carol@gmail.com
    dave@outlook.com
    eve@yahoo.com
);

my %seen;
my @unique = grep {
    my ($domain) = m~\@(.+)$~;
    !$seen{$domain}++;
} @emails;

say for @unique;
# alice@gmail.com
# bob@yahoo.com
# dave@outlook.com

One email per domain, keeping the first one seen. Change the key expression and you change what "duplicate" means. The pattern is infinitely flexible.

By file extension:

my %seen;
my @one_per_type = grep {
    my ($ext) = m~\.(\w+)$~;
    !$seen{$ext // 'none'}++;
} @files;

By first word:

my %seen;
my @unique_starts = grep {
    my ($first) = m~^(\S+)~;
    !$seen{$first}++;
} @lines;

Same pattern, different key. That is the beauty of it.

Part 7: UNIQUE LOG ENTRIES

Real-world use. You have a log file with repeated errors. You want to see each unique error message once, in the order they first appeared:

#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';

my %seen;

while (<;>;)
{
    chomp;
    # extract the error message, ignoring timestamp and PID
    if (m~^\[[\d\-: ]+\]\s+\[(\w+)\]\s+(.+)$~)
    {
        my ($level, $msg) = ($1, $2);
        next unless $level eq 'ERROR';
        say $msg unless $seen{$msg}++;
    }
}

$ cat app.log
[2026-04-10 08:01:23] [ERROR] Connection refused to db-primary
[2026-04-10 08:01:24] [ERROR] Connection refused to db-primary
[2026-04-10 08:01:25] [ERROR] Connection refused to db-primary
[2026-04-10 08:02:00] [ERROR] Timeout reading from cache
[2026-04-10 08:02:01] [ERROR] Connection refused to db-primary
[2026-04-10 08:02:02] [ERROR] Timeout reading from cache
[2026-04-10 08:03:15] [ERROR] Disk space low on /var

$ perl dedup_errors.pl app.log
Connection refused to db-primary
Timeout reading from cache
Disk space low on /var

Three unique errors from seven log lines. In order of first appearance. The %seen hash remembers what you have already printed, and unless $seen{$msg}++ is the same trick as grep { !$seen{$_}++ }, just written differently.

Part 8: THE ONE-LINER VERSION

As a command-line one-liner for deduplicating lines of input:

$ sort data.txt | perl -ne 'print unless $seen{$_}++'

Wait. If you are sorting first, you might as well use uniq. The real power of the Perl version is that you do NOT need to sort:

$ perl -ne 'print unless $seen{$_}++' data.txt

That preserves original file order. The Unix uniq command only removes adjacent duplicates, so it requires sorted input. Perl's %seen trick works on unsorted data.

TOOL                   REQUIRES SORT?    PRESERVES ORDER?
--------------------   --------------    ----------------
sort | uniq            yes               no (sorted)
perl -ne '!$s{$_}++'  no                yes
awk '!seen[$0]++'      no                yes
sort -u                yes               no (sorted)

The awk version does the same thing, by the way. Same concept, different syntax. Good ideas transcend languages.

Part 9: COMPARISON TO LIST::UTIL::UNIQ

Perl 5.26 added uniq to List::Util (core module):

use List::Util 'uniq';

my @unique = uniq @data;

Under the hood, List::Util::uniq uses the same %seen approach but implemented in XS (C code), so it is faster for large lists. It also handles undef correctly, which the naive %seen trick does not.

# the %seen trick turns undef into ""
my @data = (1, undef, 2, undef, 3);
my %seen;
my @unique = grep { !$seen{$_ // ''}++ } @data;
# you have to handle undef explicitly with //

# List::Util::uniq handles it natively
use List::Util 'uniq';
my @unique = uniq @data;    # just works

There is also uniqstr (string comparison) and uniqnum (numeric comparison) for when the distinction matters:

use List::Util qw(uniqstr uniqnum);

# "1" and "1.0" are different strings
my @s = uniqstr("1", "1.0", "1");     # ("1", "1.0")

# 1 and 1.0 are the same number
my @n = uniqnum(1, 1.0, 1);           # (1)

Use List::Util::uniq in production code. Use the %seen idiom when you want zero module dependencies, when you need a custom key function, or when you are writing a one-liner.

Part 10: THE BEAUTY OF THE IDIOM

Seven tokens. grep { !$seen{$_}++ }. No function call. No module. No temporary array. Just the raw mechanics of Perl doing what Perl does.

It works because:

Autovivification creates hash entries on first access
Post-increment returns the old value before incrementing
!0 is true, !N is false for any positive N
grep filters based on boolean return value
Hash keys are unique by definition

Five Perl features conspiring to solve a common problem in one expression. This is not cleverness for its own sake. It is the language working exactly as designed, every feature pulling its weight.

      .--.
     |o_o |     "First time? Come on in.
     |:_/ |      Second time? Get lost."
    //   \ \
   (|     | )
  /'\_   _/`\
  \___)=(___/

Every Perl programmer should recognize this idiom on sight. It is as fundamental as chomp or split. You will see it in code reviews, in Stack Overflow answers, in modules, in one-liners.

Once you understand why it works, you understand something deeper about Perl: the language is full of small, composable behaviors that combine into elegant solutions. Post-increment, hash autovivification, boolean negation, and grep. None of them were designed for deduplication. Together, they make deduplication trivial.

That is the Perl way. Not one tool for every job. Every tool for the right job, combined on the fly.

perl.gg