perl.gg / one-liners

Duplicate Detector

2025-12-19

Find duplicates. One of the most common tasks in sysadmin life.
perl -ne 'print if $seen{$_}++'
That's it. Prints every line that appears more than once.

The trick is the post-increment. First time a line appears, $seen{$} is 0 (false), so nothing prints. Second time, it's 1 (true), so it prints. Third time, still true, prints again.

Want only the second occurrence? We'll get there.

Part 1: HOW IT WORKS

print if $seen{$_}++
Break it down:
PIECE WHAT IT DOES ------------ ------------------------------------------ $seen{$_} Hash lookup - has this line been seen? ++ Post-increment - add 1 AFTER returning value $seen{$_}++ Returns OLD value (0 first time), then increments print if ... Print if the condition is true (non-zero)
First encounter: $seen{$
} is undef (0 in numeric context). Returns 0, then becomes 1. Condition false, no print.

Second encounter: $seen{$} is 1. Returns 1, then becomes 2. Condition true, print.

.--. |o_o | |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/

Part 2: VARIATIONS

Print duplicates only once (not every repeat):
perl -ne 'print if $seen{$_}++ == 1'
The == 1 means "only on the second occurrence."

Print unique lines only (no duplicates at all):

perl -ne 'print unless $seen{$_}++'
Flip the logic. Print first occurrence, skip all repeats.

Print lines that appear exactly N times (requires two passes):

perl -ne '$c{$_}++; END { print for grep { $c{$_} == 3 } keys %c }'

Part 3: FIRST VS LAST OCCURRENCE

First occurrence of each line:
perl -ne 'print unless $seen{$_}++'
Last occurrence of each line:
perl -ne '$last{$_} = $_; END { print values %last }'
This overwrites each time, so only the last survives.

But order is lost. Want last occurrence in order?

perl -ne '$last{$_} = $.; END { print sort { $last{$a} <=> $last{$b} } keys %last }'
Store line numbers, sort by them at the end.

Part 4: CASE INSENSITIVE

Ignore case when detecting duplicates:
perl -ne 'print if $seen{lc $_}++'
The lc lowercases the key. "Hello" and "HELLO" are now the same.

Normalize whitespace too:

perl -ne '$k = lc; $k =~ s/\s+/ /g; print if $seen{$k}++'

Part 5: FIELD-BASED DUPLICATES

Duplicate detection on a specific column:
perl -ane 'print if $seen{$F[0]}++'
The -a splits each line into @F. This checks for duplicate first fields only.

Duplicate IPs in a log:

perl -ane 'print if $seen{$F[0]}++' access.log
Duplicate usernames in /etc/passwd:
perl -F: -ane 'print if $seen{$F[0]}++' /etc/passwd

Part 6: COUNTING DUPLICATES

How many times does each line appear?
perl -ne '$c{$_}++; END { print "$c{$_}: $_" for keys %c }'
Output like:
3: this line appeared three times 1: this line appeared once 5: this line appeared five times
Sorted by count:
perl -ne '$c{$_}++; END { print "$c{$_}: $_" for sort { $c{$b} <=> $c{$a} } keys %c }'
Most frequent first.

Part 7: ADJACENT DUPLICATES

The uniq command only removes adjacent duplicates:
perl -ne 'print unless $_ eq $last; $last = $_'
Line must equal the previous line to be skipped.

This is faster (no hash) but only catches consecutive repeats:

aaa aaa <- removed bbb aaa <- NOT removed, not adjacent

Part 8: REAL WORLD EXAMPLES

Find duplicate lines in a config file:
perl -ne 'print "$.: $_" if $seen{$_}++' config.ini
Includes line numbers so you can find them.

Duplicate entries in /etc/hosts:

perl -ane 'print if $seen{$F[1]}++' /etc/hosts
Checks hostname field (second column).

Duplicate SSH keys:

perl -ne 'print if $seen{(split)[1]}++' ~/.ssh/authorized_keys
Keys are in the second field. Finds if someone's key is listed twice.

Duplicate cron jobs:

crontab -l | perl -ne 'print if $seen{$_}++'

Part 9: MEMORY CONSIDERATIONS

The hash stores every unique line. For huge files with many unique lines, this eats memory.

For massive files, consider:

sort file.txt | uniq -d
External sort handles files larger than RAM. But loses original order.

Or process in chunks if you only care about recent duplicates:

tail -10000 huge.log | perl -ne 'print if $seen{$_}++'

Part 10: THE FAMILY

These patterns are related:
perl -ne 'print if $seen{$_}++' # All duplicates perl -ne 'print if $seen{$_}++ == 1' # Each duplicate once perl -ne 'print unless $seen{$_}++' # Unique lines only perl -ne '$c{$_}++; END{print for grep{$c{$_}>1}keys%c}' # Dupes, one each
The post-increment idiom is the heart of all of them.

Part 11: WHY POST-INCREMENT

Why $seen{$
}++ instead of ++$seen{$_}?

Pre-increment returns the NEW value:

++$seen{$_} # Returns 1 on first encounter (true!)
Post-increment returns the OLD value:
$seen{$_}++ # Returns 0 on first encounter (false!)
With pre-increment, everything prints. The ++ happens before the return. Post-increment is what makes the logic work.

Part 12: COMBINING WITH OTHER PATTERNS

Duplicates matching a pattern:
perl -ne 'print if /error/i && $seen{$_}++'
Only errors, and only repeated ones.

Duplicates across multiple files:

perl -ne 'print "$ARGV: $_" if $seen{$_}++' *.log
Shows which file the duplicate is in.

Duplicates within a time window (log files):

perl -ane ' $t = $F[0]; %seen = () if $t ne $last_t; print if $seen{$_}++; $last_t = $t ' timestamped.log
Resets the seen hash when the timestamp changes.
$seen{$_}++ | +----+----+ | | first again | | skip print The post-increment trick

Created By: Wildcard Wizard. Copyright 2026