Duplicate Detector
Find duplicates. One of the most common tasks in sysadmin life.That's it. Prints every line that appears more than once.perl -ne 'print if $seen{$_}++'
The trick is the post-increment. First time a line appears, $seen{$} is 0 (false), so nothing prints. Second time, it's 1 (true), so it prints. Third time, still true, prints again.
Want only the second occurrence? We'll get there.
Part 1: HOW IT WORKS
Break it down:print if $seen{$_}++
First encounter: $seen{$} is undef (0 in numeric context). Returns 0, then becomes 1. Condition false, no print.PIECE WHAT IT DOES ------------ ------------------------------------------ $seen{$_} Hash lookup - has this line been seen? ++ Post-increment - add 1 AFTER returning value $seen{$_}++ Returns OLD value (0 first time), then increments print if ... Print if the condition is true (non-zero)
Second encounter: $seen{$} is 1. Returns 1, then becomes 2. Condition true, print.
.--. |o_o | |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/
Part 2: VARIATIONS
Print duplicates only once (not every repeat):The == 1 means "only on the second occurrence."perl -ne 'print if $seen{$_}++ == 1'
Print unique lines only (no duplicates at all):
Flip the logic. Print first occurrence, skip all repeats.perl -ne 'print unless $seen{$_}++'
Print lines that appear exactly N times (requires two passes):
perl -ne '$c{$_}++; END { print for grep { $c{$_} == 3 } keys %c }'
Part 3: FIRST VS LAST OCCURRENCE
First occurrence of each line:Last occurrence of each line:perl -ne 'print unless $seen{$_}++'
This overwrites each time, so only the last survives.perl -ne '$last{$_} = $_; END { print values %last }'
But order is lost. Want last occurrence in order?
Store line numbers, sort by them at the end.perl -ne '$last{$_} = $.; END { print sort { $last{$a} <=> $last{$b} } keys %last }'
Part 4: CASE INSENSITIVE
Ignore case when detecting duplicates:The lc lowercases the key. "Hello" and "HELLO" are now the same.perl -ne 'print if $seen{lc $_}++'
Normalize whitespace too:
perl -ne '$k = lc; $k =~ s/\s+/ /g; print if $seen{$k}++'
Part 5: FIELD-BASED DUPLICATES
Duplicate detection on a specific column:The -a splits each line into @F. This checks for duplicate first fields only.perl -ane 'print if $seen{$F[0]}++'
Duplicate IPs in a log:
Duplicate usernames in /etc/passwd:perl -ane 'print if $seen{$F[0]}++' access.log
perl -F: -ane 'print if $seen{$F[0]}++' /etc/passwd
Part 6: COUNTING DUPLICATES
How many times does each line appear?Output like:perl -ne '$c{$_}++; END { print "$c{$_}: $_" for keys %c }'
Sorted by count:3: this line appeared three times 1: this line appeared once 5: this line appeared five times
Most frequent first.perl -ne '$c{$_}++; END { print "$c{$_}: $_" for sort { $c{$b} <=> $c{$a} } keys %c }'
Part 7: ADJACENT DUPLICATES
The uniq command only removes adjacent duplicates:Line must equal the previous line to be skipped.perl -ne 'print unless $_ eq $last; $last = $_'
This is faster (no hash) but only catches consecutive repeats:
aaa aaa <- removed bbb aaa <- NOT removed, not adjacent
Part 8: REAL WORLD EXAMPLES
Find duplicate lines in a config file:Includes line numbers so you can find them.perl -ne 'print "$.: $_" if $seen{$_}++' config.ini
Duplicate entries in /etc/hosts:
Checks hostname field (second column).perl -ane 'print if $seen{$F[1]}++' /etc/hosts
Duplicate SSH keys:
Keys are in the second field. Finds if someone's key is listed twice.perl -ne 'print if $seen{(split)[1]}++' ~/.ssh/authorized_keys
Duplicate cron jobs:
crontab -l | perl -ne 'print if $seen{$_}++'
Part 9: MEMORY CONSIDERATIONS
The hash stores every unique line. For huge files with many unique lines, this eats memory.For massive files, consider:
External sort handles files larger than RAM. But loses original order.sort file.txt | uniq -d
Or process in chunks if you only care about recent duplicates:
tail -10000 huge.log | perl -ne 'print if $seen{$_}++'
Part 10: THE FAMILY
These patterns are related:The post-increment idiom is the heart of all of them.perl -ne 'print if $seen{$_}++' # All duplicates perl -ne 'print if $seen{$_}++ == 1' # Each duplicate once perl -ne 'print unless $seen{$_}++' # Unique lines only perl -ne '$c{$_}++; END{print for grep{$c{$_}>1}keys%c}' # Dupes, one each
Part 11: WHY POST-INCREMENT
Why $seen{$}++ instead of ++$seen{$_}?Pre-increment returns the NEW value:
Post-increment returns the OLD value:++$seen{$_} # Returns 1 on first encounter (true!)
With pre-increment, everything prints. The ++ happens before the return. Post-increment is what makes the logic work.$seen{$_}++ # Returns 0 on first encounter (false!)
Part 12: COMBINING WITH OTHER PATTERNS
Duplicates matching a pattern:Only errors, and only repeated ones.perl -ne 'print if /error/i && $seen{$_}++'
Duplicates across multiple files:
Shows which file the duplicate is in.perl -ne 'print "$ARGV: $_" if $seen{$_}++' *.log
Duplicates within a time window (log files):
Resets the seen hash when the timestamp changes.perl -ane ' $t = $F[0]; %seen = () if $t ne $last_t; print if $seen{$_}++; $last_t = $t ' timestamped.log
$seen{$_}++ | +----+----+ | | first again | | skip print The post-increment trick
Created By: Wildcard Wizard. Copyright 2026