perl.gg / essentials

Sorting

2024-07-19

Sorting in Perl is weird. And wonderful. And occasionally baffling.

The sort function gives you two magic variables: $a and $b. They're the items being compared. Your job is to say which one comes first.

sort { $a <=> $b } @numbers
That's it. The spaceship operator says "compare these numerically." Perl handles the rest.

Part 1: THE TWO OPERATORS

Perl has two comparison operators for sorting:
OPERATOR TYPE RETURNS -------- -------- -------------------------------- <=> Numeric -1 if $a < $b, 0 if equal, 1 if $a > $b cmp String Same, but compares as strings
The return value tells sort which element should come first:
-1 $a comes before $b 0 They're equal (order doesn't matter) 1 $b comes before $a
You don't have to return exactly -1, 0, 1. Any negative, zero, or positive value works. But these operators give you exactly what sort needs.

Part 2: NUMERIC SORTING

Default sort is alphabetical. Numbers get mangled:
my @nums = (1, 20, 3, 100, 15); print join(", ", sort @nums);
Output:
1, 100, 15, 20, 3
That's ASCII order, not numeric. The string "100" comes before "15" because '1' < '1' is false but '0' < '5' is true... wait, that's not right either. It's '1' == '1', then '0' < '5' is true... actually '10' comes before '15' because we compare character by character.

The point: it's not what you want.

Fix it with the spaceship:

my @nums = (1, 20, 3, 100, 15); print join(", ", sort { $a <=> $b } @nums);
Output:
1, 3, 15, 20, 100
Now we're talking.
.--. |o_o | |:_/ | // \ \ (| | ) /'\_ _/`\ \___)=(___/

Part 3: DESCENDING ORDER

Swap $a and $b:
sort { $b <=> $a } @numbers # Largest first sort { $b cmp $a } @strings # Z before A
Or use reverse (less efficient but clearer):
reverse sort { $a <=> $b } @numbers
The swap trick is faster. No extra pass through the list.

Part 4: CASE-INSENSITIVE SORTING

Strings sort case-sensitively by default:
my @words = qw(apple Banana cherry APRICOT); print join(", ", sort @words);
Output:
APRICOT, Banana, apple, cherry
Uppercase letters have lower ASCII values than lowercase. 'A' is 65, 'a' is 97. So capitals sort first.

Fix it with lc():

sort { lc($a) cmp lc($b) } @words
Output:
apple, APRICOT, Banana, cherry
We're comparing lowercase versions, but keeping the original case in the output.

Part 5: THE $a AND $b MYSTERY

Where do $a and $b come from? They're package variables that sort creates for you. No declaration needed.

But beware: they're package-scoped, not lexical. This matters if you use strict:

use strict; my @sorted = sort { $a <=> $b } @nums; # Works fine
$a and $b are special-cased. Perl doesn't complain about them even under strict.

But don't do this:

my $a = 5; # Shadows the sort variables! sort { $a <=> $b } @nums; # Broken
Your lexical $a hides the magic one. Chaos ensues.

Part 6: SORTING BY EXTRACTED KEYS

Here's where it gets interesting. You can sort by anything you can compute from the elements.

Sort URLs by their title:

my @urls = ( 'https://example.com | Example Site', 'https://perl.org | The Perl Foundation', 'https://cpan.org | CPAN', ); my @sorted = sort { my ($title_a) = $a =~ /\| (.+)$/; my ($title_b) = $b =~ /\| (.+)$/; lc($title_a) cmp lc($title_b); } @urls;
The regex \| (.+)$ captures everything after the pipe. We compare those extracted titles.

Output:

https://cpan.org | CPAN https://example.com | Example Site https://perl.org | The Perl Foundation

Part 7: THE SCHWARTZIAN TRANSFORM

That URL sort has a problem. We're extracting the title twice per comparison. For n elements, sort does O(n log n) comparisons. That's a lot of repeated regex work.

Enter the Schwartzian Transform:

my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, /\| (.+)$/ ] } @urls;
Read it bottom to top:
1. map: Transform each URL into [$url, $title] 2. sort: Compare by $title (the second element) 3. map: Extract just the URL (first element)
Each regex runs once per element, not once per comparison. For large lists, this is dramatically faster.

Part 8: BREAKING IT DOWN

Let's trace through the transform:
INPUT: @urls = ( 'https://example.com | Example Site', 'https://perl.org | The Perl Foundation', 'https://cpan.org | CPAN', ) STEP 1 - First map (bottom): [ ['https://example.com | Example Site', 'Example Site'], ['https://perl.org | The Perl Foundation', 'The Perl Foundation'], ['https://cpan.org | CPAN', 'CPAN'], ] STEP 2 - Sort by index [1]: [ ['https://cpan.org | CPAN', 'CPAN'], ['https://example.com | Example Site', 'Example Site'], ['https://perl.org | The Perl Foundation', 'The Perl Foundation'], ] STEP 3 - Final map (top): ( 'https://cpan.org | CPAN', 'https://example.com | Example Site', 'https://perl.org | The Perl Foundation', )
The key insight: compute once, compare many times.

Part 9: SORTING COMPLEX DATA

Sorting hashes by value:
my %scores = ( alice => 87, bob => 92, carol => 85, ); my @ranked = sort { $scores{$b} <=> $scores{$a} } keys %scores;
We sort the keys, but compare using the values. $scores{$b} first means descending order (highest score first).

Output:

bob, alice, carol

Part 10: MULTI-KEY SORTING

Sort by multiple criteria with the or operator:
my @people = ( { name => 'Alice', age => 30 }, { name => 'Bob', age => 25 }, { name => 'Alice', age => 25 }, ); my @sorted = sort { $a->{name} cmp $b->{name} or $a->{age} <=> $b->{age} } @people;
First by name, then by age. The or short-circuits: if the name comparison returns non-zero, we're done. Only if names are equal (comparison returns 0) do we check age.

Part 11: CUSTOM COMPARISON FUNCTIONS

For complex sorts, extract the logic:
sub by_domain { my ($domain_a) = $a =~ m{://([^/]+)}; my ($domain_b) = $b =~ m{://([^/]+)}; return lc($domain_a) cmp lc($domain_b); } my @sorted = sort by_domain @urls;
Note: no block braces when using a named function. And $a/$b are still the magic variables - they just work.

Part 12: GOTCHAS

Don't modify $a or $b:
sort { $a++ <=> $b } # BAD: modifies source array
The comparison function should be pure. No side effects.

Don't return non-numbers:

sort { 'apple' } # Meaningless
The return value must be numeric. Negative, zero, or positive.

Watch out for undef:

sort { $a <=> $b } (1, undef, 3); # Warnings!
Use the // operator to handle it:
sort { ($a // 0) <=> ($b // 0) } @list;

Part 13: THE COMPLETE TOOLKIT

# Numeric ascending sort { $a <=> $b } @nums # Numeric descending sort { $b <=> $a } @nums # String ascending sort { $a cmp $b } @strings # String descending sort { $b cmp $a } @strings # Case-insensitive sort { lc($a) cmp lc($b) } @strings # By hash value sort { $hash{$a} <=> $hash{$b} } keys %hash # Multi-key sort { $a->{x} cmp $b->{x} or $a->{y} <=> $b->{y} } @list # Schwartzian (compute once) map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, expensive_computation($_)] } @list
+---------+ | UNSORTED| | 3 1 4 1 | | 5 9 2 6 | +----+----+ | V .-----. / sort \ | <=> | \ / '-----' | V +---------+ | SORTED | | 1 1 2 3 | | 4 5 6 9 | +---------+
perl.gg