perl.gg

Mastering Perl Sorting

In this blog post, we'll explore a fascinating Perl script that sorts URLs based on their titles. We'll break down the script, discuss how sorting works in Perl, and provide a simple sorting example for those new to this concept.

The Script

First, let's look at the full script:


xxxxxxxxxx
cat URLs.txt | perl -nlE '
    $urls->{s~watch/\K[^"]+~$&~r} = $_;
}{
    my @sorted = sort {
        my ($title_a) = $a =~ m{>([^<]+)</a>};
        my ($title_b) = $b =~ m{>([^<]+)</a>};
        lc($title_a) cmp lc($title_b);
    } values %{$urls};

    local $" = qq|\n|;
    say qq|@sorted|;
'

Understanding the Script

Reading the Input: The line cat URLs.txt | perl -nlE reads each line from URLs.txt and processes it with Perl.
Storing URLs: The first part of the script populates a hash reference $urls where keys are derived using a regex and values are the original lines.
```
xxxxxxxxxx
$urls->{s~watch/\K[^"]+~$&~r} = $_;
```
This line uses a regex to extract a part of the URL and sets it as the key in the hash. The value is the entire line.

Sorting the URLs:


xxxxxxxxxx
my @sorted = sort {
    my ($title_a) = $a =~ m{>([^<]+)</a>};
    my ($title_b) = $b =~ m{>([^<]+)</a>};
    lc($title_a) cmp lc($title_b);
} values %{$urls};

This block:

Extracts titles using regex.
Converts titles to lowercase to ensure case-insensitive comparison.
Uses cmp for string comparison.

Output the Sorted List:


xxxxxxxxxx
local $" = qq|\n|;
say qq|@sorted|;

The special variable $" is set to a newline, and the sorted list is printed.

Demo Data

Let's create a demo URLs.txt:


xxxxxxxxxx
<li><a href="https://example.com/watch/123">Zebra Link</a></li>
<li><a href="https://example.com/watch/456">Apple Link</a></li>
<li><a href="https://example.com/watch/789">Mango Link</a></li>

When you run the script on this file, it will sort the URLs based on the titles, producing:


xxxxxxxxxx
<li><a href="https://example.com/watch/456">Apple Link</a></li>
<li><a href="https://example.com/watch/789">Mango Link</a></li>
<li><a href="https://example.com/watch/123">Zebra Link</a></li>

Understanding Perl's sort

Sorting with Implicit Variables $a and $b

In Perl, the sort function uses two special variables $a and $b for comparison. Here's a simple example:


xxxxxxxxxx
my @numbers = (5, 3, 8, 1, 4);
my @sorted_numbers = sort { $a <=> $b } @numbers;
print "@sorted_numbers\n";  # Outputs: 1 3 4 5 8

In this example:

sort uses a block { $a <=> $b } for numeric comparison.
$a and $b represent pairs of elements in the list being compared.

String Comparison

For string comparison, you use the cmp operator:


xxxxxxxxxx
my @words = ('apple', 'Mango', 'banana', 'Zebra');
my @sorted_words = sort { lc($a) cmp lc($b) } @words;
print "@sorted_words\n";  # Outputs: apple banana Mango Zebra

Here, lc($a) and lc($b) convert strings to lowercase, ensuring a case-insensitive sort.

Conclusion

Sorting in Perl is powerful and flexible, leveraging special variables $a and $b for comparison. Our example script demonstrates a practical use case, sorting URLs based on their link titles. Understanding these concepts, you can effectively sort lists in Perl.

Feel free to experiment with the provided script and demo data. Happy coding!