Why do I need cmp_to_key?
Sorting is generally straightfoward; items can be sorted in number order or alphabetically. However, sometimes you need to sort items in a different order and the cmp_to_key
function in the functools
module is an easy way to implement custom sorting functions.
What is a comparison function?
According to the functools
documentation:
A comparison function is any callable that accept two arguments, compares them, and returns a negative number for less-than, zero for equality, or a positive number for greater-than.
Generally comparison functions are used as keys to other functions (like sorted
or min
or any function that takes a key
argument).
Let's see this in action
The following code block uses cmp_to_key
to make a standard sorting function. It actually doesn't do anything differently than just using sorted
by itself.
from functools import cmp_to_key
def standard_comparison(x,y):
if x > y:
return 1
elif x < y:
return -1
else:
return 0
my_list = [4,2,3,1,5,3]
sorted(my_list, key=cmp_to_key(standard_comparison))
Output:
>>> [1, 2, 3, 3, 4, 5]
Looking into standard_comparison
Let's take a look at the standard_comparison
function above. It takes two numbers, compares them, and returns "1", "-1", or "0" depending on whether the numbers are bigger, smaller or the same as each other.
But what does this actually mean?
Basically, while the sorted
function is looking through the list, it's getting responses of "1", "-1", or "0" telling it where to put certain values. For our purposes "1" can be understood as "put to the right side of the list", "-1" as "put to the left side of the list" and "0" as "do nothing".
Customizing the comparison function
I can change the above standard_comparison
to be "only consider a number to be 'bigger' if it's more than two than the next one," which leads to some interesting results as you can see above.
from functools import cmp_to_key
def weird_comparison(x,y):
if x + 2 > y:
return 1
elif x < y:
return -1
else:
return 0
my_list = [4,2,3,1,5,3]
sorted(my_list, key=cmp_to_key(weird_comparison))
Output:
>>> [2, 1, 4, 3, 3, 5]
Realistic Example
The above was a cute example of messing with the comparison function, but the real question is "when would someone actually use this?"
I think the most common example would be "almost" alphabetical sorting.
I've needed to sort metrics into certain periods and usually there's no natural ordering on which to sort. This is especially helpful when the data is output in a web application.
The following example sorts a list of dictionaries by their respective time periods. The order is 'Lifetime', 'Trailing 7 Days', 'Trailing 30 Days', 'First Day'. Without our sorting function it would be difficult to get to this order since "First Day" would always be first in the list.
from functools import cmp_to_key
def my_time_comparison(x,y):
time_list = [
'Lifetime',
'Trailing 7 Days',
'Trailing 30 Days',
'First Day'
]
first_time = x['SALES_TIME_PERIOD']
second_time = y['SALES_TIME_PERIOD']
first_time_idx = time_list.index(first_time)
second_time_idx = time_list.index(second_time)
if first_time_idx > second_time_idx:
return 1
elif first_time_idx < second_time_idx:
return -1
else:
return 0
sales_list = [
{'PRODUCT_ID': 1, 'SALES_TIME_PERIOD': 'Trailing 7 Days', 'UNITS': 20 },
{'PRODUCT_ID': 1, 'SALES_TIME_PERIOD': 'First Day', 'UNITS': 2 },
{'PRODUCT_ID': 1, 'SALES_TIME_PERIOD': 'Trailing 30 Days', 'UNITS': 40 },
{'PRODUCT_ID': 1, 'SALES_TIME_PERIOD': 'Lifetime', 'UNITS': 50 },
{'PRODUCT_ID': 2, 'SALES_TIME_PERIOD': 'Trailing 7 Days', 'UNITS': 30 },
{'PRODUCT_ID': 2, 'SALES_TIME_PERIOD': 'First Day', 'UNITS': 3 },
{'PRODUCT_ID': 2, 'SALES_TIME_PERIOD': 'Trailing 30 Days', 'UNITS': 50 },
{'PRODUCT_ID': 2, 'SALES_TIME_PERIOD': 'Lifetime', 'UNITS': 60 },
{'PRODUCT_ID': 3, 'SALES_TIME_PERIOD': 'Trailing 7 Days', 'UNITS': 41 },
{'PRODUCT_ID': 3, 'SALES_TIME_PERIOD': 'First Day', 'UNITS': 4 },
{'PRODUCT_ID': 3, 'SALES_TIME_PERIOD': 'Trailing 30 Days', 'UNITS': 61 },
{'PRODUCT_ID': 3, 'SALES_TIME_PERIOD': 'Lifetime', 'UNITS': 71 }]
sorted(sales_list, key=cmp_to_key(my_time_comparison))
Output:
>>> [{'PRODUCT_ID': 1, 'SALES_TIME_PERIOD': 'Lifetime', 'UNITS': 50},
{'PRODUCT_ID': 2, 'SALES_TIME_PERIOD': 'Lifetime', 'UNITS': 60},
{'PRODUCT_ID': 3, 'SALES_TIME_PERIOD': 'Lifetime', 'UNITS': 71},
{'PRODUCT_ID': 1, 'SALES_TIME_PERIOD': 'Trailing 7 Days', 'UNITS': 20},
{'PRODUCT_ID': 2, 'SALES_TIME_PERIOD': 'Trailing 7 Days', 'UNITS': 30},
{'PRODUCT_ID': 3, 'SALES_TIME_PERIOD': 'Trailing 7 Days', 'UNITS': 41},
{'PRODUCT_ID': 1, 'SALES_TIME_PERIOD': 'Trailing 30 Days', 'UNITS': 40},
{'PRODUCT_ID': 2, 'SALES_TIME_PERIOD': 'Trailing 30 Days', 'UNITS': 50},
{'PRODUCT_ID': 3, 'SALES_TIME_PERIOD': 'Trailing 30 Days', 'UNITS': 61},
{'PRODUCT_ID': 1, 'SALES_TIME_PERIOD': 'First Day', 'UNITS': 2},
{'PRODUCT_ID': 2, 'SALES_TIME_PERIOD': 'First Day', 'UNITS': 3},
{'PRODUCT_ID': 3, 'SALES_TIME_PERIOD': 'First Day', 'UNITS': 4}]
How did the above work?
- First we put the order we wanted to use in
time_list
. - As we step through the list, we access the value of
SALES_TIME_PERIOD
in each dictionary. - We compare the value in our dictionary to the order that value falls in the list. For example, "Lifetime" would return an index of 0 and "First Day" would return an index of 3.
- We compare the indices to determine which is the "smaller value" which should be at the top and the "larger value" that should be toward the bottom. Since 0 < 3, "Lifetime" gets placed above "First Day".
Conclusion
cmp_to_key
is an easy and effective way to implement your custom sorting heuristics. With a minimum amount of set-up you can retain a sort order that can be used in multiple applications and only needs to be passed as a key
argument to functions.
Comments