C++ std::sort predicate with templates

Kent - July 7, 2016

There are a couple of sorting algorithms in C++ from std::sort to lamdbas, each tailored to different use cases. As a programmer, you may or may not want to delve into the depths of sort algorithms. That’s a domain left for experts. Luckily, it’s quite easy to get started with sorting data.

Sorting methods in C++

With C++, there is a standardized sort method readily available. It’s called std::sort and works by calling it with a range it should sort, usually a begin-iterator and an end-iterator. Normal use cases are sorting a vector or an array, but any container which implements the RandomAccess-iterator can be sorted. Containers like std::list can’t be sorted with std::sort. std::sort is available in the <algorithm> header.

If you try to sort std::list with std::sort, you’ll get a compile error. The error will be a variation of not being able to use operator - on the std::list iterator type. If you want to sort a list, you’ll have to use std::list::sort.

Using qsort to sort structs or classes will not be covered.

What algorithm std::sort will use, is left to the implementation, but the complexity should not exceed O(N log N) where N is number of elements to sort. Implementations are usually implementing std::sort by combining different sort algorithms with different strengths and weaknesses. One such algorithm is IntroSort.

IntroSort is a hybrid sort algorithm, invented in 1997 by David Musser. The goal of IntroSort is to provide a sorting algorithm, which has a fast average performance and an optimal worst case performance. It combines Quicksort and Heapsort, and combines the good parts from both algorithms. Due to this, IntroSort has an average performance and a worst case performance of O(n log n), where N is number of elements.

In theory, sorting numbers and sorting any data is not different. They use the same algorithm, the same range, and the same method. The only difference is the comparison between elements. But that’s standardized too. The real difference is the method comparing the elements.

To sort a collection of any data (also known as POD, Plain Old Data) with STL we need to tell it how to sort these objects. This chapter will go through many of those methods.

Sorting “Hello World!"

Most tutorials about sorting starts with sorting some numbers. But any container with RandomIterator can be sorted with std::sort. And as it happens to be, std::string implements RandomIterator.

<span class="ot">#include <algorithm></span>
<span class="ot">#include <string></span>
<span class="ot">#include <iostream></span>

std::string hello{ <span class=“st”>“Hello World!"</span> }; std::sort(hello.begin(), hello.end()); std::cout << hello << <span class=“st”>"</span><span class=“ch”>\n</span><span class=“st”>"</span>;

Output will be !HWdellloor. Notice the space in front, which is the first character.

Sorting structs and classes

If something can be compared by the less-than operator <, it can be sorted when the data is in a container supporting random access.

Creating a container with your data is simple enough. C++ is all about ‘pay for what you use’.

Given the following data,

<span class="kw">struct</span> some_data
{
    some_data() = <span class="kw">default</span>;
    some_data(<span class="dt">int</span> val) : a(val) {}
&lt;span class="dt">int&lt;/span> a;

};

Creating a container like so is straight forward with an initializer_list.

std::vector<some_data> elements = { <span class="dv">0</span>,<span class="dv">9</span>,<span class="dv">2</span>,<span class="dv">7</span>,<span class="dv">3</span>,<span class="dv">5</span>,<span class="dv">7</span>,<span class="dv">3</span> };

When you build and run this code, you’ll get a std::vector with 8 elements of some_data. When you try to sort this using std::sort, you’ll get an error stating the compiler could not find a suitable operator < for comparing two instances of some_data.

There are three ways of solving this,

  1. In-class definition, commonly known as member overload.
  2. Out of class definition, commonly known as non-member overload.
  3. By a predicate, and that predicate can be any method matching the signature of the predicate.

Member operator < overload

The member operator < overload is the simplest and easiest if you’re able to change the class/struct itself. This will not work with other classes/structures, whose you have no control over (be classes from other libraries) or when sharing structs between C and C++.

Defining your own operators, is also known as operator overloading. And in this case, it’s relational operator overloading.

Some may argue one must use friend to overload operators, but that is not true. It’s only necessary to use friend method if and only if the comparison should have access to members declared private in the struct or class.

Using some_data from above, make a method with the following signature in the struct: bool operator<(const some_data & rhs) const. The abbreviation rhs means right hand side, while lhs is left hand side.

The struct will then look like:

<span class="kw">struct</span> some_data
{
    some_data() = <span class="kw">default</span>;
    some_data(<span class="dt">int</span> val) : a(val) {}
&lt;span class="dt">int&lt;/span> a;

&lt;span class="dt">bool&lt;/span> &lt;span class="kw">operator&lt;/span>&lt;(&lt;span class="dt">const&lt;/span> some_data & rhs) &lt;span class="dt">const&lt;/span>
{
    &lt;span class="kw">return&lt;/span> a &lt; rhs.a;
}

};

A call to std::sort will then compile and call the inline and overloaded less-than operator.

std::sort(elements.begin(), elements.end());

Non-member operator < overload

When you’re not able to change the class or struct itself, a non-member operator < overload can be used.

Outside of the class, define this method:

<span class="dt">bool</span> <span class="kw">operator</span><(<span class="dt">const</span> some_data & lhs, <span class="dt">const</span> some_data & rhs)
{
    <span class="kw">return</span> lhs.a < rhs.a;
}

Using std::sort have the exact same syntax, and the results are identical.

std::sort(elements.begin(), elements.end());

Sort predicate examples

In a written or spoken language, a predicate is a part of a sentence, which “states something” or “tells something” about a subject (or a thing). In computer science and C++, a predicate is much of the same thing. Given n inputs, it can tell something about the object or objects being compared .

With sorting, a predicate states if object A comes before object B, thus a search predicate must have two arguments of same type.

Continuing the use of some_data above, a search predicate could be the operator <-overload (both member and non-member). However, when using a matching operator < overload in the same namespace as the struct itself, all sort operations will use that overload. If you want to customize the sort order in certain cases, predicates will let you define a case-by-case sort orders.

Pretend you’re writing a system for a car dealership. A car has certain properties like make, model, year and price.

<span class="kw">struct</span> car
{
    std::string make;
    std::string model;
    <span class="dt">int</span> year;
    <span class="dt">double</span> price;
};

Sorting a car against another car does not make any sense, but sorting cars based on what make and model makes sense, also by year and how much the car costs.

There are a couple ways of sorting the car struct, here are using a lambda or a method.

Lambda as a sort predicate

Using a lambda as a sort predicate is probably the easiest and most efficient way of implementing predicates. The reason behind that logic, is that a lambda is an anonymous method which is implemented at the call site. As a bonus, it’s easier for the compiler to inline and optimize the lambda than a method defined elsewhere. It’s both easier to read and more efficient than the alternate methods.

Using the the car-struct, a c++ predicate lambda for sorting by make and model may be implemented like this.

<span class="kw">auto</span> sortByMakeAndModel = 
    [](<span class="dt">const</span> car & a, <span class="dt">const</span> car & b) -> <span class="dt">bool</span>
{
    <span class="kw">if</span> (a.make < b.make) <span class="kw">return</span> <span class="kw">true</span>;
    <span class="kw">if</span> (a.make > b.make) <span class="kw">return</span> <span class="kw">false</span>;
    <span class="kw">if</span> (a.model < b.model) <span class="kw">return</span> <span class="kw">true</span>;
    <span class="kw">if</span> (a.model > b.model) <span class="kw">return</span> <span class="kw">false</span>;
&lt;span class="kw">return&lt;/span> &lt;span class="kw">false&lt;/span>;

};

There are two more ways of creating a predicate with multiple elements to sort with, one where operator < is always used and the third where everything is a big boolean expression.

Using only operator <.

<span class="kw">auto</span> sortByMakeAndModel_v2 = 
    [](<span class="dt">const</span> car & a, <span class="dt">const</span> car & b) -> <span class="dt">bool</span>
{
    <span class="kw">if</span> (a.make < b.make) <span class="kw">return</span> <span class="kw">true</span>;
    <span class="kw">if</span> (b.make < a.make) <span class="kw">return</span> <span class="kw">false</span>;
    <span class="kw">if</span> (a.model < b.model) <span class="kw">return</span> <span class="kw">true</span>;
    <span class="kw">if</span> (b.model < a.model) <span class="kw">return</span> <span class="kw">false</span>;
&lt;span class="kw">return&lt;/span> &lt;span class="kw">false&lt;/span>;

};

Big boolean expression.

<span class="kw">auto</span> sortByMakeAndModel_v3 = 
    [](<span class="dt">const</span> car & a, <span class="dt">const</span> car & b) -> <span class="dt">bool</span>
{
    <span class="kw">return</span> 
        (
            (a.make < b.make) ||
            (a.make == b.make && a.model < b.model)
        );
};

All three lambdas generate the same sort order and have identical semantics. The difference is how they are compiled to machine code. During my testing in Release builds, the lambdas sortByMakeAndModel and sortByMakeAndModel_v2 got merged into one lambda, while the third, sortByMakeAndModel_v3 got completely inlined.

Inlining in computer science, means that the contents of the method is taken out and inserted at the call site. It’s as if there is no function, only the contents of the method is copied every place the inlined method is used.

std::sort(first, last, sortByMakeAndModel);
std::sort(first, last, sortByMakeAndModel_v2);
std::sort(first, last, sortByMakeAndModel_v3);

The output in any case is:

Sort by make and model
1. Ford Mondeo 2010 (25000)
2. Porsche 911 1999 (65000)
3. Volvo V70 2007 (20000)
4. Volvo V90 2016 (55000)

Sorting by the other variables, year and price are straight forward comparison. And during my testing, both methods were completely inlined.

Sort by year.

<span class="kw">auto</span> sortByYear = [](<span class="dt">const</span> car & a, <span class="dt">const</span> car & b) -> <span class="dt">bool</span>
{
    <span class="kw">return</span> a.year < b.year;
};

Used like:

std::sort(first, last, sortByYear);

After sorting by year, the list looks like this.

Sort by year
1. Porsche 911 1999 (65000)
2. Volvo V70 2007 (20000)
3. Ford Mondeo 2010 (25000)
4. Volvo V90 2016 (55000)

Sort by price.

<span class="kw">auto</span> sortByPrice = [](<span class="dt">const</span> car & a, <span class="dt">const</span> car & b) -> <span class="dt">bool</span>
{
    <span class="kw">return</span> a.price < b.price;
};

Used like:

std::sort(first, last, sortByPrice);

After sorting by price, the vector looks like this:

Sort by price
1. Volvo V70 2007 (20000)
2. Ford Mondeo 2010 (25000)
3. Volvo V90 2016 (55000)
4. Porsche 911 1999 (65000)

Free method (non-member method) sort predicate

A free method (non-member method) is any method not defined in a struct or a class. It behaves much like a lambda, but can also be present in multiple compilation units. For performance reasons, using a non-member method should be inline. We can give the compiler a hint to inline the method by using inline.

Sorting the car structure by year is much like the sortByYear-lambda.

<span class="kw">inline</span> <span class="dt">bool</span> sortYear(<span class="dt">const</span> car & a, <span class="dt">const</span> car & b)
{
    <span class="kw">return</span> a.year < b.year;
}

Using the non-member method is also almost identical to the lambda.

std::sort(first, last, &sortYear);

And as expected, the cars are sorted by year.

Sort by year, non-member method
1. Porsche 911 1999 (65000)
2. Volvo V70 2007 (20000)
3. Ford Mondeo 2010 (25000)
4. Volvo V90 2016 (55000)

Member method sort predicate

Using a member method requires an instance of the class or struct. C++11 introduced std::bind (previously boost::bind). It makes it simpler to bind to a method in a class, but care must be taken when constructing the std::bind object.

We want to use a C++ sort predicate member function in the following class.

<span class="kw">struct</span> car_sort
{
    <span class="dt">bool</span> sortYear(<span class="dt">const</span> car & a, <span class="dt">const</span> car & b) <span class="dt">const</span>
    {
        <span class="kw">return</span> a.year < b.year;
    }
};

There must be an instance of the class available, car_sort sort_instance;, and the sortYear method must be public.

Then is it possible to bind to the method.

std::sort(first, last, std::bind(&car_sort::sortYear, 
  sort_instance, std::placeholders::_1, 
  std::placeholders::_2));

The syntax may look frightening, but there is a reason for each part. The descriptions are on the following source listing, where each element is on it’s own line.

std::sort(                 <span class="co">// (1) Call to std::sort</span>
  first,                   <span class="co">// (2) First iterator</span>
  last,                    <span class="co">// (3) End iterator</span>
  std::bind(               <span class="co">// (4) Predicate, which is std::bind</span>
    &car_sort::sortYear,   <span class="co">// (5) Pointer to member method</span>
    sort_instance,         <span class="co">// (6) Instance of struct</span>
    std::placeholders::_1, <span class="co">// (7) Placeholder for argument 1</span>
    std::placeholders::_2  <span class="co">// (8) Placeholder for argument 2</span>
  )
);

While 1, 2 and 3 are the same as before, 4-8 are new and requires some explanation. Item 4 tells the compiler the predicate is an object produced by std::bind, and the predicate when called will use method 5, with instance 6, and put the first argument into placeholder _1 (7) and the second argument into placeholder _2 (8).

The output is still valid.

Sort by year, member method
1. Porsche 911 1999 (65000)
2. Volvo V70 2007 (20000)
3. Ford Mondeo 2010 (25000)
4. Volvo V90 2016 (55000)

Using a free method, std::bind or std::function is more likely to incur a runtime cost, because the sort predicate method to be called is essentially a function pointer. And as with any pointer, it may be changed during runtime. The compiler may be smart and optimize away the pointer, if and only if it can see the pointer will not be changed and can not be changed during runtime. This is a prime example for dynamic binding.

With lambdas, you’re more inclined to get static binding. That means the function to be called may and will be resolved at compile time. It’s not possible to change the static binding during runtime.

Sorting with template predicates

To the std::sort method we simply supply a template predicate! It’s just a method taking 2 arguments of that type and apply our logic to tell if the first argument is lesser than the last argument. With templates it becomes a little more complex, but still it’s very simple when you know how it works.

<span class="ot">#include <iostream>     </span><span class="co">// Provides cout</span>
<span class="ot">#include <vector>       </span><span class="co">// Provides our basic container</span>
<span class="ot">#include <algorithm>    </span><span class="co">// Provides std::sort</span>

<span class=“co”>// Structure to be sorted</span>

<span class=“kw”>template</span><<span class=“kw”>typename</span> T> <span class=“kw”>struct</span> generic_data { generic_data(size_t i, size_t t, T data) : id(i) , time(t), other_data(data) {} size_t id; size_t time; T other_data; };

<span class=“kw”>using</span> A = generic_data<<span class=“dt”>bool</span>>; <span class=“kw”>using</span> B = generic_data<std::string>; <span class=“kw”>using</span> C = generic_data<<span class=“dt”>char</span>*>;

<span class=“co”>// Our predicate</span> <span class=“co”>//</span> <span class=“kw”>template</span><<span class=“kw”>typename</span> CLASS> <span class=“dt”>bool</span> timeSortPredicate(<span class=“dt”>const</span> CLASS &a, <span class=“dt”>const</span> CLASS &b) { <span class=“kw”>return</span> a.time < b.time; }

<span class=“kw”>template</span><<span class=“kw”>typename</span> CLASS> <span class=“dt”>bool</span> idSortPredicate(<span class=“dt”>const</span> CLASS &a, <span class=“dt”>const</span> CLASS &b) { <span class=“kw”>return</span> a.id < b.id; }

<span class=“co”>// Our collection printer</span> <span class=“co”>//</span> <span class=“kw”>template</span><<span class=“kw”>typename</span> CLASS> <span class=“dt”>void</span> print(<span class=“dt”>const</span> CLASS &c) { <span class=“kw”>for</span> (<span class=“dt”>const</span> <span class=“kw”>auto</span> & item : c) { std::cout << <span class=“st”>“Id: “</span> << item.id << <span class=“st”>”, time: “</span> << item.time << <span class=“st”>”, data: “</span> << item.other_data << <span class=“st”>"</span><span class=“ch”>\n</span><span class=“st”>"</span>; } }

The method template_predicate_sort is a test method of some sort, showing how to use the templated predicates. The clue is to explicitly tell the compiler what type the predicate should have. In this case, either of type A, B or C.

<span class="dt">void</span> template_predicate_sort()
{
    <span class="co">// Construct collections by initalizer lists</span>
    std::vector<A> alist = {
        {<span class="dv">1</span>, <span class="dv">15</span>, <span class="kw">false</span>},
        {<span class="dv">2</span>,<span class="dv">20</span>, <span class="kw">true</span>},
        {<span class="dv">3</span>,<span class="dv">14</span>, <span class="kw">false</span>}
    };
std::vector&lt;B&gt; blist = {
    {&lt;span class="dv">3&lt;/span>, &lt;span class="dv">4&lt;/span>, &lt;span class="st">"first"&lt;/span>},
    {&lt;span class="dv">2&lt;/span>, &lt;span class="dv">10&lt;/span>, &lt;span class="st">"second"&lt;/span>},
    {&lt;span class="dv">1&lt;/span>, &lt;span class="dv">12&lt;/span>, &lt;span class="st">"third"&lt;/span>}
};

std::vector&lt;C&gt; clist = {
    {&lt;span class="dv">11&lt;/span>, &lt;span class="dv">0&lt;/span>, &lt;span class="st">"char literal 1"&lt;/span>},
    {&lt;span class="dv">12&lt;/span>, &lt;span class="dv">0&lt;/span>, &lt;span class="st">"char literal 2"&lt;/span>},
    {&lt;span class="dv">13&lt;/span>, &lt;span class="dv">0&lt;/span>, &lt;span class="st">"char literal 3"&lt;/span>}
};

&lt;span class="co">// Print stuff&lt;/span>
std::cout &lt;&lt; &lt;span class="st">"Original collections.&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;
std::cout &lt;&lt; &lt;span class="st">"Collection A&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;
print(alist);

std::cout &lt;&lt; &lt;span class="st">"Collection B&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;
print(blist);

std::cout &lt;&lt; &lt;span class="st">"Collection C&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;
print(clist);

std::cout &lt;&lt; &lt;span class="st">"Sorting collections by time&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;

sort(alist.begin(), alist.end(), timeSortPredicate&lt;A&gt;);
sort(blist.begin(), blist.end(), timeSortPredicate&lt;B&gt;);
sort(clist.begin(), clist.end(), timeSortPredicate&lt;C&gt;);

&lt;span class="co">// Print stuff&lt;/span>
std::cout &lt;&lt; &lt;span class="st">"Collection A&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;
print(alist);

std::cout &lt;&lt; &lt;span class="st">"Collection B&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;
print(blist);

std::cout &lt;&lt; &lt;span class="st">"Collection C&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;
print(clist);

std::cout &lt;&lt; &lt;span class="st">"Sorting collections by id&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;

sort(alist.begin(), alist.end(), idSortPredicate&lt;A&gt;);
sort(blist.begin(), blist.end(), idSortPredicate&lt;B&gt;);
sort(clist.begin(), clist.end(), idSortPredicate&lt;C&gt;);

&lt;span class="co">// Print stuff&lt;/span>
std::cout &lt;&lt; &lt;span class="st">"Collection A&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;
print(alist);

std::cout &lt;&lt; &lt;span class="st">"Collection B&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;
print(blist);

std::cout &lt;&lt; &lt;span class="st">"Collection C&lt;/span>&lt;span class="ch">\n&lt;/span>&lt;span class="st">"&lt;/span>;
print(clist);

}

Compile with:

g++ sortpredicate.cpp -o sortpredicate

The output should be:

Original collections.
Collection A
Id: 1, time: 15, data: 0
Id: 2, time: 20, data: 1
Id: 3, time: 14, data: 0
Collection B
Id: 3, time: 4, data: first
Id: 2, time: 10, data: second
Id: 1, time: 12, data: third
Collection C
Id: 11, time: 0, data: char literal 1
Id: 12, time: 0, data: char literal 2
Id: 13, time: 0, data: char literal 3
Sorting collections by time
Collection A
Id: 3, time: 14, data: 0
Id: 1, time: 15, data: 0
Id: 2, time: 20, data: 1
Collection B
Id: 3, time: 4, data: first
Id: 2, time: 10, data: second
Id: 1, time: 12, data: third
Collection C
Id: 11, time: 0, data: char literal 1
Id: 12, time: 0, data: char literal 2
Id: 13, time: 0, data: char literal 3
Sorting collections by id
Collection A
Id: 1, time: 15, data: 0
Id: 2, time: 20, data: 1
Id: 3, time: 14, data: 0
Collection B
Id: 1, time: 12, data: third
Id: 2, time: 10, data: second
Id: 3, time: 4, data: first
Collection C
Id: 11, time: 0, data: char literal 1
Id: 12, time: 0, data: char literal 2
Id: 13, time: 0, data: char literal 3

Sorting std::list

Sorting a std::list is almost the same as sorting a vector or a string, except with std::list the sort method is a member function. Except this difference, the std::list::sort method doesn’t accept a range, and it comes with two overloads. One without any predicate, and one with a predicate. The usage is identical to what was covered earlier.

Given a list of numbers, just call sort() with no arguments to sort by the default operator <.

std::list<<span class="dt">int</span>> numbers = { <span class="dv">0</span>,<span class="dv">9</span>,<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">8</span>,<span class="dv">4</span>,<span class="dv">6</span> };
numbers.sort();

The output is:

Numbers sorted: 0 1 2 4 6 8 9

The same applies to sorting a struct. For simplicity, we’ll use the car struct from before and the sortYear predicate from before. Using lambdas would be

std::list<car> cars = {
    { <span class="st">"Volvo"</span>, <span class="st">"V70"</span>, <span class="dv">2007</span>, <span class="fl">20000.0</span> },
    { <span class="st">"Porsche"</span>, <span class="st">"911"</span>, <span class="dv">1999</span>, <span class="fl">65000.0</span> },
    { <span class="st">"Ford"</span>, <span class="st">"Mondeo"</span>, <span class="dv">2010</span>, <span class="fl">25000.0</span> },
    { <span class="st">"Volvo"</span>, <span class="st">"V90"</span>, <span class="dv">2016</span>, <span class="fl">55000.0</span> }
};

<span class=“co”>// Sort by free method predicate</span> cars.sort(&sortYear);

<span class=“co”>// Sorting a list with a lambda as predicate</span> <span class=“kw”>auto</span> sortYearPredicate = [](<span class=“dt”>const</span> car & a, <span class=“dt”>const</span> car & b) -> <span class=“dt”>bool</span> { <span class=“kw”>return</span> a.year < b.year; };

cars.sort(sortYearPredicate);

The output is:

Cars sorted in a list
1. Porsche 911 1999 (65000)
2. Volvo V70 2007 (20000)
3. Ford Mondeo 2010 (25000)
4. Volvo V90 2016 (55000)

See Also

Comments

Any comments? Create a new discussion on GitHub.
There used to be an inline comment form here, but it was removed.