libostd/doc/ranges.md

10 KiB

Ranges

@brief Ranges are the backbone of OctaSTD iterable objects and algorithms.

What are ranges?

Standard C++ has an iterator system. The iterators are designed to mimic pointers API-wise and if you need to represent a range from A to B, you need a pair of iterators. Various APIs that work with iterators use this and take two iterators as an argument.

However, a system like this can be both hard to use and hard to deal with in custom objects, because you suddenly have your state split into two things. That's why OctaSTD introduces ranges, inspired by D's range system but largely designed from scratch for C++.

A range is a type that represents an interval of values. Just like with C++ iterators, there are several categories of ranges, with each enhancing the previous in some way.

Implementing a range

Generally, there are two kinds of ranges, input ranges and output ranges. Input ranges are ranges you read from and output ranges are ranges you write into. There are several categories that extend input ranges, namely forward ranges, bidirectional ranges, infinite random access ranges, finite random access ranges and contiguous ranges. These actually more or less map to the corresponding iterator categories. You can also have any input range meet the requirements for an output range. These are called mutable ranges, similarly to C++ mutable iterators.

Characteristics of an input range

    #include <ostd/range.hh>

    struct my_range: ostd::input_range<my_range> {
        using range_category  = ostd::input_range_tag;
        using value_type      = T;
        using reference       = T &;
        using size_type       = size_t;
        using difference_type = ptrdiff_t;

        my_range(my_range const &);
        my_range &operator=(my_range const &);

        bool empty() const;
        void pop_front();
        reference front() const;

        // optional methods with fallbacks
        void pop_front_n(size_type n);
    };

This is what any input range is required to contain. An input range is the simplest readable range type. The main thing to consider with simple input ranges is that if you make the copy of the range, it's not required to be independent of the range it's copied from. Therefore, a simple input range cannot be used in elaborate multi-pass algorithms, but it's useful for stuff like ranges over I/O streams, where the current state is determined by the backing stream for all ranges that point to it and thus all ranges change when the stream or any of the ranges do.

Ranges typically don't store the memory they represent a range to. Instead, they store a reference to some backing object and therefore their lifetime relies on the backing object. This does not have to be the case though, there can also be range types that are completely independent, for example ostd::number_range. But typically it is the case.

But let's take a look at the structure first.

    struct my_range: ostd::input_range<my_range>

Any input range (forward, bidirectional etc too!) is required to derive from ostd::input_range like this. The type provides various convenience methods as well as fallbacks for optional methods. Please refer to its documentation for more information. Keep in mind that none of the provided methods are virtual, so it's not safe to call them while expecting the overridden variants to be called.

    using range_category = ostd::input_range_tag;
    using value_type      = T;
    using reference       = T &;
    using size_type       = size_t;
    using difference_type = ptrdiff_t;

Any input range is required to have a series of types that define its traits.

The range_category alias defines the capabilities of the range. The possible values are ostd::input_range_tag, ostd::forward_range_tag, ostd::bidirectional_range_tag, ostd::random_access_range_tag, ostd::finite_random_access_range_tag and ostd::contiguous_range_tag.

The value_type alias defines the type the values have in the sequence. It's not used by plain input ranges, but it can still be used by algorithms and it's used in more elaborate range types.

The reference alias defines a reference type for value_type. It's what's returned from the front() method besides other places. Keep in mind that it does not necessarily have to be a T &. It can actually be a value type, or for example ostd::move_range represents it as an rvalue reference. So it really is just a type that is returned by accesses to the range, as accesses are typically not meant to be copy the contents (but they totally can).

The size_type alias represents the type typically used for sizes of the range the object represents. It's typically size_t. Similarly, the difference_type alias is used for a distance within the range. Usually it's ptrdiff_t, but for example for I/O stream range types it can be the stream's offset type.

Now let's take a look at the methods.

    my_range(my_range const &);
    my_range &operator=(my_range const &);

Any range type is required to be CopyConstructible and CopyAssignable. As previously mentioned, this does not have to mean that the internal states will be independent. With input ranges, it can all point to the same state. From forward ranges onwards, independence of state is guaranteed though.

    bool empty() const;

This method checks whether the range has any elements left in it. If this returns true, it means front() is safe to use. Safe code should always check; the behavior is undefined if an item is retrieved on an empty range.

    void pop_front();

This turns a range representing { a, b, c, ...} into { b, c, ... }, i.e. removes the first item from the range. The item typically still remains in the backing object for the range; but as an input range potentially modifies the internal state of its backing memory, it doesn't have to be the case. It's always the case for forward ranges onwards though.

Calling pop_front() is undefined when the range is empty. It could throw an exception but it could also be completely unchecked and cause an invalid memory access.

    reference front() const;

And finally, this retrieves the front item of the range, typically a reference to it but could be anything depending on the definition of the reference alias. Calling this is undefined when the range is empty(). It could be invalid memory access, it could throw an exception, it could blow up your house and kill your cat. Safe algorithms always need to check for emptiness first.

    void pop_front_n(size_type n);

There is one more, which is optional. This pops n values from the range. It has a default implementation in ostd::input_range which merely calls the pop_front() method n times. Custom range types are allowed to override this with their own more efficient implementations.

Output ranges

Output ranges are a different kind of beast compared to input ranges. I could cover them at the end but I'll do it now instead. This is the structure of an output range:

    #include <ostd/range.hh>

    struct my_range: ostd::output_range<my_range> {
        using value_type      = T;
        using reference       = T &;
        using size_type       = size_t;
        using difference_type = ptrdiff_t;

        my_range(my_range const &);
        my_range &operator=(my_range const &);

        void put(value_type const &v);
        void put(value_type &&v);
    };

    // optional
    template<typename IR>
    void range_put_all(my_range &output, IR input);

As you can see, they're much simpler than input ranges.

    using value_type      = T;
    using reference       = T &;
    using size_type       = size_t;
    using difference_type = ptrdiff_t;

Why is there no range_category here? Well, it's already defined by the ostd::output_range it derives (and has to derive) from. We already know that it will always be ostd::output_range_tag. Might as well avoid specifying it always.

Output ranges are always copyable, just like input ranges. There are no rules on state preservation.

    void put(value_type const &v);
    void put(value_type &&v);

This will insert a value into the output range. Typically, it will trigger some writing into a backing container. There are no guarantees on how the position will be affected in other input ranges. Most frequently, all output ranges pointing to a container of some sort will just do some kind of append.

    template<typename IR>
    void range_put_all(my_range &output, IR input);

This optional method has a default implementation in the library which simply goes over the input and calls something like output.put(input.front()) for each item of input. You're free to specialize this (argument-dependent lookup is performed by library calls to this) with a more efficient implementation if you wish.

Additionally, any input range is allowed to implement the output range interface. If the library ever does any checks for whether the given range is an output range, it will do a check based on capabilities of the range rather than just its category. Input ranges that implement output range interface are called mutable input ranges. Most frequently, their r.put(v); will be the same as r.front() = v; r.pop_front(); but that is not guaranteed.

Additionally, put(v) is always well defined. When it fails (for example when there is no more space left in the container), an exception will be thrown. The type of exception that is thrown depends on the particular range and the container it backs. When the container is unbounded, it might also never throw. Either way, the range type is required to properly specify its behavior. Throwing a custom exception type is a good thing because it lets algorithms put(v) into ranges without checking and if an error happens and the exception propagates, the user can check it simply as if it were an exception thrown from the container. This makes output ranges easy to work with, in many cases it would be otherwise very difficult to handle the errors. Also, it makes it easy to never have to handle any errors, simply by using output ranges backed by unbounded containers, for example ostd::appender_range.