I’ve spent a few months mostly working on an enormous branch, so Subspace hasn’t seen a lot of
changes on main
. Today I merged a lot of stuff, which makes it a good time to mention some of the
new things.
Slices and Vec
Micro post: https://sunny.garden/@blinkygal/110289198753682615
Today in Subspace I finished and merged all the methods for Slice and SliceMut.
Slice<T>
is a const reference to a contiguous range of const T
(like a std::span<const T>
),
while SliceMut<T>
is a mutable reference to a contiguous range of T
(like a std::span<T>
).
Why are these different types? First, this allows the default, shorter wording to be the safer and
preferred one: const is default. Secondly, it’s not possible to sort()
on a Slice, but if they
were one type, it would require all methods that require mutable access to the pointed-to range to
be qualified with requires (!std::is_const_v<T>)
. The API gets harder to understand and to use as
a result. This way, the SliceMut type has methods that allow mutation, and Slice simply does not.
There are 95 methods in all! I started working on these two months ago on March 3. The Pull Request is here: https://github.com/chromium/subspace/pull/218/.
I’m kinda relieved to finally get through them all. There’s lots of useful stuff in there though, including:
- searching
- sorting
- splitting
- sub-slicing with ranges (including range literals)
- iterating (forward and reverse) on splits, chunks, and sliding windows
- joining
- filling or repeating
- working with prefixes and suffixes
- swapping
- reordering
These methods all appear on Vec
as well, which is an owning Slice. In addition, Vec
is usable in all places that a Slice
or
SliceMut
(if the Vec
is not const) would be usable (except as an rvalue, which would allow
references to escape from a temporary object).
And similarly SliceMut
is usuable anywhere a Slice
would be.
I am sure There’s lots of room for performance tuning, and some TODOs left in this regard. As always everything comes with unit tests, so making future changes can be done with confidence.
Range literals
In the C++ standard library, the std::span
type allows subslicing through the subspan(offset)
and subspan(offset, count)
overloads (implemented as a default argument). Then,
s.subspan(offset)
in C++ is equivalent to &s[offset..]
in Rust and s.subspan(offset, count)
in
C++ is equivalent to &s[offset..(offset + count)]
in Rust.
But Rust is more expressive here. Its RangeBounds can describe any possible range
with or without a front edge or back edge. ..
is everything, 2..
starts from 2 and goes to the
end, ..5
starts at the beginning and stops at 5 (excluding 5), and 2..5
starts at 2 and ends at
5 (excluding 5). It even allows 2..=5
to include 5 in the bounds.
Instead of providing a bunch of overloads for this type of expressivity, Subspace slices (and Vec)
can be subsliced using the operator[](sus::ops::RangeBounds<usize> auto)
operator, which aborts if
given a range out of bounds, or get_range(sus::ops::RangeBounds<usize> auto)
and
get_range_mut(sus::ops::RangeBounds<usize> auto)
which return a sus::Option
holding nothing
if out of bounds or a Slice
(or SliceMut
respectively) otherwise.
The RangeBounds
concept is satisfied by a set of types provided in sus::ops
that specify
a start, an end, both, or neither:
sus::ops::Range
has a start and end (exclusive).sus::ops::RangeFrom
has a start, with an unbounded end.sus::ops::RangeTo
has an end, with an unbounded start.sus::ops::RangeFull
represented an unbounded start and end.
For dynamic values, these can be constructed from numeric values as they are aggegrate types. For constant values, a literal syntax is provided. C++ doesn’t let us have all the nice things, so we can’t use the exact literal syntax from Rust, as C++ requires non-quoted literals to parse as numbers even if you’re going to do the parsing yourself. But we get this:
"3..7"_r
is aRange<usize>
from 3 to 7 (exclusive)."3..=7"_r
is aRange<usize>
from 3 to 7 (inclusive)."3.."_r
is aRangeFrom<usize>
that starts at 3."..7"_r
is aRangeTo<usize>
that ends at 7 (exclusive).".."_r
is aRangeFull<usize>
that represents an unbounded range.
Range types can be modified to produce new ranges with the .start_at()
and .end_at()
methods.
These change, or add, a start or end bound, changing the type of the object when a new bound is
added. For instance ::sus::ops::RangeFull().start_at(3)
makes a ::sus::ops::RangeFrom(3)
.
Mixing literal values with dynamic values is possible as well with these methods, such as a range
that starts at 5, but ends at x
: "5.."_r.end_at(x)
.
Certainly this is nowhere as nice as Rust’s 5..x
syntax. Maybe C++ can give us that one day. But I
want to provide the tools to write literal ranges when code authors would benefit from doing so, and
the range type aggregate constructors are always available too.
Bringing this back to slices, a Vec can produce a slice for any range using its operator[]
,
get_range()
or get_range_mut()
:
auto vec = sus::Vec<i32>::with_values(1, 2, 3, 4, 5);
auto s1 = vec[".."_r]; // A SliceMut over [1, 2, 3, 4, 5].
auto s2 = vec["3.."_r]; // A SliceMut over [4, 5].
auto s3 = vec["..3"_r]; // A SliceMut over [1, 2, 3].
auto s4 = vec["2..3"_r]; // A SliceMut over [3].
Lastly, for signed ranges, there’s also a _rs
literal suffix that produces ranges over isize
instead of usize
, such as "3..12"_rs
to make sus::ops::Range<isize>(3, 12)
. But slices always
work with ranges over usize
.
Iterators
While I was slogging through the slice methods, I also added some exciting iterator features.
Generator functions
Micro post: https://sunny.garden/@blinkygal/110044818048764779
It’s now possible to write an Iterator that will compose nicely with all the existing Iterator methods, but as a single function, instead of having to write a whole type. This is thanks to C++ coroutines, as they allow writing a generator function to modify iteration.
This provides a means to write “control flow” into an iteration, in the spirit of https://without.boats/blog/the-registers-of-rust/.
Here’s the generator unit tests to demonstrate what I mean.
Here the x()
function produces an iterator over the set 1, 2, 3, 4
. This is composed with the
filter()
method that keeps only things in the range (1, 4)
(exclusive). The result is 2, 3
.
TEST(IterGenerator, ComposeFromGenerator) {
auto x = []() -> Generator<i32> {
co_yield 1;
co_yield 2;
co_yield 3;
co_yield 4;
};
// Generator is trivially relocatable, so no need to box() it.
sus::iter::Iterator<i32> auto it =
x().filter([](const i32& a) { return a > 1 && a < 4; });
EXPECT_EQ(it.next().unwrap(), 2);
EXPECT_EQ(it.next().unwrap(), 3);
EXPECT_EQ(it.next(), sus::None);
}
Here the x()
function composes with an input iterator, and it does the same as the filter()
call above, discarding anything outside the range (1, 4)
(exclusive). It is chained with an
iterator from a Vec<i32>
that contains 1, 2, 3, 4
and again the result is 2, 3
.
TEST(IterGenerator, ComposeIntoGenerator) {
auto x = [](sus::iter::Iterator<i32> auto it) -> Generator<i32> {
for (i32 i : it) {
if (i > 1 && i < 4) co_yield i;
}
};
sus::iter::Iterator<i32> auto it =
sus::vec(1, 2, 3, 4).construct<i32>().into_iter().generate(x);
EXPECT_EQ(it.next().unwrap(), 2);
EXPECT_EQ(it.next().unwrap(), 3);
EXPECT_EQ(it.next(), sus::None);
}
Enumerate
Micro post: https://sunny.garden/@blinkygal/110223458907555381
I also have added Iterator::enumerate
. Adding the sus::iter::Enumerate
type was fairly trivial,
but in order to iterate backwards as a sus::iter::DoubleEndedIterator
, it needs to know the total
length of the iteration sequence. That is, the input iterator needs to also be a
sus::iter::ExactSizeIterator
. While these concepts were already present in the library, I had not
plumbed them through composing iterator types. So that is now done.
This means, for example, calling filter()
or map()
on a DoubleEndedIterator
will produce
another DoubleEndedIterator
. And when possible, such as when calling rev()
on an
ExactSizeIterator
, the resulting reversed iterator will also be an ExactSizeIterator
.
And in the case of enumerate()
, if the input iterator was a DoubleEndedIerator
and
ExactSizeIterator
, the output iterator will also be.
Notably, iterators over a Vec
, Array
or slice are double-ended and have an exact known size and
thus satisfy these traits. The same will be true for most
Here’s an example where we have an iterator over 1, 2, 3, 4, 5
. It is double-ended and its exact
size is known. The iterator is reversed with rev()
, so it will output 5, 4, 3, 2, 1
. Then it
is composed with enumerate()
which means each step includes the position in the iteration
sequence.
auto chars = Vec<char>::with_values('a', 'b', 'c', 'd', 'e');
{
auto it = sus::move(chars).into_iter().rev().enumerate();
// enumerate() makes an iterator over a Tuple of position and value.
using E = sus::Tuple<usize, char>;
static_assert(sus::iter::Iterator<decltype(it), E>);
// The output iterator of enumerate() is a DoubleSidedIterator and
// ExactSizeIterator.
static_assert(sus::iter::DoubleEndedIterator<decltype(it), E>);
static_assert(sus::iter::ExactSizeIterator<decltype(it), E>);
// The output the position paired with the reversed values.
for (auto [pos, val]: it) {
// Prints:
// (0, e)
// (1, d)
// (2, c)
// (3, b)
// (4, a)
std::cerr << "(" << size_t{pos} << ", " << val << ")\n";
}
}
Side note: In this example, the usize
position is casted to a primitive to print, as I did not yet
decide on a path for integrating with streams or
to otherwise print things.
Stdlib Ranges
Subspace’s first major compatability hook with the C++ standard library was added, with conversions
from sus::iter::Iterator
types to C++ ranges.
Calling .range()
on any iterator will produce a std::ranges::range
output, which satisfies the
std::ranges::input_range
concept.
Here are some unit tests that demonstrate the behaviour, showing that the output from .range()
is
usable as a std::ranges::viewable_range
and a std::ranges_input_range
. In these examples, the
iterator owns the values as vec
is consumed to produce the iterator through into_iter()
.
TEST(CompatRanges, ViewableRange) {
sus::Vec<i32> vec = sus::vec(1, 2, 3, 4, 5, 6);
// all() requires a `std::ranges::viewable_range`.
auto view = std::ranges::views::all(sus::move(vec).into_iter().range());
i32 e = 1;
for (i32 i : view) {
EXPECT_EQ(e, i);
e += 1;
}
EXPECT_EQ(e, 7);
}
TEST(CompatRanges, InputRange) {
sus::Vec<i32> vec = sus::vec(1, 2, 3, 4, 5, 6);
// filter() requires a `std::ranges::input_range`.
auto filter = std::ranges::views::filter(sus::move(vec).into_iter().range(),
[](const i32& i) { return i > 3; });
i32 e = 4;
for (i32 i : filter) {
EXPECT_EQ(e, i);
e += 1;
}
EXPECT_EQ(e, 7);
}
Closures/Callable concepts
Fn
, FnMut
and FnOnce
are now concepts, not concrete types, just as the same are traits in
Rust.
Most notably they manage to take a function signature syntax in their template arguments while
still producing useful errors when given a non-matching type. This was rather non-trivial as C++
concepts do not allow partial specialization, and referring to structs in a concept stops good error
propogation.
Also the template arguments allow some simple pattern matching.
sus::fn::Fn<sus::fn::Anything(i32)> auto
will match any const-callable thing that takes an i32
,
regardless of its return type. And sus:fn::Fn<sus::fn::NonVoid(i32)> auto
will match any
const-callable thing that takes an i32
input and produces some non-void output.
I spent a week on this rewrite, which also
introduced lightwight FnRef
, FnMutRef
and FnOnceRef
that are concrete types satifying the
above concepts and which refer to a function pointer or callable object like a lambda without
owning any state. These types are super useful to receive a callable that will be used during a
function but not stored beyond the lifetime of the function. And they don’t require your function
to become a template, which will help keep down compile times.
Most Subspace methods that receive a callable now use these “Ref” implementation types instead of
the concepts. Though limitations of type deduction still mean we need to use concept types in some
cases, where the callable’s type signature is templated. For example with Option::map()
where the
return type of the callable determines the output of the map()
function, using FnRef<R(T&&)>
means the compiler can not deduce the R
type:
note: 'Option<R> Option<int>::map(sus::fn::FnRef<R(T &&),>) noexcept &&':
could not deduce template argument for 'sus::fn::FnRef<R(T &&),>'
The result is some pretty unweildly C++ concept bounds, and there’s a ton of space for C++ to improve here in the future:
template <::sus::fn::FnOnce<::sus::fn::NonVoid(T&&)> MapFn, int&...,
class R = std::invoke_result_t<MapFn&&, T&&>>
constexpr Option<R> map(MapFn&& m) && noexcept {...}
The “Ref” types also can’t be used in a constexpr context (they rely on reinterpret_cast), but the concepts can be.
The old stateful, heap-allocated capturing types have been renamed to FnBox
, FnMutBox
, and
FnOnceBox
and they still exist for storing a callable past the lifetime of a function. For
instance, Iterator::map()
stores its callable as a FnMutBox
on its output iterator. These
types reason for being is to try provide guardrails for binding values instead of pointers, but
these APIs definitely need some love and/or reworking.
Hopefully I will write a longer post about how the implementation of these concepts works, and the evolution of all these types.