Welcome back again dear reader. You may remember that so far we've gotten our editor to read, display, and scroll around text. Unfortunately we're just short of having an actual editor, due to our lack of, umm, editability. Let's have a think about what it would mean to edit text. To edit text would mean that we have some text representation that we can arbitrarily insert and remove text from. The simplest approach to this would be to use a string array. In Zig, this would be std.ArrayList(u8), which is a heap allocated string of u8 characters.
It would be very simple and easy to work with a plain string, but there are certain circumstances where premature optimisation is appropriate; and that is when it is not premature but instead premeditated. See, we already know a string is going to be a performance bottleneck, it's well known to have a poor worst-case insertion performance of O(n), where n is the number of bytes. That's because if we insert a byte at an index, each of the following bytes have to be copied and moved along the string. Given that we'll likely be using our editor to make lots of insertion on lots of files (big and small), we must think of a better way.
We could instead break our string into fixed sized chunks, and then insert into each of the chunks instead. And this is a great optimisation, it'll improve the worst case insertion performance to, ..., hold on, let me check my notes... Ah, to O(n)! Wait, that's the same as a plain string, only now n is the number of chunks. To be fair, this approach is going to be hundreds of times faster than a plain string, but it's still not the best approach.
The best reasonable approach, come up by people much smarter than me, is a rope. A rope is a bit like a chunked string, but the chunks are stored within a tree instead of an array. Thanks to this, our insertion time will be O(log(n)), which is going to perform much better than the other option.
So in this post, we're going to build a rope, then we'll compare the roles of visitors and iterator in consuming these ropes. Rope mutations will be covered next time. It's going to be meaty; we'll discuss the allocations, branchless programming, data structure, and other topics that can only be taken advantage of by low-level systems-programming languages like Zig.
Chunks
So as mentioned earlier, our chunks are going to be a fixed size array of string characters. For us, this array is fixed size in memory, but we want to be able to modify its length like a normal string. We'll create a type for this called an ArrayVec which we'll end up using for much more than just the chunks.
pub fn ArrayVec(comptime T: type, comptime capacity: usize) type {
return struct {
const Private = struct { buffer: [capacity]T = undefined };
len: usize = 0,
private: Private = .{},
const Self = @This();
pub fn extend(self: *Self, items: []const T) Error!void {
if (self.len + items.len > capacity) return Error.OutOfRange;
@memcpy(
self.private.buffer[self.len .. self.len + items.len],
items,
);
self.len += items.len;
}
pub fn push(self: *Self, item: T) Error!void {
if (self.isFull()) return Error.OutOfRange;
self.private.buffer[self.len] = item;
self.len += 1;
}
pub fn insert(self: *Self, index: usize, item: T) Error!void {
if (self.isFull()) return Error.OutOfRange;
@memcpy(
self.private.buffer[index + 1 .. self.len + 1],
self.private.buffer[index..self.len],
);
self.private.buffer[index] = item;
self.len += 1;
}
pub fn pop(self: *Self) ?T {
if (self.len == 0) return null;
self.len -= 1;
return self.private.buffer[self.len];
}
pub fn slice(self: *const Self) []const T {
return self.private.buffer[0..self.len];
}
pub fn sliceMut(self: *Self) []T {
return self.private.buffer[0..self.len];
}
};
}
Simple enough right? It's basically a normal array with normal array operations, but at a fixed capacity rather than one that can grow infinitely. A lot of these operations are also cheap because all the memory is already allocated.
We can start off our chunk as something like this.
pub const MAX_BASE = 128;
text: ArrayVec(u8, MAX_BASE);
pub fn init(text: []const u8) arrayVec.Error!Chunk {
var self: Chunk = .{ .text = .{} };
try self.text.extend(text);
return self;
}
Summaries
Simples right? Enjoy it while it lasts, because things will get chunky. Let's think towards the future. Once we have our rope set up, we want to be able to do things like jump towards newlines or handle unicode codepoints. To support this we want to be able to provide a summary of this information from our chunk so that we can efficiently move around the rope. Let's focus on surmising newlines and the length of our chunk.
We'll keep track of the newlines of the text, and then return a summary that can be used by our tree. Let's have a look at this naive implementation, where we count the newlines in an array.
pub fn summary(self: *const Chunk) Summary {
return .{ .len = self.len(), .lines = self.lines() };
}
pub fn len(self: *const Chunk) usize {
return self.text.len;
}
pub fn lines(self: *const Chunk) Point {
var row = 0;
var column = 0;
for (self.text.slice()) |byte| {
if (byte == '\n') {
row += 1;
column = 0;
} else {
column += 1;
}
}
return .{ .row = row, .column = column };
}
Here is the point where I'd like us to think about optimisations. When we navigate and edit a rope, we're going to be querying the lines in a chunk a lot; it's a hot path. We're going to start by memoising the count over chars with a list of true/false values instead. This isn't where any performance improvement is happening yet, but it will be the first step in building something performant. Focus on the init and lines methods here.
text: ArrayVec(u8, MAX_BASE);
newlines: ArrayVec(bool, MAX_BASE);
pub fn init(text: []const u8) arrayVec.Error!Chunk {
// Store a list of booleans, where `true` at an index
// matches a newline in `text`.
var newlines: [MAX_BASE]bool = undefined;
for (0.., text) |i, byte| {
newlines[i] = byte == '\n';
}
var self: Chunk = .{
.text = .{},
.newlines = .{},
};
try self.text.extend(text);
try self.newlines.extend(newlines);
return self;
}
pub fn summary(self: *const Chunk) Summary {
return .{ .len = self.len(), .lines = self.lines() };
}
pub fn len(self: *const Chunk) usize {
return self.text.len;
}
pub fn lines(self: *const Chunk) Point {
const newlines = self.newlines.slice();
// The row is the count of set newlines
var row = 0;
for (self.newlines.slice()) |newline| {
row += if (newline) 1 else 0;
}
// The column is the number of set of non-newlines after
// the final newline
var column = 0;
for (0..newlines.len) |i| {
const index = newlines.len - i - 1;
const newline = newlines[index];
if (newline) break;
column += 1;
}
return .{ .row = row, .column = column };
}
pub const Point = struct {
row: usize,
column: usize,
pub fn zero() Point {
return .{ .row = 0, .column = 0 };
}
};
pub const Summary = struct {
len: usize,
lines: Point,
pub fn zero() Summary {
return .{ .len = 0, .lines = .zero() };
}
pub fn add(self: *const Summary, other: *const Summary) Summary {
return .{
.len = self.len + other.len,
.lines = self.lines.add(&other.lines),
};
}
pub fn sub(self: *const Summary, other: *const Summary) Summary {
return .{
.len = self.len - other.len,
.lines = self.lines.sub(&other.lines),
};
}
}
Looking at this you're probably thinking I'm pretty silly. Why store 128 bytes of booleans in a list when we were just iterating over the text? And if you thought that, well I'd say you're pretty mean. And then I'd introduce to you the concept of the humble bitmap. See a list of 128 booleans is kinda like a list of 128 ones and zeros; and you know what else is a list of 128 ones and zeros? Yep, a u128! And as we'll see, we can operate much faster in an integer than a list.
Conceptually, you can view a bitmaps as follows.
Text: "Hi\nthere\n"
Bitmap: 0b00_100000_1
Little-endian: 0b100000100
Popcount: 2
Clz (Count leading zeroes): 0
Let's rework our init method a little bit now.
const Bitmap = u128;
const MAX_BASE = @bitSizeOf(Bitmap);
text: ArrayVec(u8, MAX_BASE);
newlines: Bitmap;
pub fn init(text: []const u8) arrayVec!Chunk {
// Create a list of `u8` bytes that maps to a `u128`
const SUB_CHUNK_SIZE = @bitSizeOf(u8);
var newlinesBytes = std.mem.zeroes([MAX_BASE / SUB_CHUNK_SIZE]u8);
var chunkIndex: u8 = 0;
var bytes = text;
while (bytes.len > 0) {
// Iterate through `text`, 8 bytes at a time
const subChunk = bytes[0..@min(bytes.len, SUB_CHUNK_SIZE)];
bytes = bytes[@min(bytes.len, SUB_CHUNK_SIZE)..];
// Set each bit in the sub-bitmap from right to left
var newlines: u8 = 0;
for (0.., subChunk) |i, byte| {
const index: u3 = @intCast(i);
const newline: u8 = @intCast(@intFromBool(byte == '\n'));
newlines |= newline << index;
}
newlinesBytes[chunkIndex] = newlines;
chunkIndex += 1;
}
var self = .{
.text = .{},
// Convert the list of `u8` to a `u128` in little-endian.
// Note our bitmap is the reverse of `text`.
.newlines = std.mem.readInt(Bitmap, &newlinesBytes, .little),
};
try self.text.extend(text);
return self;
}
Okay, we're able to shrink newlines property a decent amount, but was all of that really worth it? Well where bitmaps really shine is that they're just a number, which means we use CPU instructions which are much faster than our manual implementations.
If you have a look back at our old lines method you'll notice we have some for and if blocks, which are branches. Modern CPUs try to predict branches and if they guess wrong, they have to backtrack, which is slow.
Integers like u128 have special CPU-level instructions, which we can use to make our method branchless.
pub fn lines(self: *const Chunk) Point {
// Slice bitmap by dropping leading bits.
const bitshift: std.math.Log2Int(Bitmap)
= @intCast(@max(MAX_BASE, self.text.len) - self.text.len);
const newlines = self.newlines << bitshift;
// The row is the count of set newlines
var row = @popcount(newlines);
// The column is the number of set of non-newlines after
// the final newline.
// NOTE: If `@popCount` is zero, then `newlines` is zero and
// `@clz` will need to be clamped to `self.text.len`.
const column = @min(@clz(newlines), self.text.len);
return .{ .row = row, .column = column };
}
Now we have a summary for our chunk that can quickly be derived from the chunk's data. You could test these different implementations if you like and compare how well they perform.
test lines {
var chunk: Chunk = .{};
try chunk.extend("Hello\nWorld!\n");
const before = std.time.nanoTimestamp();
for (0..1_000_000) |_| {
_ = chunk.lines();
}
const after = std.time.nanoTimestamp();
std.debug.print("Time: {d}ns\n", .{after - before});
}
| Implementation | 0.1kb (ms) | 1kb (ms) | 2kb (ms) | 4kb (ms) |
|---|---|---|---|---|
| Iterating over source | 1399 | 2656 | 5328 | 10480 |
| Branchless bitmap | 107 | 108 | 106 | 112 |
Sum-tree
All of this summary stuff wasn't for nothing. See we're not just working with your Aunty's boring old B-Tree, we're gonna work with a sum-tree (which is a boring old B-Tree but each node stores a summary of its children). Sum-trees a great for anything with a cursor-like problem (like an editor!). I've followed in the footsteps of Zed who have a great write up on sum-tree ropes and why they work well. When we can ask a node "Hey how many lines" do you have and it says "40", then we know whether to skip across, back, or within it. Ours will look a little like this.
pub const TREE_MAX = 16;
pub fn SumTree(comptime T: type) type {
return union(enum) {
pub const Internal = struct {
height: usize,
summary: T.Summary,
childSummaries: ArrayVec(T.Summary, TREE_MAX),
childTrees: ArrayVec(*SumTree(T), TREE_MAX),
};
pub const Leaf = struct {
summary: T.Summary,
itemSummaries: ArrayVec(T.Summary, TREE_MAX),
items: ArrayVec(T, TREE_MAX),
};
internal: Internal,
leaf: Leaf,
const Self = @This();
pub fn init(gpa: std.mem.Allocator) !*Self {
const self = try gpa.create(Self);
self.* = .{ .leaf = .{
.summary = .zero(),
.itemSummaries = .{},
.items = .{},
} };
return self;
}
pub fn deinit(self: *Self, gpa: std.mem.Allocator) void {
switch (self.*) {
.internal => |*internal| {
for (internal.childTrees.slice()) |child| {
child.deinit(gpa);
}
},
.leaf => {},
}
gpa.destroy(self);
}
};
}
We're going to need a way to populate our tree as well, which will be done by pushing each chunk like a normal B-Tree. With each push we try to add it to the end of the last open leaf in the tree, and if there's no space in the current tree we increase the height of the tree. As we do so, we'll keep track of the summaries, which we can then use to navigate the tree by.
pub fn push(self: *Self, gpa: std.mem.Allocator, item: T) !void {
if (try self.pushFixed(gpa, item)) |right| {
// There's no space in the tree, so increase the
// height.
// Change self from `old-self` to `self->old-self`
// ` ->right `
const left = try gpa.create(Self);
std.mem.swap(Self, self, left);
var internal: Self.Internal = .{
.height = left.height() + 1,
.summary = left.summary().add(&right.summary()),
.childSummaries = .{},
.childTrees = .{},
};
internal.childTrees.push(left);
internal.childTrees.push(right);
internal.childSummaries.push(left.summary());
internal.childSummaries.push(right.summary());
self.* = .{ .internal = internal };
}
}
fn pushFixed(self: *Self, gpa: std.mem.Allocator, item: T) !?*Self {
switch (self.*) {
.leaf => |*leaf| {
const itemSummary = item.summary();
// If there's room, add it
if (leaf.items.len < TREE_MAX) {
try leaf.items.push(item);
try leaf.itemSummaries.push(itemSummary);
leaf.summary = leaf.summary.add(&itemSummary);
return null;
}
// Otherwise, split it
const newLeaf: Self.Leaf = .{
.summary = itemSummary,
.items = .{},
.itemSummaries = .{},
};
try newLeaf.items.push(item);
try newLeaf.itemSummaries.push(itemSummary);
// Return leaf to be handled by caller
const node = try gpa.create(Self);
node.* = .{ .leaf = newLeaf };
return node;
},
.internal => |*internal| {
const lastChild = internal.childTrees.lastMut()
orelse return Error.EmptyInternal;
const itemSummary = item.summary();
// Push and handle split
if (try lastChild.*.push(gpa, item)) |newChild| {
// If child can fit into this node, push it
if (internal.childTrees.len < TREE_MAX) {
try internal.childTrees.push(newChild);
try internal.childSummaries.push(itemSummary);
internal.summary = internal.summary.add(itemSummary);
return null;
}
// Otherwise, create a new node and propogate to
// parent
var newInternal: Self.Internal = .{
.height = internal.height,
.summary = childSummary,
.childSummaries = .{},
.childTrees = .{},
};
try newInternal.childTrees.push(newChild);
try newInternal.childSummaries.push(itemSummary);
// Return node to be handled by caller
const node = try gpa.create(Self);
node.* = .{ .internal = newInternal };
return node;
} else {
// The item was pushed, so update the summary
const lastChildSummary = lastChild.*.summary();
const oldLastChildSummary = internal.childSummaries.lastMut().?;
const difference = lastChildSummary.sub(&oldLastChildSummary);
oldLastChildSummary.* = lastChildSummary;
internal.summary = internal.summary.add(&difference);
return null;
}
},
}
}
Rope
Our rope is essentially going to be a wrapper around a sum-tree of chunks. Our rope will have some extra methods for navigating and editing the rope.
pub const Tree = SumTree(Chunk);
tree: *Tree,
gpa: std.mem.Allocator,
const Rope = @This();
pub fn init(gpa: std.mem.Allocator) !Rope {
return .{
.tree = try .init(gpa);
.gpa = gpa
};
}
pub fn deinit(self: *Rope) void {
self.tree.deinit(self.gpa);
}
pub fn read(self: *Rope, reader: *std.Io.Reader) !void {
var buffer: [Chunk.MAX_BASE]u8 = undefined;
while (true) {
const bytes = try reader.readSliceShort(&buffer);
try self.push(buffer[0..bytes]);
if (bytes < buffer.len) break;
}
}
pub fn push(self: *Rope, text: []const u8) !void {
var offset: usize = 0;
while (offset < text.len) {
const chunkSize = @min(Chunk.MAX_BASE, text.len - offset);
const chunkText = text[offset .. offset + chunkSize];
var chunk: Chunk = .{};
try chunk.extend(chunkText);
offset += chunkSize;
try self.tree.push(chunk);
}
}
In order for us to do anything useful with this we need to be able to write it, slice it, and (in the next post) edit it.
Visitors
Let's compare the concept of iterators and visitors. Iteration is the best method for navigating serial data, like a string, by iterating through each contiguous item next to each other in memory. But because tree's aren't contiguous in memory, we have to create a stack to manage navigating the tree. We could allocate this stack as an array, but conveniently computers come built-in with a super speedy stack that can be allocated to by making a function call. This is what a visitor takes advantage of, it recursively visits each item in the tree and executes a callback for each item, much like an iterator.
Let's say we want to write a format function. We'll recursively visit each node and fire a callback for each leaf we encounter.
pub fn format(self: *const Self, writer: *std.Io.Writer) std.Io.Writer.Error!void {
return formatTree(&self.tree, writer);
}
fn formatTree(node: *const Tree, writer: *std.Io.Writer) std.Io.Writer.Error!void {
switch (node.*) {
.internal => |*internal| {
for (internal.childTrees.slice()) |child| {
try formatTree(child);
}
}
.leaf => |*leaf| {
for (leaf.items.slice()) |chunk| {
const s = chunk.slice();
_ = try writer.write(s);
}
}
}
}
This works, but we'll probably want to write many visitors like this with a similar structure. What if some want to throw errors and some not? What if some need to provide a result or track state? A Visitor is a way to provide these arbitrary requirements without requiring boilerplate.
We're going to rip right into a generic visitor here. It's going to take the type of each node T, a context C (which can represent some state, input, or return value), and an optional error E. We can then either traverse it forwards or backwards, which will be useful when we work with cursor movements in future.
pub fn Visitor(comptime T: type, comptime C: type, comptime E: ?type) type {
// A visitor can optionally some error, which can act like `break`
const R = if (E) |_E| _E!void else void;
return struct {
visitInternal: ?fn (*const SumTree(T).Internal, C) R,
visitLeaf: ?fn (*const SumTree(T).Leaf, C) R,
pub fn visit(self: *const Self, node: *const SumTree(T), context: C) R {
return self.visitDirection(false, node, context);
}
pub fn visitReverse(self: *const Self, node: *const SumTree(T), context: C) R {
return self.visitDirection(true, node, context);
}
pub fn visitDirection(self: *const Self, comptime reverse: bool, node: *const SumTree(T), context: C) R {
switch (node.*) {
.internal => |*internal| {
// Visit self, then recursivly visit children
if (self.visitInternal) |f| {
if (E) |_| {
try f(internal, context);
} else {
f(internal, context);
}
}
const children = internal.childTrees.slice();
for (0..children.len) |i| {
const child = if (reverse) children[children.len - i - 1] else children[i];
if (E) |_| {
try self.visit(child, context);
} else {
self.visit(child, context);
}
}
},
.leaf => |*leaf| {
// Visit self
if (self.visitLeaf) |f| {
return f(leaf, context);
}
}
}
}
}
}
This looks like it wouldn't be as fast as an iterator, because we have so many if statements, but because so much information here is comptime so Zig should be able to omit most of the information. Imagine we create a visitor like this.
const ThisVisitor = Visitor(Chunk, void null);
const visitor: ThisVisitor = .{
visitLeaf = blackBox,
visitInternal = null,
}
visitor.visit(tree);
Then we can imagine Zig would generate a visitor something like this.
const ThisVisitor = struct {
visitLeaf: fn (*const SumTree(Chunk).Leaf) void,
pub fn visit(self: *const Self, node: *const SumTree(Chunk)) void {
switch (node.*) {
.internal => |*internal| {
// Visit self, then recursivly visit children
for (children) |child| {
self.visit(child, context);
}
},
.leaf => |*leaf| {
// Visit self
return f(leaf, context);
}
}
}
}
Now if we wanted to be able to format our rope with std.debug.print("{f}", .{ rope }) we could create the following method.
pub fn format(self: *const Rope, writer: *std.Io.Writer) std.Io.Writer.Error!void {
const visitor: Visitor(Chunk, *std.Io.Writer, std.Io.Writer.Error) = .{
.visitInternal = null,
.visitLeaf = formatVisitor,
};
return visitor.visit(self.tree, writer);
}
fn formatVisitor(leaf: *const Tree.Leaf, context: *std.Io.Writer) std.Io.Writer.Error!void {
for (leaf.items.slice()) |chunk| {
const s = chunk.slice();
_ = try context.write(s);
}
}
In Summary
By now we should be able to create a rope and print it.
test Rope {
var gpa: std.heap.GeneralPurposeAllocator(.{}) = .{};
defer _ = gpa.deinit();
// input
const source = "Hello\nWorld!\n";
// output
var buffer: [13]u8 = undefined;
var writer = std.Io.Writer.fixed(&buffer);
// create
var rope: Rope = try .init(gpa.allocator());
defer rope.deinit();
try rope.push(source);
// format
try writer.print("{f}", .{rope});
try std.testing.expectEqualStrings(source, &buffer);
}
That's the extent we'll explore here as the next step on our agenda is to be able to slice and edit our ropes by adding methods like insert, replace, or remove. It might feel like we didn't get as far as previous posts, but there's a lot in which we've covered.
We've gone deep into sum-trees and (apologies) got very code heavy, but we've only scratched the surface of ropes so far. While looking into what's needed for writing an editor, I personally found sum-trees particularly interesting. They seem like a fairly simple data structure that's seldom used but could be useful for so many applications.
And if you haven't worked with trees much before, you'll probably find the building and visiting of trees will come up frequently as you use them more and more.
I also hope you've been able to take away some of the advantages of writing low-level code and meta-programming.