Posted on

When it comes to extending user applications via plugins, the quality of an API can make or break the process. If the API is too obtuse to use, it may alienate people and can curb the popularity of your project.

With that in mind, when it came to duat (the text editor which this blog is about), I had to think very deeply about many aspects of what can make an API pleasant to use. This post is about the sharing of state, and how to do it in a non-intrusive way.

Since the text editor is written and configured in Rust, this will go over my perspective on how to make use of rust's systems to make the API, trying to maximize usefulness and minimize friction, even for newcomers to the language (and low level programming as a whole).

Sharing State

There are many ways one can design an API. The method that I chose for Duat is to share as much state as possible, for the following reasons:

  • It's convenient for the end user if they can just access whatever information they want with little effort or overhead.
  • It's (probably) faster than other systems like message passing.
  • Rust as a language is particularly suited for it (memory/thread safety and all that).

In Duat, plugins are included as regular dependencies in your Cargo.toml, which means they run alongside your configuration code in main.rs, which in turn means that they should be able to access the same state. Here's a concrete example of what you should be able to do with the API:

fn apply_edit_to_all_buffers(edit: Edit) {
    let buffers = context::buffers();
    for buffer in buffers {
        buffer.text_mut().edit(edit.clone());
    }
}

In this example, I'm applying an edit to all Buffers. Note that Buffer::text_mut should return a mutable reference (or guard of some kind), so I could inspect the content of the Text without making a copy.

Given that, there are a few requirements for this API to pass the smell test:

  • It should be easy to understand.
  • It should be consistent and predictable.
  • It should be hard to mess up.

With that out of the way, we may look at some examples and my general journey towards the current solution:

Straight up mutable references

For very obvious reasons to any moderately experienced Rust user, I never went with this approach, which would look roughly like this:

// Cheap cloning for shared access.
#[derive(Clone)]
struct Buffer {
    text: Arc<UnsafeCell<Text>>,
}

impl Buffer {
    fn text_mut(&mut Self) -> &mut Text {
        // !!!
        unsafe { &mut *self.value.get() }
    }
}

fn apply_edit_to_all_buffers(edit: Edit) {
    let buffers: Vec<Buffer> = context::buffers();
    for mut buffer in buffers {
        buffer.text_mut().edit(edit.clone());
    }
}

It should be pretty clear why this method is not really viable for Rust. For one, you can call context::buffers() from any thread, breaking thread safety. You can also acquire multiple mutable references to the same Text, breaking Rust's mutability XOR aliasing rule.

So even though this is the most readable syntax that will be explored in this article, it should be discarded on the basis that it breaks some basic rules of Rust.

Receiving/returning state

Another approach that might work is one where we get temporary ownership of state:

struct Buffer {
    text: Text,
}

impl Buffer {
    fn text_mut(&mut Self) -> &mut Text {
        &mut self.text
    }
}

fn apply_edit_to_all_buffers(edit: Edit) {
    let mut buffers: Vec<Buffer> = context::get_buffers();

    for buffer in &mut buffers {
        buffer.text_mut().edit(edit.clone());
    }

    context::set_buffers(buffers);
}

This one doesn't break the rules of Rust, and at a first glance, it might even seem valid, but it breaks some of our requirements:

  • It should be consistent and predictable.
  • It should be hard to mess up.

This one is pretty easy to mess up. Right away, you notice that if you don't call context::set_buffers, Duat will immediately not have any Buffers anymore.

Additionally, if someone calls context::get_buffers from another thread (or even the same thread), they will think there are no Buffers, since they were taken by this first call.

All of this makes this API quite difficult to use, and more importantly, it becomes difficult to share state with other plugins, since they may take state away before you're able to properly observe it.

Much like the first approach, I also never experimented with this one, since I knew that these exact problems were a possibility, and I would lose a lot of what makes Rust special for compile time guarantees.

Unlike the first approach, this one also prevents you from keeping a Buffer around in order to edit it later, since you are required to return all Buffers when you're done editing them.

Given this information, it would seem like multiple people trying to access the same information at the same time is the root of our problems...

The obvious approach: Mutex/RwLock

The next approach seems very promising. It makes use of mutual exclusion in order to guarantee that no two people try to mutate state at once:

struct Buffer {
    text: Text,
}

impl Buffer {
    fn text_mut(&mut Self) -> &mut Text {
        &mut self.text
    }
}

fn apply_edit_to_all_buffers(edit: Edit) {
    let buffers: Vec<Arc<RwLock<Buffer>>> = context::buffers();

    for buffer in buffers {
        let mut buf = buffer.write().unwrap();
        buf.text_mut().edit(edit.clone());
    }
}

This approach guarantees that the problems of the previous approach won't happen, since the context::buffers will never return an empty Vec.

But let's say I wanted to compare and modify the Buffers with the currently active Buffer:

fn cmp_mod_with_current() -> {
    let current_buffer: Arc<RwLock<Buffer>> = context::current_buffer();
    let mut cur_buf = current_buffer.write().unwrap();

    for other_buffer in context::buffers() {
        let other_buf = other_buffer.read().unwrap();
        comparison_and_modification_fn(cur_buf, other_buf);
    }
}

Aaand that's a "self-deadlock", which breaks one of our requirements:

  • It should be hard to mess up.

(Self) Deadlocks

In the Scale Of Sorrow (SOS) of programming bugs, deadlocks rank quite highly:

  1. Segmentation faults
  2. Deadlocks
  3. Off by one errors
  4. Logic bugs

This is because, if a deadlock happens for an end user, it is quite difficult for them to diagnose, and even harder for them to communicate what exactly happened that caused the deadlock.

In this case, a self deadlock (not an official term) is happening because, while iterating through the Buffers, we will attempt to acquire a read guard for a Buffer that is already mutably acquired. This will result in the program hanging forever.

Now, this specific situation can be resolved by just using Arc::ptr_eq to figure out if they point to the same data. But it's still an easy mistake to run into, and it doesn't solve the multithreaded scenario with other plugins, where deadlocks could still happen.

For quite a while, this is what the API of Duat was using. Not in this exact shape, there were layers of abstractions that covered up the Arc<RwLock>s, but at the end of the day, the system was still backed by them, and I ran into deadlocks quite frequently, which led me to the following conclusion:

The API should be single threaded

This is because a multi threaded, state sharing API will (inevitably as far as I know) always lead to the possibility of deadlocks, which will really sour the development experience, as instead of dealing with the logic of your plugin, you will try to be figuring out where a deadlock occurred, and in which order you're supposed to acquire each part of state.

This realization (as well as some eureka moments) lead me to the current state of affairs

The Pass model

And thus we arrive at what the API currently looks like, which is based primarily on two structs: RwData<T> and Pass.

It works like this: If you have a &mut Pass and a RwData<T>, you can acquire a &mut T. If you have a &Pass and RwData<T>, you can acquire &T. That is, the method definitions roughly look like this:

impl<T: ?Sized> RwData<T> {
    fn read<'p>(&'p self, pa: &'p Pass) -> &'p T;

    fn write<'p>(&'p self, pa: &'p mut Pass) -> &'p mut T;
}

So here's what the function we've been writing would look like:

struct Buffer {
    text: Text
}

impl Buffer {
    fn text_mut(&mut Self) -> &mut Text {
        &mut self.text
    }
}

fn apply_edit_to_all_buffers(pa: &mut Pass, edit: Edit) {
    let buffers: Vec<RwData<Buffer>> = context::buffers(pa);

    for buffer in buffers {
        let buf = buffer.write(pa);
        buf.text_mut().edit(edit.clone());
    }
}

Note here that the function now takes an additional argument: &mut Pass. In Duat, this argument indicates that "the function is allowed to modify shared state". Likewise, an argument of type &Pass indicates that "the function is allowed to access shared state".

This model solves a bunch of previous concerns:

No deadlocks

Since it doesn't rely on locking mechanisms like Mutex and RwLock, I no longer run into the problem of locking the same value twice. For example, the previous cmp_mod_with_current function would look like this:

fn cmp_mod_with_current(pa: &mut Pass) -> {
    let current_buffer: RwData<Buffer> = context::current_buffer(pa);
    let mut cur_buf = current_buffer.write().unwrap();

    for other_buffer in context::buffers() {
        let buffers = (&current_buffer, &other_buffer)
        if let Some((cur_buf, other_buf)) = pa.try_write_many(buffers) {
            comparison_and_modification_fn(cur_buf, other_buf);
        };
    }
}

This function will now only return Some if the two RwData<Buffer>s don't point to the same resource, preventing multiple mutable references from being acquired at once.

Single threaded

This model, by virtue of the &mut Pass argument (and the fact that Pass: !Send), prevents users from accessing shared state from other threads.

Now, while this may seem inconvenient, the vast majority of the time (in the context of text editors), multithreading is not the correct approach for things. Even if you do end up needing it, Duat has this functionality, albeit with asynchronous message passing instead:

fn update_from_other_thread() {
    let edit = get_edit();
    context::queue(move |pa: &mut Pass| {
        apply_edit_to_all_buffers(pa, edit);
    });
}

Borrowchecking benefits

Because of the definition of RwData::read and RwData::write, we get some of the benefits of & and &mut, this includes things like reborrowing:

fn reborrow_example(pa: &mut Pass, buffer: RwData<Buffer>) {
    let buf = buffer.write(pa);

    let buf_again = buffer.write(pa);
}

This is completely fine because of reborrowing and because rustc is able to figure out that I'm no longer making use of buf, so it should be fine to just borrow it again. If we tried this example with the RwLock approach, however...

fn relock_example(buffer: RwLock<Buffer>) {
    let buf = buffer.write().unwrap();

    let whoops = buffer.write().unwrap();
}

...we would run into a deadlock.

Speeeeeeeeeeed

Since this model involves no locking at all and makes use of a zero-sized struct (Pass) in order to borrow, the actual definition of RwData is quite close to this:

struct RwData<T: ?Sized> {
    data: Arc<UnsafeCell<T>>,
}

impl<T: ?Sized> RwData<T> {
    fn read<'p>(&'p self, _: &'p Pass) -> &'p T {
        // Safety: The reference to Pass ensures no mutable references exist.
        unsafe { &*self.data.get() }
    }

    fn write<'p>(&'p self, _: &'p mut Pass) -> &'p mut T {
        // Safety: The mutable reference to Pass ensures no references exist.
        unsafe { &mut *self.data.get() }
    }
}

impl<T: ?Sized> Clone for RwData {
    fn clone(&self) -> Self {
        Self { data: self.data.clone() }
    }
}

// Safety: Due to the Pass model, access is restricted to the main thread,
// so if a value is Send, the RwData<T> can be created on a separate thread
// and be sent to the main thread, where it won't be shared with others.
unsafe impl<T: ?Sized + Send> Send for RwData<T>
unsafe impl<T: ?Sized + Send> Sync for RwData<T>

As far as I know, in optimized code, both RwData::read and RwData::write should be noops. Compare that to the RwLock method, which relies on locking, and you can see that this allows for way less overhead.

How would you get this elusive Pass?

A question that might come to mind is "Okay, I understand the model and how it works, but fn main() doesn't have a Pass".

There are a few places where you will get access to a Pass, however, the most common way to do so is through hook::add, map and alias. These three functions essentially trigger things automatically when conditions are met:

setup_duat!(setup);
use duat::prelude::*;

fn setup(opts: &mut Opts) {
    hook::add::<BufferOpened>(|pa: &mut Pass, buffer: Handle<Buffer>| {
        let buf = buffer.write(pa);
        match buf.filetype() {
            Some("html" | "cpp" | "c" | "lua") => buf.opts.tabstop = 2,
            Some("markdown" | "latex" | "typst") => buf.opts.wrap_lines = true,
            _ => {}
        }
    });

    // Regularly aliasing keys to other keys.
    alias::<Insert>("jk", "<Esc>");

    // Set clipboard to concatenation of all selections.
    map::<Normal>("<c-y>", |pa: &mut Pass| {
        let buffer = context::current_buffer(pa);
        let mut content = String::new();

        // Edit all cursors
        buffer.edit_all(pa, |c| content.push_str(&c.selection().to_string()));

        duat::clipboard::set(content);
    });
}

You may notice here that you're not receiving an RwData<Buffer> directly. That's because there are further abstractions on top of that, specific to widgets.

Final thoughts

Hopefully you can see why I choose the method that I did for shared state. It provides a good balance of the three requirements outlined in the beginning of this article:

  • It should be easy to understand.

The requirements of &Pass => access to shared state and &mut Pass => exclusive access to shared state are universal and (IMO) pretty straightforward to understand.

  • It should be consistent and predictable.

Unlike the other models, the behaviour of this one doesn't depend on the order in which the objects were obtained, nor does it depend on others not currently making use of the objects.

  • It should be hard to mess up.

It is actually "impossible" to mess up with this model, since every invalid access is caught at compile time, and functions like Pass::write_many will panic at runtime instead of resulting in a hard to diagnose deadlock. (Don't worry the panics are caught).

Finally, was this article readable? Was it coherent? This is the first article I've written about programming that's not just documentation, but more of a walkthrough of my journey through the various shapes of the API. So feedback in that regard would be appreciated.

Table of Contents