Learning WebAssembly + Rust Part 2

This writeup is a little chaotic, but hopefully it will help others who are working on Rust, WASM, and JavaScript interop.

Let's start with a coding challenge. Let's build the foundation of a text-based adventure game engine.

  1. You have some shared state, for example, the player's current location
  2. You have a list of rooms each with a unique id and a description
  3. You have a way to take player input, mutate the state, and map the input and state to some text

We'll look at how I implemented steps 1-3 in Rust and compiled to WASM. But before I go into that a few quick notes since my last post.

First, I've learned that it's not required to use no_std when compiling to WASM. Parts of the standard library can be compiled to WASM, and from what I understand, dlmalloc is already the default allocator if you use the standard library.

Second, I've had good success using dlmalloc::GlobalDlmalloc as the global allocator when I am using no_std. See my previous post for the code and remember to enable the "global" feature on the dlmalloc crate.

Third, in my last post about WebAssembly and Rust I talked about using static data. As I've continued to experiment, I've determined that if you use statics on their own, you will fight against the compiler - it does not like mutable static variables if you don't give it more information. In particular, it needs to know that the static data is only accessible from a single thread. To do that, you can wrap your statics in the thread_local! macro which automates some of the work of thread::LocalKey and then rely on Cell or RefCell for interior mutability.

struct State {
  location: i32
}

struct State {
  id: i32,
  description: &'static str
}

thread_local! {
  static GLOBAL_STATE: RefCell<State> = RefCell::new(State { location: 0 });
  static LOCATIONS: RefCell<Vec<Locations>> = RefCell::new(vec![
    Location {
        id: 0,
        description: "loc1"
    },
    Location {
        id: 0,
        description: "loc2"
    }
  ]);
}

Let's see how we work with LocalKey by building a function for the player to "look" and see where they're at.

fn look() -> &'static str {
    LOCATIONS.with(|l| {
        GLOBAL_STATE.with(|s| {
            let locations = l.borrow();
            let state = s.borrow();
            for loc in locations.iter() {
                // Is the player at the location?
                if state.location == loc.id {
                    // Show the room description
                    return loc.description;
                }
            }
            "You don't see anything"
        })
    })
}

The with function of ThreadLocal will return a reference to the contained value in the thread local storage.

If the LocalKey contains RefCell then you can call with_borrow_mut to mutate the value inside.

[unsafe(no_mangle)]
fn go(location: u32) {    
    GLOBAL_STATE.with_borrow_mut(|state| {
        state.location = location;
    })
}

Now let's look at how we can get player input, mutate the state, and return some text. Up till now, the only function we've exposed in the WASM module is the "go" function and it does mutate the state, but it's not able to return a string since strings aren't a native WASM type. The "look" function does return a static string, but we're not exposing that in the WASM module. In order for the caller (the browser) to get the text, we need to add two new functions - one to get the pointer to the static string, and a second to get the length of the string...

#[unsafe(no_mangle)]
pub extern "C" fn look_ptr() -> *const u8 {
    look().as_ptr()    
}

#[unsafe(no_mangle)]
pub extern "C" fn look_len() -> usize {
    look().len()
}

Then using the technique I discussed in the previous post, on the JavaScript side, we can get the memory location of the string with look_ptr() and get the length of the string with look_len() and then we can decode that area of the WASM memory buffer to get the string on the JavaScript side.

// Get memory location of look text
const offset = instance.exports.look_ptr();
const stringBuffer = new Uint8Array(
    instance.exports.memory,
    offset,
    instance.exports.look_len()
);

// Probably better to use TextDecoder here.
// See code later on this post
let str = '';
for (let i = 0; i < stringBuffer.length; i++) {
    str += String.fromCharCode(stringBuffer[i]);
}
// Show the description of the
// player's current location
console.log(str);

That solves the problem of passing text from WASM to a JavaScript String, but how would we get a JavaScript string into WASM?

If you take a look at this thread in the WebAssembly repository, you'll see a technique where JavaScript can copy bytes into the WebAssembly memory.


As I'm writing this, I've determined that the previous code, although correct, is not the best structure for re-usability. So let's take a step back and talk about our data layout and a naming convention for WASM functions.

  1. We'll have static memory called COMMAND with type [u8; 100]. This is where JavaScript can store the player's command, e.g. "Look Around", "Go West", etc... This is a simple byte array where JS can write the bytes of a UTF-8 encoded string. Since this will be mutated in JavaScript, we don't need to wrap it in a RefCell.

  2. We'll have static memory called COMMAND_LENGTH with type RefCell<usize>. This is where JavaScript can store the byte length of the player's command. Since we're using a static, fixed-length array for the COMMAND, we need a place to record the actual length of the command that was entered, e.g. 11 for "Look Around" since that command has 11 characters. We will provide a function for JavaScript to call to set this value, so we will wrap the usize in a RefCell so Rust can mutate it.

  3. We'll have static memory called OUTPUT with type RefCell<Vec<u8>>. This is where WASM can put the output of the command and JavaScript can then read the bytes from, e.g. After executing "Look Around", the output will be the description of the room. For unknown commands it may respond with "Huh?". This area of memory can be read by JavaScript but it won't write to it.

  4. With this model, JavaScript needs to know where data is in memory to read or write - For that we will use functions named (memory_location)_get_ptr(). If reading data, JS needs to know the length of that data to read at the memory location - For that we will use functions named (memory_location)_get_len(). If writing data, in some situations, JS also needs functions to communicate the length of data written - For that we will use functions named (memory_location)_set_len().

  5. For our text adventure game, we need a Rust function that can convert the COMMAND from utf8 bytes into a Rust String, pattern match on the command, and then write the results to the OUTPUT, e.g. find out which room the player is in and then show the room description.

  6. We need some general-purpose JavaScript functions to write text into WASM and read text from WASM.

Here's what the Rust code looks like. In the future I may investigate creating some macros to automate the creation of the get_ptr, get_len, and set_len functions.

use dlmalloc::GlobalDlmalloc;
use std::cell::RefCell;

#[global_allocator]
static GLOBAL: GlobalDlmalloc = GlobalDlmalloc;

thread_local! {
  static COMMAND : [u8; 100] = const { [0; 100] };
  static COMMAND_LENGTH : RefCell<usize> = const { RefCell::new(0) };
  static OUTPUT : RefCell<Vec<u8>> = const { RefCell::new(vec![]) };
}

#[unsafe(no_mangle)]
pub extern "C" fn command_get_ptr() -> *const u8 {
    COMMAND.with(|c| c.as_ptr())
}

#[unsafe(no_mangle)]
pub extern "C" fn command_set_len(length: usize) {
    COMMAND_LENGTH.with_borrow_mut(|c| {
        *c = length;
    })
}

#[unsafe(no_mangle)]
pub extern "C" fn command_get_len() -> usize {
    COMMAND_LENGTH.with(|c| *c.borrow())
}

#[unsafe(no_mangle)]
pub extern "C" fn output_get_ptr() -> *const u8 {
    OUTPUT.with(|c| c.borrow().as_ptr())
}

#[unsafe(no_mangle)]
pub extern "C" fn output_get_len() -> usize {
    OUTPUT.with(|c| c.borrow().len())
}

#[unsafe(no_mangle)]
pub extern "C" fn handle_command() {
    COMMAND.with(|c| {
        let command_text = String::from_utf8(c[0..command_get_len()].to_vec());

        OUTPUT.with_borrow_mut(|o| match command_text {
            Ok(text) => {
                let output = run_command(text);
                o.clear();
                o.extend_from_slice(output.as_bytes());
            }
            Err(err) => {
                o.clear();
                o.extend_from_slice(err.as_bytes());
            }
        });
    });
}

fn run_command(command: String) -> String {
    match command.to_lowercase().as_str() {
        "look" => String::from("You look around"),
        "hello" => String::from("Greetings, friend!"),
        _ => String::from("Eh?"),
    }
}

And here's the JavaScript which includes a runCommand function that can accept input from the player and log the output depending on the command. For this example saying passing "Look" will return "You look around", passing in "Hello" will return "Greetings, friend!", and everything else will return "Eh?".

// lib is the WASM module loaded globally
function runCommand(commandText) {
    writeTextToWasm(lib.instance, "command", commandText);          
    exports.handle_command();
    const output = readTextFromWasm(lib.instance, "output");
    console.log(output);
}

function writeTextToWasm(wasmInstance, locationName, text) {
    const enc = new TextEncoder();      
    const jsBytes = enc.encode(text);
    const length = jsBytes.length;      
        
    const myWasmArrayPtr = wasmInstance.exports[locationName + "_get_ptr"]();
    const myWasmArray = new Uint8Array(wasmInstance.exports.memory.buffer, myWasmArrayPtr, length);

    // Tell WASM how long the data is
    wasmInstance.exports[locationName + "_set_len"](length);
    myWasmArray.set(jsBytes);
}

function readTextFromWasm(wasmInstance, locationName) {
    const myWasmArrayPtr = wasmInstance.exports[locationName + "_get_ptr"]();
    const length = wasmInstance.exports[locationName + "_get_len"]();      
    const myWasmArray = new Uint8Array(wasmInstance.exports.memory.buffer, myWasmArrayPtr, length);
    const dec = new TextDecoder('utf-8');
    return dec.decode(myWasmArray);
}

One final note - in the rust code I have the run_command which matches the player input to the output. With this sort of model, I would call that an "internal" function, whereas the other functions to get memory location and pointer are "external". As I continue to explore this, I'm hoping that there can be a clear separation between the internal functions and the external functions, so that I can build an application however I want (within reason) and then be able to build the external functions if I want to have the logic in a WASM module. To an extend, using statics breaks that model since it requires that the internal code also use statics. In the future I'll explore if there's ways to avoid that.

We'll leave it a that. From what I understand this sort of coordination and copying between WASM and JavaScript is what wasm_bindgen and js_sys do.