Hackers News

Wasm GC isn’t ready for realtime graphics — dthompson

Wasm GC is a wonderful thing that is now available in all major web
browsers since slowpoke Safari/WebKit finally shipped it in December.
It provides a hierarchy of heap allocated reference types and a set of
instructions to operate on them. Wasm GC enables managed memory
languages to take advantage of the advanced garbage collectors inside
web browser engines. It’s now possible to implement a managed memory
language without having to ship a GC inside the binary. The benefits
are smaller binaries, better performance, and better integration with
the host runtime.

However, Wasm GC has some serious drawbacks when compared to linear
memory. I enjoy playing around with realtime graphics programming in
my free time, but I was disappointed to discover that Wasm GC just
isn’t a good fit for that right now. I decided to write this post
because I’d like to see Wasm GC on more or less equal footing with
linear memory when it comes to binary data manipulation.

Hello triangle

For starters, let’s take a look at what a “hello
triangle”

WebGL demo looks like with Wasm GC. I’ll use
Hoot, the Scheme to Wasm compiler
that I work on, to build it.

Below is a Scheme program that declares imports for the subset of the
WebGL, HTML5 Canvas, etc. APIs that are necessary and then renders a
single triangle:

(use-modules (hoot ffi))

(define-foreign get-element-by-id
  "document" "getElementById"
  (ref string) -> (ref null extern))

(define-foreign element-width
  "element" "width"
  (ref extern) -> i32)
(define-foreign element-height
  "element" "height"
  (ref extern) -> i32)

(define-foreign get-canvas-context
  "canvas" "getContext"
  (ref extern) (ref string) -> (ref null extern))

(define GL_VERTEX_SHADER 35633)
(define GL_FRAGMENT_SHADER 35632)
(define GL_COMPILE_STATUS 35713)
(define GL_LINK_STATUS 35714)
(define GL_ARRAY_BUFFER 34962)
(define GL_STATIC_DRAW 35044)
(define GL_COLOR_BUFFER_BIT 16384)
(define GL_TRIANGLES 4)
(define GL_FLOAT 5126)
(define-foreign gl-create-shader
  "gl" "createShader"
  (ref extern) i32 -> (ref extern))
(define-foreign gl-delete-shader
  "gl" "deleteShader"
  (ref extern) (ref extern) -> none)
(define-foreign gl-shader-source
  "gl" "shaderSource"
  (ref extern) (ref extern) (ref string) -> none)
(define-foreign gl-compile-shader
  "gl" "compileShader"
  (ref extern) (ref extern) -> none)
(define-foreign gl-get-shader-parameter
  "gl" "getShaderParameter"
  (ref extern) (ref extern) i32 -> i32)
(define-foreign gl-get-shader-info-log
  "gl" "getShaderInfoLog"
  (ref extern) (ref extern) -> (ref string))
(define-foreign gl-create-program
  "gl" "createProgram"
  (ref extern) -> (ref extern))
(define-foreign gl-delete-program
  "gl" "deleteProgram"
  (ref extern) (ref extern) -> none)
(define-foreign gl-attach-shader
  "gl" "attachShader"
  (ref extern) (ref extern) (ref extern) -> none)
(define-foreign gl-link-program
  "gl" "linkProgram"
  (ref extern) (ref extern) -> none)
(define-foreign gl-use-program
  "gl" "useProgram"
  (ref extern) (ref extern) -> none)
(define-foreign gl-get-program-parameter
  "gl" "getProgramParameter"
  (ref extern) (ref extern) i32 -> i32)
(define-foreign gl-get-program-info-log
  "gl" "getProgramInfoLog"
  (ref extern) (ref extern) -> (ref string))
(define-foreign gl-create-buffer
  "gl" "createBuffer"
  (ref extern) -> (ref extern))
(define-foreign gl-delete-buffer
  "gl" "deleteBuffer"
  (ref extern) (ref extern) -> (ref extern))
(define-foreign gl-bind-buffer
  "gl" "bindBuffer"
  (ref extern) i32 (ref extern) -> none)
(define-foreign gl-buffer-data
  "gl" "bufferData"
  (ref extern) i32 (ref eq) i32 -> none)
(define-foreign gl-enable-vertex-attrib-array
  "gl" "enableVertexAttribArray"
  (ref extern) i32 -> none)
(define-foreign gl-vertex-attrib-pointer
  "gl" "vertexAttribPointer"
  (ref extern) i32 i32 i32 i32 i32 i32 -> none)
(define-foreign gl-draw-arrays
  "gl" "drawArrays"
  (ref extern) i32 i32 i32 -> none)
(define-foreign gl-viewport
  "gl" "viewport"
  (ref extern) i32 i32 i32 i32 -> none)
(define-foreign gl-clear-color
  "gl" "clearColor"
  (ref extern) f64 f64 f64 f64 -> none)
(define-foreign gl-clear
  "gl" "clear"
  (ref extern) i32 -> none)

(define (compile-shader gl type source)
  (let ((shader (gl-create-shader gl type)))
    (gl-shader-source gl shader source)
    (gl-compile-shader gl shader)
    (unless (= (gl-get-shader-parameter gl shader GL_COMPILE_STATUS) 1)
      (let ((info (gl-get-shader-info-log gl shader)))
        (gl-delete-shader gl shader)
        (error "shader compilation failed" info)))
    shader))

(define (link-shader gl vertex-shader fragment-shader)
  (let ((program (gl-create-program gl)))
    (gl-attach-shader gl program vertex-shader)
    (gl-attach-shader gl program fragment-shader)
    (gl-link-program gl program)
    (unless (= (gl-get-program-parameter gl program GL_LINK_STATUS) 1)
      (let ((info (gl-get-program-info-log gl program)))
        (gl-delete-program gl program)
        (error "program linking failed" info)))
    program))

(define canvas (get-element-by-id "canvas"))
(define gl (get-canvas-context canvas "webgl"))
(when (external-null? gl)
  (error "unable to create WebGL context"))

(define vertex-shader-source
  "attribute vec2 position;
attribute vec3 color;
varying vec3 fragColor;

void main() {
  gl_Position = vec4(position, 0.0, 1.0);
  fragColor = color;
}")
(define fragment-shader-source
  "precision mediump float;

varying vec3 fragColor;

void main() {
  gl_FragColor = vec4(fragColor, 1);
}")
(define vertex-shader
  (compile-shader gl GL_VERTEX_SHADER vertex-shader-source))
(define fragment-shader
  (compile-shader gl GL_FRAGMENT_SHADER fragment-shader-source))
(define shader (link-shader gl vertex-shader fragment-shader))

(define stride (* 4 5))
(define buffer (gl-create-buffer gl))
(gl-bind-buffer gl GL_ARRAY_BUFFER buffer)
(gl-buffer-data gl GL_ARRAY_BUFFER
                #f32(-1.0 -1.0
                      1.0  0.0  0.0
                      1.0 -1.0
                      0.0  1.0  0.0
                      0.0  1.0
                      0.0  0.0  1.0)
                GL_STATIC_DRAW)

(gl-viewport gl 0 0 (element-width canvas) (element-height canvas))
(gl-clear gl GL_COLOR_BUFFER_BIT)
(gl-use-program gl shader)
(gl-enable-vertex-attrib-array gl 0)
(gl-vertex-attrib-pointer gl 0 2 GL_FLOAT 0 stride 0)
(gl-enable-vertex-attrib-array gl 1)
(gl-vertex-attrib-pointer gl 1 3 GL_FLOAT 0 stride 8)
(gl-draw-arrays gl GL_TRIANGLES 0 3)

Note that in Scheme, the equivalent of a Uint8Array is a
bytevector. Hoot uses a packed array, an (array i8) specifically,
for the contents of a bytevector.

And here is the JavaScript code necessary to boot the resulting Wasm
binary:

window.addEventListener("load", async () => {
  function bytevectorToUint8Array(bv) {
    let len = reflect.bytevector_length(bv);
    let array = new Uint8Array(len);
    for (let i = 0; i < len; i++) {
      array[i] = reflect.bytevector_ref(bv, i);
    }
    return array;
  }

  let mod = await SchemeModule.fetch_and_instantiate("triangle.wasm", {
    reflect_wasm_dir: 'reflect-wasm',
    user_imports: {
      document: {
        getElementById: (id) => document.getElementById(id)
      },
      element: {
        width: (elem) => elem.width,
        height: (elem) => elem.height
      },
      canvas: {
        getContext: (elem, type) => elem.getContext(type)
      },
      gl: {
        createShader: (gl, type) => gl.createShader(type),
        deleteShader: (gl, shader) => gl.deleteShader(shader),
        shaderSource: (gl, shader, source) => gl.shaderSource(shader, source),
        compileShader: (gl, shader) => gl.compileShader(shader),
        getShaderParameter: (gl, shader, param) => gl.getShaderParameter(shader, param),
        getShaderInfoLog: (gl, shader) => gl.getShaderInfoLog(shader),
        createProgram: (gl, type) => gl.createProgram(type),
        deleteProgram: (gl, program) => gl.deleteProgram(program),
        attachShader: (gl, program, shader) => gl.attachShader(program, shader),
        linkProgram: (gl, program) => gl.linkProgram(program),
        useProgram: (gl, program) => gl.useProgram(program),
        getProgramParameter: (gl, program, param) => gl.getProgramParameter(program, param),
        getProgramInfoLog: (gl, program) => gl.getProgramInfoLog(program),
        createBuffer: (gl) => gl.createBuffer(),
        deleteBuffer: (gl, buffer) => gl.deleteBuffer(buffer),
        bindBuffer: (gl, target, buffer) => gl.bindBuffer(target, buffer),
        bufferData: (gl, buffer, data, usage) => {
          let bv = new Bytevector(reflect, data);
          gl.bufferData(buffer, bytevectorToUint8Array(bv), usage);
        },
        enableVertexAttribArray: (gl, index) => gl.enableVertexAttribArray(index),
        vertexAttribPointer: (gl, index, size, type, normalized, stride, offset) => {
          gl.vertexAttribPointer(index, size, type, normalized, stride, offset);
        },
        drawArrays: (gl, mode, first, count) => gl.drawArrays(mode, first, count),
        viewport: (gl, x, y, w, h) => gl.viewport(x, y, w, h),
        clearColor: (gl, r, g, b, a) => gl.clearColor(r, g, b, a),
        clear: (gl, mask) => gl.clear(mask)
      }
    }
  });
  let reflect = await mod.reflect({ reflect_wasm_dir: 'reflect-wasm' });
  let proc = new Procedure(reflect, mod.get_export("$load").value);
  proc.call();
});

Hello problems

There are two major performance issues with this program. One is
visible in the source above, the other is hidden in the language
implementation.

Heap objects are opaque on the other side

Wasm GC heap objects are opaque to the host. Likewise, heap objects
from the host are opaque to the Wasm guest. Thus the contents of an
(array i8) object are not visible from JavaScript and the contents
of a Uint8Array are not visible from Wasm. This is a good security
property in the general case, but it’s a hinderance in this specific
case.

Let’s say we have an (array i8) full of vertex data we want to put
into a WebGL buffer. To do this, we must make one JS->Wasm call for
each byte
in the array and store it into a Uint8Array. This is
what the bytevectorToUint8Array function above is doing. Copying
any significant amount of data per frame is going to tank performance.
Hope you aren’t trying to stream vertex data!

Contrast the previous paragraph with Wasm linear memory. A
WebAssembly.Memory object can be easily accessed from
JavaScript

as an ArrayBuffer. To get a blob of vertex data out of a memory
object, you just need to know the byte offset and length and you’re
good to go. There are many Wasm linear memory applications using
WebGL successfully.

Manipulating multi-byte binary data is inefficient

To read a multi-byte number such as an unsigned 32-bit integer from an
(array i8), you have to fetch each individual byte and combine them
together. Here’s a self-contained example that uses Guile-flavored
WAT format:

(module
 (type $bytevector (array i8))
 (data $init #u32(123456789))
 (func (export "main") (result i32)
       (local $a (ref $bytevector))
       (local.set $a (array.new_data $bytevector $init
                                     (i32.const 0)
                                     (i32.const 4)))
       (array.get_u $bytevector (local.get $a) (i32.const 0))
       (i32.shl (array.get_u $bytevector (local.get $a) (i32.const 1))
                (i32.const 8))
       (i32.or)
       (i32.shl (array.get_u $bytevector (local.get $a) (i32.const 2))
                (i32.const 16))
       (i32.or)
       (i32.shl (array.get_u $bytevector (local.get $a) (i32.const 3))
                (i32.const 24))
       (i32.or)))

By contrast, Wasm linear memory needs but a single i32.load
instruction:

(module
 (memory 1)
 (func (export "main") (result i32)
       (i32.store (i32.const 0) (i32.const 123456789))
       (i32.load (i32.const 0))))

Easy peasy. Not only is it less code, it’s a lot more efficient.

The triangle example above uses static vertex data stuffed into
bytevector literals, so it doesn’t hit this problem, but real programs
that have dynamic buffer data will be slower than their linear memory
equivalents.

Unsatisfying workarounds

There’s no way around the multi-byte problem at the moment, but for
byte access from JavaScript there are some things we could try to work
with what we have been given. Spoiler alert: None of them are
pleasant.

Use Uint8Array from the host

This approach makes all binary operations from within the Wasm binary
slow since we’d have to cross the Wasm->JS bridge for each read/write.
Since most of the binary data manipulation is happening in the Wasm
module, this approach will just make things slower overall.

Use linear memory for bytevectors

This would require a little malloc/free implementation and a way
to reclaim memory for GC’d bytevectors. You could register every
bytevector in a FinalizationRegistry in order to be notified upon GC
and free the memory. Now you have to deal with memory
fragmentation. This is Wasm GC, we shouldn’t have to do any of this!

Use linear memory as a scratch space

This avoids crossing the Wasm/JS boundary for each byte, but still
involves a byte-by-byte copy from (array i8) to linear memory within
the Wasm module. So far this feels like the least worst option, but
the extra copy is still going to greatly reduce throughput.

Wasm GC needs some fixin’

I’ve used realtime graphics as an example because it’s a use case that
is very sensitive to performance issues, but this unfortunate need to
copy binary data byte-by-byte is also the reason why strings are
trash

on Wasm GC right now.
Stringref is a good
proposal and the Wasm community group made a mistake by rejecting it.

Anyway, there has been some discussion about both
multi-byte and
ArrayBuffer access
on GitHub, but as far as I can tell neither issue is anywhere close to
a resolution.

Can these things be implemented efficiently? How can the need for
direct access to packed arrays from JS be reconciled with Wasm heap
object opaqueness? I hope the Wasm community group can arrive at
solutions sooner than later because it will take a long time to get
the proposal(s) to phase 4 and shipped in all browsers, perhaps years.
I am getting by making simple things with HTML5 Canvas but it would be
a shame to be effectively shut out from using WebGPU when it finally
reaches stable browser releases.

admin

The realistic wildlife fine art paintings and prints of Jacquie Vaux begin with a deep appreciation of wildlife and the environment. Jacquie Vaux grew up in the Pacific Northwest, soon developed an appreciation for nature by observing the native wildlife of the area. Encouraged by her grandmother, she began painting the creatures she loves and has continued for the past four decades. Now a resident of Ft. Collins, CO she is an avid hiker, but always carries her camera, and is ready to capture a nature or wildlife image, to use as a reference for her fine art paintings.

Related Articles

Leave a Reply