OxCaml logo Jane Street logo

Using SIMD processor extensions in OxCaml

The OxCaml compiler provides built-in 128 and 256-bit SIMD vector types, as well as intrinsics for amd64 SIMD instructions up to and including AVX2.

To get started with SIMD, add ocaml_simd and ocaml_simd_avx or ocaml_simd_sse to your dependencies. You will also want to use ppx_simd, which provides convenient syntax for defining constants like blend and shuffle masks.

Types

When SIMD is enabled, the following SIMD vector types are available:

int8x16         int8x32
int8x16#        int8x32#
int16x8         int16x16
int16x8#        int16x16#
int32x4         int32x8
int32x4#        int32x8#
int64x2         int64x4
int64x2#        int64x4#
float32x4       float32x8
float32x4#      float32x8#
float64x2       float64x4
float64x2#      float64x4#

The types ending with # are unboxed: they are passed between functions in XMM/YMM registers, stored in structures as flat data, and may be stored in flat arrays. The corresponding intrinsics operate on unboxed vectors. For more detail on unboxed types, see the docs.

The types without # are boxed: when passed to a non-inlined function, they will be copied to a heap allocated (abstract) block. Boxed vectors are not necessarily aligned, so will generate unaligned load/store instructions.

Within a function, all SIMD vectors live in floating-point registers or 16-byte aligned stack slots.

Intrinsics

SIMD vectors are opaque: no operations on them are built into the language. Instead, the compiler translates certain “builtin” externals directly to SIMD instructions. Your code should use ocaml_simd_avx or ocaml_simd_sse, which expose OxCaml APIs for these intrinsics.

module Float32x4 = Ocaml_simd_sse.Float32x4

let v = Float32x4.set 1.0 2.0 3.0 4.0
let v = Float32x4.sqrt v
let x, y, z, w = Float32x4.splat v

SIMD vectors may be loaded from / stored to strings, bytes, bigstrings, and arrays of the corresponding unboxed type. Load and store operations are also provided by the intrinsics libraries rather than Base or Core.

module Int8x16 = Ocaml_simd_sse.Int8x16

let text = "abcdefghijklmnopqrstuvwxyz"
let floats = [| 1.0; 2.0 |]
let ints = [| 1; 2 |]

let _ = Int8x16.String.get text ~byte:0
let _ = Float64x2.Float_array.get floats ~idx:0 (* Float array optimization required *)
let _ = Int64x2.Immediate_array.get_tagged ints ~idx:0

Some operations require the user to choose a specific behavior at compile time. To do so, you must provide a compile time constant generated by ppx_simd. Refer to ppx_simd for more details.

module Int32x4 = Ocaml_simd_sse.Int32x4

let x = Int32x4.set 0 2 4 6
let y = Int32x4.set 1 3 5 7
let z = Int32x4.blend [%blend 0, 1, 0, 1] x y

C ABI

Like floats, both boxed and unboxed SIMD vectors may be passed to C stubs. The OxCaml runtime provides several helper functions for working with SIMD vectors.

external vec128_stub : (int8x16[@unboxed]) -> (int8x16[@unboxed]) =
  "boxed_vec128_stub" "unboxed_vec128_stub"

external vec256_stub : (int8x32[@unboxed]) -> (int8x32[@unboxed]) =
  "boxed_vec256_stub" "unboxed_vec256_stub"
#include <caml/simd.h>

__m128i unboxed_vec128_stub(__m128i v) {
  return v;
}
__m256i unboxed_vec256_stub(__m256i v) {
  return v;
}

value boxed_vec128_stub(value v) {
  return caml_copy_vec128i(unboxed_vec128_stub(Vec128_vali(v)));
}
value boxed_vec256_stub(value v) {
  return caml_copy_vec256i(unboxed_vec256_stub(Vec256_vali(v)));
}

Future Work

Support for NEON and AVX512 is coming soon.