Using SIMD processor extensions in OxCaml
The OxCaml compiler provides built-in 128 and 256-bit SIMD vector types, as well as intrinsics for amd64 SIMD instructions up to and including AVX2.
To get started with SIMD, add
ocaml_simd
and ocaml_simd_avx or ocaml_simd_sse to your dependencies.
You will also want to use ppx_simd, which provides convenient syntax for
defining constants like blend and shuffle masks.
Types
When SIMD is enabled, the following SIMD vector types are available:
int8x16 int8x32
int8x16# int8x32#
int16x8 int16x16
int16x8# int16x16#
int32x4 int32x8
int32x4# int32x8#
int64x2 int64x4
int64x2# int64x4#
float32x4 float32x8
float32x4# float32x8#
float64x2 float64x4
float64x2# float64x4#
The types ending with # are unboxed: they are passed between functions in
XMM/YMM registers, stored in structures as flat data, and may be stored in flat
arrays. The corresponding intrinsics operate on unboxed vectors. For more detail
on unboxed types, see the docs.
The types without # are boxed: when passed to a non-inlined function, they
will be copied to a heap allocated (abstract) block. Boxed vectors are not
necessarily aligned, so will generate unaligned load/store instructions.
Within a function, all SIMD vectors live in floating-point registers or 16-byte aligned stack slots.
Intrinsics
SIMD vectors are opaque: no operations on them are built into the
language. Instead, the compiler translates certain “builtin” externals directly
to SIMD instructions. Your code should use ocaml_simd_avx or ocaml_simd_sse,
which expose OxCaml APIs for these intrinsics.
module Float32x4 = Ocaml_simd_sse.Float32x4
let v = Float32x4.set 1.0 2.0 3.0 4.0
let v = Float32x4.sqrt v
let x, y, z, w = Float32x4.splat v
SIMD vectors may be loaded from / stored to strings, bytes, bigstrings, and arrays of the corresponding unboxed type. Load and store operations are also provided by the intrinsics libraries rather than Base or Core.
module Int8x16 = Ocaml_simd_sse.Int8x16
let text = "abcdefghijklmnopqrstuvwxyz"
let floats = [| 1.0; 2.0 |]
let ints = [| 1; 2 |]
let _ = Int8x16.String.get text ~byte:0
let _ = Float64x2.Float_array.get floats ~idx:0 (* Float array optimization required *)
let _ = Int64x2.Immediate_array.get_tagged ints ~idx:0
Some operations require the user to choose a specific behavior at compile
time. To do so, you must provide a compile time constant generated by
ppx_simd. Refer to ppx_simd for more details.
module Int32x4 = Ocaml_simd_sse.Int32x4
let x = Int32x4.set 0 2 4 6
let y = Int32x4.set 1 3 5 7
let z = Int32x4.blend [%blend 0, 1, 0, 1] x y
C ABI
Like floats, both boxed and unboxed SIMD vectors may be passed to C stubs. The OxCaml runtime provides several helper functions for working with SIMD vectors.
external vec128_stub : (int8x16[@unboxed]) -> (int8x16[@unboxed]) =
"boxed_vec128_stub" "unboxed_vec128_stub"
external vec256_stub : (int8x32[@unboxed]) -> (int8x32[@unboxed]) =
"boxed_vec256_stub" "unboxed_vec256_stub"
#include <caml/simd.h>
__m128i unboxed_vec128_stub(__m128i v) {
return v;
}
__m256i unboxed_vec256_stub(__m256i v) {
return v;
}
value boxed_vec128_stub(value v) {
return caml_copy_vec128i(unboxed_vec128_stub(Vec128_vali(v)));
}
value boxed_vec256_stub(value v) {
return caml_copy_vec256i(unboxed_vec256_stub(Vec256_vali(v)));
}
Future Work
Support for NEON and AVX512 is coming soon.