Using SIMD processor extensions in OxCaml
The OxCaml compiler provides built-in 128-bit SIMD vector types, as well as intrinsics for amd64 SIMD instructions up to and including SSE4.2.
To get started with SIMD, add the ocaml_simd_sse
library to your dependencies.
You may also want to use ppx_simd
, which provides convenient syntax for
defining constants like blend and shuffle masks.
Types
When SIMD is enabled, the following 128-bit SIMD vector types are available:
int8x16
int8x16#
int16x8
int16x8#
int32x4
int32x4#
int64x2
int64x2#
float32x4
float32x4#
float64x2
float64x2#
The types ending with #
are unboxed: they are passed between functions in XMM
registers, stored in structures as flat data, and may be stored in flat arrays.
The operations provided by Ocaml_simd_sse
operate on unboxed vectors. For
more detail on unboxed types, see the docs.
The types without #
are boxed: when passed to a non-inlined function, they
will be copied to a heap allocated (abstract) block. Boxed vectors are not
necessarily 16-byte aligned, so will generate unaligned load/store instructions.
Within a function, all SIMD vectors live in floating-point registers or 16-byte aligned stack slots.
Intrinsics
SIMD vectors are opaque: no operations on them are built into the
language. Instead, the compiler translates certain “builtin” externals directly
to SIMD instructions. Your code should use the ocaml_simd_sse
library, which
exposes an OxCaml API for these intrinsics.
module Float32x4 = Ocaml_simd_sse.Float32x4
let v = Float32x4.set 1.0 2.0 3.0 4.0
let v = Float32x4.sqrt v
let x, y, z, w = Float32x4.splat v
SIMD vectors may be loaded from / stored to strings, bytes, bigstrings, and
arrays of the corresponding unboxed type. Load and store operations are also
provided by ocaml_simd_sse
, rather than Base or Core.
module Int8x16 = Ocaml_simd_sse.Int8x16
let text = "abcdefghijklmnopqrstuvwxyz"
let floats = [| 1.0; 2.0 |]
let ints = [| 1; 2 |]
let _ = Int8x16.String.get text ~byte:0
let _ = Float64x2.Float_array.get floats ~idx:0 (* Float array optimization required *)
let _ = Int64x2.Immediate_array.get_tagged ints ~idx:0
Some operations require the user to choose a specific behavior at compile
time. To do so, you must provide a compile time constant generated by
ppx_simd
. Refer to ppx_simd
for more details.
module Int32x4 = Ocaml_simd_sse.Int32x4
let x = Int32x4.set 0 2 4 6
let y = Int32x4.set 1 3 5 7
let z = Int32x4.blend [%blend 0, 1, 0, 1] x y
C ABI
Like floats, both boxed and unboxed SIMD vectors may be passed to C stubs. The OxCaml runtime provides several helper functions for working with SIMD vectors.
external simd_stub : (int8x16[@unboxed]) -> (int8x16[@unboxed]) =
"unboxed_integer_simd_stub" "boxed_integer_simd_stub"
(* ... *)
#include <caml/simd.h>
__m128i unboxed_integer_simd_stub(__m128i v) {
return v;
}
value boxed_integer_simd_stub(value v) {
return caml_copy_vec128i(unboxed_integer_simd_stub(Vec128_vali(v)));
}
Future Work
Support for wider vectors and NEON/AVX2/AVX512 intrinsics is coming soon.