OxCaml logo Jane Street logo

The “unboxed types” extension provides users with additional control over the way their data is represented in memory and registers. These new types have different layouts, which is part of their kind, to distinguish them from normal OCaml types.

This page gives a comprehensive overview of the extension. Unboxed types are still in active development, with new features being added frequently, and the documentation here is kept up to date.

Layouts

Every type is now classified by a layout, much like how every expression is classified by a type. The type system knows about a collection of fixed base layouts:

The type system also supports one composite layout: unboxed products:

Over time, we’ll be adding more layouts here.

Layout annotation

You can annotate type variables of type declarations with a layout, like this:

type ('a : immediate) t1
type ('a : float64) t2
type ('a : immediate, 'b : bits32) t3

If you do not annotate a type variable, we use layout inference to figure out the layout, though we know layout inference is incomplete in certain complicated scenarios with mutually recursive type definitions. If layout inference does not work, you will get an error asking you to write a layout annotation; we will never infer an incorrect layout. If layout inference does not fix the layout of a type variable, it is defaulted to have layout value.

You can also mark types as non-values using the following syntax:

type t4 : float64
type t5 : value  (* redundant, but you can do it if you like *)
type t6 : bits64 = int64# (* redundant since the layout can be deduced from the rhs *)

A type declared with no = signs (often in a signature) and no layout information defaults to layout value. Types with = signs deduce their layout from their right-hand sides.

Annotations can also be used within type expressions:

module type S = sig
  (* An annotation at binding sites sets the layout of the universal variable.
     Unannotated variables have layout `value`. *)
  val f1 : ('a : float64) ('b : immediate) 'c. 'a -> 'b -> 'c

  (* As shown here, annotation can't be placed directly on arbitrary types.
     [(int : immediate)] would be invalid syntax. The same can be achieved with
     [(int as (_ : immediate))]. *)
  val f2 : (int as ('a : immediate)) -> ('a : value) -> 'a
end

(* Note that annotations are always treated as upper bounds.
   The following is valid: *)
type ('a : immediate) t
type ('a : value) t2 = 'a t
(* ^ This will get typed as [type ('a : immediate) t2 = 'a t] *)

(* Here are a few more places where you can write annotations *)
let f3 (type a : immediate): a -> a = fun x -> x
let f4 (type (a : immediate) (b : float64) c) (x : a) (y: b) (z: c) = x
let f5 x = (x : (_ : immediate))
let f6: type (a: bits32). a -> a = fun x -> x

The full syntax can be found in the documentation for kinds. The complete annotation design is not yet implemented and the syntax should be read with kind ::= layout-name for now. It also provides reasoning around some design decisions and contains additional examples.

Layouts in module inclusion

Layouts are part of kinds, and therefore work just like kinds for the purposes of module inclusion. See the kinds documentation for more.

Unboxed numbers

We now have float#, int32#, int64#, nativeint#, and unboxed 128-bit vector types. They are the types for unboxed numbers. These all are stored without pointers; working with them does not cause any allocation.

Most unboxed numeric types have their own layout: float# : float64, int32# : bits32, int64# : bits64, nativeint# : word.

All of the 128-bit vectors have the same layout: float32x4# : vec128, float64x2# : vec128, int8x16# : vec128, int16x2# : vec128, int32x4# : vec128, and int64x2# : vec128.

Using layouts, you can usefully make a synonym of float# (or any of the other unboxed types) that has layout float64, for example with module M : sig type t : float64 ... end = struct type t = float# ... end.

Each numeric type has its own library for working with it: float_u, int32_u, int64_u, and nativeint_u (all in the janestreet_shims library).

Unboxed options

We now have type 'a or_null : value_or_null, the type of unboxed options. It has constructors Null and This v. See the or_null document for more details.

Unboxed products

The unboxed product layout describes types that work like normal products (e.g., tuples or records), but which are represented without a box.

In stock OCaml, a tuple is a pointer to a block containing the elements of the tuple. If you pass a tuple to a function, it is passed by reference in one register. The function can access the tuple’s elements through the pointer. Records and their fields are treated similarly. By contrast, an unboxed product does not refer to a block at all. When used as a function argument or return type, its elements are passed separately in their own registers, with no indirection (or on the call stack, if the product has more elements than there are available registers).

Currently, types that have unboxed product layouts are unboxed tuples and unboxed records.

Unboxed tuples are written #(...), and may have labels just like normal tuples. So, for example, you can write:

module Flipper : sig
  val flip : #(int * float# * lbl:string) -> #(lbl:string * float# * int)
end = struct
  let flip #(x,y,~lbl:z) = #(~lbl:z,y,x)
end

Unboxed records are defined, constructed, and matched on like normal records, but with a leading hash. Fields are projected with .#. For example:

type t = #{ f : float# ; s : string }
let inc #{ f ; s } = #{ f = Float_u.add f #1.0 ; s }
let get_s t = t.#s

The field names of unboxed records occupy a different namespace from the field names of “normal” (including [@@unboxed]) records.

Unboxed tuples and records may be nested within other unboxed tuples and records. There are no limitations on the layouts of the elements of unboxed tuples, but the fields of unboxed records must be representable.

Limitations and future plans:

The any layout

If all we know about a type is that its layout is any, we cannot execute code using that type.

For example, it’s fine to write this function type:

val f : ('a : any). 'a -> 'a (* valid as a type signature *)

(See the previous section to learn more about the layout annotation used here)

But it’s not possible to implement a function of that type:

let f (type a : any) (x : a) = x (* rejected by the compiler *)

This is because the compiler doesn’t know how to work with data of a type (calling convention, etc.) without knowing its concrete layout:

Error: This pattern matches values of type a
      but a pattern was expected which matches values of type
        ('a : '_representable_layout_1)
      The layout of a is any, because
        of the annotation on the abstract type declaration for a.
      But the layout of a must be representable, because
        it's the type of a function argument.

The main use case for layout any in its current form is with module types. For example:

module type S = sig
  type t : any

  val add : t -> t -> t
  val one : unit -> t
  val print : t -> unit
end

module M1 : S with type t = float# = struct
  type t = float#

  let add x y = Float_u.add x y
  let one () = #1.
  let print t = Printf.printf "%f" (Float_u.to_float t)
end

module M2 : S with type t = int = struct
  type t = int

  let add x y = x + y
  let one () = 1
  let print t = Printf.printf "%d" t
end

Here by defining module type S with layout any and using with constraints, we can reason about modules with similar shapes but that operate on different layouts. This removes code duplication and can aid ppxs in supporting unboxed types.

[@layout_poly] attribute

The attribute enables support for limited layout polymorphism on external %-primitives. This is possible because these primitives are always inlined at every use site. We can thus specialize the function implementation based on the layout information at each site.

With a [@layout_poly] external declaration like this:

external[@layout_poly] opaque_identity : ('a : any). 'a -> 'a = "%opaque"

It means that opaque_identity can operate on any concrete layout and have all of these types:

opaque_identity : ('a : float64). 'a -> 'a
opaque_identity : ('a : value). 'a -> 'a
opaque_identity : ('a : bits64). 'a -> 'a
...

The attribute changes the meaning of the layout annotation (_ : any) and turns 'a into a layout polymorphic type variable.

As a consequence of the specialization happening at every use site, this limited layout polymorphic behavior does not propagate:

let f = opaque_identity

Here f can have one and only one of the types listed above:

let _ = f #1.
(* or *)
let _ = f 100
(* but NOT BOTH *)

The current implementation also restricts all layout polymorphic type variables to have the same layout:

external[@layout_poly] magic : ('a : any) ('b : any). 'a -> 'b = "%obj_magic"

let f1 : int32# -> int32# = magic;; (* ok *)
let f2 : float# -> float# = magic;; (* ok *)
let f3 : float# -> int32# = magic;; (* error *)

This feature is conceptually similar to [@local_opt] for modes and is useful for array access primitives.

Here’s the list of primitives that currently support [@layout_poly]:

Arrays of unboxed elements

Arrays can store elements of any layout. You can think of array as having been declared as:

type ('a : any) array = (* ... *)

Array elements are packed according to their width. For example, arrays of elements whose layout is bits32 store two elements per word.

You can use normal array syntax for constructing such an array:

let array = [| #2l |]

Array primitives must be declared with [@layout_poly] to be usable with arrays of unboxed elements.

module Array = struct
  external[@layout_poly] get : ('a : any). 'a array -> int -> 'a = "%array_safe_get"
end

let first_elem () = array.(0)

(The above relies on the fact that array projection syntax desugars to a call to whatever Array.get is in scope.)

A limited set of primitives may be bound as [@layout_poly]; see the earlier section for more information.

Runtime representation

Array Tag Layout of data
('a : float64) array Double_array_tag 64 bits per element
('a : bits64) array Custom_tag reserved custom block word, followed by 64 bits per element
('a : float32) array, ('a : bits32) array Custom_tag reserved custom block word, followed by 32 bits per element
('a : vec128) array Custom_tag reserved custom block word, followed by 128 bits per element

The reserved custom block word is the standard custom block field that stores a pointer to the record of custom operations, like polymorphic equality and comparison. For unboxed 32-bit element types, like int32# and float32#, the custom operations pointer is different for odd-length arrays and even-length arrays.

Odd-length arrays of 32-bit element type have 32 bits of padding at the end. The contents of this padding is unspecified, and it is not guaranteed that the padding value will be preserved by the generated code or the runtime.

Using unboxed types in structures

Unboxed types can usually be put in structures, though there are some restrictions.

These structures may contain unboxed types:

Unboxed numbers can’t be put in these structures:

There aren’t fundamental issues with the structures that lack support. They will just take some work to implement.

Here’s an example of a record with an unboxed field. We call such a record a “mixed record”, and it is represented at runtime by a “mixed block”.

type t =
  { str : string;
    i : int;
    f : float#;
  }

The “mixed block” representation

The runtime representation of mixed blocks is slightly different than normal OCaml blocks. These differences are present to accommodate the garbage collector, which must scan the fields with layout value, but not the fields containing unboxed types.

To enable this, the header word of mixed blocks remembers how many elements of the block are values, with a maximum of 254. The compiler reorders the fields of your block so that all the values are first, and the GC knows to stop scanning after it has seen that number of fields.

For example, consider this record type:

type t =
  { w : float#;
    x : string;
    y : int64#;
    z : int
  }

The compiler will represent this type with a block where the fields are in the order x, z, w, y.

The reordering is invisible to source-level OCaml programs that don’t use unsafe features, but can be relevant when writing C bindings or OCaml code that depends on the runtime representation of values. It is stable in the sense that it never changes the relative order of two values, or of two non-values. Immediates count as values for this purpose (they are always moved to the prefix).

There is a special case for records that consist solely of float and float# fields. The “flat float record optimization” applies to any such record—all of the fields are stored flat, even the float ones that will require boxing upon projection. The fields are also not reordered. This special case exists to provide a better migration story for all-float records to which the flat float record optimization currently applies.

Blocks may contain unboxed products, in which case the products are “flattened” to become individual fields of the block, and reordered to accommodate the mixed block representation. For example, consider this record type:

type t =
  { a : float#;
    b : #(w:float# * #(x:string * y:int64#) * z:(int * int));
    c : bool }

This is represented as a block with six fields, and the fields appear the order x, z, c, a, w, y.

Generic operations aren’t supported

Some operations built in to the OCaml runtime aren’t supported for structures containing unboxed types.

These operations aren’t supported:

These operations raise an exception at runtime, similar to how polymorphic comparison raises when called on a function.

You should use ppx-derived versions of these operations instead.

Depending on the layout of mixed blocks

The implementation of field layout in a mixed block is not finalized. For example, we’d like for int32 fields to be packed efficiently (two to a word) on 64 bit platforms. Currently that’s not the case: each one takes up a word.

As a result, code that depends on the way mixed blocks are represented in memory (e.g., via C bindings) may need updates in the future. To help manage this, OxCaml provides mechanisms to assert your code depends on the current representation. The mechanism depends on whether you are writing C bindings or (unsafe) OCaml code.

Note also that, while unboxed types are generally considered an “upstream compatible” (because they can be erased while preserving behavior), depending on the exact representation of mixed blocks is not. Thus, use of these mechanism is also a sign that your code may need a custom mechanism if it is intended to work both in OxCaml and upstream OCaml.

In C bindings

To ensure that your C code will need to be updated when the layout changes, use the Assert_mixed_block_layout_v# family of macros. For example,

Assert_mixed_block_layout_v3;

Write the above in statement context, i.e. either at the top-level of a file or within a function.

In OCaml code

Users who write OCaml code that depends on the layout of mixed blocks (via Obj.magic or similar) should instead include a reference in the relevant modules to Stdlib_upstream_compatible.mixed_block_layout_v#. For example:

let _ = Stdlib_upstream_compatible.mixed_block_layout_v3

Example

Here’s a full example. Say you’re writing C bindings against this OCaml type:

(** foo.ml *)
type t =
  { x : int32#;
    y : int32#;
  }

Here is the recommend way to access fields:

Assert_mixed_block_layout_v3;
#define Foo_t_x(foo) (*(int32_t*)&Field(foo, 0))
#define Foo_t_y(foo) (*(int32_t*)&Field(foo, 1))

Future changes and history

We will bump the version number if make changes to the layout of mixed blocks. For example, it will be bumped if:

When we bump the version, the C assertion for the previous version will fail at compile time, and the OCaml definition for the previous version will be removed from the standard library. This alerts maintainers of code using these mechanisms to consider whether that code needs updates.

Version history: