Block indices
This document describes the language feature and implementation for explicit indices into a block. Before reading this document, you may wish to read up through the layouts section of the main document.
As a quick example:
open Stdlib_stable
type pt = { x : int; y : int }
type line = { p : pt#; q : pt# }
(* Creating an immutable block index into a nested record *)
let mk_idx () : (line, int) idx_imm = (.q.#y)
let get_coord (line : line) (i : (line, int) idx_imm) : int =
(* If [i] is [(.q.#y)], then the below is similar to to [line.q.#y] *)
Idx_imm.unsafe_get line i
(* Creating a block index into an array *)
let first_x (): (pt# array, int) idx_mut = (.(0).#x)
let inc_coord (pts : 'a) (i : ('a, int) idx_mut) =
(* Equivalent to [pts.(i) <- pts.(i) + 1] when [i] is in bounds and [pts] is
an [int array]. But this function could also be used update a mutable
record containing an [int]. *)
Idx_mut.unsafe_set pts i (Idx_mut.unsafe_get pts i + 1)
Overview
A block index is an opaque, explicit index to an element in a larger structure. The language feature includes these predefined types:
type ('a, 'b : any) idx_imm : bits64
type ('a, 'b : any) idx_mut : bits64
Given ('a, 'b) idx_imm or ('a, 'b) idx_mut, we refer to 'a as the “base
type” and 'b as the “element type.” A block index thus represents the position
of an element type within the base type. For example,
(.q.#y) : (line, int) idx_imm in the example above represents the position of
this int in a line:
v
{ p = #{ x; y }; q = #{ x; y } }
Index creation. Block index creation uses the syntax (.foo.#bar).
Specifically, it consists of one “block access” followed by zero or more
“unboxed accesses” within parentheses.
Block accesses take the following forms:
- Record field:
.foo - Array index as an
int,int64#,int32#,int16#,int8#, ornativeint#:.(i),.L(i),.l(i),.S(i),.s(i), or.n(i) - Immutable array index as an
int,int64#,int32#,int16#,int8#, ornativeint#:.:(i),.:L(i),.:l(i),.S(i),.s(i), or.:n(i) - Mutable or immutable block index:
.idx_mut(idx)or.idx_imm(idx)
Unboxed accesses take the following forms:
- Unboxed record field:
.#bar
Index mutability. If the block access is mutable (mutable record fields,
arrays, and mutable block indices), then an idx_mut is created, and if the
block access is immutable (immutable record fields, immutable arrays, and
immutable block indices), then an idx_imm is created.
Using indices. Naturally, block indices can be used to read and write
within blocks. This can be done via the Idx_imm.unsafe_get,
Idx_mut.unsafe_get, and Idx_mut.unsafe_set functions in Stdlib_stable.
(These functions are unsafe only for array indices, which may be out of range.
Uses of block indices built without the use of an array-indexing operation are
always safe.)
A key advantage of block indices is that these accessor functions are
polymorphic in both the base type and element type. Index reading roughly
(ignoring mutability, layouts, modes) has the type signature
'a -> ('a, 'b) idx -> 'b, and index writing roughly has the type
signature 'a -> ('a, 'b) idx -> 'b -> unit.
Index deepening. Block indices themselves are included as a type of block
access so that indices can be deepened. For example, given
idx : ('a, pt#) idx_imm, one may obtain
(.idx_imm(idx).#y) : ('a, int) idx_imm.
Example use cases
Implement “interior pointers”
By packing the base type parameter, block indices can be used to implement pointers into a block. (There is ongoing work to create a standardized library for interior pointers, but we include this example for illustrative purposes.)
type ('a : any) iptr = P : #('base * ('base, 'a) idx_imm) -> 'a iptr [@@unboxed]
type ('a : any) mptr = P : #('base * ('base, 'a) idx_mut) -> 'a mptr [@@unboxed]
Allow polymorphic APIs to support fine-grained access
module Stack : sig
type 'a t
val empty : 'a t
val push : 'a t -> 'a -> unit
val pop : 'a t -> 'a
val update_top : 'a t -> ('a, 'b) idx_mut -> 'b -> unit
(* [update_top] normally isn't possible without exposing the representation.
One could use ['a t -> ('a -> unit) -> unit], but the closure is less
efficient. *)
end
(* ... *)
type pt = { x : int; y : int }
type line = { mutable p : pt#; mutable q : pt# }
let drop_last_to_y_axis (s : line Stack.t) =
Stack.update_top s (.q.#y) 0
Edge cases and limitations
- For block indices to arrays, the array type parameter must be
mod non_float. - Indices to
[@@unboxed]records cannot be taken. - An index to a
floatin a flattened float record has an element typefloat#. - Indices to some records containing both values and non-values, and occupying over 2^16 bytes, cannot be created. See Representation of block indices for details.
- Indices to structures with non-default modalities are not supported.
Specifically, the composition of modalities of the accesses of an
idx_immmust have the identity modality, while the composition of modalities of the accesses of anidx_mutmust the the modalityglobal many aliased unyielding.
Representation of block indices
We document the compilation of block indices here, but do not guarantee this representation to be stable between versions.
Consider the following type:
type a = #{ s : string; i : int64# }
type b = #{ i : int64#; a : a; s : string }
type c = { mutable b : b; s : string }
The record c presents an interesting problem for block indices: the fields of
its contained unboxed records a and b are not actually contiguous at runtime
when using the native code compiler. This problem is caused by the
mixed block representation,
which mandates that we reorder fields so that values come before unboxed types.
While the layout of c has the shape
((b_i64, (a_string, a_i64), b_string), c_string),
its representation on the native code compiler looks like this:
a_string b_string c_string b_i64 a_i64
b ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^
a ^^^^^^^^ ^^^^^
And in the bytecode compiler, unboxed records are actually boxed, and not reordered.
Acccordingly, block indices also have two different representations. In the native compiler, they are the offset into the block and the gap between the values and non-values of the pointed-to payload, both in bytes. In the bytecode compiler, block indices are represented as a sequence of field positions.
Idx in c |
Native repr. | Bytecode repr. |
|---|---|---|
(.b) |
offset 0, gap 8 | { 0 } |
(.s) |
offset 16, gap 0 | { 1 } |
(.b.#i) |
offset 24, gap 0 | { 0; 0 } |
(.b.#a) |
offset 0, gap 24 | { 0; 1 } |
(.b.#s) |
offset 8, gap 0 | { 0; 2 } |
(.b.#a.#s) |
offset 0, gap 0 | { 0; 1; 0 } |
(.b.#a.#i) |
offset 32, gap 0 | { 0; 1; 1 } |
In-memory representation:
- In the native compiler, the offset and gap are packed into
a single
bits64. There are two subcases:- The index is to product containing both values and non-values. In this case, the offset is the lower 48 bits and the gap is the upper 16 bits.
- The index is to all values/non-values. In this case, all 64 bits are used for the offset.
- In the bytecode compiler, the field positions are stored as tagged integers
in single block with tag 0.
- Unboxed record fields in the index into singleton unboxed records are not included in the list of positions, as singleton unboxed records are erased during translation to lambda.
While marshaling data should be consistent between bytecode and native, this
case is safe because mixed blocks (the only values that can contain bits64s)
cannot be marshaled.
For a visualization of the native representation of block indices, and the implementation of deepening, see below.
Native implementation of index deepening
While from the perspective of the layout, deepening an index simply entails “moving to a subtree,” deepening the native representation is more involved. Below, we show the different cases to consider. Note that:
- The “left” and “right” of an element refers to the layout, not the native representation.
o1andg1refer to the offset and gap of the index before deepening (light orange), whileo2andg2refer to the offset and gap of the index after deepening (blue).
