TL; DR;
You do not have to store sensitive information on the OCaml heap. Thus, you should never copy your secret to any value allocated by the OCaml heap, therefore, neither bytes, nor strings, nor arrays can be used, even temporary ones.
Introduction to the OCaml Memory Model
OCaml values are evenly represented as labeled machine words. The less significant bit of the word is used as a tag that distinguishes between pointers (tag = 0) and immediate values (tag = 1). Thus, a value always has a fixed size and is a pointer or immediate.
Immediate values store their data in the most significant part of the word, that is, 31 bits in 32-bit systems and 63 bits in 64-bit systems. Pointers store their data in blocks that are located in the so-called OCaml heap. OCaml Heap is a collection of blocks managed by the garbage collector (GC). A block is a piece of data with a header prefix. The header defines the size of the data and some other meta information used by the GC. A block may contain OCaml values (pointers or immediate values) or opaque data.
Summarizing. All OCaml values are represented as machine words that either store data directly in the word, or are pointers to blocks allocated by the heap. Each pointer points to one and only one block. Multiple pointers can point to the same block. Such values are considered physically equal. Some blocks are not indicated by any pointers. Such blocks are called dead and are regenerated by the GC.
Introduction to OCaml Garbage Collector
The GC manages blocks by distributing, moving, and freeing them. GC itself uses an arena that is either obtained from the C memory allocator (malloc) or directly from the kernel via the memmap system call (it depends on the specific system and runtime).
GC is a generation, which means that values are first allocated in a special area of the heap called a small heap. A small heap is an adjacent fixed-size memory region represented at runtime by three pointers: the beg pointer to the beginning of the small heap, the end pointer to the end of the small heap, and the cur pointer to the beginning of the free small heap area. When a block is selected, cur increases by the size of the block. Then the block is initialized with data. When there is no more free space in the small heap (i.e., then end - cur smaller than the required block size), an insignificant GC cycle starts. GC parses all the blocks stored in the Small Heap and copies all the blocks referenced by at least one pointer to the main heap. After that, the cur pointer is set to beg .
In a large heap, a block can also be copied several times during a process called compaction. The compactor may try to rearrange blocks in their arena to achieve a more compact representation of the heap.
Safety implications
Since the OCaml GC is a moving GC, it can copy data distributed by the heap arbitrarily. Although it is called moving, it is still really just copying. Ie, when a block moves from the lowest heap to the big heap, it is actually just copied to bits and thus duplicated. A phantom block in a small heap can live for an arbitrary period of time until it is overwritten by some newly allocated value. When an object moves during compaction, it is also copied and may or may not be overwritten during the process. And, of course, it goes without saying that after the block becomes dead, it can still survive on the heap for an arbitrary amount of time until the GC is reused.
This means that if a secret gets into a bunch of OCaml, it will be lonely, because GC can repeat it several times in an arbitrary and unpredictable way. Thus, we can only keep secrets either in direct values or in regions that are not controlled by GC. As mentioned earlier, all OCaml values that are pointers always point to a block in the OCaml heap. A block may contain data directly or may contain a pointer itself, which will point outside the memory heap. So-called custom blocks may or may not store their information in the OCaml heap, it depends on the specific presentation of each custom block. For example, the Bigarray library provides custom blocks that store their payload outside the OCaml heap. Thus, Bigarray is a custom block that has two fields: pointer and size. This is an opaque block, i.e. The GC will never consider these two values as OCaml values and will never follow either size or pointer. The data marked with a pointer is located outside the OCaml heap and is either allocated using malloc or memmap (in fact, it can be an arbitrary integer and even point to a stack or static data, t really matters if we look at bigarrays in the same way as a pair ptr,len ).
All this makes Bigarrays ideal for keeping secrets. We can be sure that they will not be moved by the GC, we can rewrite them to prevent information leakage after they are released.
Further considerations
We must be careful and never allow copying a secret to a bunch of OCaml from our safe place. This means that even if our main repository is safe bigarray, the information will still leak if we copy its contents to the OCaml string. Therefore, if we first read the information in the OCaml line, and then copy it to bigarray, the information will still leak. Thus, any interface that uses the values allocated by the OCaml heap is unsafe and should not be used. For example, we cannot use OCaml channels to read or write secrets (we must rely on a memory map or unbuffered IO provided by the Unix module). And again, whenever you get the string data type from Bigarray, you copy your data with all the consequences.