Overwriting data in memory

I wrote a password manager at Ocaml. To make it as safe as possible, I would like to save the string (encryption key) in memory so that it can be overwritten. Since Ocaml passes by value , and there is a garbage collector, this has proven difficult. I encrypt all the buffers and variables that I can, but for this I still need a "session key". To prevent this from being detected by automatic key finders or placed in swap, it was collected from a random data group in a buffer using random increment. So really, I only need one variable that can be overwritten for the collected key in a few seconds before it goes to the Nocrypto library ... Will this link work?

According to this cornell "Refs and Arrays" page , refs are mutable and work similarly to pointers to C. That being said, I also found an answer discussing Ocaml refs , which says in the answer "they act as pointers to a new allocated memory " Does this mean each time, it just highlights a new thing in memory, and does not actually mutate the material in memory? If so, you cannot “rewrite” the link.

Other possible solutions that I came across are Bigarrays and custom blocks. I'm not quite sure that “user blocks” are actually allocated outside the garbage collection area or not. They seem to be used to access external C code. Are they copied by the garbage collector? Can they be overwritten? There is also the idea of ​​“opaque bytes” and opaque objects in memory. I have a pretty hard time wrapping my head around how it all fits together. A useful but confusing (for me) discussion of user blocks in memory when stack overflows is here: Are user blocks ever copied to memory? Answer says that they can be moved. Even if they could be overwritten?

The last possible solution is to save it with Cstruct, as the Nocrypto library does. They discuss this in this github issue: Secret erasure of material. Applicant declares:

"Of course, the main material is Cstruct.t, which is Bigarray.Array1.t, which stands out outside the GC heap."

Is that even right? If so, I cannot find the source file that actually does this. I am new to Ocaml and functional programming in general. If you're interested, my program is on github here: ocaml-pass

+8
garbage-collection memory-management ocaml
source share
2 answers

TL; DR;

You do not have to store sensitive information on the OCaml heap. Thus, you should never copy your secret to any value allocated by the OCaml heap, therefore, neither bytes, nor strings, nor arrays can be used, even temporary ones.

Introduction to the OCaml Memory Model

OCaml values ​​are evenly represented as labeled machine words. The less significant bit of the word is used as a tag that distinguishes between pointers (tag = 0) and immediate values ​​(tag = 1). Thus, a value always has a fixed size and is a pointer or immediate.

Immediate values ​​store their data in the most significant part of the word, that is, 31 bits in 32-bit systems and 63 bits in 64-bit systems. Pointers store their data in blocks that are located in the so-called OCaml heap. OCaml Heap is a collection of blocks managed by the garbage collector (GC). A block is a piece of data with a header prefix. The header defines the size of the data and some other meta information used by the GC. A block may contain OCaml values ​​(pointers or immediate values) or opaque data.

Summarizing. All OCaml values ​​are represented as machine words that either store data directly in the word, or are pointers to blocks allocated by the heap. Each pointer points to one and only one block. Multiple pointers can point to the same block. Such values ​​are considered physically equal. Some blocks are not indicated by any pointers. Such blocks are called dead and are regenerated by the GC.

Introduction to OCaml Garbage Collector

The GC manages blocks by distributing, moving, and freeing them. GC itself uses an arena that is either obtained from the C memory allocator (malloc) or directly from the kernel via the memmap system call (it depends on the specific system and runtime).

GC is a generation, which means that values ​​are first allocated in a special area of ​​the heap called a small heap. A small heap is an adjacent fixed-size memory region represented at runtime by three pointers: the beg pointer to the beginning of the small heap, the end pointer to the end of the small heap, and the cur pointer to the beginning of the free small heap area. When a block is selected, cur increases by the size of the block. Then the block is initialized with data. When there is no more free space in the small heap (i.e., then end - cur smaller than the required block size), an insignificant GC cycle starts. GC parses all the blocks stored in the Small Heap and copies all the blocks referenced by at least one pointer to the main heap. After that, the cur pointer is set to beg .

In a large heap, a block can also be copied several times during a process called compaction. The compactor may try to rearrange blocks in their arena to achieve a more compact representation of the heap.

Safety implications

Since the OCaml GC is a moving GC, it can copy data distributed by the heap arbitrarily. Although it is called moving, it is still really just copying. Ie, when a block moves from the lowest heap to the big heap, it is actually just copied to bits and thus duplicated. A phantom block in a small heap can live for an arbitrary period of time until it is overwritten by some newly allocated value. When an object moves during compaction, it is also copied and may or may not be overwritten during the process. And, of course, it goes without saying that after the block becomes dead, it can still survive on the heap for an arbitrary amount of time until the GC is reused.

This means that if a secret gets into a bunch of OCaml, it will be lonely, because GC can repeat it several times in an arbitrary and unpredictable way. Thus, we can only keep secrets either in direct values ​​or in regions that are not controlled by GC. As mentioned earlier, all OCaml values ​​that are pointers always point to a block in the OCaml heap. A block may contain data directly or may contain a pointer itself, which will point outside the memory heap. So-called custom blocks may or may not store their information in the OCaml heap, it depends on the specific presentation of each custom block. For example, the Bigarray library provides custom blocks that store their payload outside the OCaml heap. Thus, Bigarray is a custom block that has two fields: pointer and size. This is an opaque block, i.e. The GC will never consider these two values ​​as OCaml values ​​and will never follow either size or pointer. The data marked with a pointer is located outside the OCaml heap and is either allocated using malloc or memmap (in fact, it can be an arbitrary integer and even point to a stack or static data, t really matters if we look at bigarrays in the same way as a pair ptr,len ).

All this makes Bigarrays ideal for keeping secrets. We can be sure that they will not be moved by the GC, we can rewrite them to prevent information leakage after they are released.

Further considerations

We must be careful and never allow copying a secret to a bunch of OCaml from our safe place. This means that even if our main repository is safe bigarray, the information will still leak if we copy its contents to the OCaml string. Therefore, if we first read the information in the OCaml line, and then copy it to bigarray, the information will still leak. Thus, any interface that uses the values ​​allocated by the OCaml heap is unsafe and should not be used. For example, we cannot use OCaml channels to read or write secrets (we must rely on a memory map or unbuffered IO provided by the Unix module). And again, whenever you get the string data type from Bigarray, you copy your data with all the consequences.

+14
source share

I would use a value of type bytes , essentially a mutable byte array:

 # let buffer = Bytes.make 16 'x';; val buffer : bytes = "xxxxxxxxxxxxxxxx" # Bytes.set buffer 0 'T';; - : unit = () # buffer;; - : bytes = "Txxxxxxxxxxxxxxx" # Bytes.fill buffer 0 16 ' ';; - : unit = () # buffer;; - : bytes = " " 

You can overwrite Bytes.fill after completion.

0
source share

All Articles