How do Erlang atoms work?

Trying to find the documentation for the details, I did not find a lot of things:

  • There is an atom table (erlang runtime instance-).
  • Atom string literal is stored only once.
  • Atoms take 1 word.

For me, this leaves a lot of things unclear.

  • Is the atom word meaning the same regardless of the sequence modules loaded into the runtime instance? If modules A and B define / reference some atoms, will the value of the atom change from session to session, depending on whether the first or second A has been loaded? B

  • If there is a match for an atom inside the module, is there some kind of "atomic literal meaning for the atom"? Do modules have their own local-local table with an atom index, which is populated during module loading?

  • In a distributed scenario where two instances of an erlang environment interact with each other. Are there any "sync-atom-tables" actions? Or do atoms become serialized as string literals rather than words?

+7
erlang
source share
2 answers

Atom is just an identifier supported by a virtual machine. The identifier representation is an integer of a machine of the underlying architecture, for example. 4 bytes on 32-bit systems and 8 bytes on 64-bit systems. See Use in the LYSE Book .

The same atom in the same virtual machine always maps to the same identifier (integer). For example, the following tuple:

{apple, pear, cherry, apple} 

can be saved as the following tuple in the actual Erlang memory:

 {1, 2, 3, 1} 

All atoms are stored in one large table, which is never collected with garbage, i.e. as soon as an atom is created in a running virtual machine, it remains in the table until the virtual machine is closed.

Answering your questions:

1. No. The atom id will change between VM starts. If you turn off the virtual machine and restart the tuple over the system, you can get the following identifiers:

 {50, 51, 52, 50} 

depending on what other atoms were created before loading it. Atoms only live as long as the virtual machine.

2. No. There is only one atom table on the VM. All literal atoms in the module are mapped to their identifiers when loading the module. If a specific atom does not already exist in this table, it is inserted and remains there until the VM reboots.

3. None. Tables with atoms are located on the VM and are separate. Consider a situation where two virtual machines start simultaneously, but they do not know each other. Atoms created in each virtual machine can have different identifiers in the table. If at some point in time one node finds out about other nodes, different atoms will have different identifiers. They cannot be easily synchronized or combined. But atoms are not just sent as textual representations to another node. They are "compressed" in the form of a cache and send everything together in the header. See the distribution header in the communication protocol description. Basically, the header contains atoms used in later terms with their identifiers and textual representation. Then, each term refers to an atom by the identifier specified in the header, and does not transmit the same text every time.

+10
source share

To become truly core without going into implementation. An atom is a literal β€œthing” with a name. This value is always in itself, and it knows its name. You usually use it when you want a tag, for example, ok and error atoms. Atoms are unique in the sense that there is only one foo atom in the system, and every time I refer to foo , I mean the same unique foo , regardless of whether they are in the same module or they come from one process. There is always only one foo .

A bit of implementation. Thus, the atoms are stored in the global atomic table, and when you create a new atom, it is inserted into the table, if it is not already there. This makes atom comparison for equality very fast, as you just check to see if these two atoms refer to the same slot in the atom table.

While separate virtual machine instances, nodes have separate atom tables, the connection between nodes in a distributed erlang is optimized for this, so very often you do not need to send the actual atom name between nodes.

0
source share

All Articles