Why does my class take up so much memory?

I will have literally tens of millions of instances of some MyClass class and you want to minimize its memory size. The question of how much space an object occupies in memory was discussed in Find out the size of an .net object. I decided to follow the recommendations of Jon Skeet, and this is my code:

// Edit: This line is "dangerous and foolish" :-) // (However, commenting it does not change the result) // [StructLayout(LayoutKind.Sequential, Pack = 1)] public class MyClass { public bool isit; public MyClass nextRight; public MyClass nextDown; } class Program { static void Main(string[] args) { var a1 = new MyClass(); //to prevent JIT code mangling the result (Skeet) var before = GC.GetTotalMemory(true); MyClass[] arr = new MyClass[10000]; for (int i = 0; i < 10000; i++) arr[i] = new MyClass(); var after = GC.GetTotalMemory(true); var per = (after - before) / 10000.0; Console.WriteLine("Before: {0} After: {1} Per: {2}", before, after, per); Console.ReadLine(); } } 

I run the program on 64-bit Windows, select the “release”, the target platform: “any processor” and select “optimize the code” (the options only matter if I am explicitly targeting x86). The result, unfortunately, is 48 bytes per instance.

My calculation will be 8 bytes per link, plus 1 byte for bool plus some 8-byte overhead. What's happening? Is it a conspiracy to keep RAM prices high and / or to allow inflating code other than Microsoft? Well, okay, I think my real question is: what am I doing wrong, or how can I minimize the size of MyClass?

Edit: I apologize for being inaccurate in my question, I edited a couple of identifier names. My specific and immediate concern was to create a “2-dimensional linked list” as a sparse logical implementation of the matrices, where I can easily get an listing of the given values ​​in a given row / column. [Of course, this means that I must also store the x, y coordinates in the class, which makes my idea even less feasible]

+7
source share
3 answers

Approach the problem from the other end. Instead of asking myself: "How can I reduce this data structure and still have tens of millions of them?" ask yourself: "How can I present this data using a completely different data structure, which is much more compact?"

It looks like you are building a doubly linked bools list, which, as you noticed, uses thirty to fifty times more memory than you need. Is there a reason why you are not just using BitArray to store a list of bools?

UPDATE:

I actually tried to implement a sparse Boolean two-dimensional matrix

Well, why didn't you say that in the first place?

When I want to create a sparse boolean matrix with two huge sizes, I create an immutable constant logical quadrino with a memoized factory. If the array is sparse or even tight, but self-similar in some way, you can achieve huge compressions. Square arrays of 2 64 x 2 64 Booleans are easily represented, although, obviously, as a real array that will have more memory than exists in the world.

I worked on the idea of ​​doing a series of blog articles on this technique; I will most likely do this at the end of March.

In short, the idea is to create an abstract Quad class that has two subclasses: Single and Multi. "Single" is a doubleton - as a singleton, but with exactly two instances called True and False. Multi is a Quad that has four subframes called NorthEast, SouthEast, SouthWest, and NorthWest.

Each square has an integer "level"; the level of a single zero is zero, and a multilevel level n is required for all his children to be Quads of level n-1.

Multi factory is remembered; when you ask him to create a new Multi with four children, he turns to the cache to find out if he has done this before. If he is, he does not create a new one; he passes the old one. Since Quads are immutable, you don’t have to worry about someone swapping Quad for you after it's in the cache.

Now consider how many words of memory (a word of 4 or 8 bytes depending on the architecture) "all false" consume many levels of n. Level 1 "all false" multi consumes four words to refer to its children, a word to calculate the level (if necessary, you do not need to maintain the level in multi, although this helps for debugging) and a few words for the synchronization block and so on. Call it eight words. (Plus, the memory for the False Single quad that we can count is constant two or three words, and therefore can be ignored.)

Level 2 “all false” multi uses the same eight words, but each of its four children is the same level 1. Thus, the total consumption of level 2 “all false” multites allows you to say 16 words.

The same goes for level 3, 4, ... and so on. The total memory consumption for a layered level of 64, which logically represents a 2 64 x 2 64 square array of Boolean objects, is just 64 x 16 words of memory!

Make sense? Hope this sketch is enough for you to get started. If not, see my blog at the end of March.

+26
source

8 (link to the object) + 8 (link to the object) + 1 (bool) + 16 (heading) + 8 (link in the array itself) = 41

Even if it is offset internally, each will be aligned on the heap. So, we are looking at least 48 bytes.

I can’t understand for life why you need a linked list of bools. The list of them will take up 48 times less space, and before you go on to optimize the storage of bool per bit, which will make it 384 times smaller. And easier to manipulate.

+4
source

If these hundreds of millions of class instances are mostly copies of the class with slight variations in the values ​​of the class properties, then your system is the main candidate for using the so-called Flyweight . This pattern minimizes memory usage by using the same instances over and over again and simply changing properties as needed ...

+1
source

All Articles