Are there restrictions on highlighting small fragments using numa_alloc_onnode ()?

I am experimenting with NUMA on a machine that has 4 Operton 6272 processors running centOS. There are 8 NUMA nodes, each of which has 16 GB of memory.

Here is a small test program that I am running.

void pin_to_core(size_t core)
{
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(core, &cpuset);
    pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
}

int main()
{
    pin_to_core( 0 );

    size_t bufSize = 100;
    for( int i = 0; i < 131000; ++i )
    {
        if( !(i % 10) )
        {
            std::cout << i << std::endl;
            long long free = 0;
            for( unsigned j = 0; j < 8; ++j )
            {
                numa_node_size64( j, &free );
                std::cout << "Free on node " << j << ": " << free << std::endl;
            }
        }

        char* buf = (char*)numa_alloc_onnode( bufSize, 5 );
        for( unsigned j = 0; j < bufSize; ++j )
            buf[j] = j;
    }

    return 0;
}

Thus, basically, a thread running on core # 0 allocates 131 Kbytes of 100-byte buffers on NUMA node 5, initializes them with garbage and comforts them. After every 10 iterations, we display information about how much memory is available for each NUMA node.

At the beginning of the output, I get:

0
Free on node 0: 16115879936
Free on node 1: 16667398144
Free on node 2: 16730402816
Free on node 3: 16529108992
Free on node 4: 16624508928
Free on node 5: 16361529344
Free on node 6: 16747118592
Free on node 7: 16631336960
...

And at the end I get:

Free on node 0: 15826657280
Free on node 1: 16667123712
Free on node 2: 16731033600
Free on node 3: 16529358848
Free on node 4: 16624885760
Free on node 5: 16093630464
Free on node 6: 16747384832
Free on node 7: 16631332864
130970
Free on node 0: 15826657280
Free on node 1: 16667123712
Free on node 2: 16731033600
Free on node 3: 16529358848
Free on node 4: 16624885760
Free on node 5: 16093630464
Free on node 6: 16747384832
Free on node 7: 16631332864
mbind: Cannot allocate memory
mbind: Cannot allocate memory
mbind: Cannot allocate memory
mbind: Cannot allocate memory
mbind: Cannot allocate memory
mbind: Cannot allocate memory
mbind: Cannot allocate memory
130980
...

What is not clear to me:

1) "mbind: Can not allocate memory"? , , , , , 1000, , - .

2) , node 5, , -, 0 5.

-, , , ?

UPDATE

(2). , node 5, , , # 0 ( NUMA node 0). pin_to_core(0) pin_to_core(8), 1 5. pin_to_core(40), node 5.

UPDATE2

libnuma numa_alloc_onnode() : mmap() mbind(). , NUMA node - move_pages(). . ( j) node ( ENOENT), node 0, node 5. regular: 5,0,5,0,... , 131000- , mbind() , , node 0. , mbind, - ENOMEM, , , - " ". , , "" , 16 node.

, :

  • , mbind(), 50% , ​​ NUMA node . , - , ...

  • mbind. , , mbind() .

, , : , NUMA ndoes. mlock (- , ).

+4
2

libnuma.c, numa_alloc_onnode() , NUMA node. mmap() . /proc/sys/vm/max_map_count, :

# echo 1048576 > /proc/sys/vm/max_map_count

sysctl:

# sysctl -w vm.max_map_count=1048576

Linux 65530. mmap() , . . numa_alloc_onnode() 37 . mmap() 2 * (65530-37) = 130986.

, mbind() , - , . , , . , :

numa_alloc_onnode( bufSize, 5 )

numa_alloc_onnode( bufSize, i % 4 )

mmap() 65500- , .

+2

, man- numa_alloc_onnode

The size argument will be rounded up to a multiple of the system page size.

, , , . , 131000 .

numa_set_strict() numa_alloc_onnode, node.

numa_set_strict() sets a flag that says whether the functions  allocating
   on specific nodes should use use a strict policy. Strict means the
   allocation will fail if the memory cannot be allocated  on  the  target
   node.   Default operation is to fall back to other nodes.  This doesn't
   apply to interleave and default.
+1

All Articles