How can I get around this EOutOfMemory exception when encoding a very large file?

I am using Delphi 2009 with Unicode strings.

I am trying to encode a very large file to convert it to Unicode:

var Buffer: TBytes; Value: string; Value := Encoding.GetString(Buffer); 

This is great for a 40 MB buffer, which doubles and returns a value of 80 MB Unicode string.

When I try this with a 300 MB buffer, it gives me an EOutOfMemory exception.

Well, that was not entirely unexpected. But I decided to trace it anyway.

It goes into the DynArraySetLength procedure in the system unit. In this procedure, it goes into the heap and calls ReallocMem. To my surprise, it successfully allocates 665,124,864 bytes !!!

But then towards the end of the DynArraySetLength, it calls FillChar:

  // Set the new memory to all zero bits FillChar((PAnsiChar(p) + elSize * oldLength)^, elSize * (newLength - oldLength), 0); 

In the comment you can see what should do. This is not much, but it is a procedure that throws an EOutOfMemory exception. Here is the FillChar from the system unit:

 procedure _FillChar(var Dest; count: Integer; Value: Char); {$IFDEF PUREPASCAL} var I: Integer; P: PAnsiChar; begin P := PAnsiChar(@Dest); for I := count-1 downto 0 do P[I] := Value; end; {$ELSE} asm // Size = 153 Bytes CMP EDX, 32 MOV CH, CL // Copy Value into both Bytes of CX JL @@Small MOV [EAX ], CX // Fill First 8 Bytes MOV [EAX+2], CX MOV [EAX+4], CX MOV [EAX+6], CX SUB EDX, 16 FLD QWORD PTR [EAX] FST QWORD PTR [EAX+EDX] // Fill Last 16 Bytes FST QWORD PTR [EAX+EDX+8] MOV ECX, EAX AND ECX, 7 // 8-Byte Align Writes SUB ECX, 8 SUB EAX, ECX ADD EDX, ECX ADD EAX, EDX NEG EDX @@Loop: FST QWORD PTR [EAX+EDX] // Fill 16 Bytes per Loop FST QWORD PTR [EAX+EDX+8] ADD EDX, 16 JL @@Loop FFREE ST(0) FINCSTP RET NOP NOP NOP @@Small: TEST EDX, EDX JLE @@Done MOV [EAX+EDX-1], CL // Fill Last Byte AND EDX, -2 // No. of Words to Fill NEG EDX LEA EDX, [@@SmallFill + 60 + EDX * 2] JMP EDX NOP // Align Jump Destinations NOP @@SmallFill: MOV [EAX+28], CX MOV [EAX+26], CX MOV [EAX+24], CX MOV [EAX+22], CX MOV [EAX+20], CX MOV [EAX+18], CX MOV [EAX+16], CX MOV [EAX+14], CX MOV [EAX+12], CX MOV [EAX+10], CX MOV [EAX+ 8], CX MOV [EAX+ 6], CX MOV [EAX+ 4], CX MOV [EAX+ 2], CX MOV [EAX ], CX RET // DO NOT REMOVE - This is for Alignment @@Done: end; {$ENDIF} 

So, my memory was allocated, but it crashed, trying to fill it with zeros. It makes no sense to me. As far as I know, the memory does not even need to be filled with zeros - and this is probably one way or another - since in any case the Encoding expression should fill it.

Is there any way to prevent Delphi from filling up memory?

Or is there another way I can get Delphi to allocate this memory successfully for me?

My real goal is to make this encoding statement for my very large file, so any solution that allows this will be greatly appreciated.


Conclusion: see my comments on the answers.

This is a warning to be careful when debugging assembler code. Make sure you break all the β€œRET” lines, as I missed one of them in the middle of the FillChar procedure and mistakenly concluded that FillChar caused the problem. Thank you, Mason, for this.

I will have to split the input into pieces in order to process a very large file.

+4
source share
4 answers

Read the fragment from the file, encode and write to another file, repeat.

+5
source

FillChar does not allocate any memory, so this is not your problem. Try tracing it and putting breakpoints in RET statements, and you will see that FillChar ends. Whatever the problem, it may be at a later stage.

+6
source

Wild assumption: can the problem be memory overloaded and when FillChar actually accesses the memory, it cannot find the page that really gives you? I don’t know if Windows will even recompile the memory, I know that some operating systems - you will not know about it until you try to use the memory.

If so, it can cause bloating in FillChar.

+1
source

Programs are great for looping. They run tirelessly.

Allocating a huge amount of memory takes time. There will be many calls to the heap manager. Your OS does not even know if you have the amount of contiguous memory that you need ahead of time. Your OS says yes, I have 1 GB for free. But as soon as you go to use it, your OS says wait, do you want it all in one piece? Let me make sure that I have enough in one place. If this is not the case, you will receive an error message.

If he really has memory, then for the heap manager there is still a lot of work in preparing the memory and marking it as used.

So, obviously, it makes sense to allocate less memory and just skip it. This saves the computer from a lot of work, which he will only have to cancel when it is done. Why not make it work a bit by saving memory, and then just keep reusing it?

Stack memory is allocated much faster than heap memory. If you keep the memory usage small (by default less than 1 MB), the compiler can just use the stack memory for a lot of memory, which will make your loops even faster. In addition, the local variables that are allocated in the register are very fast.

There are factors such as a cluster of hard drives and cache sizes, cache sizes, etc. that offer tips on the best block sizes. The key is to find a good number. I like to use 64K chunks.

+1
source

Source: https://habr.com/ru/post/1314142/


All Articles