How to process a file in PowerShell in turn in a stream

I work with several text files with several gigabytes and I want to use some stream processing for them using PowerShell. This is simple stuff, simply analyzing each row and taking out some data, and then storing it in the database.

Sorry, get-content | %{ whatever($_) } get-content | %{ whatever($_) } seems to store the entire set of lines at this stage of the channel in memory. It is also surprisingly slow, a very long time to really read it all.

So my question has two parts:

  • How can I make it process a stream line by line and not store all buffered memory? I would like to avoid using several gigabytes of RAM for this.
  • How can I make it work faster? Initializing PowerShell on top of get-content is 100 times slower than C # script.

I hope that there is something dumb I am doing here, for example, the -LineBufferSize parameter is -LineBufferSize or something else ...

+76
stream powershell
Nov 16 2018-10-11T00:
source share
3 answers

If you are really going to work with text files with several gigabytes, then do not use PowerShell. Even if you find a way to read it, in any case, PowerShell will process a lot of lines more slowly, and you won’t be able to avoid it. Even simple road cycles, say, for 10 million iterations (quite real in your case):

 # "empty" loop: takes 10 seconds measure-command { for($i=0; $i -lt 10000000; ++$i) {} } # "simple" job, just output: takes 20 seconds measure-command { for($i=0; $i -lt 10000000; ++$i) { $i } } # "more real job": 107 seconds measure-command { for($i=0; $i -lt 10000000; ++$i) { $i.ToString() -match '1' } } 

UPDATE: If you are still not afraid, try using the .NET reader:

 $reader = [System.IO.File]::OpenText("my.log") try { for() { $line = $reader.ReadLine() if ($line -eq $null) { break } # process the line $line } } finally { $reader.Close() } 

UPDATE 2

There are comments about a possibly more / less short code. There is nothing wrong with the for source code, and this is not pseudo code. But a shorter (shortest?) Reading cycle option

 $reader = [System.IO.File]::OpenText("my.log") while($null -ne ($line = $reader.ReadLine())) { $line } 
+76
Nov 16 '10 at 8:53
source share
β€” -

System.IO.File.ReadLines() perfect for this scenario. It returns all the lines of the file, but allows you to start iterating along the lines, which means that it does not need to store all the contents in memory.

Requires .NET 4.0 or later.

 foreach ($line in [System.IO.File]::ReadLines($filename)) { # do something with $line } 

http://msdn.microsoft.com/en-us/library/dd383503.aspx

+46
Oct. 13
source share

If you want to use direct PowerShell, check out the code below.

 $content = Get-Content C:\Users\You\Documents\test.txt foreach ($line in $content) { Write-Host $line } 
+10
Jul 07
source share



All Articles