Should we make nested throats?

I am trying to create a parser for a huge number of files, and I cannot find a ressource about what I can call "nested goroutines" (maybe this is not the correct name?).

Given the large number of files, each of them has many lines. Should I do:

for file in folder: go do1 def do1: for line in file: go do2 def do2: do_something 

Or should I use only one level goroutines and do the following:

 for file in folder: for line in file: go do_something 

My question is primarily about performance issues.

Thank you for coming to this offer!

+8
source share
3 answers

If you go through the architecture you specified, you have a good chance of exiting CPU / Mem / etc, because you are going to create an arbitrary number of workers. I suggest, instead, go with an architecture that allows throttling through channels. For instance:

In the main process, submitting files to the channel:

 for _, file := range folder { fileChan <- file } 

then in another goroutine break the files into lines and transfer them to the channel:

 for { select{ case file := <-fileChan for _, line := range file { lineChan <- line } } } 

then in the third larynx pop out the lines and do what you will with them:

 for { select{ case line := <-lineChan: // process the line } } 

The main advantage of this is that you can create as many or several running procedures that your system can process and transmit to them all the same channels, and depending on which procedure is launched on the channel, it will process it first, so that you can throttle the amount of resources you use.

Here is a working example: http://play.golang.org/p/-Qjd0sTtyP

+7
source

The answer depends on how intensively the processor runs on each line.

If the line operation is short-lived, definitely don't bother to spawn a goroutine for each line.

If it is expensive (think ~ 5 seconds or more), proceed with caution. You may have a lack of memory. Starting with Go 1.4, goroutine spawning allocates a stack of 2048 bytes. For 2 million lines, you can allocate more than 2 GB of RAM for goroutine stacks. Think about whether to allocate this memory.

In short, you are likely to get the best results with the following setting:

 for file in folder: go process_file(file) 

If the number of files exceeds the number of processors, you will probably have enough concurrency to mask the disk I / O latency when reading files from the disk.

+1
source

cannot comment because of reputation, but thought this post was appropriate: https://medium.com/@vigneshsk/how-to-write-high-performance-code-in-golang-using-go-routines- 227edf979c3c

0
source

All Articles