How to avoid threads?

Question

How to avoid threads?

Recently I read a lot about how to write multi-threaded applications - this is a huge pain in the neck, and they have learned enough about the topic to understand, at least at some level, why this is so.

I read that using functional programming techniques can help ease some of this pain, but I have never seen a simple example of functional code that is parallel. So what are the alternatives to using threads? At least there are some ways to abstract them, so you don’t need to think about things like locking, and whether particular library objects are thread safe.

I know that Google MapReduce should help with the problem, but I did not see a brief explanation of it.

Although I give a specific example below, I'm more curious about general methods than solving this specific problem (using an example to illustrate other methods would be useful, though).

I came up with a question when I wrote a simple web crawler as a training exercise. It works very well, but it is slow. Most of the bottleneck comes from loading pages. It is currently single-threaded, and thus it only loads one page at a time. Thus, if pages can be loaded at the same time, this will significantly speed up the process, even if the crawler runs on the same processor. I was looking to use threads to solve the problem, but it scares me. Any suggestions on how to add concurrency to this type of problem without unleashing a terrible nightmare?

+6

language-agnostic multithreading concurrency

Tristan havelick Dec 19 '08 at 10:54

source share

14 answers

The reason functional programming helps with concurrency is not because it avoids the use of threads.

Instead, functional programming preaches immutability and lack of side effects.

This means that the operation can be reduced to N number of threads or processes without worrying about messing with the general state.

+22

Flyswat Dec 19 '08 at 23:01

source share

In fact, threads are pretty easy to use until you need to synchronize them. Usually you use threadpool to add a task and wait for it to finish.

When threads need to interact and access common data structures, multithreading becomes very complex. Once you have two locks, you can get locks, and here multithreading becomes very difficult. Sometimes a lock code can be erroneous with just a few instructions. In this case, you could only see errors in production, on multi-core machines (if you developed on the same core what happened to me), or they can be caused by some other equipment or software. Testing modules here does not help much, testing detects errors, but you can never be as confident as in normal applications.

+9

bh213 Dec 19 '08 at 23:11

source share

There are a few brief mentions of asynchronous models, but no one has explained this, so I thought I would call back. The most common method I've seen as an alternative to multithreading is asynchronous architectures. All that actually means is that instead of sequentially executing the code in one thread, you use the polling method to run some functions, and then return and periodically check until the data appears.

It really only works on models like your aforementioned crawler, where the real bottleneck is I / O, not the processor. In broad strokes, the asynchronous approach initiates the download on several sockets, and the polling cycle periodically checks to see if the download is complete, and when this is done, we can move on to the next step. This allows you to run multiple downloads waiting on the network by switching contexts within the same thread.

A multi-threaded model will work the same, except for using a separate thread, rather than a polling cycle that checks for multiple sockets in a single thread. In an I / O-bound application, asynchronous polling works in much the same way as streaming for many use cases, since the real problem is simply waiting for the I / O to complete and not so much for the processor to wait for data processing.

Another example of the real world is for a system that was supposed to execute a number of other executable files and wait for results. This can be done in threads, but it is also much simpler and almost as effective to simply disable several external applications as process objects, and then periodically check them until they are fully executed. This puts the processor intensive components (executable code in external executables) into their own processes, but data processing is processed asynchronously.

Python ftp server lib I'm working on, pyftpdlib uses the Python asynchronous library to handle serving FTP clients with only one thread, asynchronous socket communication for file transfers and command / response.

See more information on the Python Twisted library page on Asynchronous Programming - although somewhat specific to using Twisted, it also introduces asynchronous programming from a newbie.

+4

Jay Dec 20 '08 at 1:07

source share

Concurrency is a rather complex subject in computer science, which requires a good understanding of the hardware architecture, as well as the behavior of the operating system.

Multithreading has many implementations based on your hardware and your hosting operating system, and as hard as it is now, there are many pitfalls. It should be noted that to achieve "true" concurrency flows is the only way. Essentially, threads are the only way for you, as a programmer, to share resources between different parts of your software, allowing them to work in parallel. In parallel, you should be aware that a standard CPU (with dual / multiple cores aside) can only do one at a time. Concepts, such as context switching, now come into play, and they have their own set of rules and restrictions.

I think you should look for a more general background on this subject, as you say, before embarking on the implementation of concurrency in your program.

I think the best place to start is wikipedia article on concurrency and go from there.

+1

Yuval Adam Dec 19 '08 at 23:00

source share

What usually makes multithreaded programming such a nightmare when threads share resources and / or need to communicate with each other. If you load web pages, your streams will work independently, so you may not have big problems.

One thing you can consider is creating multiple processes, not multiple threads. In the case where you mention - while loading web pages, you can divide the workload into several pieces and drop each fragment into a separate instance of the tool (for example, cURL ) to do the job.

+1

Parappa Dec 19 '08 at 11:04

source share

If your goal is to achieve concurrency, it will be difficult to get away from using multiple threads or processes. The trick is not to avoid this, but to manage it in such a way as to be reliable and error prone. Dead ends and race conditions, in particular, are two aspects of parallel programming that are easily mistaken. One general approach to managing this is to use the producer / consumer queue ... threads write work items to the queue, and workers extract items from it. You must make sure that you synchronized access to the queue correctly and that you are configured.

Also, depending on your problem, you can also create a domain-specific language that fixes concurrency problems, at least from the point of view of the person using your language ... of course, the engine that processes the language still has to process concurrency, but if it will be used by many users, it can make a difference.

+1

DSO Dec 19 '08 at 23:16

source share

There are some good libraries there.

java.util.concurrent.ExecutorCompletionService will take a Futures collection (i.e. tasks that return values), process them in the background thread, then drag them into the queue so that you can continue the process as they complete. Of course, this is Java 5 and later, so it is not available everywhere.

In other words, all of your code is single-threaded, but where you can identify a safe file to work in parallel, you can compile it into a suitable library.

The point is, if you can make tasks independent, then thread safety does not seem impossible with a little thought - although it is strongly recommended that you leave a complex bit (for example, implementing an ExecutorCompletionService) to an expert ...

+1

Bill michell Feb 19 '09 at 12:42

source share

One easy way to avoid streaming in your simple scenario is to load from different processes. The main process will call other processes with parameters that will upload files to the local directory. And then the main process can do the real work.

I do not think that there are any simple solutions to these problems. This is not a problem with threads. Its concurrency that inhibits the human mind.

0

Igal Serban Dec 19 '08 at 23:02

source share

You can watch the MSDN video in F #: PDC 2008: an introduction to F #

This includes the two things you are looking for. (Functional + Asynchronous)

0

Tom wijsman Dec 19 '08 at 23:17

source share

For python, this looks like an interesting approach: http://members.verizon.net/olsongt/stackless/why_stackless.html#introduction

0

Tristan havelick Dec 19 '08 at 23:29

source share

Threads should not be avoided and not "difficult." Functional programming is also not necessarily the answer. The .NET framework makes streams pretty simple. With a little thought you can make smart multi-threaded programs.

Here is an example of your web browser (in VB.NET)

 Imports System.Threading Imports System.Net Module modCrawler Class URLtoDest Public strURL As String Public strDest As String Public Sub New(ByVal _strURL As String, ByVal _strDest As String) strURL = _strURL strDest = _strDest End Sub End Class Class URLDownloader Public id As Integer Public url As URLtoDest Public Sub New(ByVal _url As URLtoDest) url = _url End Sub Public Sub Download() Using wc As New WebClient() wc.DownloadFile(url.strURL, url.strDest) Console.WriteLine("Thread Finished - " & id) End Using End Sub End Class Public Sub Download(ByVal ud As URLtoDest) Dim dldr As New URLDownloader(ud) Dim thrd As New Thread(AddressOf dldr.Download) dldr.id = thrd.ManagedThreadId thrd.SetApartmentState(ApartmentState.STA) thrd.IsBackground = False Console.WriteLine("Starting Thread - " & thrd.ManagedThreadId) thrd.Start() End Sub Sub Main() Dim lstUD As New List(Of URLtoDest) lstUD.Add(New URLtoDest("http://stackoverflow.com/questions/382478/how-can-threads-be-avoided", "c:\file0.txt")) lstUD.Add(New URLtoDest("http://stackoverflow.com/questions/382478/how-can-threads-be-avoided", "c:\file1.txt")) lstUD.Add(New URLtoDest("http://stackoverflow.com/questions/382478/how-can-threads-be-avoided", "c:\file2.txt")) lstUD.Add(New URLtoDest("http://stackoverflow.com/questions/382478/how-can-threads-be-avoided", "c:\file3.txt")) lstUD.Add(New URLtoDest("http://stackoverflow.com/questions/382478/how-can-threads-be-avoided", "c:\file4.txt")) lstUD.Add(New URLtoDest("http://stackoverflow.com/questions/382478/how-can-threads-be-avoided", "c:\file5.txt")) lstUD.Add(New URLtoDest("http://stackoverflow.com/questions/382478/how-can-threads-be-avoided", "c:\file6.txt")) lstUD.Add(New URLtoDest("http://stackoverflow.com/questions/382478/how-can-threads-be-avoided", "c:\file7.txt")) lstUD.Add(New URLtoDest("http://stackoverflow.com/questions/382478/how-can-threads-be-avoided", "c:\file8.txt")) lstUD.Add(New URLtoDest("http://stackoverflow.com/questions/382478/how-can-threads-be-avoided", "c:\file9.txt")) For Each ud As URLtoDest In lstUD Download(ud) Next ' you will see this message in the middle of the text ' pressing a key before all files are done downloading aborts the threads that aren't finished Console.WriteLine("Press any key to exit...") Console.ReadKey() End Sub End Module

0

user21826 Dec 20 '08 at 12:30

source share

Use Twisted. "Twisted is an event driven mechanism written in Python" http://twistedmatrix.com/trac/ . With it, I could execute 100 asynchronous HTTP requests at a time without using threads.

0

yogman Dec 20 '08 at 12:54

source share

Your specific example is rarely solved with multithreading. As many have said, this class of problems is related to IO binding, which means that the processor has very little work, and spends most of the time getting some data to the wire and processing it, and similarly it should wait for disk buffers to clean so that it can put more recently downloaded data to disk.

The performance method is a select () function or equivalent system call. The main process is to open several sockets (for downloading the web crawler) and file descriptors (for storing them on disk). Then you set all the different sockets and fh to non-blocking mode, which means that instead of waiting for your program to read data after the request is issued, it will immediately return using a special code (usually EAGAIN) to indicate that the data is not ready. If you go through all the sockets this way, you will poll what works well, but still remains a waste of processor resources, because your reads and writes will almost always come back with EAGAIN.

To get around this, all sockets and fp will be collected in "fd_set", which is passed to the select system call, then your program will block, waiting for ANY of the sockets and wake up your program when there is some data on any of the threads for processing.

Another general case related to computing is, without a doubt, best addressed with some kind of true parallelism (as described above for the asynchronous concurrency presented above) to access the resources of several processors. In case your processor-related task runs on the same arsenal of threads, definitely avoid any concurrency, since overhead actually slows down your task.

0

Tokenmacguy Dec 26 '08 at 23:01

source share

Joel Coehoorn · Accepted Answer · 2008-12-19T23:20:24+0000

I will add an example of how functional code can be used to securely match code.

Here is what code you can do in parallel, so you don’t have to wait until one file finishes to start downloading the following:

void DownloadHTMLFiles(List<string> urls) { foreach(string url in urls) { DownlaodOneFile(url); //download html and save it to a file with a name based on the url - perhaps used for caching. } }

If you have several files, the user can spend a minute or more waiting for them all. We can rewrite this code in a way like this, and basically does the same thing:

 urls.ForEach(DownloadOneFile);

Note that this is still done sequentially. However, it is not only shorter, but here we have gained an important advantage. Since each call to the DownloadOneFile function is completely isolated from others (for our purposes, the available bandwidth is not a problem), you can very easily change the ForEach function to another very similar function: one that launches each DownlaodOneFile call to a separate stream from the stream.

It turns out that the network has such a function using Parallel Extensions . Thus, using functional programming, you can change one line of code and suddenly run something in parallel, which is used to run sequentially. It is quite powerful.

How to avoid threads?

More articles: