Where do these 1k threads come from

I am trying to create an application that multithreaded downloads images from a website as an introduction to streams. (never used thread before)

But at present it seems that 1000+ threads are being created, and I'm not sure where they are coming from.

First I put the thread queue in the thread pool, for starters I only have 1 job in the job array

foreach (Job j in Jobs) { ThreadPool.QueueUserWorkItem(Download, j); } 

What launches void Download(object obj) in a new stream, where it goes through a certain number of pages (images are needed / 42 images per page)

 for (var i = 0; i < pages; i++) { var downloadLink = new System.Uri("http://www." + j.Provider.ToString() + "/index.php?page=post&s=list&tags=" + j.Tags + "&pid=" + i * 42); using (var wc = new WebClient()) { try { wc.DownloadStringAsync(downloadLink); wc.DownloadStringCompleted += (sender, e) => { response = e.Result; ProcessPage(response, false, j); }; } catch (System.Exception e) { // Unity editor equivalent of console.writeline Debug.Log(e); } } } 

correct me if i am wrong the next void is called in the same thread

 void ProcessPage(string response, bool secondPass, Job j) { var wc = new WebClient(); LinkItem[] linkResponse = LinkFinder.Find(response).ToArray(); foreach (LinkItem i in linkResponse) { if (secondPass) { if (string.IsNullOrEmpty(i.Href)) continue; else if (i.Href.Contains("http://loreipsum.")) { if (DownloadImage(i.Href, ID(i.Href))) j.Downloaded++; } } else { if (i.Href.Contains(";id=")) { var alterResponse = wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href)); ProcessPage(alterResponse, true, j); } } } } 

And finally it goes to the last function and loads the actual image

 bool DownloadImage(string target, int id) { var url = new System.Uri(target); var fi = new System.IO.FileInfo(url.AbsolutePath); var ext = fi.Extension; if (!string.IsNullOrEmpty(ext)) { using (var wc = new WebClient()) { try { wc.DownloadFileAsync(url, id + ext); return true; } catch(System.Exception e) { if (DEBUG) Debug.Log(e); } } } else { Debug.Log("Returned Without a extension: " + url + " || " + fi.FullName); return false; } return true; } 

I'm not sure how I start this many threads, but would like to know.

Edit

The goal of this program is to load one task at a time (maximum 5), each of which loads a maximum of 42 images.

so a maximum of 210 images can / should be uploaded at most at any time.

+5
source share
2 answers

First of all, how did you measure the number of threads? Why do you think you have a thousand of them in your application? You use ThreadPool , so you do not create them yourself, and ThreadPool will not create so many of them for its needs.

Secondly, you mix synchronous and asynchronous operations in your code. Since you cannot use TPL and async/await , skip the code and count the unit-of-works that you create to minimize them. After that, the number of queued items in ThreadPool will decrease, and your application will receive the necessary performance.

  • You do not install the SetMaxThreads method in your application, so according to MSDN :

    Maximum number of thread pool threads
    The number of operations that can be queued in a thread pool is limited only by available memory; however, the thread pool limits the number of threads that can be active in the process at a time. By default, the limit is 25 worker threads per processor and 1000 threads complete I / O.

    So you should set a maximum of 5 .

  • I can’t find a place in your code where you check 42 images for one job, you only increase the value in the ProcessPage method.

  • Check ManagedThreadId for the descriptor WebClient.DownloadStringCompleted - it runs in another thread or not.
  • You add a new item to the ThreadPool queue, why are you using an asynchronous operation to load? Use synchronous overload , for example:

     ProcessPage(wc.DownloadString(downloadLink), false, j); 

    This will not create another element in the ThreadPool queue, and you will not have a synchronization context switch.

  • In ProcessPage your wc variable does not collect garbage, so you do not free all your resources here. Add a using statement here:

     void ProcessPage(string response, bool secondPass, Job j) { using (var wc = new WebClient()) { LinkItem[] linkResponse = LinkFinder.Find(response).ToArray(); foreach (LinkItem i in linkResponse) { if (secondPass) { if (string.IsNullOrEmpty(i.Href)) continue; else if (i.Href.Contains("http://loreipsum.")) { if (DownloadImage(i.Href, ID(i.Href))) j.Downloaded++; } } else { if (i.Href.Contains(";id=")) { var alterResponse = wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href)); ProcessPage(alterResponse, true, j); } } } } } 
  • In the DownloadImage method, you also use asynchronous loading. This also adds the item to the ThreadPoll queue, and I think you can avoid this and use synchronous overload :

     wc.DownloadFile(url, id + ext); return true; 

So, in general, avoid context switching operations and properly manage resources.

+2
source

Your wc WebClinet will go out of scope and will be randomly garbage collected before the async callback. Also in all asynchronous calls, you must enable immediate return and return the actual delegated function. Therefore, ProcessPage must be in two places. In addition, j in the source loop may go out of scope, depending on where Download is declared in the source loop.

0
source

All Articles