Dispatch_sync vs. dispatch_async in the main queue

Bear with me, this will take some explanation. I have a function that looks like below.

Context: "aProject" is a Core Data object named LPProject with an array named "memberFiles" that contains instances of another Core Data object called LPFile. Each LPFile represents a file on disk, and we want to open each of these files and analyze its text by looking for @import instructions pointing to OTHER files. If we find the @import operators, we want to find the file that they point to, and then β€œlink” this file to this, adding a relation to the main data object that the first file represents. Since all this may take some time on large files, we will do this from the main stream using GCD.

- (void) establishImportLinksForFilesInProject:(LPProject *)aProject { dispatch_queue_t taskQ = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); for (LPFile *fileToCheck in aProject.memberFiles) { if (//Some condition is met) { dispatch_async(taskQ, ^{ // Here, we do the scanning for @import statements. // When we find a valid one, we put the whole path to the imported file into an array called 'verifiedImports'. // go back to the main thread and update the model (Core Data is not thread-safe.) dispatch_sync(dispatch_get_main_queue(), ^{ NSLog(@"Got to main thread."); for (NSString *import in verifiedImports) { // Add the relationship to Core Data LPFile entity. } });//end block });//end block } } } 

Now where things get weird:

This code works, but I see a strange problem. If I run it on LPProject with several files (about 20), it works fine. However, if I run it on LPProject, which has more files (say 60-70), it correctly performs NOT . We never return to the main thread, NSLog(@"got to main thread"); never appears and the application freezes. BUT, (and this is where things get REALLY weird) --- if I run the code in a small FIRST project and THEN run it on a large project, everything works fine. This is ONLY when I first run the code in a large project that a problem arises.

And here is the kicker, if I change the second send line to this:

 dispatch_async(dispatch_get_main_queue(), ^{ 

(That is, use async instead of sync to send the block to the main queue), everything works all the time. In excellence. Regardless of the number of files in the project!

I find it difficult to explain this behavior. Any help or advice on what to test next will be appreciated.

+52
objective-c cocoa core-data objective-c-blocks grand-central-dispatch
Jun 30 '11 at 17:31
source share
3 answers

This is a common problem with disk I / O and GCD. Basically, a GCD probably spawns one thread for each file, and at some point you have too many threads to serve the system in a reasonable amount of time.

Every time you call dispatch_async (), and in this block you are trying to use any I / O (for example, it looks like you are reading some files here), there is probably a thread in which this block of code is blocked (the OS pauses), while he expects data to be read from the file system. The way GCD works is that when it sees that one of its workflows is blocked on I / O, and you still ask it to do more work at the same time, it just spawns a new workflow. Thus, if you try to open 50 files in a parallel queue, you will probably end up with GCD generating ~ 50 streams.

These are too many threads to fully service the system, and you end up starving for the main thread for the CPU.

The way to fix this is to use a serial queue instead of a parallel queue to perform your file operations. It is easy to do. You will want to create the next queue and save it as ivar in your object, so as not to create several consecutive queues. So remove this call:

dispatch_queue_t taskQ = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);

Add this to your init method:

taskQ = dispatch_queue_create("com.yourcompany.yourMeaningfulLabel", DISPATCH_QUEUE_SERIAL);

Add this to your dealloc method:

dispatch_release(taskQ);

And add this as ivar to the class declaration:

dispatch_queue_t taskQ;

+53
Jul 01 '11 at 1:16
source share

I believe Ryan is on the right track: there are too many threads that occur when a project has 1,500 files (the amount I decided to check with.)

So, I reworked the above code to work as follows:

 - (void) establishImportLinksForFilesInProject:(LPProject *)aProject { dispatch_queue_t taskQ = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); dispatch_async(taskQ, ^{ // Create a new Core Data Context on this thread using the same persistent data store // as the main thread. Pass the objectID of aProject to access the managedObject // for that project on this thread context: NSManagedObjectID *projectID = [aProject objectID]; for (LPFile *fileToCheck in [backgroundContext objectWithID:projectID] memberFiles]) { if (//Some condition is met) { // Here, we do the scanning for @import statements. // When we find a valid one, we put the whole path to the // imported file into an array called 'verifiedImports'. // Pass this ID to main thread in dispatch call below to access the same // file in the main thread context NSManagedObjectID *fileID = [fileToCheck objectID]; // go back to the main thread and update the model // (Core Data is not thread-safe.) dispatch_async(dispatch_get_main_queue(), ^{ for (NSString *import in verifiedImports) { LPFile *targetFile = [mainContext objectWithID:fileID]; // Add the relationship to targetFile. } });//end block } } // Easy way to tell when we're done processing all files. // Could add a dispatch_async(main_queue) call here to do something like UI updates, etc });//end block } 

So basically, we now create one stream that reads all files instead of one stream per file. In addition, it turns out that calling dispatch_async () on main_queue is the right approach: the worker thread will send this block to the main thread and will NOT wait for it to return before scanning the next file.

This implementation essentially sets up a β€œsequential” queue, as Ryan suggested (the for loop is its sequential part), but with one advantage: when the for loop ends, we finished processing all the files, and we can just insert the dispatch_async (main_queue) block so that Do whatever we want. This is a very good way to tell when the parallel processing task is completed and which does not exist in my old version.

The downside here is that it is a bit more difficult to work with Core Data for multiple threads. But this approach seems bulletproof for projects with 5000 files (this is the highest level I tested).

+5
Jul 01 '11 at 18:26
source share

I think the diagram is easier to understand:

In this situation, the author described:

| taskQ | *********** start |

| dispatch_1 *********** | ---------

| dispatch_2 ************* | ---------

.

| dispatch_n *** ----------

| main line (synchronization) | ** start sending home |

************************** | --dispatch_1-- | --dispatch_2-- | --dispatch3-- | *** *************************** | --dispatch_n |,

which make the main synchronization queue so busy that it finally fails to complete the task.

0
Feb 12 '15 at 1:22
source share



All Articles