My goal is to parse a large XML file (20 GB) using Swift. NSXMLParser has some performance issues and transitions to Swift objects, so I am considering multithreading. In particular, the following unit:
- Main stream - analyzes data
- Workflow — Throws ObjC types into Swift types and sends 1. Casting ObjC NSDictionary to [String: String] is the biggest bottleneck. This is also the main reason for splitting into multiple threads.
- Workflow - parses XML in ObjC types - and sends it to 2. NSXMLParser is a push parser, once it starts parsing, you cannot pause it.
Data must be analyzed sequentially, so the input order must be maintained. My idea is to start NSRunLoop on both 1 and 2, allowing parallel processing without blocking. According to Apple's documentation, communication between threads can be achieved by calling performSelector:onThread:withObject:waitUntilDone:. However, this symbol is not available in Swift.
I don’t think GCD will work as a solution. Both workflows should be lengthy processes with new work arriving at random intervals.
How can one achieve the above (e.g. NSRunLoops on multiple threads) using Swift?
source
share