First, consider the structure of your xml. By following this link, you can find criteria for an XML structure suitable for parallel processing. Concurrent XML parsing in Java
If your xml structure is parallel to the process, then a few ideas:
As I know, parsing XML requires a stack structure in order to remember the current position in the tree and verify that the nodes open and close correctly.
The stack structure can be represented as a one-dimensional array with a stack pointer. The stack pointer contains the position of the top stack element in the array.
They say that you can store arrays in 1D textures (maximum 4096 elements). Or in 2D textures (maximum 16,777,216 = 4,096x4,096 elements) ... See the following link for more information https://developer.nvidia.com/gpugems/GPUGems2/gpugems2_chapter33.html
if you assign a separate floating point number to each unique element name, then you can store elements as numbers
if you take the input text as an array of ascii / utf-8 codes, then why not save them as an array of floating point numbers?
The last thing that is important for using the GPU is the output structure.
If you need, for example, a column of fixed-length columns, then it is only about how to present such a structure in a 1D or 2D array of floating point numbers
When you are confident in the previous moments, and the GPU is right for you, just write functions to convert your data into textures and textures back to your data.
And then, of course, the whole XML parser ...
I have never tried programming using the GPU, but very soon it seems to me that something is impossible ...
Someone should be the first to create the whole algorithm and try to use the GPU efficiently or not.