This seems like an interesting issue. Be that as it may, I had the impression that it was not easy to guess the generator from it the generated sequence of bits. What you can get is a model, which may or may not be a close approximation to the original generator. The approximation will be closer when a large number of generated sequences are processed.
A simple method would be to create a parsing tree and create a dictionary in each part of the tree.
Something like that:
Abstract |--------| |Ambient , Anisotropic,(Approximation, Attenuation) | of | xxxx yyyy | | using for
xxxx → dictionary list
yyyy → dictionary list
source share