How to guess the grammar of a list of sentences generated in some way?

I have lost sentences expressed from http://www.ywing.net/graphicspaper.php , a random computer graphics paper headline generator, some of the example sentences listed are sorted as follows:


  • Abstract ambient occlusion using texture mapping
  • Abstract mapping of the texture of the surrounding world
  • Abstract anisotropic soft shadows.
  • Abstract approximation
  • Abstract approximation of adaptive soft shadows using Culling
  • Abstract approximation of ambient occlusion using hardware-accelerated clustering
  • Abstract approximation of distributed surfaces using estimation
  • Abstract geometry approximation of textured environmental occlusion
  • Abstract approximation of mipmaps for opacity
  • Abstract approximation of occlusal fields for scattering of the subsurface layer
  • Abstract soft shadow approximation using reflective texturing
  • Abstract custom rendering
  • Abstract attenuation and moving geometry maps
  • Abstract attenuation of ambient occlusion using image-dependent texture matching.
  • Abstract attenuation of light fields for mipmaps
  • Abstract Attenuation of nonlinear occlusion of the environment
  • Abstract attenuation of pre-computed mipmaps using Re-meshing

    -...

I would like to try reverse grammar designing and learn how to do it somehow, as usual, with the lisp method or the NLTK method. Any ideas on this?

- Drake

+4
source share
3 answers

This seems like an interesting issue. Be that as it may, I had the impression that it was not easy to guess the generator from it the generated sequence of bits. What you can get is a model, which may or may not be a close approximation to the original generator. The approximation will be closer when a large number of generated sequences are processed.

A simple method would be to create a parsing tree and create a dictionary in each part of the tree.

Something like that:

Abstract |--------| |Ambient , Anisotropic,(Approximation, Attenuation) | of | xxxx yyyy | | using for 

xxxx → dictionary list

yyyy → dictionary list

0
source

You might be interested in Alignment Based Learning from Menno van Zaanen. Many years have passed since I read his articles, but the main idea is

  • find a common substring
  • assign him a grammar rule
  • rewrite text to use this rule
  • check if the rewritten text + grammar is shorter than the source text.

Run this for all combinations of all regular substrings to find the best grammar.

This is a bit like an optimal compression algorithm. Theory underlying it, Minimum description length .

+1
source

There are approaches to the study of grammar of the language, taking into account a number of sentences based on genetic programming. For example, Studying context-free grammars using an evolutionary approach .

Wikipedia also lists some other approaches.

0
source

All Articles