How to guess the grammar of a list of sentences generated in some way?

Question

How to guess the grammar of a list of sentences generated in some way?

I have lost sentences expressed from http://www.ywing.net/graphicspaper.php , a random computer graphics paper headline generator, some of the example sentences listed are sorted as follows:

Abstract ambient occlusion using texture mapping
Abstract mapping of the texture of the surrounding world
Abstract anisotropic soft shadows.
Abstract approximation
Abstract approximation of adaptive soft shadows using Culling
Abstract approximation of ambient occlusion using hardware-accelerated clustering
Abstract approximation of distributed surfaces using estimation
Abstract geometry approximation of textured environmental occlusion
Abstract approximation of mipmaps for opacity
Abstract approximation of occlusal fields for scattering of the subsurface layer
Abstract soft shadow approximation using reflective texturing
Abstract custom rendering
Abstract attenuation and moving geometry maps
Abstract attenuation of ambient occlusion using image-dependent texture matching.
Abstract attenuation of light fields for mipmaps
Abstract Attenuation of nonlinear occlusion of the environment
Abstract attenuation of pre-computed mipmaps using Re-meshing
-...

I would like to try reverse grammar designing and learn how to do it somehow, as usual, with the lisp method or the NLTK method. Any ideas on this?

- Drake

+4

python lisp nlp

Drake Sep 11 '10 at 5:01

source share

3 answers

You might be interested in Alignment Based Learning from Menno van Zaanen. Many years have passed since I read his articles, but the main idea is

find a common substring
assign him a grammar rule
rewrite text to use this rule
check if the rewritten text + grammar is shorter than the source text.

Run this for all combinations of all regular substrings to find the best grammar.

This is a bit like an optimal compression algorithm. Theory underlying it, Minimum description length .

+1

Nathan shively-sanders Sep 11 '10 at 18:09

source share

There are approaches to the study of grammar of the language, taking into account a number of sentences based on genetic programming. For example, Studying context-free grammars using an evolutionary approach .

Wikipedia also lists some other approaches.

0

dmitry_vk Sep 11 '10 at 16:59

source share

pyfunc · Accepted Answer · 2010-09-11T06:21:59+0000

This seems like an interesting issue. Be that as it may, I had the impression that it was not easy to guess the generator from it the generated sequence of bits. What you can get is a model, which may or may not be a close approximation to the original generator. The approximation will be closer when a large number of generated sequences are processed.

A simple method would be to create a parsing tree and create a dictionary in each part of the tree.

Something like that:

Abstract |--------| |Ambient , Anisotropic,(Approximation, Attenuation) | of | xxxx yyyy | | using for

xxxx → dictionary list

yyyy → dictionary list

How to guess the grammar of a list of sentences generated in some way?

-...

More articles: