What is a good natural language library for paraphrasing?

I am looking for an existing library to generalize or paraphrase content (I am on blogs) - any experience working with existing natural language processing libraries?

I am open to various languages, so I am more interested in abilities and accuracy.

+10
language-agnostic nlp
Aug 24 '08 at 20:57
source share
5 answers

There was some discussion of Grock. This is now supported as OpenCCG, and it will also be redefined in OpenNLP.

You can find OpenCCG at http://openccg.sourceforge.net/ . I would also suggest the Curran and Clark CCG parser here: http://svn.ask.it.usyd.edu.au/trac/candc/wiki

Basically, for a paraphrase, you need to write something that first analyzes the sentences of the blog posts, extracts the semantic meaning of these posts, and then searches through the word space for words that will compositionally create the same semantic meaning, and then select the one that doesn't matches the current offer. It will take a lot of time, and it may not make much sense. Do not forget that for this you will need an almost perfect resolution of anaphora and the ability to choose conclusions at the discourse level.

If you just want to make blog entries that don’t have duplicate machine-identifiable content, you can always just use WordNet theme and focus conversions and synonyms. There definitely were sites that made money with AdWords that had done this before.

+9
Oct 10 '08 at 10:30
source share

I think he wants to generate blog entries, automatically paraphrasing everything that was in the blogs that this system controls.

It would be really interesting if you could combine from 2 to 10 blog posts similar to each other, but from different sources, and then make a rephrased "real" resume automatically (size of 1 blog post).

It is also great for homeworks. Unfortunately, this is not so easy to do.

The only way I could see was to decompose each sentence into a “value”, and then randomly change the structure of the sentence and some words that preserve the meaning.

These sentences mean the same thing:

  • I hate this guy, he's so dumb.
  • This guy is stupid, I hate him.
  • I despise this dumb guy.
  • He is dumb, I hate him.

It would be non-trivial to write a program to convert one of these sentences into others, and these are simple sentences, real sentences from blogs are much more complicated.

+5
Oct 09 '08 at 14:25
source share

You find yourself in a very remote domain like AI. I worked a lot on converting text into machine knowledge, mainly using Attempto Controlled English (see: http://attempto.ifi.uzh.ch/site/ ), it is a natural language (English) that is completely processed by a computer in several different ontologies such as OWLDL.

It looks like we will be crowded, though ...

Is there a reason not just to accept the first few sentences of your blog post and then add an ellipse for your resume?

0
Aug 24 '08 at 21:14
source share

Thanks for these links. It seems that GROK is dead - but it can still work for my purposes.

2 more links:

Potentially controlled English is an interesting concept: as a completely opposite view of the problem. Not very practical for what I'm trying to do.

@mmattax Regarding the proposal to take a few sentences - I am not trying to submit a resume: otherwise it would be nice judo . I am going to actually generalize the content for other evaluation purposes.

0
Sep 01 '08 at 4:24
source share

You might want to try GATE or the private, patented and expensive TextAnalyst COM API

0
Oct 09 '08 at 14:02
source share



All Articles