Detection of "patterns" in the given text?

Question

Detection of "patterns" in the given text?

If I have a significant amount of text and try to find patterns that occur most often, I thought about solving it using the N-Gram approach, and in fact it was proposed as a solution in this , but my requirement is slightly different. To clarify, I have text like this:

I wake up every day morning and read the newspaper and then go to work
I wake up every day morning and eat my breakfast and then go to work
I am not sure that this is the solution but I will try
I am not sure that this is the answer but I will try
I am not feeling well today but I will get the work done and deliver it tomorrow
I was not feeling well yesterday but I will get the work done and let you know by tomorrow

and trying to extract the "patterns" as follows:

I wake up every day morning and ... and then go to work
I am not sure that this is the ... but I will try
I ... not feeling well ... but I will get the work done and ... tomorrow

I'm looking for an approach that can scale up to a million lines of text, so I'm just wondering if I can adapt the same N-gram approach to solve this problem, or are there any alternatives?

+5

language-agnostic machine-learning nlp nltk data-mining

Legend Jun 29 '11 at 21:07

1

Fred Foo · Accepted Answer · 2011-06-29T21:24:11+0000

:)

, , , . n-. . Manning and Schütze (1999) .

Detection of "patterns" in the given text?

More articles: