Detection of "patterns" in the given text?

If I have a significant amount of text and try to find patterns that occur most often, I thought about solving it using the N-Gram approach, and in fact it was proposed as a solution in this , but my requirement is slightly different. To clarify, I have text like this:

I wake up every day morning and read the newspaper and then go to work
I wake up every day morning and eat my breakfast and then go to work
I am not sure that this is the solution but I will try
I am not sure that this is the answer but I will try
I am not feeling well today but I will get the work done and deliver it tomorrow
I was not feeling well yesterday but I will get the work done and let you know by tomorrow

and trying to extract the "patterns" as follows:

I wake up every day morning and ... and then go to work
I am not sure that this is the ... but I will try
I ... not feeling well ... but I will get the work done and ... tomorrow

I'm looking for an approach that can scale up to a million lines of text, so I'm just wondering if I can adapt the same N-gram approach to solve this problem, or are there any alternatives?

+5
1

:)

, , , . n-. . Manning and Schütze (1999) .

+5

All Articles