I haven't used my own DAWG suggestion, and I'm still curious to know if anyone has a way to do this, but I was able to come up with this:
Option Explicit Sub test() Dim ABC As Scripting.Dictionary Dim v As Range Dim n As Integer n = 5 Set ABC = FindRepeatingWordChains(n, ActiveDocument) ' This is a dictionary of word ranges (not the same as an Excel range) that contains the listing of each word chain/phrase of length n (5 from the above example). ' Loop through this collection to make your selections/highlights/whatever you want to do. If Not ABC Is Nothing Then For Each v In ABC v.Font.Color = wdColorRed Next v End If End Sub ' This is where the real code begins. Function FindRepeatingWordChains(ChainLenth As Integer, DocToCheck As Document) As Scripting.Dictionary Dim DictWords As New Scripting.Dictionary, DictMatches As New Scripting.Dictionary Dim sChain As String Dim CurWord As Range Dim MatchCount As Integer Dim i As Integer MatchCount = 0 For Each CurWord In DocToCheck.Words ' Make sure there are enough remaining words in our document to handle a chain of the length specified. If Not CurWord.Next(wdWord, ChainLenth - 1) Is Nothing Then ' Check for non-printing characters in the first/last word of the chain. ' This code will read a vbCr, etc. as a word, which is probably not desired. ' However, this check does not exclude these 'words' inside the chain, but it can be modified. If CurWord <> vbCr And CurWord <> vbNewLine And CurWord <> vbCrLf And CurWord <> vbLf And CurWord <> vbTab And _ CurWord.Next(wdWord, ChainLenth - 1) <> vbCr And CurWord.Next(wdWord, ChainLenth - 1) <> vbNewLine And _ CurWord.Next(wdWord, ChainLenth - 1) <> vbCrLf And CurWord.Next(wdWord, ChainLenth - 1) <> vbLf And _ CurWord.Next(wdWord, ChainLenth - 1) <> vbTab Then sChain = CurWord For i = 1 To ChainLenth - 1 ' Add each word from the current word through the next ChainLength
I tested this on a 258-page document ( TheStory.txt ) from this source , and it started in just a few minutes.
See the test() section for use.
To use Scripting.Dictionary objects, you will need to reference the runtime of the Microsoft Scripting Runtime scripts. If this is undesirable, small modifications can be made to use Collections instead, but I prefer Dictionary as it has a useful .Exists() method.
Gaffi
source share