Delphi spell correction code?

This question should discuss how to encode the spell corrector and not duplicate the Delphi Spell Checker .

Two years ago, I found and used Peter Norwig's spell corrector code on my website in Python. But the performance seemed low. Interestingly, several more languages ​​that perform the same task have been added to your list of web pages recently.

Some lines on the Peter page include syntax like:

[a + c + b for a, b in splits for c in alphabet] 

How to translate it to delphi?

I am interested in how a Delphi expert in SO will use the same theory and perform the same task with some suitable lines and possible mediocre or better performance. This is not to lower any language, but to learn how to compare how they implement the task in different ways.

Thanks for that in advance.

[change]

I will cite Marcelo Toledo , who contributes to version C, saying: "... Although the purpose of this article [C version] was to show the algorithms, not to highlight Python ...". Although its version of C has the second line, according to his article, its version is characterized by high performance when the dictionary file is huge. Therefore, this question does not highlight any language, but requires a delphi solution, and it is not at all intended for competition, although Peter has an influence on the leadership of Google Research.

[Update]

I was enlightened by David's suggestion and studied the theory and routine of Peter's page. A very crude and ineffective procedure has been done, slightly different from other languages, mine is a graphical interface. I am new to and a student at Delphi, I dare not publish my full code (it is poorly written). I will describe my idea of ​​how I did it. Your comment is welcome, so the procedure will be improved.

My hardware and software are out of date. This is enough for my work (my specialization is not related to a computer or program)

 AMD Athlon Dual Core Processor 2.01 Ghz, 480 Memory Windows XP SP2 IDE Delphi 7.0 

This is a snapshot and a record of the processing time of the β€œcorrect” word. I tried Gettickcount, Tdatetime and Queryperformancecounter to keep track of the correct time for the word, but gettickcount and Tdatetime will output o ms for each check, so I have to use QueryPerformanceCounter. Perhaps there are other ways to do this more accurately.

The general lines are 72, not including a function that registers the time of verification. The number of rows may not meet the criteria indicated above by Marcelo. The post discusses how to perform the task in different ways. Delphi's SO experts will, of course, use minimal lines to achieve maximum performance.

Spell checker

 procedure Tmajorform.FormCreate(Sender: TObject); begin loaddict; end; procedure Tmajorform.loaddict; var fs: TFilestream; templist: TStringlist; p1: tperlregex; w1: string; begin //load that big.txt (6.3M, is Adventures of Sherlock Holmes) //templist.loadfromstream //Use Tperlregex to tokenize ( I used regular expression by [Jan Goyvaerts][5]) //The load and tokenize time is about 7-8 seconds on my machine, Maybe there are other ways to //speed up loading and tokenizing. end; procedure Tmajorform.edits1(str: string); var i: integer; ch: char; begin // This is to simulate Peter page in order to fast generate all possible combinations. // I do not know how to use set in delphi. I used array. // Peter said his routine edits1 would generate 494 elements of 'something'. Mine will // generate 469. I do not know why. Before duplicate ignore, mine is over 500. After setting // duplicate ignore, there are 469 unique elements for 'something'. end; procedure Tmajorform.correct(str: string); var i, j: integer; begin //This is a loop and binary search to add candidate word into list. end; procedure Tmajorform.Button2Click(Sender: TObject); var str: string; begin // Trigger correct(str: string); end; 

It seems that Tfilestream can increase the load by 1-2 seconds. I tried using the CreateFileMapping method, but failed, and it seemed a bit complicated. Perhaps there are other ways to quickly download a huge file. Since this big.txt file will not be large, given the availability of the enclosure, there should be a more efficient way to download a larger and larger file.

Another point: Delphi 7.0 does not have a built-in regular expression. I look at other languages ​​that do spell checking on the Perter page, they pretty much directly call their inline regular expression. Of course, a real expert does not need any built-in class or library and can independently build it. For beginners, some classes or libraries are convenient.

Your comment is welcome.

[Update]

I continued the study and, in addition, turned on the edits2 function (edited distance 2). This will increase by about 12 lines of code. Peter said that editing distance 2 would include almost all features. "something" will have 114,324 possibilities. My function will generate 102,727 UNIQUE opportunities for this. Of course, the proposed words will also include more.

If using edits2, the response time for correction is clearly delayed, since it increases the data by about 200 times. But I believe that some of the proposed corrections are obviously impossible, because the driver will not enter the error word, which will be in the long adjusted list of words. Thus, editing distance 1 would be better if the big.txt file is large enough to include more correct words.

Below is a snapshot of the edit to track 2 correct times.

enter image description here

+8
delphi
source share
1 answer

This is an understanding of Python. It forms the Cartesian product of partitions and alphabets.

Each splitting element is a tuple that is unpacked into a and b. Each element of the alphabet is placed in the variable c. Then 3 variables are combined, assuming they are strings. The result of a list recognition expression is a list containing elements of the form a + c + b, one element for each element in a Cartesian product.

In Python, it can be written equivalently as

 res = [] for a, b in splits: for c in alphabets: res.append(a + c + b) 

In Delphi it will be

 res := TStringList.Create; for split in splits do for c in alphabets do res.Add(split.a + c + split.b); 

I suggest you read the Python list to better understand this very powerful Python feature.

+8
source share

All Articles