What is the fastest way to check a keyword in a keyword list in Delphi?

Question

What is the fastest way to check a keyword in a keyword list in Delphi?

I have a short list of keywords. What I really would like to do is akin to:

case MyKeyword of 'CHIL': (code for CHIL); 'HUSB': (code for HUSB); 'WIFE': (code for WIFE); 'SEX': (code for SEX); else (code for everything else); end;

Unfortunately, the CASE statement cannot be used in the same way as for strings.

I could use direct IF if ELSE IF construct, for example:

 if MyKeyword = 'CHIL' then (code for CHIL) else if MyKeyword = 'HUSB' then (code for HUSB) else if MyKeyword = 'WIFE' then (code for WIFE) else if MyKeyword = 'SEX' then (code for SEX) else (code for everything else);

but I heard that it is relatively inefficient.

Instead, I did the following:

 P := pos(' ' + MyKeyword + ' ', ' CHIL HUSB WIFE SEX '); case P of 1: (code for CHIL); 6: (code for HUSB); 11: (code for WIFE); 17: (code for SEX); else (code for everything else); end;

This, of course, is not the best programming style, but it works great for me and still hasn't changed.

So what is the best way to rewrite this in Delphi so that it is simple, straightforward, but also fast?

(For reference, I am using Delphi 2009 with Unicode strings.)

Follow-up:

Toby recommended that I simply use the If Then Else construct. Looking back at my examples that use the CASE statement, I see how this is a viable answer. Unfortunately, my inclusion of CASE inadvertently hid my real question.

I really don't care which particular keyword. This is just a bonus if a particular method can identify it as a POS method. I need to know if a keyword is in a set of keywords.

So I really want to know if there is anything better:

 if pos(' ' + MyKeyword + ' ', ' CHIL HUSB WIFE SEX ') > 0 then

The equivalent of If Then Else does not look better in this case:

 if (MyKeyword = 'CHIL') or (MyKeyword = 'HUSB') or (MyKeyword = 'WIFE') or (MyKeyword = 'SEX') then

In Barry's comment on Cornell's question, he mentions TDictionary Generic. I haven't found any new Generic collections yet, and it looks like I should delve into them. My question here will be, are they built for efficiency, and how would I use TDictionary to compare in views and in speed for the two lines above?

In a later profiling, I found that string concatenation as in: (`` + MyKeyword + '') is VERY expensive in time and should be avoided when possible. Virtually any other solution is better than doing this.

+7

optimization delphi lookup case-statement

lkessler Jan 23 '10 at 23:54

source share

8 answers

 type TKeyword = ( ECHIL, EHUSB, EWIFE, ESEX ) const TKeyNames : array[TKeyword] of string = ( 'CHIL', 'HUSB', 'WIFE', 'SEX' ); Key : TKeyword case Key of ECHIL : (code for CHIL); EHUSB : (code for HUSB); EWIFE : (code for WIFE); ESEX : (code for SEX); else (code for everything else); end;

In general, don't use strings as “keys,” use enumerations — they are safer, and you get most of the speed increase.

Unfortunately, Delphi (as far as I know) does not have a standard hash table implementation that will be easy to use, however you can always collapse your own.

BTW, code for SEX sounds a lot more fun than beer code: P

+6

Kornel kisielewicz Jan 24 '10 at 0:26

source share

You can use the const table (which should be sorted by alpha) and quick binary sort. It is very efficient and does not require any hashing.

Here is the function:

 function IsKeyWord(const KeyWords: array of string; const aToken: String): Boolean; // aToken must be already uppercase var First, Last, I, Compare: Integer; begin First := Low(Keywords); Last := High(Keywords); Result := False; while First <= Last do begin I := (First + Last) shr 1; Compare := CompareStr(Keywords[i],aToken); if Compare = 0 then begin Result := True; break; end else if Compare < 0 then First := I + 1 else Last := I - 1; end; end;

And here are some examples of keywords:

 const PASCALKEYWORDS: array[0..100] of string = ('ABSOLUTE', 'ABSTRACT', 'AND', 'ARRAY', 'AS', 'ASM', 'ASSEMBLER', 'AUTOMATED', 'BEGIN', 'CASE', 'CDECL', 'CLASS', 'CONST', 'CONSTRUCTOR', 'DEFAULT', 'DESTRUCTOR', 'DISPID', 'DISPINTERFACE', 'DIV', 'DO', 'DOWNTO', 'DYNAMIC', 'ELSE', 'END', 'EXCEPT', 'EXPORT', 'EXPORTS', 'EXTERNAL', 'FAR', 'FILE', 'FINALIZATION', 'FINALLY', 'FOR', 'FORWARD', 'FUNCTION', 'GOTO', 'IF', 'IMPLEMENTATION', 'IN', 'INDEX', 'INHERITED', 'INITIALIZATION', 'INLINE', 'INTERFACE', 'IS', 'LABEL', 'LIBRARY', 'MESSAGE', 'MOD', 'NAME', 'NEAR', 'NIL', 'NODEFAULT', 'NOT', 'OBJECT', 'OF', 'OR', 'OUT', 'OVERRIDE', 'PACKED', 'PASCAL', 'PRIVATE', 'PROCEDURE', 'PROGRAM', 'PROPERTY', 'PROTECTED', 'PUBLIC', 'PUBLISHED', 'RAISE', 'READ', 'READONLY', 'RECORD', 'REGISTER', 'REINTRODUCE', 'REPEAT', 'RESIDENT', 'RESOURCESTRING', 'SAFECALL', 'SET', 'SHL', 'SHR', 'STDCALL', 'STORED', 'STRING', 'STRINGRESOURCE', 'THEN', 'THREADVAR', 'TO', 'TRY', 'TYPE', 'UNIT', 'UNTIL', 'USES', 'VAR', 'VARIANT', 'VIRTUAL', 'WHILE', 'WITH', 'WRITE', 'WRITEONLY', 'XOR'); DFMKEYWORDS: array[0..4] of string = ( 'END', 'FALSE', 'ITEM', 'OBJECT', 'TRUE'); CKEYWORDS: array[0..47] of string = ( 'ASM', 'AUTO', 'BREAK', 'CASE', 'CATCH', 'CHAR', 'CLASS', 'CONST', 'CONTINUE', 'DEFAULT', 'DELETE', 'DO', 'DOUBLE', 'ELSE', 'ENUM', 'EXTERN', 'FLOAT', 'FOR', 'FRIEND', 'GOTO', 'IF', 'INLINE', 'INT', 'LONG', 'NEW', 'OPERATOR', 'PRIVATE', 'PROTECTED', 'PUBLIC', 'REGISTER', 'RETURN', 'SHORT', 'SIGNED', 'SIZEOF', 'STATIC', 'STRUCT', 'SWITCH', 'TEMPLATE', 'THIS', 'THROW', 'TRY', 'TYPEDEF', 'UNION', 'UNSIGNED', 'VIRTUAL', 'VOID', 'VOLATILE', 'WHILE'); CSHARPKEYWORDS : array[0..86] of string = ( 'ABSTRACT', 'AS', 'BASE', 'BOOL', 'BREAK', 'BY3', 'BYTE', 'CASE', 'CATCH', 'CHAR', 'CHECKED', 'CLASS', 'CONST', 'CONTINUE', 'DECIMAL', 'DEFAULT', 'DELEGATE', 'DESCENDING', 'DO', 'DOUBLE', 'ELSE', 'ENUM', 'EVENT', 'EXPLICIT', 'EXTERN', 'FALSE', 'FINALLY', 'FIXED', 'FLOAT', 'FOR', 'FOREACH', 'FROM', 'GOTO', 'GROUP', 'IF', 'IMPLICIT', 'IN', 'INT', 'INTERFACE', 'INTERNAL', 'INTO', 'IS', 'LOCK', 'LONG', 'NAMESPACE', 'NEW', 'NULL', 'OBJECT', 'OPERATOR', 'ORDERBY', 'OUT', 'OVERRIDE', 'PARAMS', 'PRIVATE', 'PROTECTED', 'PUBLIC', 'READONLY', 'REF', 'RETURN', 'SBYTE', 'SEALED', 'SELECT', 'SHORT', 'SIZEOF', 'STACKALLOC', 'STATIC', 'STRING', 'STRUCT', 'SWITCH', 'THIS', 'THROW', 'TRUE', 'TRY', 'TYPEOF', 'UINT', 'ULONG', 'UNCHECKED', 'UNSAFE', 'USHORT', 'USING', 'VAR', 'VIRTUAL', 'VOID', 'VOLATILE', 'WHERE', 'WHILE', 'YIELD');

And it is very easy to use:

  if IsKeyWord(PASCALKEYWORDS,UpperCase('BEGIN')) then ....

You can easily change the IsKeyWord () function to return the marker index if you need it.

 function KeyWordIndex(const KeyWords: array of string; const aToken: String): integer; // aToken must be already uppercase var First, Last, Compare: Integer; begin First := Low(Keywords); Last := High(Keywords); while First <= Last do begin result := (First + Last) shr 1; Compare := CompareStr(Keywords[result],aToken); if Compare = 0 then exit else if Compare < 0 then First := result + 1 else Last := result - 1; end; result := -1; // not found end;

+5

Arnaud bouchez Jan 24 '10 at 11:17

source share

Your series of if , to check if any of the specified keywords can be input, can be shortened by checking single characters to help out as soon as possible. Your example

 if MyKeyword = 'CHIL' then (code for CHIL) else if MyKeyword = 'HUSB' then (code for HUSB) else if MyKeyword = 'WIFE' then (code for WIFE) else if MyKeyword = 'SEX' then (code for SEX) else (code for everything else);

can be replaced by

 KW := kwOther; if MyKeyword <> '' then begin case MyKeyword[1] of 'C': if MyKeyword = 'CHIL' then KW := kwCHIL; 'H': if MyKeyword = 'HUSB' then KW := kwHUSB; 'S': if MyKeyword = 'SEX' then KW := kwSEX; 'W': if MyKeyword = 'WIFE' then KW := kwWIFE; end; end; case KW of kwCHIL: {code for CHIL}; kwHUSB: {code for HUSB}; kwSEX: {code for SEX}; kwWIFE: {code for WIFE}; else {code for everything else}; end;

For case-insensitive keywords, case will check upper and lower case, and comparison will use AnsiCompareText() . If you have several keywords with the same first letter, you can enclose these case statements, but readability is likely to suffer soon with a very small increase in speed.

Taking this maximum, you can implement a state machine that PChar uses to calculate the next state that will enter the case else as soon as the first inconsistent character is found. That would be faster than any hash solution.

+3

mghie Jan 24 '10 at 9:56

source share

I think that

 if x then else

is the best solution. Your own solution is very inelegant and is not worth it to minimize efficiency. The if then else construct is perfect for what you want.

+2

Toby allen Jan 24 '10 at 0:27

source share

For the cleanest code, it is best to use case with enumerations, or if ... with strings, as others have suggested. There are several solutions from the beaten track, though, if you really want to get there.

One of them is to use a row hash map that looks like a row indexed list. The values in the list will be procedure pointers for the code that you want to execute for each line. All procedures must have the same exact parameters - and you have to write a hash map yourself or find one that you can use, for example. in the JVCL.

 // prepare the hash map myHashMap[ 'CHIL' ] := @procToCallForChil; myHashMap[ 'HUSB' ] := @procToCallForHusb; // use the hash map type TPrototypeProc = procedure( your-parameters-here ); var procToCall : TPrototypeProc; begin @procToCall := myHashMap[ 'CHIL' ]; procToCall( your-parameters-here ); end;

Two, and this is something strange, I tried once: if and only if the lines of your identifier correspond to 4 characters (as in all your examples), and they are ansi lines (not Unicode lines), you can display the lines directly to integers. A 32-bit integer is equivalent in size to a 4-byte string, so you can do:

 function FourCharStringToInteger( const s : ansistring ) : integer; begin assert(( length( s ) = 4 ), 'String to convert must be exactly 4 characters long!'); move( s[1], result, 4 ); end;

Any string identifier with a length of less than 4 characters must be filled with spaces, and strings are case sensitive ("chil" and "CHIL" give different values). To use this with the case argument, you will have to pre-copy the values, which may or may not be suitable for your purpose:

 id_CHIL := FourCharStringToInteger( 'CHIL' );

And finally, you can have a case statement:

 case id of id_CHIL : ... id_HUSB : ... end;

This is a "special care" code and can only matter if you have hundreds or more lines of your identifier - this really should not be done at all. Of course, it’s easy to break. I agree that the statements of this case are faster than the endless processions of blocks ... else ....

+2

Marek jedliński Jan 24 '10 at 2:13

source share

disclaimer: the answer is based on an updated description of the problem, i.e. just checks if the string matches or not.

If you really want to go to the last bit of performance, additional information about your data sets can help.

How many keywords are we talking about? (which order of magnitude)
Is a set of keywords installed?
Are there many repetitions in the input set? (i.e. the same keywords X are repeated)
What is the expected hit / miss ratio? Do you expect to match one keyword for every thousand input words, or do you expect almost every word to be found?

For example,

for a small number of keywords (say about 20, depending on implementation), hashing overheads will become important.
If you get it, you can get the perfect hash scheme (see here for an example in C), you can get rid of any chain or similar scheme by resetting some important loops. Again, this will require that both your keywords and the input set be known in front, which is unlikely.
If there are many repetitions in keywords (and a large collection of hash files), a small local cache of the last X words can help.
if you expect a lot of egregious misses (or your hash algorithm is very inefficient; P), trie might be more efficient than a hash table.

Most of them are a bit extreme for common performance tuning tasks. I would probably pre-profile standard "hashed" implementations (delphi generics, jcl, etc.) to see which one is best for your task.

+1

Paul-jan Jan 24 '10 at 8:35

source share

You can also switch to a more object oriented approach and have something like

 TCommand = class public procedure Execute; virtual; abstract; end; TCommandClass = class of TCommand;

and factory create teams for you

 TCommandFactory = class public procedure RegisterKeyWord (const Keyword : String; CmdClass : TCommandClass); function CreateCommand (const Keyword : String) : TCommand; end;

The calling code will look as simple as:

 CommandFactory.CreateCommand (Keyword).Execute;

Thus, you localized and encapsulated all the keyword strings in a simple factory class, which facilitates subsequent changes to the input language. Using this team approach has other benefits, such as easy extensibility.

I know that this can be interpreted as not an answer to your question, because it is not about how quickly you can do it. But this is another approach worth considering.

0

jpfollenius Jan 24 '10 at 9:25

source share

Uwe raabe · Accepted Answer · 2010-01-24T10:25:54+0000

I mainly use the IndexText function from StrUtils for this purpose. It is similar to your pos approach, but the return value does not depend on the individual string length. As a trick, it is also case insensitive (use IndexStr if you don't want this).

 function IndexText(const AText: string; const AValues: array of string): Integer; overload;

A comment on these functions actually mentions the case construct:

{AnsiMatchText and AnsiIndexText provide a function like with strings}

What is the fastest way to check a keyword in a keyword list in Delphi?

More articles: