Strcmp (), but with 0-9 AFTER A-Z? (C / C ++)

For reasons I totally disagree with, but "The Powers (of Anti-Usability), which will continue, despite my objections, I have a sorting procedure that makes the base strcmp () compare with sorting by its name "It works fine; it's hard to understand that this is wrong. However, at 11 o'clock it was decided that entries starting with a number should come after entries starting with a letter, contrary to the ASCII order. They quote that the EBCDIC standard has the numbers of the following letters, so the preliminary assumption is not universal truth, and I do not have the power to win this argument ... but I digress.

That is my problem. I replaced all the relevant strcmp references with a new nonstd_strcmp function call, and now you need to implement modifications to make the sort change. I used the source of FreeBSD as a base: http://freebsd.active-venture.com/FreeBSD-srctree/newsrc/libkern/strncmp.c.html

if (n == 0) return (0); do { if (*s1 != *s2++) return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1)); if (*s1++ == 0) break; } while (--n != 0); return (0); 

I think it may take some time for me to really think about how this should be done, but I am sure that I am not the only one who has experienced mental death from specification changes just prior to release.

+7
c ++ c strcmp lexicographic
source share
6 answers

In this special case, only with uppercase letters (as OP mentioned in the comments) and numbers 0-9 you can also omit the order table and instead multiply both different characters by 4 and compare the results modulo 256. The range of ASCII digits (from 48 to 57) will not overflow 8 bits (57 Γ— 4 = 228), but the range of capital letters (from 65 to 90) will be (65 Γ— 4 = 260). When we compare the multiplied values ​​modulo 256, the value for each letter will be less than the value of any digit: 90 Γ— 4% 256 = 104 <192 = 48 Γ— 4

The code might look something like this:

 int my_strcmp (const char *s1, const char *s2) { for (; *s1 == *s2 && *s1; ++s1, ++s2); return (((*(const unsigned char *)s1) * 4) & 0xFF) - \ (((*(const unsigned char *)s2) * 4) & 0xFF); } 

Of course, the solution to the order table is much more universal in general, since it allows you to determine the sort order for each character - this solution is reasonable only for this special case with capital letters and numbers. (But, for example, on microcontroller platforms, saving even the small amount of memory used by the table can be a real benefit.)

+4
source share

What you need to do is create an order table for each character. It is also the easiest way to make case insensitive comparisons.

 if (order_table[*s1] != order_table[*s2++]) 

Keep in mind that characters can be signed, in which case the index in your table may become negative. This code is for signed characters only:

 int raw_order_table[256]; int * order_table = raw_order_table + 128; for (int i = -128; i < 128; ++i) order_table[i] = (i >= '0' && i <= '9') ? i + 256 : toupper(i); 
+16
source share

If your authority - that’s all - like all other forces - that’s what I came across you, you can make it an option (even if it is hidden):

The sort order:

o Numbers after letters

o Letters after numbers

or, even worse, they can understand that they want to sort the Numbers numerically (for example, β€œA123” appears after β€œA15”), then it could be

o Numbers after letters

o Letters after numbers

o Smart numbers after letters

o Letters after smart numbers

This helps diagnose the real problem, not the symptom. I bet there is little chance that they can change their minds at the 11th and 59th minutes.

+8
source share

You can use the lookup table to translate ASCII to EBCDIC when comparing characters; -)

+5
source share

Although generally agree with the answers above, I think it's silly to make queries for each iteration of the loop, unless you think that most comparisons will have different first characters when you could do

 char c1, c2; while((c1 = *(s1++)) == (c2 = *(s2++)) && c1 != '\0'); return order_table[c1] - order_table[c2]; 

In addition, I would recommend building an order_table with a static initializer, which will improve speed (no need to generate every time or ever), as well as possibly readability

+3
source share

This is what should be a pretty good string comparison implementation similar to the one described by other posts.

 static const unsigned char char_remap_table[256] = /* values */ #define char_remap(c) (char_remap_table[(unsigned char) c]) int nonstd_strcmp(const char * restrict A, const char * restrict B) { while (1) { char a = *A++; char b = *B++; int x = char_remap(a) - char_remap(b); if (x) { return x; } /* Still using null termination, so test that from the original char, * but if \0 maps to \0 or you want to use a different end of string * then you could use the remapped version, which would probably work * a little better b/c the compiler wouldn't have to keep the original * var a around. */ if (!a) { /* You already know b == a here, so only one test is needed */ return x; /* x is already 0 and returning it allows the compiler to * store it in the register that it would store function * return values in without doing any extra moves. */ } } } 

Besides this, you can generalize the function to take char_remap_table as a parameter, which will allow you to easily use different mappings later if you need to.

 int nonstd_strcmp(const char * restrict a, const char * restrict b, const char * restrict map); 
+2
source share

All Articles