Is there a better way to define multiple character ranges?

Question

Is there a better way to define multiple character ranges?

I am currently writing C code that selects characters and numbers from all available ASCII characters. As a novice programmer, I usually did

if ((i > 25 && i < 50) || (i > 100 && i < 200)) { contents }

for the variable i is between 25 ~ 50, 100 ~ 200 (exception) to meet the condition.

If I want to set several ranges, such as 32 ~ 64 ( ! to @ ) and 91 ~ 96 ( [ to ` ) and 123 ~ 126 ( { to ~ ), then it will be better (which means shorter or simpler code) or I have to stick with this method, keep adding each range, as in the code above?

+8

c range ascii

Kagamin Jul 13 '16 at 4:55

source share

7 answers

You can write a function that checks whether a value belongs to any of the given ranges:

 struct Range { int min; int max; }; bool in_ranges(int character, struct Range *ranges, size_t num_ranges) { for(size_t i = 0; i < num_ranges; ++i) { if(ranges[i].min < character && character < ranges[i].max) return true; } return false; } int main() { struct Range rngs[] = {{25,50}, {100,200}}; bool at_sign_si_in_range = in_ranges('@', rngs, 2); return 0; }

It simplifies editing and improves readability. Also, if you keep writing all ranges in a conditional clause, as in your example, consider range checks, for example

 lower_bound < value && value < upper_bound

It looks like a mathematical notation ( x < a < y ), and also seems easier to read.

+2

Sergey Jul 13 '16 at 5:15

source share

If you use single byte characters, you can get better performance with an array of flags, setting either individual bits or integer bytes to indicate the values of characters that are in one of the ranges.

If you are writing code for an Intel processor that supports SSE 4.2 instructions, you may need to use PCMPISTRI or similar, which can compare up to 16 single-byte characters to 8 different ranges in a single instruction.

+1

Simon spero Jul 13 '16 at 13:47

source share

My answer will be "it depends." :)

If isalpha() and the friends from ctype.h do what you want, then absolutely use them.

But if not ...

If you had only two ranges, as in your example, I don’t think it looks too dirty. If there are more, perhaps put a range test in a (built-in) function to reduce the number of logic elements at a time:

 if (in_range(val, a1, b1) || in_range(val, a2, b2) || ... )

(Or call it B(n,a,b) if you feel the need to preserve screen properties.)

If ranges can change at run time or there are many, put constraints in a struct and loop through an array of them. If there really are a lot of them, put together a list and do something smart with it, like a binary search on the lower limits (or something else). But for a small number, I would not bother.

If the general range of acceptable values is small (for example, unsigned characters with values of 0..255), but the number of individual "ranges" is large ("all with basic values"), then create a table (bitmap) of the values and check for this. Create a table in any way. ( isalpha() is probably implemented like this)

 unsigned char is_prime[256] = {0, 0, 1, 1, 0, 1, 0, 1, ...}; if (is_prime[val]) { ...

+1

ilkkachu Jul 13 '16 at 16:35

source share

You can hide the duplication of l<x && x<h in a macro or inline function, but I have found that it is rarely worth what is not read as the Python syntax l<x<h , and quickly gets out of control when you start have macros for all the possibilities of including restrictions. Either you end up with a ridiculously long naming between_inc_inc ( between_inc_inc , between_inc_exc , ... which causes a defeat, rejecting the check in the first place), or you leave the reader to think about your range checks (" between(i, 50, 100) ... This range [,) a [,] one? (checks the code) nope it a (,) "), which is terrible if you are hunting for individual errors.

OTOH, I, as you know, abuse the "one-letter macros", which I determine exactly where and how they are needed, and undefined immediately after. Although they may seem ugly, the fact is that they are extremely local and do exactly what needs to be done, so there is no time to waste time looking for them, there are no mysterious parameters, and they can expose the bulk of the repeated calculations.

In your case, if the list is significantly long, I can do

 #define B(l, h) ((l)<i) && (i<(h)) || if(B(25,50) B(100,200) B(220, 240) 0) ... #undef B

(never do this in the header!)

What is a good readability improvement instead is to use character literals instead of ASCII numbers: for example, if you want to use the range az, do 'a'<=i && i<='z' .

It seems you want to exclude alphabetic and non-printing characters: you can do this with

 if((' '<=i && i<'A') || (i>'Z' && i<'a') || ('z'<i && i<=126))

0

Matteo italia Jul 13 '16 at 5:11

source share

You can write a function like:

 bool withinscope(int num, int begin, int end){ if(num > begin && num < end) return true; return false; }

Then you can use this function and keep the code clean and simple.

0

hexiecs Jul 13 '16 at 5:11

source share

 class RangeCollection { std::vector<int> ranges; public: void AddRange(int lowerBound, int upperBound) { vector.push_back(lowerBound); vector.push_back(upperBound); } bool IsInRange(int num) { for(int i=0; i<ranges.size()-1; i+=2) { if(num>ranges[i] && num<ranges[i+1])return true; } return false; } };

You can call AddRange to add as many ranges as you want, then you can check if the number is in the range.

 RangeCollection rc; rc.AddRange(20,25); rc.IsInRange(22);//returns true

0

meJustAndrew Jul 13 '16 at 6:27

source share

a3f · Accepted Answer · 2016-07-13T05:15:24+0000

In your specific case, the collection of <ctype.h> functions will be

 if (isprint(i) && !isalpha(i))

Added bonus: it works even with systems without ascii.

Is there a better way to define multiple character ranges?

More articles: