The fastest way to map telephony prefixes using an asterisk PHP script

and in advance for help.

Background - I am writing a PHP script that should figure out where the recipient is going. Telephony prefixes are strings that determine the purpose. For each call, the program must find the longest prefix corresponding to the line. For example, the number 30561234567 would correspond to 305, but not 3057 or 304. If 3056 existed, this would be the preferred match.

After exploring several data structures, the tree in which each node stores a digit and contains pointers to the other 10 possible options seems ideal. This can be implemented as an array of arrays, where the script can check 3, find the array there, then check 0 on this new array, find another array, and so on, until it finds a value instead of an array. This value will be the destination identifier (script output).

Requirements - Performance is absolutely important. The time taken to verify these prefixes delays the call, and each server must handle large numbers of calls, so the data structure must be stored in memory. There are currently about 6,000 prefixes.

Problem - the script is run every time the server accepts the call, so the data must be stored in some cache server. After checking memcached and APC (Advanced PHP Cache) I decided to use APC because it is [much faster] [3] (this is local memory)

The problem is that an array of arrays can have up to 10 arrays in depth and will be a very complex data structure, which, if I add to the cache as an object, will take a long time to de-serialize.

However, if I add each separate array to the cache separately (with some logical naming structure so that it can easily find it as 3 for array 3, then 30 for array 30, 305 for the array following this patch, etc. ) I will have to take different arrays from the cache many times (up to 10 per call), which changes the cache too often.

Am I going to do it right? Maybe there is another solution? Or one of the methods that I suggested above the other?

Thank you for entering

Alex

+6
performance php caching asterisk
source share
6 answers

Here is an example code for an N-ary tree structure;

class PrefixCache { const EOS = 'eos'; protected $data; function __construct() { $this->data = array(); $this->data[self::EOS] = false; } function addPrefix($str) { $str = (string) $str; $len = strlen($str); for ($i=0, $t =& $this->data; $i<$len; ++$i) { $ch = $str[$i]; if (!isset($t[$ch])) { $t[$ch] = array(); $t[$ch][self::EOS] = false; } $t =& $t[$ch]; } $t[self::EOS] = true; } function matchPrefix($str) { $str = (string) $str; $len = strlen($str); $so_far = ''; $best = ''; for ($i=0, $t =& $this->data; $i<$len; ++$i) { $ch = $str[$i]; if (!isset($t[$ch])) return $best; else { $so_far .= $ch; if ($t[$ch][self::EOS]) $best = $so_far; $t =& $t[$ch]; } } return false; // string not long enough - potential longer matches remain } function dump() { print_r($this->data); } } 

it can then be called

 $pre = new PrefixCache(); $pre->addPrefix('304'); $pre->addPrefix('305'); // $pre->addPrefix('3056'); $pre->addPrefix('3057'); echo $pre->matchPrefix('30561234567'); 

which performs as needed (returns 305 if 3056 is uncommented, 3056 is returned instead).

Note that I am adding a terminal flag for each node; this avoids false partial matches, i.e. if you add the prefix 3056124, it will match 3056 correctly instead of returning 305612.

The best way to avoid a reboot every time is to turn it into a service; however, before that I will measure the runtime using APC. It can be fast enough as it is.

Alex: your answer is absolutely right - but not applicable to this question :)

+2
source share

The way I see it using a simple array structure should work fine ...

Code example: (note that for performance, prefixes are keys in the array, not values)

 // $prefixes = array(3=>1, 30=>1, 304=>1,305=>1,3056=>1,306=>1,31=>1, 40=>1); function matchNumber($number) { $prefixes = getPrefixesFromCache(); $number = "$number"; // try to find the longest prefix matching $number while ($number != '') { if (isset($keys[$number])) break; // not found yet, subtract last digit $number = substr($number, 0, -1); } return $number; } 

Another way is to request the cache directly for the number - in this case it could be optimized:

  • divide the line in 2.
  • request this line in cache.
  • if it does not exist, goto 1
  • as long as it exists, save this value as a result and add another digit.

Snippet: (note that query_cache_for () should be replaced with any function used by the caching mechanism)

 function matchNumber($number) { $temp = "$number"; $found = false; while (1) { $temp = substr($temp, 0, ceil(strlen($temp)/2) ); $found = query_cache_for($temp); if ($found) break; if (strlen($temp) == 1) return FALSE; // should not happen! } while ($found) { $result = $temp; // add another digit $temp .= substr($number, strlen($temp), 1); $found = query_cache_for($temp); } return $result; } 

This approach has the obvious advantage that each prefix is โ€‹โ€‹the only element in the cache - for example, the key can be "asterix_prefix_ <number>", the value does not matter (1 works).

+2
source share

Since you only work with numbers, it is possible that working with strings is inefficient.

You can execute the binary search algorithm. If you save all your prefixes (numerically) filled up to 15 places, and then in order, you can scan 6000 codes in about log2 (6000) ~ = 13 steps.

For example, if you have the following codes:

  • 01, 0127, 01273, 0809, 08

The following will be stored in the array:

  • 010000000000000
  • 012700000000000
  • 012730000000000
  • 080000000000000
  • 080900000000000

Stages:

  • Remove the incoming number to 15 places.
  • Perform a binary search to find the closest lowest code (and the index in the array above)
  • See code length in a separate array (using index)

Sample code to view in action:

 // Example for prefixes 0100,01,012,0127,0200 $prefixes = array('0100','0101','0120','0127','0200'); $prefix_lengths = array(4,2,3,4,4); $longest_length_prefix = 4; echo GetPrefix('01003508163'); function GetPrefix($number_to_check) { global $prefixes; global $prefix_lengths; global $longest_length_prefix; $stripped_number = substr($number_to_check, 0, $longest_length_prefix); // Binary search $window_floor = 0; $window_ceiling = count($prefixes)-1; $prefix_index = -1; do { $mid_point = ($window_floor+$window_ceiling)>>1; if ($window_floor==($window_ceiling-1)) { if ($stripped_number>=$prefixes[$window_ceiling]) { $prefix_index=$window_ceiling; break; } elseif ($stripped_number>=$prefixes[$window_floor]) { $prefix_index=$window_floor; break; } else { break; } } else { if ($stripped_number==$prefixes[$mid_point]) { $prefix_index=$mid_point; break; } elseif ($stripped_number<$prefixes[$mid_point]) { $window_ceiling=$mid_point; } else { $window_floor=$mid_point; } } } while (true); if ($prefix_index==-1 || substr($number_to_check, 0, $prefix_lengths[$prefix_index])!=substr($prefixes[$prefix_index],0, $prefix_lengths[$prefix_index])) { return 'invalid prefix'; } else { return substr($prefixes[$prefix_index], 0, $prefix_lengths[$prefix_index]); } } 
+1
source share

I do this using a row hash table where the keys are strings that represent the destination prefix. The critical factor is that the hash table must be sorted so that the longest rows are checked first. Once a suitable prefix is โ€‹โ€‹found, the destination of the call is known.

I actually also have a circle of regular expressions for more confusing recipients and validating regular expressions before destination prefixes.

I have not measured how long it takes to get a match, but I suspect a maximum of 15 ms. The whole process of checking desitnation, and then user balance and, finally, setting call time limits takes about 150 ms. In my case, I use the FastAGI service and C # Windows. As long as you take less than 500 ms, it will be unforgivable for your users.

0
source share

I am also launching a telephony application ... what I have done is provided to the internal REST API for the call, this is what caches known phone numbers and does all the prefix checking.

I also assume that you are looking for country codes. With NANP, there are only a few matching country codes. Therefore, I look primarily at NANP and do a quick match with the number of the following numbers (7) to make sure otherwise I will return to the country code. Then I have an approximate idea of โ€‹โ€‹how many numbers are in the phone number that each country should have in regular expression.

I use an XML document and map it to XPath, and then cache the XPath result if necessary.

The cool thing about using the REST API is also that it can be used to clean numbers before I save them to the billing database.

This is not an exact science, but it seems to work.

0
source share

Finding the longest common subsequence is a classic application of dynamic programming. The solution is O (n). http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

0
source share

All Articles