Best way to extract a phone number and reformat it?

Phone number data in various formats (I selected it because the incoming data is unreliable, and not in the expected formats):

+1 480-874-4666 404-581-4000 (805) 682-4726 978-851-7321, Ext 2606 413- 658-1100 (513) 287-7000,Toll Free (800) 733-2077 1 (813) 274-8130 212-363-3200,Media Relations: 212-668-2251. 323/221-2164 

My Ruby code is to extract all the digits, delete all the leading 1 for the US country code, then use the first 10 digits to create the โ€œnewโ€ phone number in the desired format:

  nums = phone_number_string.scan(/[0-9]+/) if nums.size > 0 all_nums = nums.join all_nums = all_nums[0..0] == "1" ? all_nums[1..-1] : all_nums if all_nums.size >= 10 ten_nums = all_nums[0..9] final_phone = "#{ten_nums[0..2]}-#{ten_nums[3..5]}-#{ten_nums[6..9]}" else final_phone = "" end puts "#{final_phone}" else puts "No number to fix." end 

The results are very good !

 480-874-4666 404-581-4000 805-682-4726 978-851-7321 413-658-1100 513-287-7000 813-274-8130 212-363-3200 323-221-2164 

But, I think there is a better way. Can you reorganize this to be more efficient, more choosy, or more useful?

+4
source share
4 answers

Here's a much simpler approach using only regular expressions and wildcard:

 def extract_phone_number(input) if input.gsub(/\D/, "").match(/^1?(\d{3})(\d{3})(\d{4})/) [$1, $2, $3].join("-") end end 

This breaks all non-digits ( \D ), skips the optional leading ( ^1? ), Then extracts the first 10 decimal digits in chunks ( (\d{3})(\d{3})(\d{4}) ) and formats.

Here's the test:

 test_data = { "+1 480-874-4666" => "480-874-4666", "404-581-4000" => "404-581-4000", "(805) 682-4726" => "805-682-4726", "978-851-7321, Ext 2606" => "978-851-7321", "413- 658-1100" => "413-658-1100", "(513) 287-7000,Toll Free (800) 733-2077" => "513-287-7000", "1 (813) 274-8130" => "813-274-8130", "212-363-3200,Media Relations: 212-668-2251." => "212-363-3200", "323/221-2164" => "323-221-2164", "" => nil, "foobar" => nil, "1234567" => nil, } test_data.each do |input, expected_output| extracted = extract_phone_number(input) print "FAIL (expected #{expected_output}): " unless extracted == expected_output puts extracted end 
+13
source

My approach is slightly different (and I think itโ€™s better IMHO :-): I did not need to miss a single phone number, even if there were 2 on the line. I also did not want to receive lines with three sets of numbers that were far from each other (see A cookie example), and I didnโ€™t want to mistakenly accept the IP address as a phone number.

Code to allow multiple numbers per line, but also requires the numbers of digits to be โ€œcloseโ€ to each other:

 def extract_phone_number(input) result = input.scan(/(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/).map{|e| e.join('-')} # <result> is an Array of whatever phone numbers were extracted, and the remapping # takes care of cleaning up each number in the Array into a format of 800-432-1234 result = result.join(' :: ') # <result> is now a String, with the numbers separated by ' :: ' # ... or there is another way to do it (see text below the code) that only gets the # first phone number only. # Details of the Regular Expressions and what they're doing # 1. (\d{3}) -- get 3 digits (and keep them) # 2. \D{0,3} -- allow skipping of up to 3 non-digits. This handles hyphens, parentheses, periods, etc. # 3. (\d{3}) -- get 3 more digits (and keep them) # 4. \D{0,3} -- skip up to 0-3 non-digits # 5. (\d{4}) -- keep the final 4 digits result.empty? ? nil : result end 

And here are the tests (with a few extra tests)

 test_data = { "DB=Sequel('postgres://user: username@192.168.1.101 /test_test')" => nil, # DON'T MISTAKE IP ADDRESSES AS PHONE NUMBERS "100 cookies + 950 cookes = 1050 cookies" => nil, # THIS IS NEW "this 123 is a 456 bad number 7890" => nil, # THIS IS NEW "212-363-3200,Media Relations: 212-668-2251." => "212-363-3200 :: 212-668-2251", # THIS IS CHANGED "this is +1 480-874-4666" => "480-874-4666", "something 404-581-4000" => "404-581-4000", "other (805) 682-4726" => "805-682-4726", "978-851-7321, Ext 2606" => "978-851-7321", "413- 658-1100" => "413-658-1100", "(513) 287-7000,Toll Free (800) 733-2077" => "513-287-7000 :: 800-733-2077", # THIS IS CHANGED "1 (813) 274-8130" => "813-274-8130", "323/221-2164" => "323-221-2164", "" => nil, "foobar" => nil, "1234567" => nil, } def test_it(test_data) test_data.each do |input, expected_output| extracted = extract_phone_number(input) puts "#{extracted == expected_output ? 'good': 'BAD!'} ::#{input} => #{extracted.inspect}" end end test_it(test_data) 

Alternative implementation: using "scanning", it will automatically apply the regular expression several times, which is good if you want to add more than 1 phone number per line. If you just want to get the first phone number on the line, you can also use:

 first_phone_number = begin m = /(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/.match(input) [m[1],m[2],m[3]].join('-') rescue nil; end 

(just another way of doing things using the RegExp match function)

+2
source

For numbers in the North American plan, you could extract the first number using phone_number_string.gsub(/\D/, '').match(/^1?(\d{10})/)[1]

For instance:

 test_phone_numbers = ["+1 480-874-4666", "404-581-4000", "(805) 682-4726", "978-851-7321, Ext 2606", "413- 658-1100", "(513) 287-7000,Toll Free (800) 733-2077", "1 (813) 274-8130", "212-363-3200,Media Relations: 212-668-2251.", "323/221-2164", "foobar"] test_phone_numbers.each do | phone_number_string | match = phone_number_string.gsub(/\D/, '').match(/^1?(\d{10})/) puts( if (match) "#{match[1][0..2]}-#{match[1][3..5]}-#{match[1][6..9]}" else "No number to fix." end ) end 

As with the source code, this does not capture multiple numbers, for example, "(513) 287-7000," free "(800) 733-2077"

FWIW, in the end, it became easier for me to record and store complete numbers, that is, to include the country code and separators; making guesses during capture, on which the numbering plan , in which there is no prefix, find and select formats, for example NANP v. DE, when rendering.

0
source

This is an old branch, although I thought I would share a solution to the problem.

 def extract_phone_number(input) input.delete!('^0-9').gsub!(/^1?(\d{3})(\d{3})(\d{4})/, '\1-\2-\3')[0..11] rescue NoMethodError => e nil end 

delete! removes all non-numeric characters

gsub! matches the numbers, then forms them into a line separated by a hyphen.

[0..11] digits (in case of extensions)

Rescue block protects against modification methods called on nil

Using the tests published above.

 tests = { '+1 480-874-4666' => '480-874-4666', '404-581-4000' => '404-581-4000', '(805) 682-4726' => '805-682-4726', '978-851-7321, Ext 2606' => '978-851-7321', '413- 658-1100' => '413-658-1100', '(513) 287-7000,Toll Free (800) 733-2077' => '513-287-7000', '1 (813) 274-8130' => '813-274-8130', '212-363-3200,Media Relations: 212-668-2251.' => '212-363-3200', '323/221-2164' => '323-221-2164', '' => nil, 'foobar' => nil, '1234567' => nil } tests.each do |input, expected_output| input = input.dup if input.frozen? result = extract_phone_number(input) if result == expected_output print "PASS: #{result}\n" else print "FAIL (expected #{expected_output})\n" end end # Console => PASS: 480-874-4666 => PASS: 404-581-4000 => PASS: 805-682-4726 => PASS: 978-851-7321 => PASS: 413-658-1100 => PASS: 513-287-7000 => PASS: 813-274-8130 => PASS: 212-363-3200 => PASS: 323-221-2164 => PASS: => PASS: => PASS: 
0
source

All Articles