How should I process numbers from different UNICODE sets of numbers on the same line?

Question

How should I process numbers from different UNICODE sets of numbers on the same line?

I am writing a function that translites UNICODE digits to ASCII digits, and I underestimate a little what to do if a string contains digits from different UNICODE digit sets. For example, if I have the line "\ x {2463} \ x {24F6}" ("④⓶"). Should my function

return 42?
croak, what line contains mixed sets?
carp, that string contains mixed sets and returns 42?
provide the user with an additional argument to indicate one of the three behaviors described above?
do something else?

+4

perl unicode

Chas. Owens May 21, '09 at 14:02

source share

4 answers

Alnitak · Answer 1 · 2009-05-21T14:17:53+0000

It seems your current function is performing # 1.

I suggest you also write another function for # 4, but only when this requirement appears, and not earlier .

I'm sure Joel recently wrote about “premature implementation” in a blog article, but I can't find it.

user82238 · Answer 2 · 2009-05-21T14:24:33+0000

I'm not sure I see the problem.

You support numerical conversion from a number of scenarios, that is, you know the Unicode codes for your numeric characters.

If you find an unknown code in your input, this is an error.

It is up to you what you do in case of an error; you can insert a space or underscore, or you can abort the conversion. What you do will depend on the environment in which your function is executed; this is not what we can tell you.

Fran corpier · Answer 3 · 2009-05-21T16:11:31+0000

My initial thought was No. 4; strictly based on the fact that I like the options. However, I changed my mind when I looked at your function.

The purpose of the function is to simply get the totals of 0..9. Users may find it helpful to send mixed sets (feature :). I will use it.

Novelocrat · Answer 4 · 2009-05-30T05:43:17+0000

If you have ever had to process input in databases exceeding 10, you may need to process many options on the first 6 letters of the Latin alphabet ("ABCDEF") as numbers in all their forms.

How should I process numbers from different UNICODE sets of numbers on the same line?

More articles: