How to check if a character is a Unicode (not just ASCII) newline character in Rust?

Each programming language has its own interpretation \nand \r. Unicode supports multiple characters that can represent a new line.

From the Rust link :

White space is one of the characters U + 006E (n), U + 0072 (r), or U + 0074 (t), denoting Unicode values ​​U + 000A (LF), U + 000D (CR), or U + 0009 (HT ) respectively.

Based on this statement, I would say that the Rust character is a newline if it is either \nor \r. On Windows, this can be a combination of \rand \n. However, I am not sure.

What about the next one?

  • Next line character (U + 0085)
  • Line Separator Character (U + 2028)
  • Paragraph separator character (U + 2029)

In my opinion, we are losing something like char.is_new_line(). I looked through the Unicode Character Categories , but could not find a definition for new lines.

Should I come up with my own definition of what a Unicode newline character is?

+6
source share
1 answer

, Java, Python, Go JavaScript, , " ". , regex , $, \r\r\n\n : (\r\r\n, \n), (\r, \r\n, \n, Unicode) (\r, \r, \n, \n, JS)? Go Python \r\n $, ​​Rury regex ; Java. , Unicode.

,

  • , \n -
  • \r\n
  • \r\n
  • \r\n ,
  • .

Unicode , , . , . , ASCII Record , \t.

: . http://www.unicode.org/reports/tr14/tr14-32.html#BreakingRules LB5, \r\r\n . , , . , "- : ", : -)

+10

All Articles