How to separate paragraphs in a line

Hi guys, I really need your help. I tried to take a multi-line line, which was enclosed in several paragraphs and divided it into several separate texts.

I realized that whenever I skip a line, there is a sequence \ n \ r there. Subsequently, I thought that each new line begins with \ n and ends with the character \ r. So I wrote the following code.

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Text.RegularExpressions; namespace ConsoleApplication15 { class Program { struct ParagraphInfo { public ParagraphInfo(string text) { int i; Text = text; i = text.IndexOf('.'); FirstSentence = text.Substring(0, i); } public string Text, FirstSentence; } static void Main(string[] args) { int tmp = 0; int tmp1 = 0; string MultiParagraphString = @"AA.aa. BB.bb. CC.cc. DD.dd. EE.ee."; List<ParagraphInfo> Paragraphs = new List<ParagraphInfo>(); Regex NewParagraphFinder = new Regex(@"[\n][\r]"); MatchCollection NewParagraphMatches = NewParagraphFinder.Matches(MultiParagraphString); for (int i = 0; i < NewParagraphMatches.Count; i++) { if (i == 0) { Paragraphs.Add(new ParagraphInfo((MultiParagraphString.Substring(0, NewParagraphMatches[0].Index)))); } else if (i == (NewParagraphMatches.Count - 1)) { tmp = NewParagraphMatches[i].Index + 3; tmp1 = MultiParagraphString.Length - NewParagraphMatches[i].Index - 3; Paragraphs.Add(new ParagraphInfo(MultiParagraphString.Substring(tmp, tmp1))); } else { tmp = NewParagraphMatches[i].Index + 3; tmp1 = NewParagraphMatches[i + 1].Index - NewParagraphMatches[i].Index+3; Paragraphs.Add(new ParagraphInfo(MultiParagraphString.Substring(tmp, tmp1))); } } Console.WriteLine(MultiParagraphString); foreach (ParagraphInfo Paragraph in Paragraphs) { Console.WriteLine(Paragraph.Text); } } } } 

when I typed each paragraph term one by one with the whole text, something rather strange appeared. The result of the list of paragraphs was as follows:

AA.aa.


CC.cc.

DD.


DD.dd.

EE.


EE.ee.


I can’t understand why this is happening, and besides, I can’t understand why this conclusion is so different every time.

Sorry if this is a mess, but I really need help here. By the way, if anyone has a better idea to do this, feel free to share.

thanks

+4
source share
3 answers

You can try the following:

 MultiParagraphString.Split(new [] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries); 

This will return a IEnumerable<String> . If you want to convert them to your structures, just use Select :

 MultiParagraphString.Split(new [] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries) .Select(s => new ParagraphInfo(s)).ToList(); 
+3
source

I thought every new line starts with \ n and ends with \ r

No. \r\n is a two-character sequence used to indicate a new line on Windows (and other non-Unix systems). It does not signal the “beginning” and “end” of a paragraph.

To separate to paragraphs, you can use string.Split() :

 string[] paragraphs = MultiParagraphString.Split(new string[]{"\r\n"}, StringSplitOptions.RemoveEmptyEntries); 
0
source
  string text = richTextBox1.Text; 

You can ignore paragraphs using this:

 text = text.Replace((char)10, ' '); 

You can discover paragraphs using this:

 string[] words = s.split(''); foreach (string word in words) { if (word.Contains((char)10)) { MessageBox.Show("A paragraph is here (with brillant English accent)"); } 

Notes. These codes work only when paragraphs are separated by the text enter key.

0
source

All Articles