I have snippets of text, and I would like to split them into lines. The problem is that they were formatted and therefore I canβt break it down, as I usually do:
_text = text.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries)
.ToArray();
Here is a sample text:
adj 1: around the middle of a scale of evaluation of physical
measures; "an orange of average size"; "intermediate
capacity"; "a plane with intermediate range"; "medium
bombers" [syn: {average}, {intermediate}]
2: (of meat) cooked until there is just a little pink meat
inside
n 1: a means or instrumentality for storing or communicating
information
2: the surrounding environment; "fish require an aqueous
medium"
3: an intervening substance through which signals can travel as
a means for communication
4: (bacteriology) a nutrient substance (solid or liquid) that
is used to cultivate micro-organisms [syn: {culture medium}]
5: an intervening substance through which something is
achieved; "the dissolving medium is called a solvent"
6: a liquid with which pigment is mixed by a painter
7: (biology) a substance in which specimens are preserved or
displayed
8: a state that is intermediate between extremes; a middle
position; "a happy medium"
The format is always the same:
- 1-3 letter words may be present
- number 1-10
- colon
- space
- text that can appear on multiple lines.
So, in this case, the line break should be something like the word 1-3 char, followed by the number 1-2 characters, and then:
Can someone give me some advice on how I can do this using split or using another method?
: , , . , , , , :
public parser(string text)
{
string pattern = @"(\w{1,3} )?1?\d: (?<line>[^\r\n]+)(\r?\n\s+(?<line>[^\r\n]+))*";
foreach (Match m in Regex.Matches(text, pattern))
{
if (m.Success)
{
string entry = string.Join(Environment.NewLine,
m.Groups["line"].Captures.Cast<Capture>().Select(x => x.Value));
}
}
}
:
"medium\n adj 1: \n ," \ ", \" \n \ "; \" \ "; \" \n \ "[: {}, {}]\n 2: (), , \n \nn 1: \n \n 2: ; \" \n \ "\n 3: , \na \n 4: ( ) ( ), \n [: { ]]\n 5: , - ,\ \" \n 6: , \n 7: () , spec imens \n \n 8: , ; \n ;\ " \" \n 9: -, ; " " [: {]]\n 10: , \n [syn: {mass medium}]\n 11: , ; \in \[syn: {metier}]\n [: {media} (pl)]\n "