Notepad ++ - trying to reformat some things

I have a CSV that basically has lines that look like this:

06444|WidgetAdapter 6444|Description: Here is a description. Maybe some more. |0 

The text in the third field is always different and changing, and I'm trying to replace all the lines in it with only <br> , so it ends as

 06444|WidgetAdapter 6444|Description: <br>Here is a description.<br>Maybe some more.<br>|0 

edit:

I basically need to get rid of all the lines, so each line is the correct value VALUE | VALUE | VALUE | VALUE Normalize / decorate / clean.

None of my tools can import this correctly, phpMyAdmin chokes, etc. There are line breaks in the field, there are double quotes that are not escaped, etc.

An example of another field:

 08681|Book 08681|"Testimonial" - Person You should buy this.| 

An example of another field:

 39338|Itemizer|| 
+4
source share
2 answers

If you know that you have 4 columns, you can easily analyze your data. For example, here is a PHP string that leads to an array with all the data. Each row of the array represents another array with all capture groups: [0] has full correspondence and [1] - [4] with each column:

 $pattern = '/^([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)$/m'; preg_match_all($pattern, $data, $matches, PREG_SET_ORDER); 

The template is very simple: it takes 4 values ​​(not signs), separated by 3 pipes. Once you have the data, you can easily rebuild it the way you want, for example, using nl2br .
Please note: you cannot reliably analyze the data if the first and last columns can also contain new rows.

Working example: http://ideone.com/gG0K3

+1
source

If necessary, you can target these lines using a regular expression. The idea is to find only new lines, followed by one additional value, and then only whole lines. We can check the number of values ​​after the current row 1 modulo 4, so we know that we are in the third column:

 (?:\r\n?|\n)(?=[^|]*\|[^\n\r|]*\s*(?:^(?:[^|]*\|){3}[^\n\r|]*$\s*)*\Z) 

Or, with (some) explanations:

 (?:\r\n?|\n) # Match a newline (?= # that is before... [^|]*\|[^\n\r|]*\s* # one more separator and value (?:^(?:[^|]*\|){3}[^\n\r|]*$\s*)* # and some lines with 4 values. \Z # until the end of the string. ) 

I couldn't get it to work with Notepad ++ (it didn't even match [\r\n] ), but it seems to work well on other machines:

+1
source

All Articles