Combine line with next line if last character is semicolon using batch file

I have a file with the following 4 lines.

A;1;abc;<xml/>; ;2;def;<xml >hello world</xml>; ;3;ghi;<xml/>; 

Using a batch file, I need to concatenate the lines so that if the line does not end with a semicolon (;), connect the next line to the current line.

Thus, the desired result should be

 A;1;abc;<xml/>; ;2;def;<xml>hello world</xml>; ;3;ghi;<xml/>; 

I am not very familiar with batch scripts, but tried to use for /F , but so far no luck.

As I understand it, the logic should be to check the last character for each line, if it is not a semicolon, read the next line in the current line.

In addition to this, I managed to get the last character of the string, but my script only reads the string if it is not found ;, Any ideas?

 @echo off for /f "tokens=*" %%i in (myfile.txt) do ( set var=%%i echo %%i if "%var:~-1%"==";" ( echo test ) ) 

Note: in the above query, only lines 1 and 3 are displayed.

+4
source share
3 answers

You have a number of problems with your code :)

1) As you said, your code ignores lines starting with ; - This is due to the default FOR / F EOL option. But your code also separates leading spaces from each line due to "TOKENS = *". You need to install neither EOL nor DELIMS. The syntax is strange, but it works:

 for /f delims^=^ eol^= %%i ... 

2) You are trying to install and deploy var in a parenthesized block of code. This may not work because the extension occurs when a line is parsed and the entire block of code is parsed immediately. Thus, the value %var% is the value that existed before the execution of the loop. Of course, not what you want. The solution is to use slow expansion. Enter FOR /? from the command line for more information about a delayed extension (about halfway down the help list)

3) For the contents of a variable containing ! will be damaged if it is expanded, if slow expansion is enabled. The solution is to turn on and off delayed expansion as needed in the loop. But this causes a complication, because you need to keep the value of the growing line according to the ENDLOCAL barrier. I use FOR / F to transfer the value across the barrier.

Here is the complete batch of script that should do the job. It is limited in that it cannot process strings exceeding a maximum length of ~ 8191 bytes.

This code has been rewritten to fix a significant error.

 @echo off setlocal disableDelayedExpansion set "ln=" set "print=0" for /f delims^=^ eol^= %%i in (myfile.txt) do ( set "var=%%i" setlocal enableDelayedExpansion for /f delims^=^ eol^= %%A in ("!ln!!var!") do ( if "!var:~-1!"==";" ( endlocal echo %%A set "ln=" ) else ( endlocal set "ln=%%A" ) ) ) 

SET / P Solution

There is a much simpler solution that prints each line immediately, so you don’t have to worry about wrapping the variable through ENDLOCAL. Lines that do not end on ; are printed without newlines using SET / P.

This solution has the following limitations:

1) Lines printed through SET / P will be blank. This restriction is only for Vista and later versions of Windows. This is not a problem for XP.

2) Thanks to David Rumann, now I know that SET / P will fail if the line starts with = . Very unsuccessful: (

 @echo off setlocal disableDelayedExpansion set "ln=" for /f delims^=^ eol^= %%i in (myfile.txt) do ( set "var=%%i" setlocal enableDelayedExpansion if "!var:~-1!"==";" (echo !var!) else (<nul set /p ="!var!") endlocal ) 

hybrid batch / jScript -regex solution (bulletproof?)

I wrote the batch / JScript REPL.BAT hybrid utility, which makes it easy to search and replace regular expressions with the contents of a file. This makes the job easier.

The following command should work on any input without restrictions. It has been updated to support the Windows and Unix lines. And it is much faster than a pure batch solution.

 findstr "^." myfile.txt|repl "([^;\r])\r?\n" "$1" m >"outFile.txt" 

Here is the REPL.BAT utility. Full documentation is built into the script.

 @if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment ::************ Documentation *********** ::: :::REPL Search Replace [Options [SourceVar]] :::REPL /? ::: ::: Performs a global search and replace operation on each line of input from ::: stdin and prints the result to stdout. ::: ::: Each parameter may be optionally enclosed by double quotes. The double ::: quotes are not considered part of the argument. The quotes are required ::: if the parameter contains a batch token delimiter like space, tab, comma, ::: semicolon. The quotes should also be used if the argument contains a ::: batch special character like &, |, etc. so that the special character ::: does not need to be escaped with ^. ::: ::: If called with a single argument of /? then prints help documentation ::: to stdout. ::: ::: Search - By default this is a case sensitive JScript (ECMA) regular ::: expression expressed as a string. ::: ::: JScript syntax documentation is available at ::: http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx ::: ::: Replace - By default this is the string to be used as a replacement for ::: each found search expression. Full support is provided for ::: substituion patterns available to the JScript replace method. ::: A $ literal can be escaped as $$. An empty replacement string ::: must be represented as "". ::: ::: Replace substitution pattern syntax is documented at ::: http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx ::: ::: Options - An optional string of characters used to alter the behavior ::: of REPL. The option characters are case insensitive, and may ::: appear in any order. ::: ::: I - Makes the search case-insensitive. ::: ::: L - The Search is treated as a string literal instead of a ::: regular expression. Also, all $ found in Replace are ::: treated as $ literals. ::: ::: E - Search and Replace represent the name of environment ::: variables that contain the respective values. An undefined ::: variable is treated as an empty string. ::: ::: M - Multi-line mode. The entire contents of stdin is read and ::: processed in one pass instead of line by line. ^ anchors ::: the beginning of a line and $ anchors the end of a line. ::: ::: X - Enables extended substitution pattern syntax with support ::: for the following escape sequences: ::: ::: \\ - Backslash ::: \b - Backspace ::: \f - Formfeed ::: \n - Newline ::: \r - Carriage Return ::: \t - Horizontal Tab ::: \v - Vertical Tab ::: \xnn - Ascii (Latin 1) character expressed as 2 hex digits ::: \unnnn - Unicode character expressed as 4 hex digits ::: ::: Escape sequences are supported even when the L option is used. ::: ::: S - The source is read from an environment variable instead of ::: from stdin. The name of the source environment variable is ::: specified in the next argument after the option string. ::: ::************ Batch portion *********** @echo off if .%2 equ . ( if "%~1" equ "/?" ( findstr "^:::" "%~f0" | cscript //E:JScript //nologo "%~f0" "^:::" "" exit /b 0 ) else ( call :err "Insufficient arguments" exit /b 1 ) ) echo(%~3|findstr /i "[^SMILEX]" >nul && ( call :err "Invalid option(s)" exit /b 1 ) cscript //E:JScript //nologo "%~f0" %* exit /b 0 :err >&2 echo ERROR: %~1. Use REPL /? to get help. exit /b ************* JScript portion **********/ var env=WScript.CreateObject("WScript.Shell").Environment("Process"); var args=WScript.Arguments; var search=args.Item(0); var replace=args.Item(1); var options="g"; if (args.length>2) { options+=args.Item(2).toLowerCase(); } var multi=(options.indexOf("m")>=0); var srcVar=(options.indexOf("s")>=0); if (srcVar) { options=options.replace(/s/g,""); } if (options.indexOf("e")>=0) { options=options.replace(/e/g,""); search=env(search); replace=env(replace); } if (options.indexOf("l")>=0) { options=options.replace(/l/g,""); search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1"); replace=replace.replace(/\$/g,"$$$$"); } if (options.indexOf("x")>=0) { options=options.replace(/x/g,""); replace=replace.replace(/\\\\/g,"\\B"); replace=replace.replace(/\\b/g,"\b"); replace=replace.replace(/\\f/g,"\f"); replace=replace.replace(/\\n/g,"\n"); replace=replace.replace(/\\r/g,"\r"); replace=replace.replace(/\\t/g,"\t"); replace=replace.replace(/\\v/g,"\v"); replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g, function($0,$1,$2){ return String.fromCharCode(parseInt("0x"+$0.substring(2))); } ); replace=replace.replace(/\\B/g,"\\"); } var search=new RegExp(search,options); if (srcVar) { WScript.Stdout.Write(env(args.Item(3)).replace(search,replace)); } else { while (!WScript.StdIn.AtEndOfStream) { if (multi) { WScript.Stdout.Write(WScript.StdIn.ReadAll().replace(search,replace)); } else { WScript.Stdout.WriteLine(WScript.StdIn.ReadLine().replace(search,replace)); } } } 
+6
source

No delayed extension

 @echo off setlocal EnableExtensions DisableDelayedExpansion for /f "tokens=* eol=" %%L in (myfile.txt) do ( <nul set /p ="%%L" 2>nul %= Fixed Limitation 3 =% set "xLine=%%L" call set "xLine=%%xLine:"=%%" %= Fix for Limitation 2 =% call :NewLine ) endlocal pause >nul goto :eof :NewLine if "%xLine:~-1%"==";" echo. goto :eof 

Delayed expansion

 @echo off setlocal EnableExtensions DisableDelayedExpansion for /f "tokens=* eol=" %%L in (myfile.txt) do ( <nul set /p ="%%L" 2>nul %= Fixed Limitation 3 =% setlocal EnableDelayedExpansion set "xLine=%%L" set "xLine=!xLine:"=!" %= Fix for Limitation 2 =% if "!xLine:~-1!"==";" echo. endlocal ) endlocal pause >nul 

Limitations: (Same for both versions)

  • Lines may not begin with the = character using the <nul set /p "=%%L" command.
  • Lines may not end with a double quote " due to the if "<var>"==";" echo. .
  • Double-quoted characters " at the beginning of the line will be lost due to the command <nul set /p "=%%L" . (Solved by dbenham)
  • Spaces at the beginning of the line will be truncated due to the option "tokens=* eol=" . The same problem occurs for Windows Vista or later with the delims^=^ eol^= parameter due to the set /p command. I decided to implement the tokens method for consistency in all versions of Windows.
  • Limit the length of a batch line. 8191 bytes. See String Length Limit in xp batch file? and http://support.microsoft.com/kb/830473

Note. None of these restrictions will cause the script to crash, but instead, 1 and 3 will cause these lines to scroll, and 4 will simply clip the leading space from the line.

Update

I found a solution (just for display!) For error = and trimming space with the set /p command. However, this requires a non-display character to be entered in the script package. This must be done by editing the hexadecimal data of the script. Place any character without a space that is not related to the problem (illustrated . ), 0x08 by a 0x08 character (illustrated by 0x08 ), and only the %Var% value will be displayed. NOTE. . This will not work as a solution for file output, since non-displayable characters will also be output to the file.

 set /p =".0x08%Var%" 

The reason for this is the problem because the set command has a problem parsing variable names and does not allow values ​​to be contained in the variable name.

The SET command does not allow the equal sign to be part of a variable name.

This problem has always existed, but has been exacerbated by the leading cropping issues added in Vista +. Good analysis: http://www.dostips.com/forum/viewtopic.php?f=3&t=4209

+4
source

Here is a solution that does not use the set /P command, because it introduces some limitations. Here, the corresponding lines are combined into a variable and displayed as soon as a semicolon occurs, using echo , which does not have such restrictions. The code contains explanatory notes:

 @echo off setlocal EnableExtensions DisableDelayedExpansion rem // Define constants here: set "FILE=%~1" & rem // (input file from command line argument) set "CHAR=;" & rem // (character that marks the end of line) rem // Initialise variables: set "PREV=" & rem // (variable to collect lines to combine) rem // Iterate through the lines of the given file: for /F usebackq^ delims^=^ eol^= %%L in ("%FILE%") do ( set "LINE=%%L" rem // Toggle delayed expansion to not lose `!` in text: setlocal EnableDelayedExpansion rem // Check last character of current line: if "!LINE:~-1!"=="%CHAR%" ( rem /* Last character marks end of line, so output rem collected previous lines and current one: */ echo !PREV!!LINE! rem // Clear Cached previous lines: endlocal set "PREV=" ) else ( rem /* Last character does not mark end of line, so rem do not output it but cache it in a variable; rem the `for /F` loop lets the data pass `endlocal`: */ for /F delims^=^ eol^= %%K in ("!PREV!!LINE!") do ( endlocal set "PREV=%%K" ) ) ) rem /* Output all remaining cached data in case the last line rem is not terminated by an end-of-line marker: */ if defined PREV ( setlocal EnableDelayedExpansion echo !PREV! endlocal ) endlocal exit /B 
0
source

All Articles