First line of data skipped during import

I am using an XML file to import a CSV file, and the first line of data is skipped. I canโ€™t understand why.

File format

<?xml version="1.0"?> <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='","' /> <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='\n' /> </RECORD> <ROW> <COLUMN SOURCE="1" NAME="COLUMN1" xsi:type="SQLVARYCHAR" /> <COLUMN SOURCE="2" NAME="COLUMN2" xsi:type="SQLVARYCHAR" /> </ROW> </BCPFORMAT> 

Csv

 COLUMN1,COLUMN2 "ABC","ABC123456" "TNT","TNT123456" 

Query

 SELECT * FROM OPENROWSET(BULK 'C:\sample.csv', FORMATFILE='C:\sample.xml', FIRSTROW = 2) AS a 

Result

 COLUMN1 COLUMN2 ------- ---------- "TNT TNT123456" (1 row(s) affected) 

If FIRSTROW changed to 1 , the result will be:

 COLUMN1 COLUMN2 --------------------- ---------- COLUMN1,COLUMN2 "ABC ABC123456" "TNT TNT123456" 

If the header line is removed from the CSV and FIRSTROW changed to 1 , the result is returned as expected:

 COLUMN1 COLUMN2 ------- ---------- "ABC ABC123456" "TNT TNT123456" 

Since this is an automatic report that comes with headers, are there other options to fix this?

+8
import sql-server csv
source share
6 answers

There are several issues here:

  • I suspect \n is not valid in the first line. Otherwise, SQL Server will not change the first two rows when changing to FIRSTROW = 1 .

  • Using "," as a column separator works fine for all columns except the first and last columns. This leads to the leading " in the first column and ending " in the last column. You can deal with the latter by changing the ROWTERMINATOR to "\n , but this will only work if you can also add the trailing line " to the title bar (during validation \n there). At this point, you can also make sure that the header line matches the data lines in all aspects, therefore:

     "COLUMN1","COLUMN2" -------------------^ this character has to be \n 

Honestly, I think that you could spend a week fighting all these BCP and BULK INSERT problematic issues, but still donโ€™t have a perfect solution that does not require post-operation actions (such as trimming leading / trailing characters from certain columns). My recommendation: spend 20 minutes and write a parser in C # that will automatically fix these files - deleting the title bar, ensuring the correct delimiters in place, removing all the stupid ones " , etc., before SQL Server sees this file . Clearing the file will be much less than the hoops that you are currently jumping. I am sure there are solutions for this, but IIRC you have been struggling with it for quite some time ...

+2
source share

The terminator of the first field should be only "," and not "," ".

Replace the following line:

 <FIELD ID="1" xsi:type="CharTerm" TERMINATOR=',' /> 

Here's what happens with your original file format ...

The first columns end with: "," ... This means that the SQL server parses the first row, then reads the second row and receives the first field:

 COLUMN1,COLUMN2 "ABC 

It continues to read and receive the second field (remember that we are still in the second line of the file):

 ABC123456" 

Now it has the first line ...

He then reads the following line:

 "TNT TNT123456" 

So, when you skip the first line, it really skips the first line, because your first line does not use quotation marks ...

Hope this helps.

+2
source share

Also, check the character encoding of the .CSV file. It could be a UTF-8 with a three-byte signature. Bcp.exe does not seem to understand this format. Make sure your file is saved in ASCII format as a subscriber. You can also specify the UTF-8 codepage parameter (-C 65001), but AFAIR does not work on some older versions of SQL Server.

Edit: When importing as a CSV file with a UTF-8 signature, I observed the same problem: the first line of data (the one that has the signature) is skipped.

+1
source share
 <?xml version="1.0"?> <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\""/> <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\",\"" /> <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\"\r\n"/> </RECORD> <ROW> <COLUMN SOURCE="2" NAME="COLUMN1" xsi:type="SQLVARYCHAR" /> <COLUMN SOURCE="3" NAME="COLUMN2" xsi:type="SQLVARYCHAR" /> </ROW> </BCPFORMAT> 

The BCP file provided here will be useful for skipping the first field, which essentially can ignore the beginning of "by the value of COLUMN1". FIELD1 in the record ends with a single double quote and will be skipped. Field 2 breaks the double quote followed by a comma, and then the double quote \ ", \" (backslashes here to escape the double quote character). Field 3 ends with a double quote, followed by CR (carriage return), followed by LF (lineFeed), which is \ r \ n.

The system reads the first field to a double quote and skips it, reads the second field to \ ", \" and assigns it to COLUMN1, reads the third field to \ "\ r \ n and assigns COLUMN2 and goes to the next record, etc. Basically This should work cleanly for your CSV files.

0
source share

Place the false "0" field in the XML format file, but do not map additional columns.

 <FIELD ID="0" xsi:type="CharTerm" TERMINATOR='"' /> 

This worked for me using the following query:

 SELECT * FROM OPENROWSET( BULK 'C:\sample.txt', FIRSTROW = 0, FORMATFILE = 'C:\sample.xml' ) AS a; 
0
source share

I had exactly the same problem, the CSV data header should also have double quotes (even if you try to skip the first line when inserting BCP or Bulk, this is unbelievable) or just delete the header (first line of your CSV file):

"COLUMN1", "COLUMN2" "ABC", "ABC123456" "TNT", "TNT123456"

0
source share

All Articles