How to import a delimited CSV file as ";" and decimal separator like "," in SAS?

I received (and will receive in the future) many CSV files that use a semicolon as a separator, and a comma as a decimal separator. So far, I have not been able to learn how to import these files into SAS using the proc import - or in any other automatic way without having to randomly navigate with variable names manually.

Create sample data:

%let filename = %sysfunc(pathname(work))\sap.csv; data _null_; file "&filename"; put 'a;b'; put '12345,11;67890,66'; run; 

Import Code:

 proc import out = sap01 datafile= "&filename" dbms = dlm; delimiter = ";"; GETNAMES = YES; run; 

After import, the value of the variable "AMOUNT", such as 350.58 (which corresponds to 350.58 in the American format), will look like 35 058 (which means thirty-four and ...) in SAS (and after re-export to German EXCEL it will be look like 35.058.00). A simple but dirty workaround would be the following:

 data sap02; set sap01; AMOUNT = AMOUNT/100; format AMOUNT best15.2; run; 

I wonder if there is an easy way to define a decimal separator for CVS imports (similar to the separator specification) ... or any other "clean" solution compared to my workaround. Thank you very much in advance!

+5
source share
2 answers

You technically have to use dbms=dlm not dbms=csv , although that shows all. CSV means “comma separated values”, while DLM means “limited”, which is correct here.

I don’t think there is a direct way to force SAS to read with comma through PROC IMPORT . You must tell SAS to use NUMXw.d informat when reading in the data, and I see no way to force this parameter into SAS. (There is an option for semicolon output, NLDECSEPARATOR , but I don't think this works here.)

It’s best to either write the data step code yourself, or run PROC IMPORT , go to the log and copy / paste the reading code into your program; then for each record in read mode add :NUMX10. or any corresponding maximum field width. It will look something like this:

 data want; infile "whatever.txt" dlm=';' lrecl=32767 missover; input firstnumvar :NUMX10. secondnumvar :NUMX10. thirdnumvar :NUMX10. fourthnumvar :NUMX10. charvar :$15. charvar2 :$15. ; run; 

It will also generate a lot of informative and formatted code; You can convert information data into NUMX10. instead of BEST. instead of adding information to reading. You can also simply delete information data if you do not have date fields.

 data want; infile "whatever.txt" dlm=';' lrecl=32767 missover; informat firstnumvar secondnumvar thirdnumvar fourthnumvar NUMX10.; informat charvar $15.; format firstnumvar secondnumvar thirdnumvar fourthnumvar BEST12.; format charvar $15.; input firstnumvar secondnumvar thirdnumvar fourthnumvar charvar $ ; run; 
+5
source

It’s best to either write the data step code yourself, or run PROC IMPORT, go to the log and copy / paste the read code into your program

This has a drawback. If there was a change in the structure of the csv file, for example, a changed column order, then you need to change the code in the SAS program.
Thus, it is safer to change the input by replacing the numeric fields with a comma with a dot and passing the modified input to the SAS.

The first idea was to use the perl program for this, and then use the file name in SAS with a pipe to read the modified input.
Unfortunately, there is a SAS restriction in the proc import: the IMPORT procedure does not support device types or access methods for the FILENAME statement other than DISK.
Therefore, you need to create a working file on a disk with a regulated input.

I used the CVS_PP package to read the csv file.
testdata.csv contains the csv data to read.
substitute_commasep.perl is the name of the perl program

perl code:

 # use lib "/........"; # specifiy, if Text::CSV_PP is locally installed. Otherwise error message: Can't locate Text/CSV_PP.pm in ....; use Text::CSV_PP; use strict; my $csv = Text::CSV_PP->new({ binary => 1 ,sep_char => ';' }) or die "Error creating CSV object: ".Text::CSV_PP->error_diag (); open my $fhi, "<", "$ARGV[0]" or die "Error reading CSV file: $!"; while ( my $colref = $csv->getline( $fhi) ) { foreach (@$colref) { # analyze each column value s/,/\./ if /^\s*[\d,]*\s*$/; # substitute, if the field contains only numbers and , } $csv->print(\*STDOUT, $colref); print "\n"; } $csv->eof or $csv->error_diag(); close $fhi; 

SAS Code:

 filename readcsv pipe "perl substitute_commasep.perl testdata.csv"; filename dummy "dummy.csv"; data _null_; infile readcsv; file dummy; input; put _infile_; run; proc import datafile=dummy out=data1 dbms=dlm replace; delimiter=';'; getnames=yes; guessingrows=32767; run; 
0
source

Source: https://habr.com/ru/post/1212711/


All Articles