When I export a dataset to Stata format using PROC EXPORT , SAS 9.4 automatically expands and adds an extra (empty) byte to each observation of each string variable. For example, in this dataset:
data test1; input cust_id $ 1 month 3-8 category $ 10-12 status $ 14-14 ; datalines; A 200003 ABC C A 200004 DEF C A 200006 XYZ 3 B 199910 ASD X B 199912 ASD C ; quit; proc export data = test1 file = "test1.dta" dbms = stata replace; quit;
the cust_id , category and status variables must be str1 , str3 and str1 in the final Stata file and, therefore, occupy 1 byte, 3 bytes and 1 byte, respectively, for each observation. However, SAS automatically adds an extra empty byte to each str4 , which extends their data types to str2 , str4 and str2 in the output Stata file.
This is very problematic because an extra byte is added to each observation of each string variable. For large data sets (I have about 530 million cases and numerous string variables), this can add a few gigabytes to the exported file.
Once the file is uploaded to Stata, the compress command in Stata can automatically delete these empty bytes and compress the file, but for large datasets, PROC EXPORT adds so many extra bytes to the file, which I donβt always have enough memory to load the dataset into Stata first of all.
Is there a way to stop the SAS from populating string variables in the first place? When I export a file with one character string variable (for example), I want this variable to be stored as one character string variable in the output file.
sas stata
Michael a
source share