UNIX Shell Script Solution for formatting a line-split, segmented file

The input file has up to 34 different types of records in one line.

The file is divided into channels, and each recording type is separated by the “~” symbol (except for the original recording type.

Not all 34 record types are contained in each line, and I do not need all of them.

All record types will be sent in this order, but not all record types will always be sent. The first type of entry is mandatory and will always be sent. Of the 34 types, there are only 7 that are mandatory.

Each record type has a predefined number of fields and should never deviate from this definition without proper time between the client and our download.

An Oracle table will be built with all the necessary columns based on the required record types. Thus, one row will contain information from each type of record, similar to the input file, but additionally includes values ​​of zeros for the columns that will come from certain types of records that were not included in the input.

The end result I'm looking for is a way to do conditional formatting in an input file to generate output that can just be loaded into a shell script via sqlldr instead of going through PL / SQL (as I want my non-PL employees / SQL, could fix / fix any problems that occurred at boot time).

A small example with three entries (the data types in this example do not matter):

Record Types:  AA, BB, CC, DD, EE, FF  
AA has 5 fields (Mandatory)  
BB has 2 fields (Optional)  
CC has 3 fields (Optional)  
DD has 6 fields (Optional)  
EE has 4 fields (Optional)  
FF has 2 fields (Not needed.  Skipping in output)  
GG has 4 fields (Optional)


AA|12345|ABCDE|67890|FGHIJ|~BB|12345|~CC|ABCDE|12345|~DD|A|B|C|D|E|~EE|1|2|3|~FF|P|~GG|F|R|T
AA|23456|BCDEF|78901|GHIJK|~CC|BCDEF|23456|~EE|2|3|4|~GG|R|F|G
AA|34567|CDEFG|89012|HIJKL|~DD|B|C|D||~FF|Q

1 , , 2 3 . , . :

AA|12345|ABCDE|67890|FGHIJ|~BB|12345|~CC|ABCDE|12345|~DD|A|B|C|D|E|~EE|1|2|3|~GG|F|R|T
AA|23456|BCDEF|78901|GHIJK|~BB||~CC|BCDEF|23456|~DD||||||~EE|2|3|4|~GG|R|F|G
AA|34567|CDEFG|89012|HIJKL|~BB||~CC|||~DD|B|C|D||~EE||||~GG|||

, :

typeset -i count=0
while read record
do
newfile="`echo $file`.$count.dat"
echo $record | sed 's/|~/\n/g' > $newfile
count=$count+1
done < $file 

, . , , , , , , .

?

+1
1

awk script , , :

#!/usr/bin/awk -f

BEGIN { FS=OFS="~" }

FNR==NR {
    dflts[$1] = create_empty_field($1,$2)
    if( $3 ~ /req|opt/ ) fld_order[++fld_cnt] = $1
    fld_rule[$1] = $3
    next
}

{
    flds = ""
    j = 1
    for(i=1; i<=fld_cnt; i++) {
        j = skip_flds( j )

        if($j !~ ("^" fld_order[i])) fld = dflts[fld_order[i]]
        else { fld = $j; j++ }
        flds = flds (flds=="" ? "" : OFS) fld
    }
    print flds
}

function create_empty_field(name, cnt,     fld, i) {
    fld = name
    for(i=1; i<=cnt; i++) { fld = fld "|" }
    return( fld )
}

function skip_flds(fnum,     name) {
    name = $fnum
    sub(/\|.*$/, "", name)
    while(fld_rule[name] == "skp") {
        fnum++
        name = $fnum
        sub(/\|.*$/, "", name)
    }
    return( fnum )
}

, , "known_flds"

AA~5~req
BB~2~opt
CC~3~opt
DD~6~opt
EE~4~opt
FF~2~skp
GG~4~opt

, , FS script . . :

  • req → ( ?)
  • opt → ( )
  • skp → ( )

awk.script ./awk.script known_flds data, :

AA|12345|ABCDE|67890|FGHIJ|~BB|12345|~CC|ABCDE|12345|~DD|A|B|C|D|E|~EE|1|2|3|~GG|F|R|T
AA|23456|BCDEF|78901|GHIJK|~BB||~CC|BCDEF|23456|~DD||||||~EE|2|3|4|~GG|R|F|G
AA|34567|CDEFG|89012|HIJKL|~BB||~CC|||~DD|B|C|D||~EE||||~GG||||

G , .

, , :

  • -
  • , , .
  • , .
  • known_flds. , , , , . , .

script:

  • FNR==NR - create_empty_field(), dflts . , fld_order. fld_order, "" fld_rule.
  • . fld_cnt . , known_flds, .
  • opt j.
  • flds $j, , , dflts.
  • flds , .

create_empty_field():

  • name, cnt , fld, i - , .
  • fld name ($1 known_flds)
  • cnt ($2 known_flds).

skip_flds():

  • fnum - , name -
  • name $fnum
  • , fld_rule[name] == "skp".
  • , fnum reset name.
  • , name = sub , .

, / known_flds, / awk.script data. , , , .. , .

+2

All Articles