I am working on converting some variables into a German credit file from caret into factors. Using factors reduces the number. variables from 62 to 21.
The problem is that I get inconsistent results for summarizing the data for the Goal.X columns:
for (i in 20:30) { print(c(colnames(GermanCredit)[i], length( which(GermanCredit[,i] == 1) ) )) } [1] "Purpose.NewCar" "234" [1] "Purpose.UsedCar" "103" [1] "Purpose.Furniture.Equipment" "181" [1] "Purpose.Radio.Television" "280" [1] "Purpose.DomesticAppliance" "12" [1] "Purpose.Repairs" "22" [1] "Purpose.Education" "50" [1] "Purpose.Vacation" "0" [1] "Purpose.Retraining" "9" [1] "Purpose.Business" "97" [1] "Purpose.Other" "12"
and results from prop.table
prop.table(table(Purpose)) NewCar UsedCar Furniture.Equipment Radio.Television 0.234 0.103 0.181 0.280 DomesticAppliance Repairs Education Vacation 0.012 0.022 0.050 0.009 Retraining Business Other 0.097 0.012 0.000
Vacation-Other results seem to be rotated for some reason. Any help in determining why inconsistent results would be greatly appreciated. Thanks.
- The goal was obtained using the following cycle:
pcolnamerepeat = c("CheckingAccountStatus.", "CreditHistory.", "Purpose.", "SavingsAccountBonds.", "EmploymentDuration.", "Personal.", "OtherDebtorsGuarantors.", "Property.", "OtherInstallmentPlans.", "Housing.", "Job.") for (i in pcolnamerepeat) { rpt = grep(i, colnames(GermanCredit)) tempfac <- factor(apply(GermanCredit[,rpt], 1, function(x) which(x == 1))) levels(tempfac) <- substr(colnames(GermanCredit[,rpt]), nchar(i)+1, nchar(colnames(GermanCredit[,rpt])) ) GermanCredit <- cbind(GermanCredit[-c(rpt)], tempfac) names(GermanCredit)[length(GermanCredit)] <- substr(i, 1, nchar(i)-1 ) } attach(GermanCredit)