If I recall, when calling factor (x) with the level argument, the default levels are set as levels = sort (unique (x)).
You can override this action by setting levels = unique (x).
For example:
set.seed(1) x = sample(letters, 100, replace = TRUE) head(x, 5)
[1] "g" "j" "o" "x" "f"
levels(factor(x))
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p "" q "" r "" s "
[20] "t" "u" "v" "w" "x" "y" "z"
levels(factor(x, levels = unique(x)))
[1] "g" "j" "o" "x" "f" "y" "r" "q" "b" "e" "u" "m" "s" "z" "d" "k "" a "" w "" i "
[20] "p" "v" "c" "n" "t" "l" "h"
You can see that the setting level = unique (x) preserves the order of entry into the data.
Greg
source share