I have a data frame, below are examples of data from it.
Company Category Margin
SBI BK 34.5
PNB BK 39.5
UCO BANK BK 39.9
ANDHRA BANK BK 41.3
INDIAN BANK BK 42.3
DENA BANK BK 44.5
VIJAYA BANK BK 44.5
UNION BANK BK 47.6
CENTRAL BANK BK 49.8
INFOSYS IT 5.6
HCL TECH IT 5.9
TCS IT 6.9
CMC IT 12.6
TECHMAHINDRA IT 12.6
COGNIZANT IT 15.8
IGATE IT 22.4
WIPRO IT 22.9
HEXAWARE IT 34.8
MAHINDRA SATYAM IT 34.8
DR. REDDYS PH 14.5
SUN PHARMA PH 19.2
CIPLA PH 23.9
LUPIN PH 23.9
DIVIS LABS PH 29
A careful look at the data frame indicates that it is sorted into CATEGORY, MARGIN, and then COMPANY categories.
Now my requirement is to add a new column called Ranking and assign a rating starting at 1 for each CATEGORY set. The rating numbering should begin with 1 when a new CATEGORY appears in the list.
Output result:
Company Category Margin Ranking
SBI BK 34.5 1
PNB BK 39.5 2
UCO BANK BK 39.9 3
ANDHRA BANK BK 41.3 4
INDIAN BANK BK 42.3 5
DENA BANK BK 44.5 6
VIJAYA BANK BK 44.5 7
UNION BANK BK 47.6 8
CENTRAL BANK BK 49.8 9
INFOSYS IT 5.6 1
HCL TECH IT 5.9 2
TCS IT 6.9 3
CMC IT 12.6 4
TECHMAHINDRA IT 12.6 5
COGNIZANT IT 15.8 6
IGATE IT 22.4 7
WIPRO IT 22.9 8
HEXAWARE IT 34.8 9
MAHINDRA SATYAM IT 34.8 10
DR. REDDYS PH 14.5 1
SUN PHARMA PH 19.2 2
CIPLA PH 23.9 3
LUPIN PH 23.9 4
DIVIS LABS PH 29 5
Additional requirements
Assume that the input data set is completely zigzag. Then
unique(df$Category)
[1] "BK" "IT" "PH" "MT" "EG"
After formatting, the same one returns
unique(df$Category)
[1] "BK" "IT" "PH"
Note: In the process of formatting the input dataset to prepare it for missing values, several categories have been deleted.
.. dataframe
, Ranking .
. , - , , .
head(companyRanks(3), 4) returns
COMPANY CATEGORY
BK UCO BANK BK
IT TCS IT
PH CIPLA PH
MT <NA> MT
EG <NA> EG
head(companyRanks(10), 4) # returns:
COMPANY CATEGORY
BK <NA> BK # Since there is no company with rank 10 under category BK, NA returned
IT MAHINDRA SATYAM IT
PH <NA> PH
MT <NA> MT
EG <NA> EG
- , ?