There is no need to split the population at all.
If you accept a sample of 1000 out of a total of hundreds of course codes, then it is reasonable that many of these course codes will not be selected in any sample at all.
If the population is homogeneous (say, a continuous sequence of student identifiers), a uniformly distributed sample will automatically be weighted by code rate. Since newid () is a uniform random sampler, you can exit the box.
The only wrinkle you may encounter is the student ID associated with several course codes. In this case, create a unique list (temporary table or subquery) containing a sequential identifier, student identifier and course code, an approximate sequence identifier from it, grouping by student identifier to remove duplicates.
source share