Sample from Oracle, exact number of results required (example)

I am trying to pull a random sample of a population from the Peoplesoft database. Internet searches made me think that an example of fetching a select statement might be a viable option for us to use, however, it’s hard for me to understand how the Sample clause determines the number of samples returned. I looked at the oracle documentation found here: http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_10002.htm#i2065953

But the link above only talks about the syntax used to create the sample. The reason for my question is because I need to understand how the sample percentage determines the return sample size. It looks like it applies a random number to the percentage you are asking, and then uses the number of seeds to count all the "n" records. Our requirement is that we pull out the exact number of samples, for example, we randomly select them and present the entire table (or at least group the data that we select with filters)

In a population of 10,200 elements, if I need a sample of about 100 elements, I could use this statement:

SELECT * FROM PS_LEDGER SAMPLE(1) --1 % of my total population
WHERE DEPTID = '700064' 

However, we need to pull out the exact number of samples (in this case 100) so that I can choose a sample size that almost always returns more than the number I need, and then crop IE

SELECT Count(*) FROM PS_LEDGER SAMPLE(2.5) --this percent must always give > 100 items
WHERE DEPTID = '700064' and rownum < 101

My concern about this is that my sample will not evenly represent the entire population. For example, if the sample function simply pulls each N-record after it creates its own randomly generated seed, then selecting rownum <101 will disable all records selected from the bottom of the table. What I'm looking for is a way to get exactly 100 records from a table that are randomly selected and representative enough for the whole table. Please, help!

+3
source share
3 answers

jonearles, ( 11gR2 OEL), a 1; , . /, :

select a, count(*) from (
    select * from test1 sample (1)
    order by dbms_random.value
)
where rownum < 101
group by a;

... , :

         A   COUNT(*)
---------- ----------
         1         71
         2         29

         A   COUNT(*)
---------- ----------
         1        100

         A   COUNT(*)
---------- ----------
         1         64
         2         36

, 100% 1 . . block, , , , , - , .

, , , , , ; :

select a, count(*) from (
    select a, b from (
        select a, b, row_number() over (order by dbms_random.value) as rn
        from test1
    )
    where rn < 101
)
group by a;

:

         A   COUNT(*)
---------- ----------
         1         48
         2         52

         A   COUNT(*)
---------- ----------
         1         57
         2         43

         A   COUNT(*)
---------- ----------
         1         49
         2         51

... . YMMV, .


Oracle , ora_hash, , "" .

+5

SAMPLE, - . , -, .

create table test1(a number, b char(2000));

--Insert 10K fat records.  A is always 1.
insert into test1 select 1, level from dual connect by level <= 10000;

--Insert 10K skinny records.  A is always 2.
insert into test1 select 2, null from dual connect by level <= 10000;

--Select about 10 rows.
select * from test1 sample (0.1) order by a;

, 2-. , , .

, , , , RANDOM , . , , , ORDER BY DBMS_RANDOM.VALUE.

+3

. , Stratum. . ( '700064'). :

Select DEPTID, Count(*) SAMPLE_ONE 
FROM PS_LEDGER  Sample(1)
WHERE DEPTID = '700064' 
Group By DEPTID

1% . TABLE_1

:

Select 
Ceil (Rank() over (Partition by DEPTID Order by DBMS_RANDOM.VALUE)
            / (Select SAMPLE_ONE From TABLE_1) STRATUM_GROUP
,A.*
FROM PS_LEDGER 

. , , Random Sample Sets of approx. 1%.

, 1000 , 100 10 .

.

Not sure I explained it very well, but it worked for me. I had 168 Stratum Configured on a table with over 10mm recordings that worked well.

If you want more explanation or can improve it, please feel free to.

Hi

0
source

All Articles