How do I select a limited number of rows with the same attribute?

I have a table of urls and one attribute is a domain.

Suppose I have 100 URLs from Google, 100 from Facebook, 100 from Ebay and the same for other domains, but I want to get the first 30 URLs from Google, 30 from Facebook, 30 from Ebay and 30 from others, limiting up to 500 URLs.

How am i doing this?

+5
source share
3 answers

The following SQL resolves my case, but the URLs fail because row_number do not match orders. I think this SQL needs improvement.

SELECT url,row_number FROM(
     SELECT url,row_number() OVER (PARTITION BY domain) FROM website
     WHERE domain IN
    (SELECT DISTINCT domain FROM link)  
) AS links 
WHERE row_number <= 10
LIMIT 25
+2
source

How about something like this:

SELECT url FROM link WHERE domain='Google' LIMIT 30
UNION
SELECT url FROM link WHERE domain='Facebook' LIMIT 30
UNION
SELECT ...

and etc.

+1
source

My solution is in PL / pgSQL. I found a better way that you do not need an additional temporary table (using functions INSERTin this table in the body), that is, you can use this function to directly get the result:

CREATE OR REPLACE FUNCTION getURLs(singleLimit integer, totalLimit integer)
RETURNS SETOF RECORD AS $$
DECLARE
    dom text;
    nrOfDomains integer;
    i integer;
    lim integer;
    remainder integer;
BEGIN
    nrOfDomains := totalLimit/singleLimit; -- integer division (truncate)
    remainder := totalLimit%singleLimit;

    IF remainder <> 0 THEN
        nrOfDomains := nrOfDomains + 1;
    END IF;

    i := 1;
    FOR dom IN SELECT DISTINCT domain FROM website LIMIT nrOfDomains
    LOOP
        IF i = nrOfDomains AND remainder <> 0 THEN
            lim := remainder;
        ELSE
            lim := singleLimit;
        END IF;

        RETURN QUERY SELECT * FROM website WHERE domain = dom LIMIT lim;

        i := i + 1;
    END LOOP;
    RETURN;
END $$
LANGUAGE 'plpgsql';

Here is a test drive:

postgres=> CREATE TABLE website(url text, domain text);
CREATE TABLE

postgres=> INSERT INTO website
    SELECT 'http://' || d.column1  ||'/' || n, d.column1
    FROM generate_series(1, 100) n CROSS JOIN
    (VALUES ('google'), ('facebook'), ('ebay')) d;
INSERT 0 300

postgres=> SELECT * FROM getURLs(10, 25) website(url text, domain text);

Result:

        url         |  domain
--------------------+----------
 http://google/1    | google
 http://google/2    | google
 http://google/3    | google
 http://google/4    | google
 http://google/5    | google
 http://google/6    | google
 http://google/7    | google
 http://google/8    | google
 http://google/9    | google
 http://google/10   | google
 http://facebook/1  | facebook
 http://facebook/2  | facebook
 http://facebook/3  | facebook
 http://facebook/4  | facebook
 http://facebook/5  | facebook
 http://facebook/6  | facebook
 http://facebook/7  | facebook
 http://facebook/8  | facebook
 http://facebook/9  | facebook
 http://facebook/10 | facebook
 http://ebay/1      | ebay
 http://ebay/2      | ebay
 http://ebay/3      | ebay
 http://ebay/4      | ebay
 http://ebay/5      | ebay
(25 rows)
+1
source