Postgres: different, but only for one column

I have a pgsql table with names (more than 1 million rows), but I also have a lot of duplicates. I select 3 fields: id , name , metadata .

I want to randomly select them using ORDER BY RANDOM() and LIMIT 1000 , so I do this many steps to save some memory in my PHP script.

But how can I do this, so it gives me a list without duplicates in the names.

For example, [1,"Michael Fox","2003-03-03,34,M,4545"] will be returned, but not [2,"Michael Fox","1989-02-23,M,5633"] . The name field is the most important and should be unique on the list every time I make a choice, and it should be random.

I tried with GROUP BY name , bu, then it expects that I have id and metadata in GROUP BY , as well as in the aggragate function, but I do not want them to be filtered in some way.

Does anyone know how to extract many columns, but only make individual columns?

+61
select postgresql distinct
Jun 04 '13 at 9:14
source share
3 answers

Make it separate on one (or n) column (s):

 select distinct on (name) name, col1, col2 from names 

This will return any string containing the name. If you want to control which row will be returned, you need to order:

 select distinct on (name) name, col1, col2 from names order by name, col1 

Will return the first line when ordering col1.

distinct on :

SELECT DISTINCT ON (expression [, ...]) stores only the first line of each set of lines, where these expressions are evaluated equal. DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the β€œfirst line” of each set is unpredictable unless ORDER BY is used to make sure that the first line is displayed first.

The DISTINCT ON expression must match the leftmost ORDER BY expression. An ORDER BY clause usually contains additional expressions (expressions) that determine the desired row priority in each DISTINCT ON group.

+127
Jun 04 '13 at 12:36 on
source share

Does anyone know how to extract many columns, but only make individual columns?

You want the DISTINCT ON .

You did not provide sample data or a complete request so that I did not have anything to show you. You want to write something like:

 SELECT DISTINCT ON (name) fields, id, name, metadata FROM the_table; 

This will return an unpredictable (but not "random") rowset. If you want to make it predictable, add ORDER BY for Clodaldo's answer. If you want to make this truly random, you need ORDER BY random() .

+12
Jun 04 '13 at 12:35
source share
 SELECT NAME,MAX(ID) as ID,MAX(METADATA) as METADATA from SOMETABLE GROUP BY NAME 
+2
Jun 04 '13 at 9:17
source share



All Articles