Selecting columns with DISTINCT in PostgreSQL

Question

Selecting columns with DISTINCT in PostgreSQL

I request bus stops from the database and I want it to return only 1 stop per bus line / direction. This request does just this:

Stop.select("DISTINCT line_id, direction")

Except that it will not give me a different attribute than those of 2. I tried a couple of other queries to return an id in addition to the line_id and direction fields (ideally, it returned all columns), with no luck:

 Stop.select("DISTINCT line_id, direction, id")

and

 Stop.select("DISTINCT(line_id || '-' || direction), id")

In both cases, the query loses its separate sentence, and all rows are returned.

Some terrible dude helped me and suggested using a subquery to return all identifiers:

 Stop.find_by_sql("SELECT DISTINCT a1.line_id, a1.direction, (SELECT a2.id from stops a2 where a2.line_id = a1.line_id AND a2.direction = a1.direction ORDER BY a2.id ASC LIMIT 1) as id FROM stops a1

Then I can extract all the identifiers and execute a second query to get the full attributes for each stop.

Is there a way to get all this inside 1 request and return all attributes?

+6

ruby-on-rails activerecord postgresql

samvermette Feb 15 '11 at 20:46

source share

2 answers

Not so fast - another answer chooses stop_id arbitrary

That is why your question does not make sense. We can pull out stop_ids and have different line_id and direction. But we have no idea why we have stop_id.

  create temp table test( line_id integer, direction char(1), stop_id integer); insert into test values (1, 'N', 1), (1, 'N', 2), (1, 'S', 1), (1, 'S', 2), (2, 'N', 1), (2, 'N', 2), (2, 'S', 1), (2, 'S', 2) ; select distinct on (line_id, direction) * from test; -- do this again but will reverse the order of stop_ids -- could it possible change our Robust Query?!!! drop table test; create temp table test(line_id integer,direction char(1),stop_id integer); insert into test values (1, 'N', 2), (1, 'N', 1), (1, 'S', 2), (1, 'S', 1), (2, 'N', 2), (2, 'N', 1), (2, 'S', 2), (2, 'S', 1) ; select distinct on (line_id, direction) * from test;

First select:

 line_id | direction | stop_id ---------+-----------+--------- 1 | N | 1 1 | S | 1 2 | N | 1 2 | S | 1

Second choice:

 line_id | direction | stop_id ---------+-----------+--------- 1 | N | 2 1 | S | 2 2 | N | 2 2 | S | 2

So, we left without stop_id grouping, but we have no guarantees why we got the one we did. All we know is that it is a valid stop_id. Any updates, inserts, or other material that is not guaranteed by RDMS may change around the physical order of the lines.

This is what I meant in the top comment. There is no known reason to pull one stop_id on top of another, but for some reason you need this stop_id (or something else) desperately.

+3

nate c Feb 16 '11 at 4:30

source share

Pier-olivier thibault · Accepted Answer · 2011-02-15T21:09:01+0000

 Stop.select("DISTINCT ON (line_id, direction) *")

Selecting columns with DISTINCT in PostgreSQL

Not so fast - another answer chooses stop_id arbitrary

More articles: