Rails expands fields with scope PG dislikes

Question

Rails expands fields with scope PG dislikes

I have a widget model. Widgets belong to the store model, which belongs to the Area model owned by the Company. On the company model, I need to find all related widgets. Easy:

class Widget < ActiveRecord::Base def self.in_company(company) includes(:store => {:area => :company}).where(:companies => {:id => company.id}) end end

This beautiful request will be created:

 > Widget.in_company(Company.first).count SQL (50.5ms) SELECT COUNT(DISTINCT "widgets"."id") FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1 => 15088

But later I need to use this scope in a more complex scope. The problem is that AR extends the query by selecting individual fields that are not executed in PG, because the selected fields must be in the GROUP BY clause or in the aggregate function.

Here is a more complex area.

 def self.sum_amount_chart_series(company, start_time) orders_by_day = Widget.in_company(company).archived.not_void. where(:print_datetime => start_time.beginning_of_day..Time.zone.now.end_of_day). group(pg_print_date_group). select("#{pg_print_date_group} as print_date, sum(amount) as total_amount") end def self.pg_print_date_group "CAST((print_datetime + interval '#{tz_offset_hours} hours') AS date)" end

And this is the choice he throws at PG:

 > Widget.sum_amount_chart_series(Company.first, 1.day.ago) SELECT "widgets"."id" AS t0_r0, "widgets"."user_id" AS t0_r1,<...BIG SNIP, YOU GET THE IDEA...> FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1 AND "widgets"."archived" = 't' AND "widgets"."voided" = 'f' AND ("widgets"."print_datetime" BETWEEN '2011-04-24 00:00:00.000000' AND '2011-04-25 23:59:59.999999') GROUP BY CAST((print_datetime + interval '-7 hours') AS date)

What generates this error:

PGError: ERROR: the column "widgets.id" should appear in GROUP BY or be used in the aggregate function LINE 1: SELECT "widgets". "id" AS t0_r0, "Widgets" ". User_id ...

How to rewrite the Widget.in_company region so that AR does not extend the selection request to include all the fields of the Widget model?

+8

ruby-on-rails activerecord postgresql

Karl Apr 25 '11 at 18:27

source share

5 answers

Denis de bernardy · Answer 1 · 2011-05-24T05:45:21+0000

As Frank explained, PostgreSQL will reject any query that does not return a playable rowset.

Suppose you have a query like:

 select a, b, agg(c) from tbl group by a

PostgreSQL will reject it because b remains undefined in the group by statement. Run this in MySQL, on the contrary, and it will be accepted. In the latter case, however, run a few attachments, updates, and deletes, and the line order on the pages on the disk ends.

If the memory is served, implementation details are such that MySQL actually sorts by a, b and returns the first b in the set. But as far as the SQL standard is concerned, the behavior is unspecified - and, of course, PostgreSQL doesn’t always sort it before running aggregate functions.

This could potentially lead to different b values in the result set in PostgreSQL. So PostgreSQL gives an error if you are not more specific:

 select a, b, agg(c) from tbl group by a, b

What Frank stands out is that in PostgreSQL 9.1, if a is the primary key, you can leave b unspecified - the scheduler has learned to ignore the next group by fields when the applicable primary keys imply a unique row.

For your problem, in particular, you need to indicate your group, as at the present time, plus all the fields on which you base your population, i.e. "widgets"."id", "widgets"."user_id", [snip] , but not such things as sum(amount) , which are the aggregate function calls.

As a side note, I'm not sure how your ORM / model works, but the SQL generator is not optimal. Many of those who left outer joins seem like they should be inner joins. This will result in the scheduler being able to select the appropriate connection order, if applicable.

Frank heikens · Answer 2 · 2011-05-18T07:46:09+0000

PostgreSQL version 9.1 ( beta at the moment ) can fix your problem, but only if there is a functional dependence on the primary key.

From the release notes:

Allow non-GROUP BY columns to the target query list when the primary key is specified in the GROUP BY clause (Peter Eisentraut)
Some other database system has already allowed this behavior and because of the primary key, the result is unequivocal.

You can run the test and see if it fixes your problem. If you can wait for the release of products, this can fix the problem without changing your code.

thomasfedb · Answer 3 · 2011-05-23T12:12:18+0000

First, simplify your life by saving all dates in a standard time zone. Changing dates with time zones should really be done in the view as a convenience to the user. That alone should save you a lot of pain.

If you are already working, write a transition to create a normalised_date column, wherever it is useful.

nrI suggests that another problem would be to use raw SQL, which the rails won't cause for you. To avoid this, try using a gem called Squeel (aka Metawhere 2) http://metautonomo.us/projects/squeel/

If you use this, you can remove the hard SQL code and let the rails return to doing their magic.

For example:

 .select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")

becomes (after removing the need to normalize the date):

 .select{sum(amount).as(total_amount)}

Karl · Answer 4 · 2011-05-27T18:47:33+0000

Sorry to answer my own question, but I figured it out.

First, let me apologize to those who thought I might have a SQL or Postgres problem, it is not. The problem is ActiveRecord and the SQL that it generates.

Answer: use .joins instead of .includes. So I just changed the line in the top code and it works as expected.

 class Widget < ActiveRecord::Base def self.in_company(company) joins(:store => {:area => :company}).where(:companies => {:id => company.id}) end end

I assume that when using .includes ActiveRecord tries to be smart and use JOINS in SQL, but it is not smart enough for this particular case and generates this ugly SQL to select all related columns.

However, all the answers taught me a little about Postgres, which I did not know about, so thank you very much.

ilgam · Answer 5 · 2014-11-05T16:44:44+0000

sort in mysql:

 > ids = [11,31,29] => [11, 31, 29] > Page.where(id: ids).order("field(id, #{ids.join(',')})")

in postgres:

 def self.order_by_ids(ids) order_by = ["case"] ids.each_with_index.map do |id, index| order_by << "WHEN id='#{id}' THEN #{index}" end order_by << "end" order(order_by.join(" ")) end User.where(:id => [3,2,1]).order_by_ids([3,2,1]).map(&:id) #=> [3,2,1]

Rails expands fields with scope PG dislikes

More articles: