Rails / Sql - search results for an order / group, so that repetition of objects occurs only after the appearance of others

In my application, let's say animals have many photos. I am requesting photos of animals so that I want to show all photos of all animals. However, I want each animal to look like a photograph before repeating.

Example: animal instance 1, 'cat', has four photos, animal instance 2, 'dog', has two photos: photos should appear ordered as so: #photo belongs to #animal tiddles.jpg , cat fido.jpg dog meow.jpg cat rover.jpg dog puss.jpg cat felix.jpg, cat (no more dogs so two consecutive cats) 
  • Pagination is required, so I cannot order by array.
  • The file name structure / convention does not help, although animal_id exists every photo.
  • Although there are two types of animals in this example, it is an active recording model with hundreds of records.
  • Animals can be selectively requested.

If this is not possible with active_record, I will happily use sql; I am using postgresql.

My brain is exhausted, so if anyone can come up with a better headline, please go ahead and edit it or suggest in the comments.

+4
source share
6 answers

Here is a custom PostgreSQL solution:

 batch_id_sql = "RANK() OVER (PARTITION BY animal_id ORDER BY id ASC)" Photo.paginate( :select => "DISTINCT photos.*, (#{batch_id_sql}) batch_id", :order => "batch_id ASC, photos.animal_id ASC", :page => 1) 

Here is the agmatic solution of the database:

 batch_id_sql = " SELECT COUNT(bm.*) FROM photos bm WHERE bm.animal_id = photos.animal_id AND bm.id <= photos.id " Photo.paginate( :select => "photos.*, (#{batch_id_sql}) batch_id", :order => "batch_id ASC, photos.animal_id ASC", :page => 1) 

Both queries work even if you have a where clause. Monitor the request using the expected dataset to see if it meets the expected throughput and latency requirements.

Link

PostgreSQL window function

+3
source

New solution

Add an integer column named batch_id to the animals table.

 class AddBatchIdToPhotos < ActiveRecord::Migration def self.up add_column :photos, :batch_id, :integer set_batch_id change_column :photos, :batch_id, :integer, :nil => false add_index :photos, :batch_id end def self.down remove_column :photos, :batch_id end def self.set_batch_id # set the batch id to existing rows # implement this end end 

Now add before_create to the Photo model to set the batch identifier.

 class Photo belongs_to :animal before_create :batch_photo_add after_update :batch_photo_update after_destroy :batch_photo_remove private def batch_photo_add self.batch_id = next_batch_id_for_animal(animal_id) true end def batch_photo_update return true unless animal_id_changed? batch_photo_remove(batch_id, animal_id_was) batch_photo_add end def batch_photo_remove(b_id=batch_id, a_id=animal_id) Photo.update_all("batch_id = batch_id- 1", ["animal_id = ? AND batch_id > ?", a_id, b_id]) true end def next_batch_id_for_animal(a_id) (Photo.maximum(:batch_id, :conditions => {:animal_id => a_id}) || 0) + 1 end end 

Now you can get the desired result by issuing a simple paginate command

 @animal_photos = Photo.paginate(:page => 1, :per_page => 10, :order => :batch_id) 

How it works?

Consider that we have the data indicated below:

 id Photo Description Batch Id 1 Cat_photo_1 1 2 Cat_photo_2 2 3 Dog_photo_1 1 2 Cat_photo_3 3 4 Dog_photo_2 2 5 Lion_photo_1 1 6 Cat_photo_4 4 

Now, if we execute the query ordered with batch_id , we get this

 # batch 1 (cat, dog, lion) Cat_photo_1 Dog_photo_1 Lion_photo_1 # batch 2 (cat, dog) Cat_photo_2 Dog_photo_2 # batch 3,4 (cat) Cat_photo_3 Cat_photo_4 

The distribution of the batch is not random; the animals are filled from above. The number of animals displayed per page is determined by the per_page parameter passed to the paginate method (and not the batch size).

Old decision

Have you tried this?

If you use the will_paginate gem:

 # assuming you want to order by animal name animal_photos = Photo.paginate(:include => :animal, :page => 1, :order => "animals.name") animal_photos.each do |animal_photo| puts animal_photo.file_name puts animal_photo.animal.name end 
+1
source

Lack of experience in activerecord. Using a simple PostgreSQL, I would try something like this:

Define the window function from all previous lines, which counts how many times the current animal has appeared, and then is ordered by this count.

 SELECT filename, animal_id, COUNT(*) OVER (PARTITION BY animal_id ORDER BY filename) AS cnt FROM photos ORDER BY cnt, animal_id, filename 

Filtering on a specific animal_id will work. It will always be order. I don’t know if you want something random there, but it should be easily added.

+1
source

I would recommend something hybrid / fixed based on KandadaBoggu input.

First, the correct way to do this on paper is row_number() over (partition by animal_id order by id) . The proposed rank() will generate a global line number, but you want it to be inside its section.

Using the window function is also the most flexible solution (in fact, the only solution) if you want to plan for changing the sort order here and there.

Note that this will not necessarily scale well, because to sort the results you need to:

  • select the entire result set that matches your criteria.
  • sort the whole set of results for creating partitions and get rank_id
  • top-n sort / limit over the result set a second time to get them in the final order

The right way to do this in practice, if your sort order is unchanged, is to maintain a pre-calculated rank_id. KandadaBoggu other offers point in the right direction in that sense.

When it comes to uninstalling (and possibly updates if you don't want them sorted by id), you may run into problems because you end up trading faster reads for a slower write. If removing a cat with index 1 will update the next 50k cats, you will have problems.

If you have very small sets, the overhead can be very reasonable (don't forget to index animal_id).

If not, a workaround if you find the order in which certain animals appear does not matter. This happens as follows:

  • Run the transaction.

  • If rank_id changes (i.e., insert or delete), get an advisory lock to ensure that two sessions cannot affect the rank_id of the same animal class, for example:

     SELECT pg_try_advisory_lock('the_table'::regclass, the_animal_id); 

    (Sleep for .05s if you did not receive it.)

  • Paste, find max (rank_id) for this animal_id. Assign it rank_id + 1. Then insert it.

    When deleting, select the animal with the same animal_id and the largest rank_id. Delete your animal and assign it the old rank_id to the resulting animal (unless of course you delete the last one).

  • Release the advisory lock.

  • Lock the job.

Please note that the above will be useful to use the index (animal_id, rank_id) and can be done using plpgsql triggers:

 create trigger "__animals_rank_id__ins" before insert on animals for each row execute procedure lock_animal_id_and_assign_rank_id(); create trigger "_00_animals_rank_id__ins" after insert on animals for each row execute procedure unlock_animal_id(); create trigger "__animals_rank_id__del" before delete on animals for each row execute procedure lock_animal_id(); create trigger "_00_animals_rank_id__del" after delete on animals for each row execute procedure reassign_rank_id_and_unlock_animal_id(); 

Then you can create an index with several columns according to your sorting criteria, if you do not join their entire place, for example. (rank_id, name). And you get a quick site to read and write.

+1
source

You should be able to get images (or file names, anyway) using ActiveRecord sorted by name.

Then you can use Enumerable#group_by and Enumerable#zip combine all arrays.

If you give me more information on how your file names are really organized (i.e. are they all sure, with an underscore before the number and a constant name before the underscore for each "type", etc.), then I can give you an example. I will write one for a moment, showing how you will do this for your current example.

0
source

You can run two types and build one array as follows:

result1 = The first of each type of animal. use the ruby ​​search method for this search.

result2 = All animals sorted by group. Use find to find the first occurrence of each animal again, and then use drop to remove these first occurrences from result2.

Then: markCustomResult = result1 + result2

Then: You can use willpaginate on markCustomResult

0
source

All Articles