The best way to make a request with a lot of possible joins

In the project I'm working on, we have an activity table, and each action can be associated with one of about 20 different "activity information" tables ...

eg. If the operation was of type โ€œworkโ€, then it would have a corresponding record of activity_details_work, if it had a type of โ€œsick leaveโ€, then it would have a corresponding record of activity_details_sickleave, etc.

We are currently loading actions, and then for each type of activity we have a separate request to get information about the activity from the corresponding table. It clearly does not scale if you have thousands of actions.

So, my initial thought was to have one query that retrieves the actions and combines the details in one go, for example.

SELECT * FROM activity LEFT JOIN activity_details_1_work ON ... LEFT JOIN activity_details_2_sickleave ON ... LEFT JOIN activity_details_3_travelwork ON ... ...etc... LEFT JOIN activity_details_20_yearleave ON ... 

But this will lead to the fact that each record will have 100 fields, most of which are empty, and this seems unpleasant.

Lazy-loading of parts is not really an option, since parts are almost always requested in the main logic, at least for the main types.

Is there a super-smart way to do this that I don't think about?

Thank you in advance

+4
source share
3 answers

My suggestion is to define a view for each ActivityType specifically designed for this activity.

Then add the index to the action table specified in the ActivityType field. The cluster said the index, if there is no need for someone else to be clustered (or benchmarking performance indicates that some other clustering options will be more effective).

Is there a specific reason why this degree of denormalization has been developed? Is this reason well known?

+2
source

Most likely your activity tables are like (date_from, date_to, with_who, descr) or something like that. As Peter suggested, consider throwing in a field like varchar or enum to deal with a single detail table.

If there are rational reasons to keep the tables separate, consider adding triggers that support the boolean / tinyint fields ( has_work , has_sickleave , etc.), or the bit string ( has_activites_of_type , where the first position is equal to has_work , next to has_sickleave , etc. d.).

In any case, you will probably be better off retrieving activity data in one or more separate queries - if only to avoid collisions of the field name.

+2
source

I donโ€™t think that enumeration is the way to go, because, as you say, there may be 1000 types of activities, then changing the activity table will become a problem.

It makes no sense to make a left join on a large number of tables.

So, you have the following options:

  • See this. First comment may be helpful.

  • I assume that your activity table has a field called activity_type_id . Create a table called activity_types containing the fields activity_type_id , activity_name , activity_details_table_name . The first request is as follows

    activity
    internal connection
    activity_types
    using (activity_type_id)

This query gives you the name of the table for which information is requested. This way, you can add any new type of activity by simply adding a row to the activity_types table.

+1
source

All Articles