If user_resources (t1) was a "normalized table" with one row for each combination of user => resource , then a query to get an answer would be simple as soon as joining tables together.
Alas, it is denormalized , having a resources column in the form: 'a list of a resource identifier', separated by a ';' character.
If we could convert the βresourcesβ column into rows, then many difficulties would disappear when the table join became simple.
A query to generate the result specified for:
SELECT user_resource.user, resource.data FROM user_resource JOIN integerseries AS isequence ON isequence.id <= COUNT_IN_SET(user_resource.resources, ';') JOIN resource ON resource.id = VALUE_IN_SET(user_resource.resources, ';', isequence.id) ORDER BY user_resource.user, resource.data
Output:
user data ---------- -------- sampleuser abcde sampleuser azerty sampleuser qwerty stacky qwerty testuser abcde testuser azerty
how
A "trick" is to have a table that contains numbers from 1 to some limit. I call it integerseries . It can be used to convert "horizontal" things, such as: ';' delimited strings ';' delimited strings in rows .
How it works, when you join integerseries , you do cross join , which happens "naturally" with "internal joins".
Each row is duplicated with a different "serial number" from the integerseries table, which we use as the "index" for the "resource" in the list that we want to use for this row .
The idea is as follows:
- count the number of items in the list.
- extract each item based on its position in the list.
- Use
integerseries to convert a single row into a set of rows that extract an individual resource identifier from user . resources as you move.
I decided to use two functions:
which sets the "list of delimited rows" and "index" will return the value at the position in the list. I call it: VALUE_IN_SET . that is, given "A; B; C" and the "index" of 2, then it returns "B".
Functionwhich sets the "list of delimited strings" will return a count of the number of items in the list. I call it: COUNT_IN_SET . that is, if "A; B; C" returns 3
It seems that these two functions and integerseries should provide a common solution for the delimited items list in a column .
It works?
A query to create a "normalized" table from ';' delimited string in column ';' delimited string in column . It shows all columns, including generated values ββdue to "cross_join" ( isequence.id as resources_index ):
SELECT user_resource.user, user_resource.resources, COUNT_IN_SET(user_resource.resources, ';') AS resources_count, isequence.id AS resources_index, VALUE_IN_SET(user_resource.resources, ';', isequence.id) AS resources_value FROM user_resource JOIN integerseries AS isequence ON isequence.id <= COUNT_IN_SET(user_resource.resources, ';') ORDER BY user_resource.user, isequence.id
"Normalized" table output:
user resources resources_count resources_index resources_value ---------- --------- --------------- --------------- ----------------- sampleuser 1;2;3 3 1 1 sampleuser 1;2;3 3 2 2 sampleuser 1;2;3 3 3 3 stacky 2 1 1 2 testuser 1;3 2 1 1 testuser 1;3 2 2 3
Using the normalized user_resources table user_resources , this is a simple join to provide the desired result:
Necessary functions (these are general functions that can be used anywhere)
Note: the names of these functions are associated with the mysql FIND_IN_SET function. those. Do they perform similar functions with respect to string lists?
Function COUNT_IN_SET : returns the counter character delimited items in the column.
DELIMITER $$ DROP FUNCTION IF EXISTS `COUNT_IN_SET`$$ CREATE FUNCTION `COUNT_IN_SET`(haystack VARCHAR(1024), delim CHAR(1) ) RETURNS INTEGER BEGIN RETURN CHAR_LENGTH(haystack) - CHAR_LENGTH( REPLACE(haystack, delim, '')) + 1; END$$ DELIMITER ;
Function VALUE_IN_SET : treats a delimited list as one based array and returns the value in the specified "index".
DELIMITER $$ DROP FUNCTION IF EXISTS `VALUE_IN_SET`$$ CREATE FUNCTION `VALUE_IN_SET`(haystack VARCHAR(1024), delim CHAR(1), which INTEGER ) RETURNS VARCHAR(255) CHARSET utf8 COLLATE utf8_unicode_ci BEGIN RETURN SUBSTRING_INDEX(SUBSTRING_INDEX(haystack, delim, which), delim, -1); END$$ DELIMITER ;
Additional Information:
Tables (with data):
CREATE TABLE `integerseries` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=500 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; insert into `integerseries`(`id`) values (1); insert into `integerseries`(`id`) values (2); insert into `integerseries`(`id`) values (3); insert into `integerseries`(`id`) values (4); insert into `integerseries`(`id`) values (5); insert into `integerseries`(`id`) values (6); insert into `integerseries`(`id`) values (7); insert into `integerseries`(`id`) values (8); insert into `integerseries`(`id`) values (9); insert into `integerseries`(`id`) values (10);
Resource:
CREATE TABLE `resource` ( `id` int(11) NOT NULL, `data` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; insert into `resource`(`id`,`data`) values (1,'abcde'); insert into `resource`(`id`,`data`) values (2,'qwerty'); insert into `resource`(`id`,`data`) values (3,'azerty');
User_resource:
CREATE TABLE `user_resource` ( `user` varchar(50) COLLATE utf8_unicode_ci NOT NULL, `resources` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL, PRIMARY KEY (`user`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; insert into `user_resource`(`user`,`resources`) values ('sampleuser','1;2;3'); insert into `user_resource`(`user`,`resources`) values ('stacky','3'); insert into `user_resource`(`user`,`resources`) values ('testuser','1;3');