How to implement a filtering system in SQL?

Question

How to implement a filtering system in SQL?

Now I plan to add a filter system to my site.

Examples:

(ID=apple, COLOR=red, TASTE=sweet, ORIGIN=US) (ID=mango, COLOR=yellow, TASTE=sweet, ORIGIN=MEXICO) (ID=banana, COLOR=yellow, TASTE=bitter-sweet, ORIGIN=US)

Now I'm interested in the following: SELECT ID FROM thisTable WHERE COLOR = 'yellow' AND TASTE = 'SWEET'

But my problem is that I am doing this for several categories on my site, and the columns are NOT consistent. (for example, if the table is for hand-held phones, then it will be BRAND, 3G-ENABLED, PRICE, COLOR, WAVELENGTH, etc.).

how can i create a generic circuit that allows this?

Now I plan to do:

 table(ID, KEY, VALUE)

This allows you to use an arbitrary number of columns, but for the query I use SELECT ID FROM table WHERE (KEY = X1 AND VALUE = V1) AND (KEY = X2 AND VALUE = V2), which returns an empty set.

Can someone recommend a good solution? Please note that the number of columns will change regularly.

+6

sql mysql

crapbag May 04 '10 at 4:29

source share

4 answers

Daniel Vassallo · Answer 1 · 2010-05-04T04:56:17+0000

The entity-attribute-value model that you propose can fit in this scenario.

As for the filtering query, you should understand that with the EAV model you will sacrifice a lot of query power, so this can become quite complicated. However, this way to solve your problem:

 SELECT stuff.id FROM stuff JOIN (SELECT COUNT(*) matches FROM table WHERE (`key` = X1 AND `value` = V1) OR (`key` = X2 AND `value` = V2) GROUP BY id ) sub_t ON (sub_t.matches = 2 AND sub_t.id = stuff.id) GROUP BY stuff.id;

One of the inelegant features of this approach is that you need to specify the number of attribute / value pairs that you expect to combine in sub_t.matches = 2 . If we had three conditions, we would have to specify sub_t.matches = 3 , etc.

Let's build a test case:

 CREATE TABLE stuff (`id` varchar(20), `key` varchar(20), `value` varchar(20)); INSERT INTO stuff VALUES ('apple', 'color', 'red'); INSERT INTO stuff VALUES ('mango', 'color', 'yellow'); INSERT INTO stuff VALUES ('banana', 'color', 'yellow'); INSERT INTO stuff VALUES ('apple', 'taste', 'sweet'); INSERT INTO stuff VALUES ('mango', 'taste', 'sweet'); INSERT INTO stuff VALUES ('banana', 'taste', 'bitter-sweet'); INSERT INTO stuff VALUES ('apple', 'origin', 'US'); INSERT INTO stuff VALUES ('mango', 'origin', 'MEXICO'); INSERT INTO stuff VALUES ('banana', 'origin', 'US');

Query:

 SELECT stuff.id FROM stuff JOIN (SELECT COUNT(*) matches, id FROM stuff WHERE (`key` = 'color' AND `value` = 'yellow') OR (`key` = 'taste' AND `value` = 'sweet') GROUP BY id ) sub_t ON (sub_t.matches = 2 AND sub_t.id = stuff.id) GROUP BY stuff.id;

Result:

 +-------+ | id | +-------+ | mango | +-------+ 1 row in set (0.02 sec)

Now add another fruit with color=yellow and taste=sweet :

 INSERT INTO stuff VALUES ('pear', 'color', 'yellow'); INSERT INTO stuff VALUES ('pear', 'taste', 'sweet'); INSERT INTO stuff VALUES ('pear', 'origin', 'somewhere');

The same request will be returned:

 +-------+ | id | +-------+ | mango | | pear | +-------+ 2 rows in set (0.00 sec)

If we want to limit this result to entities using origin=MEXICO , we will need to add another OR condition and check sub_t.matches = 3 instead of 2 .

 SELECT stuff.id FROM stuff JOIN (SELECT COUNT(*) matches, id FROM stuff WHERE (`key` = 'color' AND `value` = 'yellow') OR (`key` = 'taste' AND `value` = 'sweet') OR (`key` = 'origin' AND `value` = 'MEXICO') GROUP BY id ) sub_t ON (sub_t.matches = 3 AND sub_t.id = stuff.id) GROUP BY stuff.id;

Result:

 +-------+ | id | +-------+ | mango | +-------+ 1 row in set (0.00 sec)

As with every approach, there are certain advantages and disadvantages to using the EAV model. Make sure you carefully study the topic in the context of your application. You might even want to consider alternative relational databases such as Cassandra , CouchDB , MongoDB , Voldemort , HBase , SimpleDB, or other key stores.

Tarlog · Answer 2 · 2010-12-19T15:51:38+0000

The following worked for me:

 SELECT * FROM mytable t WHERE t.key = "key" AND t.value = "value" OR t.key = "key" AND t.value = "value" OR .... t.key = "key" AND t.value = "value" GROUP BY t.id having count(*)=3;

count (*) = 3 must match the number

t.key = "key" AND t.value = "value"

cases

Thomas · Answer 3 · 2010-05-04T04:37:41+0000

What you offer is known as the Entity-Attribute-Value structure and is very discouraged. One of the (big) issues with EAV designs, for example, is data integrity. How do you apply these colors only to "red", "yellow", "blue", etc.? In short, you cannot live without a lot of hacks. Another problem arises when querying (as you saw) and searching for data.

Instead, I would recommend creating a table that represents each type of entity, and therefore, each table can have attributes (columns) that are specific to that type of entity.

To convert data to columns in a result query as you search, you need to create what is often called a cross tabular query. There are reporting mechanisms that will do this, and you can do it, but most database products will not do this natively (this means that without creating the SQL string manually). Of course, performance will not be good if you have a lot of data and you have problems filtering the data. For example, suppose some of the values must be numeric. Since part of the EAV value is likely to be a string, you will need to attribute these values to an integer before you can filter them, and this assumes that the data will be converted to an integer.

cbednarski · Answer 4 · 2010-05-04T07:00:00+0000

The price you pay for simplified table development at this stage will cost you in terms of productivity in the long run. Using ORM to reduce the cost of modifying the database to fit the data in the appropriate structure is likely to be a good investment of time, even despite ORM performance.

Otherwise, you may need a "reverse ORM" that displays code from your database, which has the advantage of being less expensive and having better performance. (Slightly higher starting cost compared to ORM, but better long-term performance and reliability.)

This is a costly problem, no matter how you cut it. Do you want to pay now with development time or pay later when your tank performance? ("Pay later" is the wrong answer.)

How to implement a filtering system in SQL?

More articles: