Designing an Order Scheme in which there are scattered product definition tables

Question

Designing an Order Scheme in which there are scattered product definition tables

This is a script that I have seen in various places over the years; I am wondering if anyone has any better solution than mine ...

My company sells a relatively small number of products, however, the products we sell are highly specialized (i.e. a significant amount of parts must be provided to select this product). The problem is that, although the number of parts required to select a given product is relatively constant, the types of parts required vary greatly between products. For example:

Product X may have identifying characteristics (hypothetically)

'Colour',
'Material'
"Average time to failure"

but product Y may have characteristics

'Thickness',
'Diameter
'Source of power'

The problem (one of them, anyway) when creating an ordering system that uses both product X and product Y is that at some point the order line should refer to what it is “selling”. Since product X and product Y are defined in two different tables - and denormalizing products using a wide table layout is not an option (product definitions are quite deep) - it is difficult to see a clear way to define an order line in this way of entering, editing and reporting orders. In practice.

Things I've tried in the past

Create a parent table called “Product” with columns common to Product X and Product Y, then use “Product” as a reference for the OrderLine table and create FK relationships with “Product” as the primary side between the tables for Product X and Product Y This basically places the Product table as the parent of both OrderLine and all disparate product tables (for example, Products X and Y). It works great for entering an order, but causes problems with submitting or editing orders, as the “Product” record must keep track of which product it is designed to determine how to join the “Product” with its more young child, Product X or Product Y Benefits : Key relationships maintained. Disadvantages : reporting, editing at the order level / product level.
Create the Product Type and Product Key columns at the order line level, then use some CASE logic or views to determine the customized product to which the line belongs. This is similar to paragraph (1), without a common “Product” table. I find this a more “quick and dirty” solution, as it completely eliminates foreign keys between order lines and their product definitions. Advantages : quick solution. Disadvantages : the same as element (1), plus the lost RI.
Homogenize product definitions by creating a common header table and using key / value pairs for custom attributes (OrderLine [n] <- [1] Product [1] <- [n] ProductAttribute). Advantages : key relationships are maintained; There is no ambiguity regarding the definition of a product. Disadvantages : report (getting a list of products with their attributes, for example), entering data of attribute values, performance (selecting product attributes, inserting or updating product attributes, etc.).

If someone else tried to use another strategy with great success, I would like to hear about it.

Thanks.

+7

database database-design entity-attribute-value

Jared Sep 23 '08 at 13:16

source share

5 answers

Bill karwin · Answer 1 · 2008-09-23T18:26:21+0000

The first solution that you describe is the best if you want to maintain data integrity, and if you have relatively few product types and rarely add new product types. This is the design that I would choose in your situation. Reporting is only difficult if your reports require product-specific attributes. If your reports only need attributes in a common product table, that’s fine.

The second solution you described is called Polymorphic Associations, and that’s not good. Your "foreign key" is not a real foreign key, so you cannot use the DRI constraint to ensure data integrity. OO polymorphism has no counterpart in the relational model.

The third solution that you describe, including storing the attribute name as a string, is a project called Entity-Attribute-Value, and you can say that it is a painful and expensive solution. There is no way to guarantee data integrity, in no way to make one NOT NULL attribute, without being able to verify that this product has a specific set of attributes. It is not possible to limit a single attribute to a lookup table. Many types of aggregate queries become impossible in SQL, so you need to write a lot of application code for reporting. Use the EAV design only if you need it, for example, if you have an unlimited number of product types, the list of attributes may differ for each line, and your schema should often post new product types without changing the code or schema.

Another solution is "one-way inheritance." It uses an extremely wide table with a column for each attribute of each product. Leave NULL in columns that are not related to the product in the given row. This means that you cannot declare an attribute as NOT NULL (unless it is common to all products). In addition, most RDBMS products have a limit on the number of columns in a single table or the total width in bytes of a row. Thus, you are limited by the number of types of products that you can present in this way.

There are hybrid solutions, for example, you can usually store common attributes in columns, but product-specific attributes in the attribute-attribute-value table. Or you can store product-specific attributes in some other way, such as XML or YAML, in the BLOB column of the Products table. But these hybrid solutions suffer because some attributes need to be retrieved differently now.

The ultimate solution for these situations is to use a semantic data model using RDF instead of a relational database. It has some characteristics with EAV, but it is much more ambitious. All metadata is stored in the same way as data, so each object is self-describing, and you can request a list of attributes for this product in the same way as requesting data. There are special products such as Jena or Sesame , an implementation of this data model, and a special query language other than SQL.

Chris cudmore · Answer 2 · 2008-09-23T13:37:42+0000

It can make you start. This will require some refinement.

Table Product ( id PK, name, price, units_per_package) Table Product_Attribs (id FK ref Product, AttribName, AttribValue)

This will allow you to attach a list of attributes to products. - This is essentially your option 3

If you know the maximum number of attributes, you can go

 Table Product (id PK, name, price, units_per_package, attrName_1, attrValue_1 ...)

That would, of course, de-normalize the database, but making queries easier.

I prefer the first option because

It supports an arbitrary number of attributes.
Attribute names can be stored in another table, and referential integrity is enforced so that these damned Canadians do not insert a “color” there and interrupt reporting.

S. Lott · Answer 3 · 2008-09-23T18:44:05+0000

There is no magic bullet that you missed.

You have what are sometimes called "disjoint subclasses." There is a superclass (Product) with two subclasses (ProductX) and (ProductY). This is a problem that - for relational databases - is really difficult. [Another difficult issue is Bill Materials. Another difficult problem is the graphs of nodes and arcs.]

You really need polymorphism, where OrderLine is associated with a subclass of Product but doesn't know (or doesn't care) about which particular subclass.

You do not have many modeling options. You have largely identified the bad traits of each. This is almost the entire universe of choice.

Press everything to the superclass. This is a unified approach when you have a Product with discriminator (type = "X" and type = "Y") and a million columns. Product columns are a combination of columns in ProductX and ProductY. Due to unused columns, zeros will be displayed.
Insert everything into subclasses. In this case, you need a view that is a union of ProductX and ProductY. This view is what I joined to create a complete order. This is similar to the first solution, except that it is built dynamically and not optimized.
Attach an instance of a superclass to an instance of a subclass. In this case, the Product table is the intersection of the ProductX and ProductY columns. Each product has a key link in either ProductX or ProductY.

In fact, there is no bold new direction. In the world of relational databases, this is the choice.

If, however, you decide to change the way you create application software, you can exit this trap. If the application is object-oriented, you can do everything with first-class polymorphic objects. You need to map due to awkward relational processing; this happens twice: once when you extract material from the database to create objects and once when you save objects back to the database.

The advantage is that you can describe your processing concisely and correctly. Like objects, with subclass relationships.

The downside is that your SQL moves to simplified fetching, updating, and pasting.

This becomes an advantage when SQL is isolated at the ORM level and managed as a kind of trivial implementation detail. Java programmers use iBatis (or Hibernate or TopLink or Cocoon), Python programmers use SQLAlchemy or SQLObject. ORM retrieves and saves the database; Your application directly manages orders, lines and products.

Aj. · Answer 4 · 2008-09-23T13:44:56+0000

Is your product line changing?
If so, then creating a table for the product will cost you dearly, and the idea of a key / value pair will serve you well. This is the direction that I naturally drew.

I would create tables as follows:

 Attribute(attribute_id, description, is_listed) -- contains values like "colour", "width", "power source", etc. -- "is_listed" tells us if we can get a list of valid values: AttributeValue(attribute_id, value) -- lists of valid values for different attributes. Product (product_id, description) ProductAttribute (product_id, attribute_id) -- tells us which attributes apply to which products Order (order_id, etc) OrderLine (order_id, order_line_id, product_id) OrderLineProductAttributeValue (order_line_id, attribute_id, value) -- tells us things like: order line 999 has "colour" of "blue"

SQL to pull this together is not trivial, but it is not too complicated ... and most of it will be written once and stored (either in stored procedures or at the data access level).

We do similar things with several types of entities.

Jared · Answer 5 · 2008-09-23T14:01:54+0000

Chris and AJ: Thanks for your answers. The product line may change, but I would not call it “changeable."

The reason I don't like the third option is because it comes from the cost of metadata for the product attribute values. It essentially turns columns into rows, losing most of the benefits of a database column in the process (data type, default, constraints, foreign key relationships, etc.).

I actually participated in a past project where the product definition was implemented in this way. We essentially created a complete system for determining product / product attributes (data types, minimum or maximum occurrences, default values, "required" flags, usage scenarios, etc.). Ultimately, the system worked with significant overhead and productivity costs (for example, materialized views for visualizing products, user-friendly smart components for presenting and validating the data entry user interface to determine the product, another smart component for representing custom attributes of the product instance on the order line, blah blah blah blah).

Thanks again for your answers!

Designing an Order Scheme in which there are scattered product definition tables

More articles: