Efficient way to find sequential values

Question

Efficient way to find sequential values

Each Product may contain up to 10,000 Segment lines. Segments have a sort column starting with 1 for each product (1, 2, 3, 4, 5, ...) and a value column that can contain any values, such as (323.113, 5423.231, 873.42, 422.64, 763.1 ,. ..).

I would like to identify potential matches for products, given the subset of segments. For example, if I have 5 segment values in the correct order, how can I effectively find all products that have all 5 segments in the same order somewhere in the Segment table?

Update

I posted the following question here: Find a series of data using inaccurate measurements (fuzzy logic)

+2

sql sql-server tsql sql-server-2008 sql-server-2005

adam0101 Nov 04 '11 at 15:04

source share

3 answers

Perhaps what I'm trying to offer takes up too much space: after adding a new product, since the segments are static, you can index them by taking all the "suffixes" of the list of segments.

For example, a list of segments:

 34 57 67 34

will produce:

 34 57 67 34 57 67 34 67 34 34

You may need to store them in hard drive files, because for 10,000 segments for each product you will get many “suffixes” (up to 10,000 per product actually). The good news is that you can store them adjacent to each other so that you don't have too many hard drives. Then you can simply linearly scan the list of suffixes and match the first k values for a query that contains k segment values. So, if you are looking for 57 67 in the above list, it will return this product because it matches the second suffix.

You can also index the tree for faster matching, but it can get too complicated.

Edit: As I said in a comment, this is an adaptation of a substring. You should also sort the number of suffixes by number, and then you can perform a binary search on the list of suffixes, which theoretically should indicate a match / inconsistency in the steps of the log (10000).

+1

Tudor Nov 04 '11 at 15:31

source share

Probably the best way to do this would be to keep the denormalized version of the data.

 ProductId, DelimitedList 1 ,323.113,5423.231,873.42,422.64,763.1,

Then your search is simple

 WHERE DelimitedList LIKE '%,323.113,5423.231,873.42,%'

First, you can run a standard relational division query to return those ProductId values that match all values (not necessarily in the correct order or adjacent) to reduce the number of rows you need.

Complete Script Demo

 /*Set up test tables*/ CREATE TABLE Products( ProductId int primary key) CREATE TABLE ProductSegments( ProductId int REFERENCES Products, Sort int, Value decimal(10,3) Primary key (ProductId,Sort)) CREATE NONCLUSTERED INDEX ix ON ProductSegments(ProductId,Value) CREATE TABLE ProductSegmentsDenormalized ( ProductId int REFERENCES Products, DelimitedList varchar(max) ) /*Insert some initial data to Products...*/ INSERT INTO Products VALUES (1),(2),(3) /*... and for ProductSegments*/ ;WITH numbers(N) AS (SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master..spt_values v1, master..spt_values v2) INSERT INTO ProductSegments (ProductId, Sort, Value) SELECT ProductId AS Product, n1.N Sort, ( ABS(CHECKSUM(NEWID()))% 1000000000 ) / 1000.00 FROM numbers n1, Products /*Set up table for search data*/ DECLARE @SearchValues TABLE ( Sequence int primary key, Value decimal(10,3) ) INSERT INTO @SearchValues VALUES (1,323.113),(2,5423.231),(3,873.420),(4,422.640),(5,763.100) /*Fiddle the test data so we have some guaranteed matches*/ UPDATE ps SET ps.Value = sv.Value FROM ProductSegments ps JOIN @SearchValues sv ON ProductId = 1 AND Sort = 100 + Sequence UPDATE ps SET ps.Value = sv.Value FROM ProductSegments ps JOIN @SearchValues sv ON ProductId = 3 AND Sort = 987 + Sequence /*Create the denormalised data*/ INSERT INTO ProductSegmentsDenormalized SELECT ProductId, '|' + DelimitedList FROM Products p CROSS APPLY ( SELECT CAST(Value as varchar) + '|' FROM ProductSegments ps WHERE ps.ProductId = p.ProductId ORDER BY Sort FOR XML PATH('') ) D ( DelimitedList ) /*Do the search*/ SELECT ProductId FROM ProductSegmentsDenormalized psd WHERE psd.ProductId IN (SELECT p.ProductId FROM Products p WHERE NOT EXISTS (SELECT * FROM @SearchValues sv WHERE NOT EXISTS (SELECT * FROM ProductSegments ps WHERE ps.ProductId = p.ProductId AND sv.Value = ps.Value))) AND DelimitedList LIKE '%|' + (SELECT CAST(Value AS VARCHAR) + '|' FROM @SearchValues sv ORDER BY Sequence FOR XML PATH('')) + '%'

+1

Martin smith Nov 04 '11 at 15:44

source share

Philip kelley · Accepted Answer · 2011-11-04T19:07:11+0000

Assume the following tables:

CREATE TABLE Products ( ProductId int not null constraint PK_Products primary key ,Name varchar(100) not null ) CREATE TABLE Segments ( ProductId int not null constraint FK_Segments__Products foreign key references Products (ProductId) ,OrderBy int not null ,Value float not null ,constraint PK_Segments primary key (ProductId, OrderBy) )

Then configure your search data in the temp table:

 CREATE TABLE #MatchThis ( Position int not null ,Value float not null )

For N search objects this should be filled like this

 First item 0 <value 1> Second item 1 <value 2> Third item 2 <value 3> ... Nth item N-1 <value N>

Now configure some important values. (This may be overwhelmed in the final request, but this method makes reading easier and may slightly improve performance.)

 DECLARE @ItemCount int ,@FirstValue float -- How many items to be matched ("N", above) SELECT @ItemCount = count(*) from #MatchThis -- The value of the first item in the search set SELECT @FirstValue = Value from #MatchThis where Position = 0

And then this is just a request:

 SELECT pr.Name ,fv.OrderBy -- Required by the Group By, but otherwise can be ignored from #MatchThis mt cross join (-- All Segments that match the first value in the set select ProductId, OrderBy from Segment where Value = @FirstValue) fv inner join Product pr -- Just to get the Product name on pr.ProductId = fv.ProductId inner join Segment se on se.ProductId = fv.ProductId and se.OrderBy = fv.OrderBy + mt.Position -- Lines them up based on the first value and se.Value = mt.Value -- No join if the values don't match group by pr.Name ,fv.OrderBy having count(*) = @ItemCount -- Only include if as many segments pulled for this product/segment.OrderBy as are required

I am convinced that this will work, but now I do not have time to check it in detail. In order to optimize performance, in addition to the specified primary keys, you can add a regular index in the segment. Value

Efficient way to find sequential values

Complete Script Demo

More articles: