I am trying to create a recommendation system for poor people for an online store. I want to recognize this feature of Amazon, βCustomers Who Bought This Item Also Bought,β and I read a lot about it. I know there is an Apache Mahout thing, but I cannot configure this server. Then there would be a Google forecast API, but it cost money, so I started experimenting myself.
I have an order history with over 250,000 items, and I wrote a sub-MySQL query to find orders containing the current article, to rank other order items and sort this table for ranking, so I got a set of products that other people ordered along with the current article.
The problem is that the request can take up to 10 seconds - therefore, it cannot be used directly. I thought of a caching table, but this request stops after 20 minutes (there are 60,000 products and 250,000 items ordered). Therefore, I cannot fill out this table.
My current solution is the following: HTML recommendation is loaded via AJAX ondocumentready, so the site is loading and the recommendation is loading in the background. These recommendations are processed once and stored in the file cache (simple PEAR cache), so it loads faster the next time. Thus, a cache is created upon request if someone visits the site and is stored for a day or, possibly, a week.
I ask myself and you, would this be an acceptable approach, or is it stupid and unpromising? It would be better to store cached data in db or in a file (I think about performance and parallel hits). I mean, in the worst case, I get 60,000 file caches.
I would prefer a pre-computed table with all the data, but, as I said, it lasts a long time, and I donβt know how to optimize it. (Waiting for SQL Dude to return from the holidays ^^)
Thanks for any hint, opinion.
by the way. this is a request:
SELECT c.ArtNr as artnr , count(c.ArtNr) as rank, s.ArtNr as parent_artnr
FROM (
SELECT a.ID_order, a.ArtNr
FROM net_orderposition a
WHERE a.ArtNr = 'TT-PV0005'
) s
JOIN net_orderposition c
WHERE s.ID_order = c.ID_order AND s.ArtNr != c.ArtNr
GROUP BY c.ArtNr
ORDER BY rank DESC,c.Stamp DESC
LIMIT 10;
EDIT:
I thought about these answers, and I think they are similar to my original idea. The above code leads to the following table:
ID,ParentID , ChildID , Rank
1, TT-PV0005, TT-PV0040, 220
2, TT-PV0005, TT-PV0355, 135
3, TT-PV0005, TT-PV0450, 134
4, TT-PV0005, TT-PV0451, 89
5, TT-PV0005, RH-01V2 , 83
6, TT-PV0005, TT-PV0041, 83
7, TT-PV0005, TT-PV0353, 82
8, TT-PV0005, TT-PV0037, 80
ParentID - , ChildID - , ParentID, Rank - , .
, .
, , .
, , ?
, 10 .
?