The optimal sequence of non-overlapping purchases

I think this is a planning problem, but I'm not even sure about it! I want to find the optimal sequence of disjoint purchase decisions when I fully get acquainted with their cost and opportunities for the future.

Imagine a wholesaler who sells various products that I want to buy for my store. At any time, they can run several special offers; I will sell at full price, so their discount is my profit.

I want to maximize profits, but the catch is that I can only buy one thing at a time, and there is no such thing as a loan, and, even worse, there is a delivery delay. The good news is that I will sell items as soon as they are delivered and I can spend the money again. Thus, one way through all the options could be: I buy 100 kg of apples on Monday, they are delivered on Tuesday. Then I buy 20 monastic costumes, respectively, on Sunday. I miss a couple of days, as I know, on Wednesday they will have a Ferrari at a big discount. Therefore, I buy one of them, it will be delivered next Tuesday. Etc.

You can think about multiplying profits or not. The algorithm comes down to a decision at each stage between choosing one of today's special offers or waiting for the day, because tomorrow something better will come.

Let it be abstract a little. Purchase and delivery are becoming days-s-era. Profit is written as the selling price divided by the purchase price. That is, 1.00 means break-even, 1.10 means 10% profit, 2.0 means that I doubled my money.

buy delivery profit 1 2 1.10 Apples 1 3 1.15 Viagra 2 3 1.15 Notebooks 3 7 1.30 Nun costumes 4 7 1.28 Priest costumes 6 7 1.09 Oranges 6 8 1.11 Pears 7 9 1.16 Yellow shoes 8 10 1.15 Red shoes 10 15 1.50 Red Ferrari 11 15 1.40 Yellow Ferrari 13 16 1.25 Organic grapes 14 19 1.30 Organic wine 

NOTES: Opportunities exist only on the day of purchase (for example, organic grapes turn into wine if nobody buys it!), And I get the opportunity to sell on the same day as delivery, but I can’t buy my next product until bye the next day. Therefore, I cannot sell my monastic costumes at t = 7 and immediately buy yellow shoes at t = 7.

I was hoping that a well-known better algorithm exists, and that an R-module already exists for it, but the algorithms or academic literature would also be good, as in any other language. Speed ​​matters, but mainly when the data gets big, so I would like to know if it is O (n 2 ) or something else.

By the way, does a better algorithm improve if there is the maximum possible delivery delay? For instance. if delivery - buy <= 7

Here is the above data as a CSV:

 buy,delivery,profit,item 1,2,1.10,Apples 1,3,1.15,Viagra 2,3,1.15,Notebooks 3,7,1.30,Nun costumes 4,7,1.28,Priest costumes 6,7,1.09,Oranges 6,8,1.11,Pears 7,9,1.16,Yellow shoes 8,10,1.15,Red shoes 10,15,1.50,Red Ferrari 11,15,1.40,Yellow Ferrari 13,16,1.25,Organic grapes 14,19,1.30,Organic wine 

Or like JSON:

 {"headers":["buy","delivery","profit","item"],"data":[[1,2,1.1,"Apples"],[1,3,1.15,"Viagra"],[2,3,1.15,"Notebooks"],[3,7,1.3,"Nun costumes"],[4,7,1.28,"Priest costumes"],[6,7,1.09,"Oranges"],[6,8,1.11,"Pears"],[7,9,1.16,"Yellow shoes"],[8,10,1.15,"Red shoes"],[10,15,1.5,"Red Ferrari"],[11,15,1.4,"Yellow Ferrari"],[13,16,1.25,"Organic grapes"],[14,19,1.3,"Organic wine"]]} 

Or as a data frame R:

 structure(list(buy = c(1L, 1L, 2L, 3L, 4L, 6L, 6L, 7L, 8L, 10L, 11L, 13L, 14L), delivery = c(2L, 3L, 3L, 7L, 7L, 7L, 8L, 9L, 10L, 15L, 15L, 16L, 19L), profit = c(1.1, 1.15, 1.15, 1.3, 1.28, 1.09, 1.11, 1.16, 1.15, 1.5, 1.4, 1.25, 1.3), item = c("Apples", "Viagra", "Notebooks", "Nun costumes", "Priest costumes", "Oranges", "Pears", "Yellow shoes", "Red shoes", "Red Ferrari", "Yellow Ferrari", "Organic grapes", "Organic wine")), .Names = c("buy", "delivery", "profit", "item"), row.names = c(NA, -13L), class = "data.frame") 

LINK

Are there any R-packages for graphs (shortest path, etc.)? ( igraph offers the function shortest.paths and, in addition to the C library, has an R package and a python interface)

+6
source share
4 answers

The easiest way to think about this problem is similar to the shortest path (although considering it as a maximum flow problem is probably technically better). The number of days, 1 ... 19, can be used as node names; each node j has a link to node j + 1 with a weight of 1, and each product (b, d, g, p) in the list adds a link from day b to day d + 1 with a weight g. When we move through the nodes when we find the path, we track the best multiplication values ​​obtained so far on each node.

The Python code shown below runs in O (V + E) time, where V is the number of vertices (or days) and E is the number of edges. In this implementation, E = V + the number of products sold. Note added: loop for i, t in enumeration (graf): processes each vertex once. In this loop for e in edges: processes edges from the current vertex once. Thus, no edge is processed more than once, therefore the performance is O (V + E).

Edited Note 2: krjampani argued that O (V + E) is slower than O (n log n), where n is the number of products. However, the two orders are not comparable unless we make assumptions about the number of days considered. Please note that if delivery delays are limited and product dates overlap, then the number of days is O (n), hence O (V + E) = O (n), which is faster than O (n log n).

However, for a given set of assumptions, the runtime orders of my method and krjampani may be the same: for a large number of days, change my method to create graph nodes only for days in a sorted union of x [0] and x [1], and using day references [i-1] and day [i + 1] instead of i-1 and i + 1. Within a few days, change the krjampani method to use counter sorting O (n).

The output of the program is as follows:

  16 : 2.36992 [11, 15, 1.4, 'Yellow Ferrari'] 11 : 1.6928 [8, 10, 1.15, 'Red shoes'] 8 : 1.472 [4, 7, 1.28, 'Priest costumes'] 4 : 1.15 [1, 3, 1.15, 'Viagra'] 

which indicates that we arrived on the 16th day with a compounded profit of 2.36992, after the sale of the Yellow Ferrari on the 15th day; arrived on the 11th day with a profit of 1,628, after the sale of red shoes; etc. Please note that a dummy entry at the top of the product list and removing quotes around numbers are the main differences from JSON data. The entry in the list element graf[j] begins with [1, j-1, 0, [[j+1,1,0]]] , that is, it has the form [best-value-so-far, best-from-node #, best-from-product-key, edge-list]. Each edge list is a list of lists that take the form of [next-node #, edge-weight, product-key]. Having product 0 as a fictitious product simplifies initialization.

 products = [[0,0,0,""],[1,2,1.10,"Apples"],[1,3,1.15,"Viagra"],[2,3,1.15,"Notebooks"],[3,7,1.30,"Nun costumes"],[4,7,1.28,"Priest costumes"],[6,7,1.09,"Oranges"],[6,8,1.11,"Pears"],[7,9,1.16,"Yellow shoes"],[8,10,1.15,"Red shoes"],[10,15,1.50,"Red Ferrari"],[11,15,1.40,"Yellow Ferrari"],[13,16,1.25,"Organic grapes"],[14,19,1.30,"Organic wine"]] hiDay = max([x[1] for x in products]) graf = [[1, i-1, 0, [[i+1,1,0]]] for i in range(2+hiDay)] for i, x in enumerate(products): b, d, g, p = x[:] graf[b][3] += [[d+1, g, i]] # Add an edge for each product for i, t in enumerate(graf): if i > hiDay: break valu = t[0] # Best value of path to current node edges = t[3] # List of edges out of current node for e in edges: link, gain, prod = e[:] v = valu * gain; if v > graf[link][0]: graf[link][0:3] = [v, i, prod] day = hiDay while day > 0: if graf[day][2] > 0: print day, ":\t", graf[day][0], products[graf[day][2]] day = graf[day][1] 
+6
source

This problem naturally affects the problem of finding the maximum weighted independent intervals among the set of weighted intervals. Each element in your input data set corresponds to an interval, the starting and ending points of which are the dates of purchase and delivery, and the profit from the goods represents the weight of the interval. The problem with the maximum weight interval is to find a set of disjoint intervals whose total weight is maximum. enter image description here

The problem can be solved in O(n log n) as follows. Sort the intervals by their endpoints (see Figure). Then we go through each interval i in the sorted list and calculate the optimal solution for the subtask, which consists of intervals of 1...i in the sorted list. The optimal solution to the problem for intervals 1...i is the maximum:

 1. The optimal solution of the problem for intervals `1...(i-1)` in the sorted list or 2. Weight of interval `i` + the optimal solution of the problem for intervals `1...j`, where j is the last interval in the sorted list whose end-point is less than the start-point of `i`. 

Note that this algorithm works in O(n log n) and calculates the value of the optimal solution for each prefix of the sorted list. After starting this algorithm, we can move through the sorted list in the reverse order and find the intervals present in the optimal solution based on the values ​​calculated for each prefix.

EDIT:

For proper operation, the weight of the intervals must be the actual profits of the corresponding positions (i.e. they must be sell_price - buy_price).

Update 2: Runtime

Let V be the number of days (after the designation jwpat7). If V much less than O(n log n) , we can use count sorting to sort intervals in O(n + V) time and use an array of size V to write the solutions of the subtasks. This approach leads to a time complexity of O(V + n) . Thus, the running time of the algorithm is min(O(V+n), O(n log n)) .

+4
source

This is a dynamic programming problem. To achieve the optimal optimal choice, only the optimal choice at each stage is required. You can create a table that describes the optimal choice at each step based on the previous state and the profit from the various steps from this state. You can collapse a large feature set into a smaller set by eliminating features that are clearly not optimal when you go.

In your problem, the only condition that affects the choice is the delivery date. For example, on the first day you have three options: you can buy apples, set your profit at 1.10 and set the delivery date to 2; buy viagra, set your profit to 1.15 and set the delivery date to 3; or buy nothing, set your profit to zero and set the delivery date to 2. We can present these alternatives as follows:

 (choices=[apples], delivery=2, profit=1.10) or (choices=[viagra], delivery=3, profit=1.15) or (choices=[wait], delivery=2, profit=0.00) 

It makes no difference whether you buy Viagra or buy nothing on the first day to make future decisions. In any case, the next day you can make a purchase on the second day, so that you can refuse to wait as an alternative, since the profit is lower. However, if you buy apples, it will affect future decisions differently than if you buy Viagra or wait, so this is another alternative that you should consider. It just leaves you with these alternatives at the end of the day.

 (choices=[apples], delivery=2, profit=1.10) or (choices=[viagra], delivery=3, profit=1.15) 

On the second day, you need to consider your alternatives based on what the alternatives were on the first day. This provides three possibilities:

 (choices=[apples,notebooks], delivery=3, profit=2.25) or (choices=[apples,wait], delivery=3, profit=1.10) or (choices=[viagra,wait], delivery=3, profit=1.15) 

All three of these options will put you in the same condition as in future decisions, since they all set the delivery date to 3, so you just choose one of them with maximum profit:

 (choices=[apples,notebooks], delivery=3, profit=2.25) 

On the third day you have two alternatives

 (choices=[apples,notebooks,wait], delivery=4, profit=2.25) (choices=[apples,notebooks,nun costumes], delivery=7, profit=3.55) 

both of these alternatives should be retained as they will affect future decisions in different ways.

Please note that we simply make future decisions based on delivery dates and profits. We only track choices so that we can communicate the best set of options at the end.

Now, perhaps you can see the template. You have a set of alternatives, and whenever you have several alternatives having the same delivery date, you simply choose the one that has the maximum profit and eliminates the others. This process of folding your alternatives prevents exponential exponential work, which allows you to effectively solve the problem.

+2
source

You can solve this problem as a linear programming problem. This is a standard approach to solving logistics problems, such as those faced by airlines and corporations, with much larger problem spaces than yours. I will not formally define your problem here, but in a broad sense: your objective function is to maximize profits. You can represent purchase days and “only one purchase per day” as linear constraints.

The standard linear programming algorithm is the simplex method . Although it has exponential worse behavior, in practice it is usually very effective in real-world problems. There are many good freely available implementations. My favorite is the GNU linear programming suite . R has an interface for GLPK . Lp_solve is another well-known project that also has an R-interface . The basic approach in each case is to formally identify your problem, and then pass it on to the third solver to complete your task.

To learn more, I recommend that you take a look at the Algorithm Development Guide , which should give you enough experience to further explore online. p. 412 onwards is a detailed summary of linear programming and its variations (for example, integer programming, if you have integrity constraints).

If you've never heard of linear programming before, you can take a look at some examples of how it can be used. I really like this simple set of tutorial problems in Python. These include maximizing profits on jars of cat food and solving sudoku problems.

+1
source

Source: https://habr.com/ru/post/926716/


All Articles