IEnumerable <IDisposable>: who manages what and when - did I get it right?

Question

IEnumerable <IDisposable>: who manages what and when - did I get it right?

Here is a hypothetical scenario.

I have a very large number of usernames (for example, 10,000,000,000,000,000,000,000,000,000. Yes, we are at an intergalactic age :)). Each user has his own database. I need to iterate over a list of users and execute an SQL query on each of the databases and print the results.

Since I learned about good functional programming, and because I deal with so many users, I decided to implement this using F # and pure sequences (aka IEnumerable). And here I go.

// gets the list of user names let users() : seq<string> = ... // maps user name to the SqlConnection let mapUsersToConnections (users: seq<string>) : seq<SqlConnection> = ... // executes some sql against the given connection and returns some result let mapConnectionToResult (conn) : seq<string> = ... // print the result let print (result) : unit = ... // and here is the main program users() |> mapUsersToConnections |> Seq.map mapConnectionToResult |> Seq.iter print

Beautiful? Elegant? That's right.

But! Who and at what moment disposes of SqlConnections?

And I do not think that the answer of mapConnectionToResult should be right, because it does not know anything about the lifetime of the connection associated with it. And everything may or may not work, depending on how mapUsersToConnections implemented and other factors.

Since mapUsersToConnections is the only other place that has access to the connection, it should be responsible for removing the SQL connection.

In F #, this can be done as follows:

 // implementation where we return the same connection for each user let mapUsersToConnections (users) : seq<SqlConnection> = seq { use conn = new SqlConnection() for u in users do yield conn } // implementation where we return new connection for each user let mapUsersToConnections (users) : seq<SqlConnection> = seq { for u in users do use conn = new SqlConnection() yield conn }

C # equivalent:

 // C# -- same connection for all users IEnumerable<SqlConnection> mapUsersToConnections(IEnumerable<string> users) { using (var conn = new SqlConnection()) foreach (var u in users) { yield return conn; } } // C# -- new connection for each users IEnumerable<SqlConnection> mapUsersToConnections(IEnumerable<string> user) { foreach (var u in users) using (var conn = new SqlConnection()) { yield return conn; } }

The tests performed show that the objects are correctly distributed at the right points, even if they are executed in parallel: once at the end of the entire iteration for a common connection; and after each iteration cycle for non-shared access.

So QUESTION: Did I get it right?

EDIT :

Some answers kindly indicate some errors in the code, and I made some corrections. Below is a complete working example that compiles below.
Using SqlConnection is, for example, only goals, it really is any IDisposable.

An example that compiles

 open System // Stand-in for SqlConnection type SimpeDisposable() = member this.getResults() = "Hello" interface IDisposable with member this.Dispose() = printfn "Disposing" // Alias SqlConnection to our dummy type SqlConnection = SimpeDisposable // gets the list of user names let users() : seq<string> = seq { for i = 0 to 100 do yield i.ToString() } // maps user names to the SqlConnections // this one uses one shared connection for each user let mapUsersToConnections (users: seq<string>) : seq<SqlConnection> = seq { use c = new SimpeDisposable() for u in users do yield c } // maps user names to the SqlConnections // this one uses new connection per each user let mapUsersToConnections2 (users: seq<string>) : seq<SqlConnection> = seq { for u in users do use c = new SimpeDisposable() yield c } // executes some "sql" against the given connection and returns some result let mapConnectionToResult (conn:SqlConnection) : string = conn.getResults() // print the result let print (result) : unit = printfn "%A" result // and here is the main program - using shared connection printfn "Using shared connection" users() |> mapUsersToConnections |> Seq.map mapConnectionToResult |> Seq.iter print // and here is the main program - using individual connections printfn "Using individual connection" users() |> mapUsersToConnections2 |> Seq.map mapConnectionToResult |> Seq.iter print

Results:

General connection: "Hello" "Hello" ... "Positioning"

Individual connections: "Hello" "Positioning" "Hello" "Positioning"

+8

c # functional-programming idisposable f # sequences

Komrade P. Jul 21 '11 at 15:51

source share

8 answers

 // C# -- new connection for each users IEnumerable<SqlConnection> mapUserToConnection(string user) { while (true) using (var conn = new SqlConnection()) { yield return conn; } }

This does not look right for me: you have to dispose of the connection as soon as the next user asks for the next user (next iteration cycle) - this means that these connections can be used only one after another - as soon as user B starts working with his connection, the connection User A will be deleted. Is this really what you want?

+4

Brokenglass Jul 21 '11 at 15:57

source share

The F # sample does not check the type (even if you add a dummy implementation to your functions, for example, using failwith ). I assume that your userToConnection and connectionToResult functions actually accept the same user to the same connection with the same result. (Instead of working with sequences, as in your example):

 // gets the list of user names let users() : seq<string> = failwith "!" // maps user name to the SqlConnection let userToConnection (user:string) : SqlConnection = failwith "!" // executes some sql against the given connection and returns some result let connectionToResult (conn:SqlConnection) : string = failwith "!" // print the result let print (result:string) : unit = ()

Now, if you want access to the private connection to be userToConnection , you can change it so that it does not return a SqlConnection connection. Instead, it can return a higher-order function that provides a connection to some function (which will be indicated in the next step) and, after calling the function, removes the connection. Something like:

 let userToConnection (user:string) (action:SqlConnection -> 'R) : 'R = use conn = new SqlConnection("...") action conn

You can use currying, so when you write userToConnection user , you will get a function that expects a function and returns the result: (SqlConnection -> 'R) -> 'R Then you can compose your general function as follows:

 // and here is the main program users() |> Seq.map userToConnection |> Seq.map (fun f -> // We got a function that we can provide with our specific behavior // it runs it (giving it the connection) and then closes connection f connectionToResult) |> Seq.iter print

I'm not quite sure if you want to map one user to one connection, etc., but you can use exactly the same principle (with return functions), even if you work with collections of collections.

+4

Tomas petricek Jul 21 '11 at 16:17

source share

I think there is a huge room for improvement in this. This is not like your code should compile, since mapUserToConnection returns a sequence, and mapConnectionToResult accepts the connection (changing this map parameter to collect fix this).

I don’t understand if the user should match multiple connections or if there is one connection for each user. In the latter case, it seems unnecessary to return a single-point sequence for each user.

This is usually a bad idea to return an IDisposable from a sequence, since you cannot control when an item is located. A better approach is to limit the scope of IDisposable one function. This "control" function can take a callback that uses the resource, and after calling the callback, it can manage the resource ( using is an example of this). In your case, combining mapUserToConnection and mapConnectionToResult can completely fix the problem, since the function can control the connection lifetime.

You would end up with something like this:

 users |> Seq.map mapUserToResult |> Seq.iter print

where mapUserToResult is string -> string (accepts the user and returns the result, thereby controlling the lifetime of each connection).

+3

Daniel Jul 21 '11 at 16:14

source share

None of this looks completely fair to me - for example, why are you returning a connection sequence for a single username? Your signature did not want to look like this (written as an extension method for Linq-ness):

 IEnumerable<SqlConnection> mapUserToConnection(this IEnumerable<string> Usernames)

Anayway, moving on - in the first example:

 using (var conn = new SqlConnection()) { while (true) { yield return conn; } }

This will work, but only if the entire collection is listed. If (for example) only the first element is executed, then the connection will not be deleted (at least in C #), see Exit and use - your Dispose cannot call! .

The second example seems to work fine for me, but I had problems with code that did a similar thing, and as a result the connections were removed when they shouldn't.

As a rule, I found that combining dispose and yield return is a complex business, and I try to avoid it in favor of implementing my own counter, which explicitly implements both IDisposable and IEnumerable . Thus, you can be sure that the objects will be deleted.

+1

Justin Jul 21 '11 at 16:12

source share

Dispose should be called by someone who can ensure that the object is no longer in use. If you can make this guarantee (say, the only time an object is used in your method), then your task is to get rid of it. If you cannot guarantee that the object is done (let's say you subject the objects to an iterator), then your work does not worry about it and allows them to cope with it.

For a possible design decision, you can follow what the CLR does for Stream instances. Many constructors other than a Stream also accept a bool . If this bool value is true, then the object knows that it is responsible for ordering the Stream after it is executed. If you return an iterator, you can return the Disposable,bool type instead of tuple .

However, I would take a deeper look at the actual problem that you are facing. Perhaps instead of worrying about such things, you need to change your architecture to avoid these problems. For example, instead of having a database for each user, there was one database. Or maybe you need to use the connection pool to reduce the burden on live but inactive connections (I’m not 100% researching this for the last time).

+1

Guvante Jul 21 '11 at 17:09

source share

Trying to solve this problem only with functional constructs is IMO - a good example of the F # trap. Purely functional languages usually use immutable data structures. .NET-based F # can often not be very good for things like performance.

My solution to this problem is to isolate the imperative bit of creating and destroying the SqlConnection object in its own function. In this case, we will use useUserConnection for this:

 let users() : seq<string> = // ... /// Takes a function that uses a user connection to the database let useUserConnection connectionUser user = use conn = // ... connectionUser conn let mapConnectionToResult conn = // ... *conn is not disposed of here* // Function currying is used here let mapUserToResult = useUserConnection mapConnectionToResult let print result = // ... // Main program users() |> Seq.map mapUserToResult |> Seq.iter print

0

Tristan st-cyr Jul 21 '11 at 18:12

source share

I believe that there is a design problem. If you look at the expression about the problem, then this is user information. The user is presented as a string, and information is also presented as a string. So, we need a function like:

 let getUserInfo (u:string) : string = <some code here>

Using this is as simple as:

 users() |> Seq.map getUserInfo

Now, how this function obtains information about the user depends on this function, whether it uses SqlConnection, a file stream or any other object that may be one-time or not, this function is responsible for creating the connection and the correct processing of resources. In your code, you completely separate the creation of the connection and the extraction of the information parts, which causes this confusion as to who controls the connection.

Now, if you want to use one connection that will be used by all getUserInfo methods, you can make this method as

 let getUserInfoFromConn (c:SqlConnection) (u:string) : string = <some code here>

Now this function accepts the connection (or can accept any other one-time object). In this case, this function will not delete the connection object, and the caller of this function will disable it. We can use this as:

 use conn = new SqlConnection() users() |> Seq.map (conn |> getUserInfoFromConn)

All this makes it clear who processes the resources.

0

Ankur Jul 22 '11 at 6:17

source share

Dax fohl · Accepted Answer · 2011-07-21T16:30:24+0000

I would avoid this approach because the structure would not work if an involuntary user of your library did something like

 users() |> Seq.map userToCxn |> Seq.toList() //oops disposes connections |> List.map .... // uses disposed cxns . . ..

I am not an expert in this matter, but I would suggest that it is better not to have sequences / IEnumerables guessing things after they are received, because calling intermediate ToList () will lead to different results than just acting on the sequence directly - DoSomething (GetMyStuff ()) will be different from DoSomething (GetMyStuff (). ToList ()).

Actually, why not just use sequence expressions for all of this, since it completely circumvented this problem:

 seq{ for user in users do use cxn = userToCxn user yield cxnToResult cxn }

(Where userToCxn and cxnToResult are simple one-to-one non-decreasing functions). It seems more readable than anything, and should give the desired results, it is parallelizable and works for any one-time use. This can be translated into C # LINQ using the following method: http://solutionizing.net/2009/07/23/using-idisposables-with-linq/

 from user in users from cxn in UserToCxn(user).Use() select CxnToResult(cxn)

Another example of this is simply defining the function "getSomethingForAUserAndDisposeTheResource" and then using this as the main building block:

 let getUserResult selector user = use cxn = userToCxn user selector cxn

Once you have this, you can easily build from there:

  //first create a selector let addrSelector cxn = cxn.Address() //then use it like this: let user1Address1 = getUserResult addrSelector user1 //or more idiomatically: let user1Address2 = user1 |> getUserResult addrSelector //or just query dynamically! let user1Address3 = user1 |> getUserResult (fun cxn -> cxn.Address()) //it can be used with Seq.map easily too. let addresses1 = users |> Seq.map (getUserResult (fun cxn -> cxn.Address())) let addresses2 = users |> Seq.map (getUserResult addrSelector) //if you are tired of Seq.map everywhere, it easy to create your own map function let userCxnMap selector = Seq.map <| getUserResult selector //use it like this: let addresses3 = users |> userCxnMap (fun cxn -> cxn.Address()) let addresses4 = users |> userCxnMap addrSelector

Thus, you are not required to retrieve the entire sequence if all you want is a single user. I think the lesson learned here makes your basic functions simple, and it makes it easier to collect abstractions on top of it. And note that none of these options will work if you do a ToList somewhere in between.

IEnumerable <IDisposable>: who manages what and when - did I get it right?

More articles: