In F # How to use Seq.unfold in the context of a larger pipeline?

I have a CSV file with two columns, text and counter. The goal is to convert the file from this:

some text once,1 some text twice,2 some text thrice,3 

For this:

 some text once,1 some text twice,1 some text twice,1 some text thrice,1 some text thrice,1 some text thrice,1 

repeating each number of row counts and increasing the number of samples across many rows.

This seems like a good candidate for Seq.unfold, generating extra lines when we read the file. I have the following generator function:

 let expandRows (text:string, number:int32) = if number = 0 then None else let element = text // "element" will be in the generated sequence let nextState = (element, number-1) // threaded state replacing looping Some (element, nextState) 

FSI gives the following function signature:

 val expandRows : text:string * number:int32 -> (string * (string * int32)) option 

Doing the following in FSI:

 let expandedRows = Seq.unfold expandRows ("some text thrice", 3) 

gives the expected value:

 val it : seq<string> = seq ["some text thrice"; "some text thrice"; "some text thrice"] 

The question arises: how to connect this in the context of a larger ETL pipeline? For example:

 File.ReadLines(inFile) |> Seq.map createTupleWithCount |> Seq.unfold expandRows // type mismatch here |> Seq.iter outFile.WriteLine 

The following is the expandRows error in the pipeline context.

 Type mismatch. Expecting a 'seq<string * int32> -> ('a * seq<string * int32>) option' but given a 'string * int32 -> (string * (string * int32)) option' The type 'seq<string * int 32>' does not match the type 'string * int32' 

I expected expandRows to return a row string, as in my sandbox test. Since this is not "Waiting" or "given," I am confused. Can someone point me in the right direction?

The bottom line for the code is here: https://gist.github.com/akucheck/e0ff316e516063e6db224ab116501498

+7
f #
source share
3 answers

Seq.map creates a sequence, but Seq.unfold does not accept a sequence, it takes a single value. Thus, you cannot directly pass the output of Seq.map to Seq.unfold . You need to do this element by element.

But then for each element, your Seq.unfold will generate a sequence, so the end result will be a sequence of sequences. You can collect all these "subsequences" in one sequence using Seq.collect :

 File.ReadLines(inFile) |> Seq.map createTupleWithCount |> Seq.collect (Seq.unfold expandRows) |> Seq.iter outFile.WriteLine 

Seq.collect accepts a function and input sequence. For each element of the input sequence, it is assumed that the function will create a different sequence, and Seq.collect combine all these sequences in one. You can think of Seq.collect as Seq.map and Seq.concat combined into one function. Also, if you are leaving with C #, Seq.collect is called SelectMany there.

+6
source share

In this case, since you just want to repeat the value several times, there is no reason to use Seq.unfold . Instead, you can use Seq.replicate :

 // 'a * int -> seq<'a> let expandRows (text, number) = Seq.replicate number text 

You can use Seq.collect to create it:

 File.ReadLines(inFile) |> Seq.map createTupleWithCount |> Seq.collect expandRows |> Seq.iter outFile.WriteLine 

In fact, the only work done by this version of expandRows is to "unpack" the tuple and create its values ​​in curry.

While F # does not have such a common function in its main library, you can easily define it (and other similar useful functions )

 module Tuple2 = let curry fxy = f (x, y) let uncurry f (x, y) = fxy let swap (x, y) = (y, x) 

This will allow you to compose your pipeline from well-known function blocks:

 File.ReadLines(inFile) |> Seq.map createTupleWithCount |> Seq.collect (Tuple2.swap >> Tuple2.uncurry Seq.replicate) |> Seq.iter outFile.WriteLine 
+6
source share

It looks like you really want to do

 File.ReadLines(inFile) |> Seq.map createTupleWithCount |> Seq.map (Seq.unfold expandRows) // Map each tuple to a seq<string> |> Seq.concat // Flatten the seq<seq<string>> to seq<string> |> Seq.iter outFile.WriteLine 

since it seems like you want to convert each tuple with the count in your sequence to seq<string> via Seq.unfold and expandRows . This is done by matching.

Then you want to collapse your seq<seq<string>> to a large seq<string> , which does not work through Seq.concat .

+2
source share

All Articles