I have a CSV file with two columns, text and counter. The goal is to convert the file from this:
some text once,1 some text twice,2 some text thrice,3
For this:
some text once,1 some text twice,1 some text twice,1 some text thrice,1 some text thrice,1 some text thrice,1
repeating each number of row counts and increasing the number of samples across many rows.
This seems like a good candidate for Seq.unfold, generating extra lines when we read the file. I have the following generator function:
let expandRows (text:string, number:int32) = if number = 0 then None else let element = text
FSI gives the following function signature:
val expandRows : text:string * number:int32 -> (string * (string * int32)) option
Doing the following in FSI:
let expandedRows = Seq.unfold expandRows ("some text thrice", 3)
gives the expected value:
val it : seq<string> = seq ["some text thrice"; "some text thrice"; "some text thrice"]
The question arises: how to connect this in the context of a larger ETL pipeline? For example:
File.ReadLines(inFile) |> Seq.map createTupleWithCount |> Seq.unfold expandRows
The following is the expandRows error in the pipeline context.
Type mismatch. Expecting a 'seq<string * int32> -> ('a * seq<string * int32>) option' but given a 'string * int32 -> (string * (string * int32)) option' The type 'seq<string * int 32>' does not match the type 'string * int32'
I expected expandRows to return a row string, as in my sandbox test. Since this is not "Waiting" or "given," I am confused. Can someone point me in the right direction?
The bottom line for the code is here: https://gist.github.com/akucheck/e0ff316e516063e6db224ab116501498