Here's a solution with an emphasis on custom validation and error handling for each field. This may be redundant for a data file consisting only of numerical data!
Firstly, for such things I like to use the parser in Microsoft.VisualBasic.dll , as it is already available without using NuGet.
For each row, we can return an array of fields and a row number (for error messages)
#r "Microsoft.VisualBasic.dll" // for each row, return the line number and the fields let parserReadAllFields fieldWidths textReader = let parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader=textReader) parser.SetFieldWidths fieldWidths parser.TextFieldType <- Microsoft.VisualBasic.FileIO.FieldType.FixedWidth seq {while not parser.EndOfData do yield parser.LineNumber,parser.ReadFields() }
Next, we need a small error handling library (for more details see http://fsharpforfunandprofit.com/rop/ )
type Result<'a> = | Success of 'a | Failure of string list module Result = let succeedR x = Success x let failR err = Failure [err] let mapR f xR = match xR with | Success a -> Success (fa) | Failure errs -> Failure errs let applyR fR xR = match fR,xR with | Success f,Success x -> Success (fx) | Failure errs,Success _ -> Failure errs | Success _,Failure errs -> Failure errs | Failure errs1, Failure errs2 -> Failure (errs1 @ errs2)
Then define your domain model. In this case, it is a record type with a field for each field in the file.
type MyRecord = {id:int; name:string; description:string}
And then you can define your domain-specific analysis code. For each field, I created a validation function ( validateId , validateName , etc.). Fields that do not need to be validated can pass through raw data ( validateDescription ).
In fieldsToRecord various fields are combined using the applicative style ( <!> And <*> ). See http://fsharpforfunandprofit.com/posts/elevated-world-3/#validation for more on this.
Finally, readRecords maps each line of input to a Result record and selects only successful ones. Bad entries are written to the handleResult .
module MyFileParser = open Result let createRecord id name description = {id=id; name=name; description=description} let validateId (lineNo:int64) (fields:string[]) = let rawId = fields.[0] match System.Int32.TryParse(rawId) with | true, id -> succeedR id | false, _ -> failR (sprintf "[%i] Can't parse id '%s'" lineNo rawId) let validateName (lineNo:int64) (fields:string[]) = let rawName = fields.[1] if System.String.IsNullOrWhiteSpace rawName then failR (sprintf "[%i] Name cannot be blank" lineNo ) else succeedR rawName let validateDescription (lineNo:int64) (fields:string[]) = let rawDescription = fields.[2] succeedR rawDescription // no validation let fieldsToRecord (lineNo,fields) = let (<!>) = mapR let (<*>) = applyR let validatedId = validateId lineNo fields let validatedName = validateName lineNo fields let validatedDescription = validateDescription lineNo fields createRecord <!> validatedId <*> validatedName <*> validatedDescription /// print any errors and only return good results let handleResult result = match result with | Success record -> Some record | Failure errs -> printfn "ERRORS %A" errs; None /// return a sequence of records let readRecords parserOutput = parserOutput |> Seq.map fieldsToRecord |> Seq.choose handleResult
Here is an example of analysis in practice:
// Set up some sample text let text = """01name1description1 02name2description2 xxname3badid------- yy badidandname """ // create a low-level parser let textReader = new System.IO.StringReader(text) let fieldWidths = [| 2; 5; 11 |] let parserOutput = parserReadAllFields fieldWidths textReader // convert to records in my domain let records = parserOutput |> MyFileParser.readRecords |> Seq.iter (printfn "RECORD %A") // print each record
The result will look like this:
RECORD {id = 1; name = "name1"; description = "description";} RECORD {id = 2; name = "name2"; description = "description";} ERRORS ["[3] Can't parse id 'xx'"] ERRORS ["[4] Can't parse id 'yy'"; "[4] Name cannot be blank"]
This is by no means the most efficient way to analyze a file (I think that some CSV analyzing libraries are available in NuGet that can perform validation on parsing), but it shows how you can have full control over checking and processing errors if you need it. necessary.