Automatically print a column in HtmlProvider <...>. Table.Row>

I am using FSharp.Data HTMLProvider to retrieve table rows:

 let [<Literal>] URL = "../DailyResultsType.html" type RawResults = HtmlProvider<URL> let results = RawResults.Load(URL).Tables let dailySeq = results.Table2.Rows |> Seq.tail 

Looping through rows ( row - seq<HtmlProvider<...>.Table2.Row> ):

 for row in dailySeq do printfn "%A" row 

Results in:

 (1, nan, nan, 2) (1, nan, nan, 3) ~~~ 

Columns of type # 2 and # 3 are automatically assigned by the provider as decimal and double , since HTML contains strings such as "$ 12.00" or "$ 12".

  • Can I dynamically change the type of columns in the type returned from HtmlProvider<URL> at runtime (i.e. double to string) (but I would prefer a numeric type so that I can divide the results)

  • Or apply runtime string conversion to the values ​​in these columns to remove non-digital characters so that they are valid decimal / double / int types ...

  • Or I skipped the basic concept (most likely since I am F # noobie)

+7
source share
1 answer

I think you should try setting PreferOptionals to true so that N / As will become zero and the rest of the numbers will be decimal.

type HtmlType = HtmlProvider<URL,PreferOptionals=true> or
type HtmlType = HtmlProvider<URL,PreferOptionals=true,Culture="en-US">

 let results = HtmlType.Load(URL) results.Tables.Table1.Rows // val it : HtmlProvider<...>.Table1.Row [] = // [|("Jill", "Smith", Some 50.0M); ("Eve", "Jackson", Some 100000M); // ("John", "Doe", Some 100M); ("Jane", "Doe", null)|] 

If there is no missing value in the table, and I skip PreferOptionals and Culture, then I get the following output:

 //val it : HtmlProvider<...>.Table1.Row [] = //[|("Jill", "Smith", 50.0M); ("Eve", "Jackson", 100000M); //("John", "Doe", 100M)|] 

By the way, I could be wrong, but I could not find anything where you can specify the table schema, as in the csv provider. Therefore, as soon as I receive the data, I will just work with an array of tuples, if there are not many elements that should be simple. You can use string if necessary, or pass it directly to Deedle ( rows |> Frame.ofRecords ).

I used the following sample table.

 <table style="width:100%"> <tr>   <th>Firstname</th>   <th>Lastname</th>   <th>Age</th> </tr> <tr>  <td>Jill</td>   <td>Smith</td>   <td>$50.0</td> </tr> <tr>   <td>Eve</td>   <td>Jackson</td>   <td>$100,000</td> </tr> <tr>   <td>John</td>   <td>Doe</td>   <td>$100</td> </tr> <tr>   <td>Jane</td>   <td>Doe</td>   <td>N/A</td> </tr> </table> 
0
source

All Articles