Get data type from values ​​passed as a string

I am writing a framework that will connect to many different types of data sources and return values ​​from these sources. Lightweight are SQL, Access, and Oracle. More stringent are Sharepoint, CSV.

If I return values ​​from text sources, I would like to determine the data type of the data.

Since CSV is all text, there is no metadata for the survey, I will need to somehow analyze the data to determine the data type.

Example:

The list of "truth", "truth", "false", "false" will be logical
List "1", "0", "1", "0" will be logical List "1", "4", "-10", "500" will be an integer List "15.2", "2015.5896", "1.0245" , "500" will be double
The list of "2001/01/01", "2010/05/29 12:00", "1989/12/25 10:34:21" will be datetime

It is based on https://stackoverflow.com/a/312947/

object ParseString(string str) { Int32 intValue; Int64 bigintValue; double doubleValue; bool boolValue; DateTime dateValue; // Place checks higher in if-else statement to give higher priority to type. if (Int32.TryParse(str, out intValue)) return intValue; else if (Int64.TryParse(str, out bigintValue)) return bigintValue; else if (double.TryParse(str, out doubleValue)) return doubleValue; else if (bool.TryParse(str, out boolValue)) return boolValue; else if (DateTime.TryParse(str, out dateValue)) return dateValue; else return str; } 

Edit: I only need to do the following:

 BIT DATETIME INT NVARCHAR(255) NVARCHAR(MAX) BIGINT DECIMAL(36, 17) 

Can you see any possible priority improvement?

+7
source share
5 answers

I came up with the following solution that works:

 enum dataType { System_Boolean = 0, System_Int32 = 1, System_Int64 = 2, System_Double = 3, System_DateTime = 4, System_String = 5 } private dataType ParseString(string str) { bool boolValue; Int32 intValue; Int64 bigintValue; double doubleValue; DateTime dateValue; // Place checks higher in if-else statement to give higher priority to type. if (bool.TryParse(str, out boolValue)) return dataType.System_Boolean; else if (Int32.TryParse(str, out intValue)) return dataType.System_Int32; else if (Int64.TryParse(str, out bigintValue)) return dataType.System_Int64; else if (double.TryParse(str, out doubleValue)) return dataType.System_Double; else if (DateTime.TryParse(str, out dateValue)) return dataType.System_DateTime; else return dataType.System_String; } /// <summary> /// Gets the datatype for the Datacolumn column /// </summary> /// <param name="column">Datacolumn to get datatype of</param> /// <param name="dt">DataTable to get datatype from</param> /// <param name="colSize">ref value to return size for string type</param> /// <returns></returns> public Type GetColumnType(DataColumn column, DataTable dt, ref int colSize) { Type T; DataView dv = new DataView(dt); //get smallest and largest values string colName = column.ColumnName; dv.RowFilter = "[" + colName + "] = MIN([" + colName + "])"; DataTable dtRange = dv.ToTable(); string strMinValue = dtRange.Rows[0][column.ColumnName].ToString(); int minValueLevel = (int)ParseString(strMinValue); dv.RowFilter = "[" + colName + "] = MAX([" + colName + "])"; dtRange = dv.ToTable(); string strMaxValue = dtRange.Rows[0][column.ColumnName].ToString(); int maxValueLevel = (int)ParseString(strMaxValue); colSize = strMaxValue.Length; //get max typelevel of first n to 50 rows int sampleSize = Math.Max(dt.Rows.Count, 50); int maxLevel = Math.Max(minValueLevel, maxValueLevel); for (int i = 0; i < sampleSize; i++) { maxLevel = Math.Max((int)ParseString(dt.Rows[i][column].ToString()), maxLevel); } string enumCheck = ((dataType)maxLevel).ToString(); T = Type.GetType(enumCheck.Replace('_', '.')); //if typelevel = int32 check for bit only data & cast to bool if (maxLevel == 1 && Convert.ToInt32(strMinValue) == 0 && Convert.ToInt32(strMaxValue) == 1) { T = Type.GetType("System.Boolean"); } if (maxLevel != 5) colSize = -1; return T; } 
+12
source

Since Dimi set the bounty and needs a more "modern" solution, I will try to provide it. First, what do we need from a reasonable class that converts strings to different things?

Reasonable behavior with basic types.

View cultural information, especially when converting numbers and dates.

The ability to extend logic with custom converters, if necessary.

As a bonus, avoid long if-chains as they are quite error prone.

 public class StringConverter { // delegate for TryParse(string, out T) public delegate bool TypedConvertDelegate<T>(string value, out T result); // delegate for TryParse(string, out object) private delegate bool UntypedConvertDelegate(string value, out object result); private readonly List<UntypedConvertDelegate> _converters = new List<UntypedConvertDelegate>(); // default converter, lazyly initialized private static readonly Lazy<StringConverter> _default = new Lazy<StringConverter>(CreateDefault, true); public static StringConverter Default => _default.Value; private static StringConverter CreateDefault() { var d = new StringConverter(); // add reasonable default converters for common .NET types. Don't forget to take culture into account, that's // important when parsing numbers\dates. d.AddConverter<bool>(bool.TryParse); d.AddConverter((string value, out byte result) => byte.TryParse(value, NumberStyles.Integer, d.Culture, out result)); d.AddConverter((string value, out short result) => short.TryParse(value, NumberStyles.Integer, d.Culture, out result)); d.AddConverter((string value, out int result) => int.TryParse(value, NumberStyles.Integer, d.Culture, out result)); d.AddConverter((string value, out long result) => long.TryParse(value, NumberStyles.Integer, d.Culture, out result)); d.AddConverter((string value, out float result) => float.TryParse(value, NumberStyles.Number, d.Culture, out result)); d.AddConverter((string value, out double result) => double.TryParse(value, NumberStyles.Number, d.Culture, out result)); d.AddConverter((string value, out DateTime result) => DateTime.TryParse(value, d.Culture, DateTimeStyles.None, out result)); return d; } // public CultureInfo Culture { get; set; } = CultureInfo.CurrentCulture; public void AddConverter<T>(Predicate<string> match, Func<string, T> converter) { // create converter from match predicate and convert function _converters.Add((string value, out object result) => { if (match(value)) { result = converter(value); return true; } result = null; return false; }); } public void AddConverter<T>(Regex match, Func<string, T> converter) { // create converter from match regex and convert function _converters.Add((string value, out object result) => { if (match.IsMatch(value)) { result = converter(value); return true; } result = null; return false; }); } public void AddConverter<T>(TypedConvertDelegate<T> constructor) { // create converter from typed TryParse(string, out T) function _converters.Add(FromTryPattern<T>(constructor)); } public bool TryConvert(string value, out object result) { if (this != Default) { // if this is not a default converter - first try convert with default if (Default.TryConvert(value, out result)) return true; } // then use local converters. Any will return after the first match object tmp = null; bool anyMatch = _converters.Any(c => c(value, out tmp)); result = tmp; return anyMatch; } private static UntypedConvertDelegate FromTryPattern<T>(TypedConvertDelegate<T> inner) { return (string value, out object result) => { T tmp; if (inner.Invoke(value, out tmp)) { result = tmp; return true; } else { result = null; return false; } }; } } 

Use this:

 static void Main(string[] args) { // set culture to invariant StringConverter.Default.Culture = CultureInfo.InvariantCulture; // add custom converter to default, it will match strings starting with CUSTOM: and return MyCustomClass StringConverter.Default.AddConverter(c => c.StartsWith("CUSTOM:"), c => new MyCustomClass(c)); var items = new[] {"1", "4343434343", "3.33", "true", "false", "2014-10-10 22:00:00", "CUSTOM: something"}; foreach (var item in items) { object result; if (StringConverter.Default.TryConvert(item, out result)) { Console.WriteLine(result); } } // create new non-default converter var localConverter = new StringConverter(); // add custom converter to parse json which matches schema for MySecondCustomClass localConverter.AddConverter((string value, out MySecondCustomClass result) => TryParseJson(value, @"{'value': {'type': 'string'}}", out result)); { object result; // check if that works if (localConverter.TryConvert("{value: \"Some value\"}", out result)) { Console.WriteLine(((MySecondCustomClass) result).Value); } } Console.ReadKey(); } static bool TryParseJson<T>(string json, string rawSchema, out T result) where T : new() { // we are using Newtonsoft.Json here var parsedSchema = JsonSchema.Parse(rawSchema); JObject jObject = JObject.Parse(json); if (jObject.IsValid(parsedSchema)) { result = JsonConvert.DeserializeObject<T>(json); return true; } else { result = default(T); return false; } } class MyCustomClass { public MyCustomClass(string value) { this.Value = value; } public string Value { get; private set; } } public class MySecondCustomClass { public string Value { get; set; } } 
+8
source
  List<Type> types = new List<Type>(new Type[] { typeof(Boolean) , typeof(int) , typeof(double) , typeof(DateTime) }); string t = "true"; object retu; foreach (Type type in types) { TypeConverter tc = TypeDescriptor.GetConverter(type); if (tc != null) { try { object obj = tc.ConvertFromString(t); // your return value; } catch (Exception) { continue; } } } 
+3
source

Would it be easier to store it in a common data type using .ToInt16 (),. ToInt32 () ,. ToBool (), etc.? If you write an application that expects an int and it receives a boolean, it will fail, so it would be better if the programmer explicitly converted the expected data type.

The problem with your approach is that you don’t know if a line containing 0 as the first element will contain -100000 as position number 100. This means that you cannot perform a successful conversion until all the lines have been TryParsed by all different data types. Very expensive operation!

If anything, I will use pre-compiled regular expressions and / or custom logic to process the data. For example, iterating over all rows to find the highest / lowest number, the appearance of a row, etc.

+1
source

Starting with the narrowest types and working towards the widest, may not be the best approach. If I knew anything about data, I would start with the most common type and work at least. If, if I did not know this, I could or could not conduct some kind of research to understand that this can be statistically, if possible. I would just think about it. Why test a bit or datetime earlier if you only expect them to happen once every 10,000 records?

+1
source

All Articles