I have a large list of GUIDs and other data from which I need to pull a subset, what is the fastest way to do this?

I have a list of audit data from Dynamics CRM 2013 that I deserialised and stuck in a HashSet that is defined as:

private class AuditCache
{
    public Guid ObjectId;
    public int HistoryId;
    public DateTime? DateFrom;
    public DateTime? DateTo;
    public string Value;
};
private HashSet<AuditCache> _ac = new HashSet<AuditCache>();

I am adding such data (from the SQL Server recordset):

_ac.Add(new AuditCache{ 
    ObjectId = currentObjectId,
    HistoryId = Convert.ToInt32(dr["HistoryId"]),
    DateTo = Convert.ToDateTime(dr["CreatedOn"]),
    Value = value});

As a result, I get about half a million records.

Next, I need to iterate through each Guid and pull a subset of the data from my audit data that matches. I have a list of Guides that I create elsewhere, and their processing is about 300,000. I store them in this:

var workList = new Dictionary<Guid, DateTime>();

... and iterate over them like this:

foreach (var g in workList)

Then I need to do this to pull out a subset for each Guid:

List<AuditCache> currentSet = _ac.Where(v => v.ObjectId == g.Key).ToList();

But he is slow.

1 , ( , 1% ), , .

, , , , Guid. , : / (?) / ?

: , /, - , Dynamics CRM. , "" - , , ?

, , (371 901 Guids), 1000 . , /INSERT SQL Server, .

Method #0 - List with Lambda    ~30.00s per 1,000 rows (I never benchmarked this precisely)
Method #1 - IntersectWith        40.24s per 1,000 rows (cloning my Hashset spoilt this)
Method #2 - BinarySearch          3.20s per 1,000 rows
Method #3 - Generic Dictionary    2.19s per 1,000 rows

, , , , , , , .

, . BinarySearch , , , , .

, IntersectWith "" , , .

+4
3

AuditCache GUID ( ), List<T>.BinarySearch ?

( 15 i3-3110M @2.4Ghz). :

  • : 892
  • : 9285
  • : 12055

BigInteger System.Numerics, 128- .

- , . , , , ( -1). :

class AuditCache
{
    public Guid ObjectId;
    public int HistoryId;
    public DateTime? DateFrom;
    public DateTime? DateTo;
    public string Value;
};

class AuditCacheComparer : IComparer<AuditCache>
{
    public int Compare(AuditCache x, AuditCache y)
    {
        BigInteger intx = new BigInteger(x.ObjectId.ToByteArray());
        BigInteger inty = new BigInteger(y.ObjectId.ToByteArray());
        if (intx < inty)
        {
            return -1;
        }
        else if (intx > inty)
        {
            return 1;
        }

        return 0;
    }
}

class Program
{

    static void Main(string[] args)
    {
        List<AuditCache> testCollection = new List<AuditCache>();
        Stopwatch sw = Stopwatch.StartNew();

        for (int i = 0; i != 1000000; ++i)
        {
            testCollection.Add(new AuditCache() { ObjectId = Guid.NewGuid(), HistoryId = i });
        }

        Console.WriteLine("Collection created: {0} ms", sw.ElapsedMilliseconds);

        AuditCacheComparer comparer = new AuditCacheComparer();
        testCollection.Sort(comparer);

        Console.WriteLine("Collection sorted: {0} ms", sw.ElapsedMilliseconds);

        for(int i = 0; i != 300000; ++ i)
        {
            var index = testCollection.BinarySearch(new AuditCache() {ObjectId = Guid.NewGuid()}, comparer);
            if (index > 0)
            {
                Console.WriteLine("Found: {0} ms", sw.ElapsedMilliseconds);
            }
        }

        Console.WriteLine("Lookup: {0} ms", sw.ElapsedMilliseconds);
        Console.ReadLine();
    }
}
+3

1 1,2 P4

HashSet
.

, , HashSet , HashSet , O (n). O (n + m), n - , m - .

, AuditCache Object GetHashCode, Equals Object.GetHashCode

GUID hash ( ), .

WorkList AuditCache ( AuditCache)
, Guid ObjectId (Equal GetHashCode)

GUID , ( hashset) - , ( ) . Guid, . .

1 1,2 P4 0,5 0,7

using System.Diagnostics;
namespace HashSetIntersect
{
    /// <summary>
    /// Interaction logic for MainWindow.xaml
    /// </summary>
    public partial class MainWindow : Window
    {
        public MainWindow()
        {
            InitializeComponent();
            Stopwatch sw = new Stopwatch();
            sw.Start();
            HashSet<AuditCache> TestHashKeys1 = new HashSet<AuditCache>();
            HashSet<AuditCache> TestHashKeys2 = new HashSet<AuditCache>();
            for (UInt32 i = 0; i < 1000000; i++)
            {
                Guid g = Guid.NewGuid();
                TestHashKeys1.Add(new AuditCache(g, 1, (DateTime?)null, (DateTime?)null, "value1"));
                if (i % 2 == 0) TestHashKeys2.Add(new AuditCache(g, 0, (DateTime?)null, (DateTime?)null, "value2"));
            }            
            Debug.WriteLine(TestHashKeys1.Count.ToString() + " " + TestHashKeys2.Count.ToString());
            sw.Stop();
            Debug.WriteLine(sw.ElapsedMilliseconds.ToString());
            sw.Restart();
            HashSet<AuditCache> TestHashKeys3 = new HashSet<AuditCache>(TestHashKeys1);
            sw.Stop();
            Debug.WriteLine(sw.ElapsedMilliseconds.ToString());
            sw.Restart();
            TestHashKeys3.IntersectWith(TestHashKeys2);
            sw.Stop();
            Debug.WriteLine(sw.ElapsedMilliseconds.ToString());
            foreach (AuditCache ac in TestHashKeys3)
            {
                Debug.WriteLine(ac.Value);
            }
        }
    }
    public abstract class HashKey : Object
    {
        public Guid ObjectId { get; private set; }
        public override bool Equals(object obj)
        {
            if (!(obj is HashKey)) return false;
            HashKey comp = (HashKey)obj;
            return this.ObjectId == comp.ObjectId;
        }

        public override int GetHashCode()
        {
            return ObjectId.GetHashCode();
        }
        public HashKey(Guid objectId)
        {
            ObjectId = objectId;
        }
    }
    public class TestHashKey : HashKey
    {
        public TestHashKey(Guid ObjectId)
            : base(ObjectId)
        { }
    }
    public class AuditCache : HashKey
    {
        public int HistoryId { get; private set; }
        public DateTime? DateFrom { get; private set; }
        public DateTime? DateTo { get; private set; }
        public string Value { get; private set; }
        public AuditCache(Guid objectId, int historyId, DateTime? dateFrom, DateTime? dateTo, string value)
            : base(objectId)
        {
            HistoryId = historyId;
            DateFrom = dateFrom;
            DateTo = dateTo;
            Value = value;
        }
    }
}
+4

, , :

private Dictionary<Guid, List<AuditCache>> _crmAudit = new Dictionary<Guid, List<AuditCache>>();

"" _ac list/hashset Guid :

_crmAudit.Add(lastObjectId, _ac);

, :

List<AuditCache> currentSet;
if (_crmAudit.TryGetValue(g.Key, out currentSet))
{
    List<AuditCache> sortedSet = currentSet.OrderBy(o => o.HistoryId).ToList();

.

, ; 3 - ( !).

, , , , . :

  • Guids;
  • 0-n Guid ( .

:

  • ;
  • Guid 0-n ( );
  • .

Indeed my solution should be:

  • ignore the first list;
  • sort the second list in Guid order;
  • keep popping lines, sequentially, from the second list until Guid changes;
  • process the data for this Guide, and then move on to the next Guide until the audit record is complete.

I will probably give it too.

Now I will start to try other offers (many thanks for them).

0
source

All Articles