Is there a faster way to compare data between dynamic arrays in VBA?

I have the code below that works successfully, however it should work on two arrays of 130k + lines each. The current run time in the full data set is about 24 minutes, and adding an account at one point, it bypasses 9.8 billion times.

I read articles about using Match, Vlookup, and they all all seem to suggest that the iterative loop (as I used) is the fastest method, however I couldn’t figure out how to make other methods work with dynamic arrays and therefore the test is appropriate way.

Can someone tell me if there is a faster way to complete this activity, and if so, demonstrate how?

Sub TESTVLOOKUPARRAY() Dim PSORG1() As Variant Dim PSORG1Tot As Variant Dim PSORG1RT As Variant Dim PSORG2() As Variant Dim PSORG2Tot As Variant Dim PSORG2RT As Variant Sheets("Sheet1").Select PSORG2RT = Application.CountA(Range("A:A")) PSORG2Tot = "A1:B" & PSORG2RT PSORG2 = Range(PSORG2Tot) ' PSORG2 is now an allocated array Sheets("Sheet2").Select PSORG1RT = Application.CountA(Range("A:A")) PSORG1Tot = "A1:B" & PSORG1RT PSORG1 = Range(PSORG1Tot) ' PSORG1 is now an allocated array a = 2 ' to increment ORG values in PSORG1 Do Finish = "No" b = 1 ' to increment ORG values in PSORG2 Do If PSORG1(a, 1) = PSORG2(b, 1) Then PSORG1(a, 2) = PSORG2(b, 2) Finish = "True" ElseIf b = PSORG2RT Then PSORG1(a, 2) = "NULL" Finish = "True" End If b = b + 1 Loop Until Finish = "True" a = a + 1 Loop Until a = PSORG1RT + 1 Sheets("Sheet2").Select Set Destination = Range("A1") Destination.Resize(UBound(PSORG1, 1), UBound(PSORG1, 2)).Value = PSORG1 End Sub 
+5
source share
3 answers

I agree with the Scripting.Dictionary method.

This procedure uses Scripting.Dictionry. You need to go to VBE Tools ► Links and add a link to the Microsoft Scripting Runtime executable file.

 Sub TESTVLOOKUPARRAY() Dim PSORG1 As Variant, PSORG2 As Variant Dim a As Long, b As Long Dim dPSORG2 As New Scripting.dictionary dPSORG2.CompareMode = TextCompare Debug.Print Timer With Sheets("Sheet1") a = .Cells(Rows.Count, 1).End(xlUp).Row PSORG2 = .Cells(1, 1).Resize(a, 2).Value2 ' PSORG2 is now an allocated array For b = LBound(PSORG2, 1) To UBound(PSORG2, 1) dPSORG2.Item(PSORG2(b, 1)) = PSORG2(b, 2) Next b End With With Sheets("Sheet2") a = .Cells(Rows.Count, 1).End(xlUp).Row PSORG1 = .Cells(1, 1).Resize(a, 2).Value2 ' PSORG1 is now an allocated array End With Debug.Print dPSORG2.Count Debug.Print LBound(PSORG2, 1) & ":" & UBound(PSORG2, 1) Debug.Print LBound(PSORG2, 2) & ":" & UBound(PSORG2, 2) Debug.Print LBound(PSORG1, 1) & ":" & UBound(PSORG1, 1) Debug.Print LBound(PSORG1, 2) & ":" & UBound(PSORG1, 2) For b = LBound(PSORG1, 1) To UBound(PSORG1, 1) If dPSORG2.Exists(PSORG1(b, 1)) Then PSORG1(b, 2) = dPSORG2.Item(PSORG1(b, 1)) Else PSORG1(b, 2) = "NULL" End If Next b With Sheets("Sheet2") .Cells(1, 1).Resize(UBound(PSORG1, 1), UBound(PSORG1, 2)) = PSORG1 End With Debug.Print Timer End Sub 

FWIW, my sample data from 110 thousand lines in the lines Sheet1 and 95K in Sheet2 was executed in 20 minutes, 40 seconds with the source code. The above data was 1.72 seconds with the same data.

+3
source

I think using dictionaries will make the code faster.

Below is code that does the same task, but it uses a Dictionary object. On my computer, it is about 100 times faster than your own code (it is tested on two sheets of 5 thousand lines each, for large data sets the gain should be even better).

 Public Function TestVLookupArray2() Dim dict As Object Dim result As Variant Dim i As Long Dim destination As Excel.Range 'Load values from Sheet1 into Dictionary. Set dict = getDataFromSheetAsDictionary(Sheets("Sheet1")) result = getDataFromSheet(Sheets("Sheet2")) For i = LBound(result, 1) To UBound(result, 1) With dict If .exists(result(i, 1)) Then result(i, 2) = .Item(result(i, 1)) Else result(i, 2) = "NULL" End If End With Next i With Sheets("Sheet2") Set destination = .Range(.Cells(1, 1), .Cells(UBound(result, 1), UBound(result, 2))) destination = result End With End Function Private Function getDataFromSheet(wks As Excel.Worksheet) As Variant Dim lastRow As Long With wks lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row getDataFromSheet = .Range(.Cells(1, 1), .Cells(lastRow, 2)) End With End Function Private Function getDataFromSheetAsDictionary(wks As Excel.Worksheet) As Object Dim i As Long Dim key As String Dim value As Variant Dim arr As Variant Set getDataFromSheetAsDictionary = VBA.CreateObject("Scripting.Dictionary") arr = getDataFromSheet(wks) With getDataFromSheetAsDictionary For i = LBound(arr, 1) To UBound(arr, 1) If Not .exists(arr(i, 1)) Then Call .Add(arr(i, 1), arr(i, 2)) End If Next i End With End Function 

Please note that this code consists of 3 separate functions, you need to include all of them.

Here is an article introducing dictionaries: http://www.techbookreport.com/tutorials/vba_dictionary.html

If you have any questions about this code, let me know in the comments.

+2
source

I used the for next loop as a pose for the do loop you use yourself.

 Sub speed_up2() Dim PSORG1() As Variant, PSORG2() As Variant Dim PSORG1Tot As Range, PSORG2Tot As Range, Destination As Range Dim PSORG1RT As Long, PSORG2RT As Long Dim wb As Workbook, ws_1 As Worksheet, ws_2 As Worksheet Dim i As Byte, j As Byte Set wb = ThisWorkbook Set ws_1 = wb.Sheets("Sheet1") Set ws_2 = wb.Sheets("Sheet2") with ws_1 PSORG2RT = .Cells(Rows.Count, 1).End(xlUp).Row ' Get last row Set PSORG2Tot = .Range("A1:B" & PSORG2RT) PSORG2 = PSORG2Tot ' PSORG2 is now an allocated array End With With ws_2 PSORG1RT = .Cells(Rows.Count, 1).End(xlUp).Row Set PSORG1Tot = .Range("A1:B" & PSORG1RT) PSORG1 = PSORG1Tot ' PSORG1 is now an allocated array End With For i = 1 To UBound(PSORG1) For j = 1 To UBound(PSORG2) PSORG1(i, 2) = "NULL" If PSORG1(i, 1) = PSORG2(j, 1) Then PSORG1(i, 2) = PSORG2(j, 2) Exit For End If Next j Next i Set Destination = ws_2.Range("A1") Destination.Resize(UBound(PSORG1, 1), UBound(PSORG1, 2)).Value = PSORG1 End Sub 

Please see here for the speed test that was performed between the for next loop and the do loop.

As stated in the article, the for next loop calculates the next iteration for you, while with the do loop you need to increase the iteration each time. This can save a lot of time.

I also made corrections to the method of getting the last line, which is used to create a range. This is my personal preference; It can also be more secure than Application.COUNTA .

+1
source

All Articles