VBA, advanced filter and duplicate removal

I have a list full of different paths in col A. I have a list of parts in B and C.

As I can on a new sheet: 1) pull out each unique path, 2) for each path compile the values โ€‹โ€‹from B * C and remove duplicates. 3) repeat the next path after they are executed on the last line.

I have a faulty macro, but for the sake of brevity and accuracy I will not publish. If someone does not want to read it, please

enter image description here

Any help would be greatly appreciated.

Here is what I have (I understand it for a long time, I will try to clear it abit):

Sub FileDetail() 'Does not fill down, go to bottom to unleased fill down 'Skips unreadable files 'This Macro retrieves data from files picked. The data is based on header. Data is also filtered for unique values. 'You must make sure headers are in the first row and delimted. Dim wb As Workbook, fileNames As Object, errCheck As Boolean Dim ws As Worksheet, wks As Worksheet, wksSummary As Worksheet Dim y As Range, intRow As Long, i As Integer Dim r As Range, lr As Long, myrg As Range, z As Range Dim boolWritten As Boolean, lngNextRow As Long Dim intColNode As Integer, intColScenario As Integer Dim intColNext As Integer, lngStartRow As Long Dim lngLastNode As Long, lngLastScen As Long Dim intColinstrument As Integer, lngLastinstrument As Long 'Skipped worksheet for file names Dim wksSkipped As Worksheet Set wksSkipped = ThisWorkbook.Worksheets("Skipped") ' Turn off screen updating and automatic calculation With Application .ScreenUpdating = False .Calculation = xlCalculationManual End With ' Create a new worksheet, if required On Error Resume Next Set wksSummary = ActiveWorkbook.Worksheets("Unique data") On Error GoTo 0 If wksSummary Is Nothing Then Set wksSummary = ActiveWorkbook.Worksheets.Add(After:=ActiveWorkbook.Worksheets(ActiveWorkbook.Worksheets.Count)) wksSummary.Name = "Unique data" End If ' Set the initial output range, and assign column headers With wksSummary Set y = .Cells(.Rows.Count, 3).End(xlUp).Offset(1, 0) Set r = y.Offset(0, 1) Set z = y.Offset(0, -2) lngStartRow = y.Row .Range("A1:E1").Value = Array("File Name", "Sheet Name", "Node", "Book", "Instrument") End With 'get user input for files to search Set fileNames = CreateObject("Scripting.Dictionary") errCheck = UserInput.FileDialogDictionary(fileNames) If errCheck Then Exit Sub End If ''' For Each Key In fileNames 'loop through the dictionary On Error Resume Next Set wb = Workbooks.Open(fileNames(Key)) If Err.Number <> 0 Then Set wb = Nothing ' or set a boolean error flag End If On Error GoTo 0 ' or your custom error handler If wb Is Nothing Then wksSkipped.Cells(wksSkipped.Cells(wksSkipped.Rows.Count, "A").End(xlUp).Row + 1, 1) = fileNames(Key) Else Debug.Print "Successfully loaded " & fileNames(Key) wb.Application.Visible = False 'make it not visible ' more working with wb ' Check each sheet in turn For Each ws In ActiveWorkbook.Worksheets With ws ' Only action the sheet if it not the 'Unique data' sheet If .Name <> wksSummary.Name Then boolWritten = False ''''''''''''''''''testing additional column..trouble here ' Find the Anchor Date intColScenario = 0 On Error Resume Next intColScenario = WorksheetFunction.Match("instrument.instrumentType", .Rows(1), 0) On Error GoTo 0 If intColScenario > 0 Then ' Only action if there is data in column E If Application.WorksheetFunction.CountA(.Columns(intColScenario)) > 1 Then lr = .Cells(.Rows.Count, intColScenario).End(xlUp).Row ' Copy unique values from the formula column to the 'Unique data' sheet, and write sheet & file details .Range(.Cells(1, intColScenario), .Cells(lr, intColScenario)).AdvancedFilter xlFilterCopy, , r, True r.Offset(0, -2).Value = ws.Name r.Offset(0, -3).Value = ws.Parent.Name ' Delete the column header copied to the list r.Delete Shift:=xlUp boolWritten = True End If End If ''''''''''''''''''''''''''''''''''''below is working''''''''''''''''''''''' ' Find the Desk column intColNode = 0 On Error Resume Next intColNode = WorksheetFunction.Match("book.reportingLine.pathName", .Rows(1), 0) On Error GoTo 0 If intColNode > 0 Then ' Only action if there is data in column A If Application.WorksheetFunction.CountA(.Columns(intColNode)) > 1 Then lr = .Cells(.Rows.Count, intColNode).End(xlUp).Row ' Copy unique values from column A to the 'Unique data' sheet, and write sheet & file details (if not already written) .Range(.Cells(1, intColNode), .Cells(lr, intColNode)).AdvancedFilter xlFilterCopy, , y, True If Not boolWritten Then y.Offset(0, -1).Value = ws.Name y.Offset(0, -2).Value = ws.Parent.Name End If ' Delete the column header copied to the list y.Delete Shift:=xlUp End If End If ' Find the Intrument intColinstrument = 0 On Error Resume Next intColinstrument = WorksheetFunction.Match("instrument.instrumentType", .Rows(1), 0) On Error GoTo 0 If intColinstrument > 0 Then ' Only action if there is data in column A If Application.WorksheetFunction.CountA(.Columns(intColinstrument)) > 1 Then lr = .Cells(.Rows.Count, intColinstrument).End(xlUp).Row ' Copy unique values from column A to the 'Unique data' sheet, and write sheet & file details (if not already written) .Range(.Cells(1, intColinstrument), .Cells(lr, intColinstrument)).AdvancedFilter xlFilterCopy, , z, True If Not boolWritten Then z.Offset(0, -3).Value = ws.Name z.Offset(0, -4).Value = ws.Parent.Name End If ' Delete the column header copied to the list z.Delete Shift:=xlUp End If End If ' Identify the next row, based on the most rows used in columns C & D lngLastNode = wksSummary.Cells(wksSummary.Rows.Count, 3).End(xlUp).Row lngLastScen = wksSummary.Cells(wksSummary.Rows.Count, 4).End(xlUp).Row lngLastinstrument = wksSummary.Cells(wksSummary.Rows.Count, 5).End(xlUp).Row lngNextRow = WorksheetFunction.Max(lngLastNode, lngLastScen) + 1 If (lngNextRow - lngStartRow) > 1 Then ' Fill down the workbook and sheet names z.Resize(lngNextRow - lngStartRow, 2).FillDown ''''''''Optional if you want headers to be filled down. 'If (lngNextRow - lngLastNode) > 1 Then ' Fill down the last Node value 'wksSummary.Range(wksSummary.Cells(lngLastNode, 3), wksSummary.Cells(lngNextRow - 1, 3)).FillDown 'End If 'If (lngNextRow - lngLastScen) > 1 Then ' Fill down the last Scenario value 'wksSummary.Range(wksSummary.Cells(lngLastScen, 4), wksSummary.Cells(lngNextRow - 1, 4)).FillDown 'End If End If Set y = wksSummary.Cells(lngNextRow, 3) Set r = y.Offset(0, 1) Set z = y.Offset(0, -2) lngStartRow = y.Row End If End With Next ws wb.Close savechanges:=False 'close the workbook do not save Set wb = Nothing 'release the object End If Next 'End of the fileNames loop Set fileNames = Nothing ' Autofit column widths of the report wksSummary.Range("A1:E1").EntireColumn.AutoFit ' Reset system settings With Application .Calculation = xlCalculationAutomatic .ScreenUpdating = True .Visible = True End With End Sub 

Thus, this code gets the file name, sheet name and columns that I specify.

1) However, I was having problems adding extra columns. (I am currently getting 2 selected columns) as well

2) I am having problems placing in a format in which columns are based on each other. ex This will give me a unique value for each path, but not unique values โ€‹โ€‹for each sport.

Modify to include data (I would also like to include 4th and 5th columns, but for simplicity I saved it to 3):

 +-------------------------------+------------+--------------+ | path | sport | Teams | +-------------------------------+------------+--------------+ | stack/over/flow/larrybird | basketball | celtics | +-------------------------------+------------+--------------+ | stack/over/flow/michaeljordan | basketball | bulls | +-------------------------------+------------+--------------+ | stack/over/flow/tigerwoods | golf | pga | +-------------------------------+------------+--------------+ | stack/over/flow/josebautista | baseball | bluejays | +-------------------------------+------------+--------------+ | stack/over/flow/jordanspeith | golf | pga | +-------------------------------+------------+--------------+ | stack/over/flow/kevinlove | basketball | timberwolves | +-------------------------------+------------+--------------+ | stack/over/flow/lebronjames | basketball | cavs | +-------------------------------+------------+--------------+ | stack/over/flow/stephencurry | basketball | warriors | +-------------------------------+------------+--------------+ | stack/over/flow/larrybird | baseball | redsox | +-------------------------------+------------+--------------+ | stack/over/flow/michaeljordan | baseball | whitesox | +-------------------------------+------------+--------------+ | stack/over/flow/michaeljordan | chess | knight | +-------------------------------+------------+--------------+ | stack/over/flow/michaeljordan | basketball | hornets | +-------------------------------+------------+--------------+ | stack/over/flow/kevinlove | basketball | cavs | +-------------------------------+------------+--------------+ | stack/over/flow/tigerwoods | golf | pga | +-------------------------------+------------+--------------+ 

And the expected result (I included padding in it)

 +-------------------------------+------------+--------------+ | path | sport | teams | +-------------------------------+------------+--------------+ | stack/over/flow/larrybird | basketball | celtics | +-------------------------------+------------+--------------+ | | baseball | red sox | +-------------------------------+------------+--------------+ | stack/over/flow/tigerwoods | golf | pga | +-------------------------------+------------+--------------+ | stack/over/flow/michaeljordan | basketball | bulls | +-------------------------------+------------+--------------+ | | | hornets | +-------------------------------+------------+--------------+ | | baseball | whitesox | +-------------------------------+------------+--------------+ | | chess | knight | +-------------------------------+------------+--------------+ | stack/over/flow/kevinlove | basketball | timberwolves | +-------------------------------+------------+--------------+ | | | cavs | +-------------------------------+------------+--------------+ | stack/over/flow/josebautista | baseball | bluejays | +-------------------------------+------------+--------------+ 

This seems to be a problem for the 3rd (4th and 5th) columns with getting unique values.

+6
source share
4 answers

If you don't mind sorting the results, not in the original order, the following code will do this. It should "automatically adapt" to any number of columns.

(If you need the results in the original order, I would use the Collections or Dictionaries and User Defined Object approach)

Your data should start with A1 (with row 1 being the column label), and you can see where in the code you define the sheets for your source data and results.

Since most of the โ€œworkโ€ is done in the VBA array, not in the worksheet, it should be pretty fast.

enter image description here

 Option Explicit Sub SortFormat() Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range Dim vRes As Variant Dim R As Range, C As Range Dim V As Variant Dim I As Long, J As Long 'Set source and results worksheets, ranges Set wsSrc = Worksheets("Sheet1") Set wsRes = Worksheets("Sheet2") wsRes.Cells.Clear Set rRes = wsRes.Cells(1, 1) Application.ScreenUpdating = False 'Copy source data to results worksheet Dim LastRow As Long, LastCol As Long With wsSrc LastRow = .Cells(.Rows.Count, 1).End(xlUp).Row LastCol = .Cells(1, .Columns.Count).End(xlToLeft).Column Set R = .Range(.Cells(1, 1), .Cells(LastRow, LastCol)) R.Copy rRes Application.CutCopyMode = False End With 'Go to Results sheet With wsRes .Select .UsedRange.EntireColumn.AutoFit End With rRes.Select 'Sort the data With wsRes.Sort.SortFields .Clear Set R = wsRes.UsedRange.Columns For Each C In R .Add Key:=C, SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal Next C End With With wsRes.Sort .SetRange R .Header = xlYes .MatchCase = False .Orientation = xlTopToBottom .Apply End With 'Remove any completely duplicated rows 'Create array of columns ReDim V(0 To R.Columns.Count - 1) For I = 0 To UBound(V) V(I) = I + 1 Next I R.RemoveDuplicates Columns:=(V), Header:=xlYes 'Remove Duplicated items in each row 'Work in VBA array for more speed vRes = R For I = UBound(vRes, 1) To 3 Step -1 If vRes(I, 1) = vRes(I - 1, 1) Then vRes(I, 1) = "" For J = 2 To UBound(vRes, 2) If vRes(I, J) = vRes(I - 1, J) And _ vRes(I, J - 1) = "" Then vRes(I, J) = "" Next J Next I R = vRes Application.ScreenUpdating = True End Sub 
+1
source

A simple way would be to copy the entire range, sort it, and do some calculations:

 Sub Macro1() Application.ScreenUpdating = False Dim str As String With Sheet1 str = .Range(.Cells(.Rows.Count, 1).End(xlUp), .Cells(1, 3)).Address .Range(str).Copy Sheet2.Cells(1, 1) End With Application.CutCopyMode = False With Sheet2 .Activate Dim str2 As String str2 = .Range(str).Offset(1).Resize(.Range(str).Rows.Count - 1).Address .Range(str2).Value = Evaluate("if(" & str2 & "="""",-1E+99," & str2 & ")") .Sort.SortFields.Clear .Sort.SortFields.Add .Range(str).Offset(1).Resize(, 1), 0, 1, , 0 .Sort.SortFields.Add .Range(str).Offset(1, 1).Resize(, 1), 0, 1, , 0 .Sort.SortFields.Add .Range(str).Offset(1, 2).Resize(, 1), 0, 1, , 0 .Sort.SetRange .Range(str).Offset(1) .Sort.Header = 2 .Sort.Apply .Range(str2).Value = Evaluate("if(" & str2 & "=-1E+99,""""," & str2 & ")") Dim val As Variant, i As Long, rng2 As Range val = .Range(str).Value Set rng2 = .Range(str).Offset(.Range(str).Rows.Count).Resize(1) For i = 3 To UBound(val) If val(i - 1, 1) = val(i, 1) And val(i - 1, 2) = val(i, 2) And val(i - 1, 3) = val(i, 3) Then Set rng2 = Union(rng2, .Range(str).Rows(i)) Next i = .Range(str).Rows.Count - rng2.Rows.Count rng2.EntireRow.Delete xlShiftUp With .Range(str).Offset(1).Resize(i - 1, 1) .Value = Evaluate("if(" & .Address & "=" & .Offset(-1).Address & ",""""," & .Address & ")") With .Offset(, 1) .Value = Evaluate("if((" & .Address & "=" & .Offset(-1).Address & ")*(" & .Offset(, -1).Address & "=""""),""""," & .Address & ")") End With End With End With End Sub 

Done by phone, may contain errors!
Now a lot has changed, copy all the code and check it again.

EDIT

Well, a completely different solution. Should be quick, but may not be very clear on how it works: P

 Sub Macro2() Dim inVal As Variant, outVal() As Variant, orderArr() As Variant Dim startRng As Range Dim i As Long, j As Long, k As Long, iCount As Long Set startRng = Sheet1.Range("A2:C2") 'upmost row-range of the range to be copied (exclude headers!) With startRng.Parent inVal = .Range(startRng, .Cells(.Rows.Count, startRng.Column).End(xlUp)).Value End With ReDim orderArr(1 To UBound(inVal)) For i = 1 To UBound(inVal) iCount = 1 For j = 1 To UBound(inVal) For k = 1 To UBound(inVal, 2) If StrComp(inVal(i, k), inVal(j, k), 1) = 1 Then iCount = iCount + 1 If StrComp(inVal(i, k), inVal(j, k), 1) <> 0 Then Exit For Next Next orderArr(i) = iCount Next k = 1 ReDim outVal(1 To UBound(inVal, 2), 1 To UBound(inVal)) For i = 0 To Application.Max(orderArr) If IsNumeric(Application.Match(i, orderArr, 0)) Then iCount = Application.Match(i, orderArr, 0) For j = 1 To UBound(inVal, 2) outVal(j, k) = inVal(iCount, j) Next k = k + 1 End If Next ReDim Preserve outVal(1 To UBound(inVal, 2), 1 To k - 1) For i = 1 To UBound(outVal) For j = UBound(outVal, 2) To 2 Step -1 If outVal(i, j - 1) = outVal(i, j) Then If i = 1 Then outVal(i, j) = "" ElseIf outVal(i - 1, j) = "" Then outVal(i, j) = "" End If End If Next Next 'upper left cell of the output-range Sheet2.Range("A2").Resize(UBound(outVal, 2), UBound(outVal)).Value = Application.Transpose(outVal) End Sub 

Remember to set the initial range ( Sheet1.Range("A2:C2") ) to Selection , and then just select the range and run the macro. Works with any size (while VERY large ranges can freeze for some time).

As always: if you have questions, just ask :)

+2
source

One effective solution could be:

  • Fisrt copy values โ€‹โ€‹using Range.Copy
  • Then sort the rows using Range.Sort
  • Then remove duplicate rows using Range.RemoveDuplicates
  • Finally, remove duplicate branches using a loop

This procedure removes duplicate rows and formats them as a tree view:

 Sub RemoveDuplicates() Dim rgSource As Range, rgTarget As Range, data(), r&, c& ' define the source, the target and the number of columns Const columnCount = 3 Set rgSource = Range("Sheet1!A3") Set rgTarget = Range("Sheet1!F3") ' copy the values to the targeted range Set rgSource = rgSource.Resize(rgSource.End(xlDown).Row - rgSource.Row + 1, columnCount) Set rgTarget = rgTarget.Resize(rgSource.Rows.Count, columnCount) rgSource.Copy rgTarget ' sort the rows on each column For c = columnCount To 1 Step -1 rgTarget.Sort rgTarget.Columns(c) Next ' build the array of columns for RemoveDuplicates Dim rdColumns(0 To columnCount - 1) For c = 1 To columnCount: rdColumns(c - 1) = c: Next ' remove the duplicated rows rgTarget.RemoveDuplicates rdColumns Set rgTarget = rgTarget.Resize(rgTarget.End(xlDown).Row - rgTarget.Row + 1, columnCount) ' format as a tree view by removing the duplicated branches data = rgTarget.Value For r = UBound(data) To 2 Step -1 For c = 1 To columnCount - 1 If data(r, c) <> data(r - 1, c) Then Exit For data(r, c) = Empty Next Next rgTarget.Value = data End Sub 
+2
source

If you want to create a unique list, use the Dictionary object .

Be sure to add a link to the Runtime Scripting controls! Just some kind of quick and dirty code (as well as completely untested) based on your sample data:

 Sub GetUniques() Dim oDic as New Dictionary Dim r as Integer Dim strKey as String Dim varValue(2) as Variant 'Get a unique list of Column A values r = 3 'Your data starts on row 3 Do While Cells(r,1).value <> "" 'Run until the first blank line strKey = Cells(r,1).value varValue(0) = Cells(r,2).Value varValue(1) = Cells(r,3).Value If Not oDic.Exists(strKey) Then oDic.Add strKey, varValue End If r = r +1 Loop 'Now display your list of unique values Dim K as Variant Dim myArray as Variant r = 3 'We'll start on row 3 again but move over to column I (9) For Each K in oDic.Keys Cells(r,9).Value = K myArray = oDic.Item(K) Cells(r,10).Value = myArray(0) Cells(r,11).Value = myArray(1) r = r + 1 Next K End Sub 
+1
source

All Articles