: sort u - but only on one column in csv?

I am looking for a more specific version of the command :sort u , which will remove all duplicate lines from the file. I am working with csv and I want to delete all rows where the second column entry has a duplicate. An example should help clarify:

 a,1,b g,1,f c,1,x i,2,l m,1,k o,2,p u,1,z 
Team

sort should return:

 a,1,b i,2,l 

note: certain rows that are stored are not important if, after sorting, the entries of the 2nd column are unique.

Which vim command will output above?

Thanks!

+8
vim
source share
3 answers

Since it is impossible to carry out the conversion into question in one run of the command :sort , consider it as a two-stage process.

The first step is to sort the rows by values ​​separated by commas in the column. To do this, we can use the command :sort , passing through the regular expression, which corresponds to the first column and the next separating comma. Since :sort compares text starting immediately after the specified pattern matches on each line, it gives us the desired sort order.

 :sort/^[^,]*,/ 

To compare values, not lexicographically, use the n flag:

 :sort n/^[^,]*,/ 

The second step involves executing the sorted rows and deleting all but one of those with the same value in the second column. it is convenient to build our implementation with the command :global , which executes the given Ex command in lines corresponding to a specific pattern. By definition, a row can be deleted if it contains the same value in the second column as the next row. This formalization (accompanied by an initial assumption that these commas cannot occur in column values) gives us the following pattern:

 ^[^,]*,\([^,]*\),.*\n[^,]*,\1,.* 

So, if we run the command :delete on each row that satisfies this pattern, from top to bottom, we will only have one row for each individual value in the second column.

 :g/^[^,]*,\([^,]*\),.*\n[^,]*,\1,.*/d_ 

Both of these steps can be combined into one Ex command,

 :sort/^[^,]*,/|g/^[^,]*,\([^,]*\),.*\n[^,]*,\1,.*/d_ 
+10
source share
 :sort /\([^,]*,\)\{1}/ :g/\%(\%([^,]*,\)\{1}\1.*\n\)\@<=\%([^,]*,\)\{1}\([^,]*\)/d 

first sort by column with index 1. the second match with any row of the column index column of column 1 corresponds to the column of the next column 1 and deletes it.

column index is 1 in {1} . it is repeated 3 times.

+1
source share

using the second column

 (visual + !sort) 

using the third column

 sort -k 3 

or

 :sort /.*\%3v/ 

Or

 select the lines you wish to sort using the Capital V command. Then enter !sort -k 3n 

or skip the first two words on each line and do the following:

 :%sort /^\S\+\s\+\S\+\s\+/ 

or

sort by last column

 :%sort /\<\S\+\>$/ r 

OR USE OTHER PROGRAMS, such as MS OFFICE or OPENOFFICE

0
source share

All Articles