In Perl, this requires a single line using the List::Util library, which is basic and highly optimized:
my $newpaths = join ';', uniq split /;/, $paths;
How it works? split will create a list of paths that separate the character ; ; uniq will ensure that there are no repetitions; join will again create a string of paths separated by a character ; .
If the path case is not important, then:
my $newpaths = join ';', uniq split /;/, lc $paths;
A complete program may be:
use strict; use warnings; use List::Util qw( uniq ); my $paths = 'C:\Users\user\Desktop\TESTING\path1;C:\Users\user\Desktop\TESTING\path5;C:\Users\user\Desktop\TESTING\path1;C:\Users\user\Desktop\TESTING\path6;C:\Users\user\Desktop\TESTING\path1;C:\Users\user\Desktop\TESTING\path3;C:\Users\user\Desktop\TESTING\path1;C:\Users\user\Desktop\TESTING\path3;'; my $newpaths = join ';', uniq split /;/, $paths; print $newpaths, "\n";
To do something interesting, give time to this decision against the one using a temporary hash. This is a synchronization program:
use strict; use warnings; use List::Util qw( uniq ); use Time::HiRes qw( time ); my @p; for( my $i = 0; $i < 1000000; $i++ ) { push @p, 'C:\This\is\a\random\path' . int(rand(250000)); } my $paths = join ';', @p; my $t = time(); my $newpaths = join ';', uniq split /;/, $paths; $t = time() - $t; print 'Time with uniq: ', $t, "\n"; $t = time(); my %temp = map { $_ => 1 } split /;/, $paths; $newpaths = join ';', keys %temp; $t = time() - $t; print 'Time with temporaty hash: ', $t, "\n";
It generates 1 million random paths, which should have a 5: 1 duplicate ratio (5 duplicates of each path). The time for the server on which I tested this:
Time with uniq: 0.849196910858154 Time with temporaty hash: 1.29486703872681
Which makes the uniq library faster than a temporary hash. With 100: 1 duplicates:
Time with uniq: 0.526581048965454 Time with temporaty hash: 0.823433876037598
With 10000: 1 duplicates:
Time with uniq: 0.423808097839355 Time with temporaty hash: 0.736939907073975
Both algorithms work less than the more duplicates found. uniq performs consistently better when duplicates increase.
Feel free to play with random generator numbers.