Shuffling sparse sorted arrays

Question

Shuffling sparse sorted arrays

I have a list of event lists. Events always occur in a specific order, but not every event always occurs. Here is an example input:

[[ do, re, fa, ti ], [ do, re, mi ], [ do, la, ti, za ], [ mi, fa ], [ re, so, za ]]

Input values do not have a built-in order. These are actually messages such as "creating symbolic links" and "reindexing the search." They are sorted in a separate list, but there is no way to look only at “fa” in the first list and “mi” in the second and determine what happens before the other.

I would like to be able to accept this input and create a sorted list of all events:

 [ do, re, mi, fa, so, la, ti, za ]

or better yet, some information about each event, for example count:

 [ [do, 3], [re, 3], [mi, 2], [fa, 2], [so, 1], [la, 1], [ti, 1], [za, 2] ]

Is there a name for what I'm doing? Are there any accepted algorithms? I write this in Perl if that matters, but the pseudo code will do.

I know that, given my input example, I probably cannot guarantee the "correct" order. But my real contribution has more point data, and I am sure that with some trick he will be 95% right (this is really all I need). I just don't want to reinvent the wheel if I don't need it.

+7

list merge perl pseudocode

Plutor Jul 9 '10 at 18:32

source share

10 answers

Theoretically, let me suggest the following algorithm:

Create a directed graph.
For each input [X, Y, Z], create edges X-> Y and Y-> Z, if they do not already exist.
Perform a topological sort on the graph.
Voila!

PS
This assumes that all events occur in a specific order (always!). If this is not the case, the problem becomes NP-Complete.

PPS
And just so that you have something useful: Sort :: Topological (I don’t know if this works, but it seems correct)

+3

adamk Jul 9 '10 at 18:48

source share

If you are not using a lot of code, you can use the unix tsort command-line tsort :

 $ tsort - do re re fa fa ti do re re mi do la la ti ti za mi fa re so so za

What is the list of all pairs in your input example. This produces as output:

 do la re so mi fa ti za

which you basically need.

+2

adamk Jul 9 '10 at 20:06

source share

Use a hash to aggregate.

 my $notes= [[qw(do re fa ti)], [qw(do re mi)], [qw(do la ti za)], [qw(mi fa)], [qw(re so za)]]; my %out; foreach my $list (@$notes) { $out{$_}++ foreach @$list; } print "$_: $out{$_}\n" foreach sort keys %out;

Productivity

 do: 3 fa: 2 la: 1 mi: 2 re: 3 so: 1 ti: 2 za: 2

The% out hash is easily converted to a list if that is what you want.

 my @newout; push @newout,[$_,$out{$_}] foreach sort keys %out;

+1

user719252 Apr 21 '11 at 15:54

source share

 perl -de 0 DB<1> @a = ( ['a','b','c'], ['c','f'], ['h'] ) DB<2> map { @m{@{$_}} = @$_ } @a DB<3> p keys %m chabf

A quick shortcut that I can think of. In any case, you must go through things at least once ...

0

eruciform Jul 9 '10 at 18:42

source share

This is the perfect candidate for sorting Merge. Go to the wikipedia page here for a pretty good presentation of the http://en.wikipedia.org/wiki/Merge_sort algorithm

What you described is actually a subset / small merge sort setting. Instead of starting with an unsorted array, you have a set of sorted arrays that you want to combine together. Just call the "merge" function, as described on the wikipedia page, on the pairs of your arrays and the results of the merge function until you select one array (which will be sorted).

To configure the output the way you want, you need to define a comparison function that can be returned if one event is less than, equal to, or more than another event. Then, when your merge function detects two identical events, you can collapse them into one event and save the score for that event.

0

J teller Jul 9 '10 at 18:45

source share

Roughly speaking, the name I would give him is "hashing." You put things in name value pairs. If you want to preserve some semblance of order, you must supplement the hash with an array that preserves the order. This order is a “clash” for me.

 use strict; use warnings; my $all = [[ 'do', 're', 'fa', 'ti' ], [ 'do', 're', 'mi' ], [ 'do', 'la', 'ti', 'za' ], [ 'mi', 'fa' ], [ 're', 'so', 'za' ] ]; my ( @order, %counts ); foreach my $list ( @$all ) { foreach my $item ( @$list ) { my $ref = \$counts{$item}; # autovivs to an *assignable* scalar. push @order, $item unless $$ref; $$ref++; } } foreach my $key ( @order ) { print "$key: $counts{$key}\n"; } # do: 3 # re: 3 # fa: 2 # ti: 2 # mi: 2 # la: 1 # za: 2 # so: 1

There are other answers like this one, but mine contains this neat auto-output trick.

0

Axeman Jul 9 '10 at 19:31

source share

I'm not sure if this will be caused, but I figured out a way to find the order given by the array of arrays as input. Essentially pseudo code:

10 Find the earliest element in all arrays
20 Paste this into the list
30 Remove this item from all arrays
40 Go to 10 if there are any remaining items

Here's a working prototype:

 #!/usr/bin/perl use strict; sub InList { my ($x, @list) = @_; for (@list) { return 1 if $x eq $_; } return 0; } sub Earliest { my @lists = @_; my $earliest; for (@lists) { if (@$_) { if (!$earliest || ($_->[0] ne $earliest && InList($earliest, @$_))) { $earliest = $_->[0]; } } } return $earliest; } sub Remove { my ($x, @lists) = @_; for (@lists) { my $n = 0; while ($n < @$_) { if ($_->[$n] eq $x) { splice(@$_,$n,1); } else { $n++ } } } } my $list = [ [ 'do', 're', 'fa', 'ti' ], [ 'do', 're', 'mi' ], [ 'do', 'la', 'ti', 'za' ], [ 'mi', 'fa' ], [ 're', 'so', 'za' ] ]; my @items; while (my $earliest = Earliest(@$list)) { push @items, $earliest; Remove($earliest, @$list); } print join(',', @items);

Output:

re, mi, fa, la, ty, tak, za

0

Bizangles Jul 9 '10 at 20:42

source share

Decision:

This solves the original question before it is changed by demand.

 #!/usr/local/bin/perl -w use strict; main(); sub main{ # Changed your 3-dimensional array to a 2-dimensional array my @old = ( [ 'do', 're', 'fa', 'ti' ], [ 'do', 're', 'mi' ], [ 'do', 'la', 'ti', 'za' ], [ 'mi', 'fa' ], [ 're', 'so', 'za' ] ); my %new; foreach my $row (0.. $#old ){ # loop through each record (row) foreach my $col (0..$#{$old[$row]} ){ # loop through each element (col) $new{ ${$old[$row]}[$col] }{count}++; push @{ $new{${$old[$row]}[$col]}{position} } , [$row,$col]; } } foreach my $key (sort keys %new){ print "$key : $new{$key} " , "\n"; # notice each value is a hash that we use for properties } }

How to get information:

  local $" = ', '; # pretty print ($") of array in quotes print $new{za}{count} , "\n"; # 2 - how many there were print "@{$new{za}{position}[1]} \n"; # 4,2 - position of the second occurrence # remember it starts at 0

Basically, we create a unique list of items in the hash. For each of these elements, we have a hash property that contains a scalar count and an array for position . The number of elements in the array should vary depending on how many occurrences of the element were in the original.

The scalar property is really not required, since you can always take the scalar of the position array to get the same number. Note. If you ever add / remove elements from the count and position array, they will not be correlated by their value.

example: print scalar @{$new{za}{position}}; will give you the same as print $new{za}{count};

0

vol7ron Jul 9 '10 at 21:20

source share

Just realized that your question says that they are not a predefined order, so this may not be relevant.

Perl Code:

 $list = [ ['do', 're', 'fa', 'ti' ], ['do', 're', 'mi' ], ['do', 'la', 'ti', 'za' ], ['mi', 'fa' ], ['re', 'so', 'za' ] ]; %sid = map{($_,$n++)}qw/do re mi fa so la ti za/; map{map{$k{$_}++}@$_}@$list; push @$result,[$_,$k{$_}] for sort{$sid{$a}<=>$sid{$b}}keys%k; print "[@$_]\n" for(@$result);

exit:

 [do 3] [re 3] [mi 2] [fa 2] [so 1] [la 1] [ti 2] [za 2]

0

pdehaan Jul 10 '10 at 15:32

source share

Greg bacon · Accepted Answer · 2010-07-09T20:22:41+0000

You can use tsort to output a reasonable, though not necessarily unique sort order (known as topological order ) from the order you observed. You may be interested in reading tsort original usage , which is similar in structure to your problem.

Note that tsort requires an acyclic graph. In terms of your example, this means that you could not see that they were followed by re in one sequence, and after repetition in another.

 #! /usr/bin/perl use warnings; use strict; use IPC::Open2; sub tsort { my($events) = @_; my $pid = open2 my $out, my $in, "tsort"; foreach my $group (@$events) { foreach my $i (0 .. $#$group - 1) { print $in map "@$group[$i,$_]\n", $i+1 .. $#$group; } } close $in or warn "$0: close: $!"; chomp(my @order = <$out>); my %order = map +(shift @order => $_), 0 .. $#order; wantarray ? %order : \%order; }

Since you described the data as sparse, the above code provides tsort as much information as possible about the event adjacency matrix.

With this information, calculating the histogram and sorting its components is simple:

 my $events = [ ... ]; my %order = tsort $events; my %seen; do { ++$seen{$_} for @$_ } for @$events; my @counts; foreach my $event (sort { $order{$a} <=> $order{$b} } keys %seen) { push @counts => [ $event, $seen{$event} ]; print "[ $counts[-1][0], $counts[-1][1] ]\n"; }

To enter in your question that you indicated, exit

  [do, 3]
 [la, 1]
 [re, 3]
 [so, 1]
 [mi, 2]
 [fa, 2]
 [ti, 2]
 [za, 2]

This looks funny because we know the order of solfège, but re and la are not comparable in the partial order defined by $events : we only know that they should come after that.

Shuffling sparse sorted arrays

Decision:

How to get information:

More articles: