How to compare numbers in one list (for example, 2 and 3) with an approximate amount in another list (for example, 5)?

I am trying to correlate some audio files with some written passages of text.

I started with one audio file that someone read the dialed passage. Then I separate the audio files in each silence period with soxand similarly separate the type text so that each unique sentence is on a unique line.

Separations did not happen perfectly in every period, but whenever the speaker stopped. I need to create a list of those audio files that match the entered sentences, for example:

0001.wav This is a sentence.
0002.wav This is another sentence.

Please note that sometimes two or more audio files correspond to one sentence, for example:

  • 0001.wav ("this is") + 0002.wav ("offer") = "This offer."

To help with text matching, I used software for counting syllables in audio and counting syllables in typed text.

I have two files with this data. The first, "sentences.txt", is a list of all sentences from text presented on one line, with their number of syllables, for example:

5 This is a sentence.
7 This is another sentence.
8 This is yet another sentence.
9 This is still yet another sentence.

I can delete the offer data using awk -f" " { print $1 } sentences.txtto have this syllables_in_text.txt:

5
7
8
9

syllables_in_audio.txt . , :

0001.wav 3
0002.wav 2
0003.wav 4
0004.wav 5
0005.wav 7
0006.wav 3
0007.wav 2
0008.wav 3

( "output.txt" ) , , "sentences.txt" , :

0001.wav 0002.wav
0003.wav 0004.wav
0005.wav
0006.wav 0007.wav 0009.wav

, , , , . "0001.wav" "0002.wav" , " ". 1 "output.txt", "sentences.txt" :

Contents of "output.txt":    | Contents of "sentences.txt":
0001.wav 0002.wav            | 5 This is a sentence.
0003.wav 0004.wav            | 7 This is another sentence.
0005.wav                     | 8 This is yet another sentence.
0006.wav 0007.wav 0009.wav   | 9 This is still yet another sentence.
+4
2

awk script . :

BEGIN { 
        init counter=1
        read your first file (syllables_in_text.txt) with getline till the end (while...)
            store its value in firstfile[counter]
            counter++
        # when you had finished reading your first file
        init another_counter=1
        read your second file (syllables_in_audio.txt) with getline till the end (while...)
            if $2 (second col from your file) <= firstfile[another_counter]
                 store $1 like o[another_counter]=" " $1 
               else
                 another_counter++  
                 store $1 like o[another_counter]=" " $1
        finally loop over the o array after sorint it
            print its contents after removing the leading space
}

...

+1

, (2 3) (5)? ​​

, , .

$ cat sentences.txt
5 This is a sentence.
7 This is another sentence.
8 This is yet another sentence.
9 This is still yet another sentence.

$ cat syllables_in_audio.txt
0001.wav 5
0002.wav 5
0003.wav 7
0004.wav 7
0005.wav 8
0006.wav 9
0007.wav 9
0008.wav 9

, , awk, :

awk 'NR==FNR{a[$1]=$0;next}{b[$2]=b[$2]==""?$1:b[$2] FS $1}END{for (i in a) printf "%-40s|%s\n", b[i], a[i]}' sentences.txt syllables_in_audio.txt

0001.wav 0002.wav                       |5 This is a sentence.
0003.wav 0004.wav                       |7 This is another sentence.
0005.wav                                |8 This is yet another sentence.
0006.wav 0007.wav 0008.wav              |9 This is still yet another sentence.
+1

All Articles