The matrix contains [at the end] the editing distance between all word1 prefixes and all word2 prefixes.
d[i][j] = edit distance between word1[0..(i-1)] and word2[0..(j-1)]
You are interested in d[l1][l2] . In general, to calculate d[i][j] , you need to look at the three smaller neighbors d[i-1][j] , d[i-1][j-1] and d[i][j-1] . Thus, transitively, d[i][j] requires all records where at least one coordinate is less than i resp j (and the other is not greater). The exception is that the two characters word1[i-1] and word2[j-1] are equal, in which case you only need a smaller diagonal neighbor.
If you first filled in the -1 matrix to indicate that the corresponding editing distance between the prefixes was not estimated, and recursively compute d[l1][l2] using the cached value of the required d[i][j] , if it has already been calculated, computing it recursively and preserving its value, if not, some areas of the matrix may remain intact. Probably large areas, if there are many pairs of identical characters [only the diagonal will be evaluated if two words are equal], only small areas, if any, if several pairs of equal characters.
In the general case, most of the matrix is necessary for calculating d[l1][l2] , therefore it is faster to calculate the matrix completely using a simple algorithm than to repeat and calculate only the necessary values.
If you do not store the values of shorter prefixes, since they are necessary for calculating d[i][j] , they must be recalculated for each path d[ia][jb] from d[i][j] . Since d[ia][jb] can be achieved in many respects from d[i][j] , which will lead to a recalculation of the lot , leading to a grossly inefficient algorithm as a whole.
Since the calculation for each line uses only the previous line, you can only do with two arrays of length min{l1, l2} + 1 to save some memory, but if the words are not very long, it does not matter much, and the code is simpler with full array.