I have two multiline lines. I use the following code to determine the similarities between the two of them. It uses the Levenshtein distance algorithm.
public static double similarity(String s1, String s2) { String longer = s1, shorter = s2; if (s1.length() < s2.length()) { longer = s2; shorter = s1; } int longerLength = longer.length(); if (longerLength == 0) { return 1.0; } return (longerLength - editDistance(longer, shorter)) / (double) longerLength; } public static int editDistance(String s1, String s2) { s1 = s1.toLowerCase(); s2 = s2.toLowerCase(); int[] costs = new int[s2.length() + 1]; for (int i = 0; i <= s1.length(); i++) { int lastValue = i; for (int j = 0; j <= s2.length(); j++) { if (i == 0) costs[j] = j; else { if (j > 0) { int newValue = costs[j - 1]; if (s1.charAt(i - 1) != s2.charAt(j - 1)) newValue = Math.min(Math.min(newValue, lastValue), costs[j]) + 1; costs[j - 1] = lastValue; lastValue = newValue; } } } if (i > 0) costs[s2.length()] = lastValue; } return costs[s2.length()]; }
But the above code does not work as expected.
For example, let's say that we have the following two lines: s1 and s2 ,
S1 β How do we optimize the performance? . What should we do to compare both strings to find the percentage of similarity between both? How do we optimize the performance? . What should we do to compare both strings to find the percentage of similarity between both?
S2-> How do we optimize tje performance? What should we do to compare both strings to find the percentage of similarity between both? How do we optimize tje performance? What should we do to compare both strings to find the percentage of similarity between both?
Then I pass the above line to the similarity method, but it does not find the exact percent difference. How to optimize the algorithm?
Below is my main method
Update
public static boolean authQuestion(String question) throws SQLException{ boolean isQuestionAvailable = false; Connection dbCon = null; try { dbCon = MyResource.getConnection(); String query = "SELECT * FROM WORDBANK where WORD ~* ?;"; PreparedStatement checkStmt = dbCon.prepareStatement(query); checkStmt.setString(1, question); ResultSet rs = checkStmt.executeQuery(); while (rs.next()) { double re=similarity( rs.getString("question"), question); if(re > 0.6){ isQuestionAvailable = true; }else { isQuestionAvailable = false; } } } catch (URISyntaxException e1) { e1.printStackTrace(); } catch (SQLException sqle) { sqle.printStackTrace(); } catch (Exception e) { if (dbCon != null) dbCon.close(); } finally { if (dbCon != null) dbCon.close(); } return isQuestionAvailable; }
java algorithm levenshtein distance
Stanly moses
source share