Substituting substrings from a dictionary on another line: sentences?

Hellow Stack Overflow Overflow. I would like to receive some suggestions regarding the following problem. I am using Java.

I have array # 1 with a row of rows. For example, two lines could be: "An apple fell on Newton’s head" and "Apples grow on trees."

On the other hand, I have another array # 2 with such terms (Fruits => Apple, Orange, Peach; Items => Pen, Book; ...). I would call this array my "dictionary".

Comparing elements from one array to another, I need to see in which category you fall from # 1 from # 2. For example. Both of # 1 came under Fruit.

My most important thing is speed. I need to do these operations quickly. A structure providing a constant search for time would be good.

I read the Hashset using the contains () method, but it does not allow substrings. I also tried to run a regex (apple | orange | peach ... etc) with a case-insensitive flag, but I read that it will not be fast when the terms increase in number (a minimum of 200 is expected). Finally, I searched and consider using ArrayList with indexOf (), but I don't know about its performance. I also need to know which of the terms actually matches, so in this case it will be "Apple."

Please provide your opinions, ideas and suggestions on this issue.

Aho-Corasick, / . , . , , , , .

, Qaru people, !:)

+5
3

Google, , ( , { "Fruits" = > [Apple]}, { "Apple" = > [ "Fruits" ]}. , .

, , ( ) . , .

+3

? O (m), m - , O (n 2) - - , , , . , , BioJava .

, - O (n 2). , , , .

+2

200 , . , , , , , # 1, , , .

, : # 2, , , # 1.

(Regular expressions are compiled into a state machine - that is, on each character of the string that it simply searches for a table for the next state. If the regular expression is complex, you might have a countdown that increases the time, but your regular expression has a very simple structure.)

0
source

All Articles