Comparing two English lines for similarities

So here is my problem. I have two paragraphs of text and I need to see if they are similar. Not in the sense of string metrics, but in the sense. The following two paragraphs are related to each other, but I need to find out if they cover the same topic. Any help or direction to solve this problem would be greatly appreciated.

Fossil fuels are fuels formed by natural processes such as anaerobic decomposition of buried dead organisms. Organisms and their fossil fuels are typically millions of years old, and sometimes more than 650 million years old. Fossil fuels that contain high percentages of carbon include coal, oil, and natural gas. Fossil fuels range from low-carbon volatile materials: hydrogen such as methane, to liquid petroleum to non-volatile substances consisting of almost pure carbon, such as anthracite coal. Methane can only be found in hydrocarbon fields associated with oil, or in the form of methanate clathrates. It is generally accepted that they formed from the petrified remains of dead plants under the influence of heat and pressure in the earth's crust for millions of years.This biogenic theory was first introduced by Georg Agricola in 1556, and then Mikhail Lomonosov in the 18th century.

Secondly:

Fossil fuel reform is a method of producing hydrogen or other useful products from fossil fuels, such as natural gas. This is achieved in a processing device called a reformer that reacts with steam at high temperatures with fossil fuels. A steam methane reformer is widely used in industry for the production of hydrogen. There is also interest in developing much smaller units based on similar technology to produce hydrogen as a raw material for fuel cells. Small steam reforming units for the supply of fuel cells is currently the subject of research and development, usually methanol or natural gas, but other fuels are also considered as propane, gasoline, autogas, diesel and ethanol.

+5
3

. , " ". NLP - - , Wikipedia Text Analytics "" .

, , , .

+5

, , . - , , , - ​​ .

, , . , , , , , .

, !

+3

Latent Dirichlet Allocation (LDA) . , ( ), "". , /.

If you run LDA in your paragraph collection, then by looking at the similarity of the vector of hidden topics, you can find out if these two paragraphs are related or not.

Of course, the baseline is to not use LDA and instead use the term frequency (supplemented by tf / idf) to measure similarity (vector space model).

+2
source

All Articles