NLP legal texts?

I have a composition of several 100 thousandth legal documents (mainly from the European Union) - laws, comments, court documents, etc. I am trying algorithmically to give them some idea.

I modeled a known relationship (temporary, this-change-this, etc.). But at the level of one document, I would like for me to have the best tools to quickly understand. I am open to ideas, but here is a more specific question:

For example: are there NLP methods for defining relevant / conflicting parts of documents as opposed to a template? Recently leaked TTIP documents are thousands of pages of data tables, but one sentence somewhere out there could destroy the industry.

I have played with google new Parsey McParface and other NLP solutions in the past, but as long as they work impressively, I'm not sure how good they are in isolation.

+6
source share
3 answers

To understand the meaning of documents, you need to perform some kind of semantic analysis. You have two main possibilities with their examples:

Use frame semantics: http://www.cs.cmu.edu/~ark/SEMAFOR/

Using semantic role marking (SRL): http://cogcomp.org/page/demo_view/srl

Once you can extract information from documents, you can apply some post-processing to determine which information is relevant. Finding relevant information is related to the task, and I don’t think you can find a common tool that extracts the “relevant” information.

+3
source

I see that you have interesting information. You also mentioned the presence of a case (which is a really good plus). Let me tell you about the solution that I sketched to extract the essence from research.

To understand the meaning of documents, you need triggers to tell (or train) the computer to look for these triggers. You can get closer to this with a controlled learning algorithm with a simple implementation of the text classification problem at the most basic level. But this will require preliminary work, first the help of domain experts to determine the “triggers” from text data. There are tools for extracting sentence entities — for example, accepting noun phrases in a sentence, assigning weights based on matches, and representing them as vectors. This is your training data. This can be a really good start to include NLP in your domain.

+1
source

Do not use triggers. What you need is awareness of the meaning of the word and adaptation of the domain. You want to understand whether it is in the documents. I understand semantics in order to understand the meaning. You can create a legal ontology of terms in skos or json-ld format so that they are displayed ontologically in the knowledge graph and use it with dependency analysis, for example, with the tensor stream / parseymcparseface. Or you can transfer your documents using kappa-based architecture - something like kafka-flink-elasticsearch with added intermediate NLP layers using CoreNLP / Tensorflow / UIMA, cache the index setting between flink and elasticsearch using redis to speed up the process. To understand relevance, you can apply specific cases from a boost in your search. Also, apply mood analysis to develop intentions and truthfulness. Your use case is one of the means of extracting information, summarizing and semantic web data. Since the EU has a different legal system, you first need to generalize what really is a legal document, and then narrow it down to specific legal concepts, since they relate to a topic or region. You can also use topic modeling techniques from LDA or Word2Vec / Sense2Vec here. In addition, Lemon can also help transform vocabulary into semantics and semantics into vocabulary. NLP-> ontology → ontology → NLP. Essentially, bring clustering into your classification of a recognized name. You can also use clustering to help you build an ontology or see which vector vectors are in a document or set of documents, using the similarity to cosine. But in order to do everything that is best to display the phrase of your documents. Something like sensible common sense + deep learning can help in your case.

-4
source

All Articles