NLP for classifying / marking the contents of a sentence (mandatory binding Ruby)

I am analyzing several million letters. My goal is to be able to classify then into groups. Groups can be, for example:

  • Delivery problems (slow delivery, slow processing before shipping, incorrect availability information, etc.).
  • Problems with customer service (slow e-mail response time, intrusive response, etc.).
  • Return problems (slow processing of the return request, lack of support from customer service, etc.).
  • Pricing complaint (hidden fee, etc.)

To perform this classification, I need an NLP that can recognize a combination of word groups such as:

  • "[they | company | company | website | merchant]
  • "[did not | not | no]"
  • "[answer | reply | reply | answer]"
  • "[until the next day | fast enough | in general]"
  • and etc.

Some of the above examples of groups in combination should then correspond to sentences, for example:

  • "They did not answer"
  • "They didn't answer at all."
  • "There was no answer at all."
  • "I did not receive a response from the website"

And then classify the offer as Customer Service Issues .

Which NLP can handle this challenge? From what I read, they are most relevant:

  • Stanford CoreNLP
  • Opennlp

Check also these proposed NLP .

+4
2

OpenNLP doccat api, , . - , .

, :

customerserviceproblems They did not respond
customerserviceproblems They didn't respond 
customerserviceproblems They didn't respond at all
customerserviceproblems They did not respond at all
customerserviceproblems I received no response from the website
customerserviceproblems I did not receive response from the website

etc.... , \n newline

, , , " ", , ,

java

DoccatModel model = null;
    InputStream dataIn = new FileInputStream(yourFileOfSamplesLikeAbove);
    try {

      ObjectStream<String> lineStream =  
              new PlainTextByLineStream(dataIn, "UTF-8");

      ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
      model = DocumentCategorizerME.train("en", sampleStream);
      OutputStream modelOut = new BufferedOutputStream(new FileOutputStream(modelOutFile));
      model.serialize(modelOut);
      System.out.println("Model complete!");
    } catch (IOException e) {
      // Failed to read or parse training data, training failed
      e.printStackTrace();
    }

, :

DocumentCategorizerME documentCategorizerME;
  DoccatModel doccatModel; 

doccatModel = new DoccatModel(new File(pathToModelYouJustMade));
   documentCategorizerME = new DocumentCategorizerME(doccatModel);
 /**
 * returns a map of a category to a score
 * @param text
 * @return
 * @throws Exception 
 */
  private Map<String, Double> getScore(String text) throws Exception {
    Map<String, Double> scoreMap = new HashMap<>();
    double[] categorize = documentCategorizerME.categorize(text);
    int catSize = documentCategorizerME.getNumberOfCategories();
    for (int i = 0; i < catSize; i++) {
      String category = documentCategorizerME.getCategory(i);
      scoreMap.put(category, categorize[documentCategorizerME.getIndex(category)]);
    }
    return scoreMap;

  }

, , , , , .

+3

, :

  • , , . , .

    , , , . , , .

  • , , - . , , - lex-yacc. - SO this. , , , , .

    , , :

    {organization}{negative}{verb} :- delivery problems
    

    organization = [they|the company|the firm|the website|the merchant] ..

.

+2

All Articles