I have a series of text items- raw HTML from a MySQL database. I want to find the most common phrases in these entries (not the only most common phrase, and ideally, without matching word for word).
My example is any review on Yelp.com that shows 3 fragments of hundreds of reviews about a restaurant in the format:
Try Hamburger (at 44 reviews)
for example, the Highlights section of this page:
http://www.yelp.com/biz/sushi-gen-los-angeles/
I have NLTK installed, and I played around with it a bit, but honestly, itβs overloaded with options. This seems like a fairly common problem, and I could not find a direct solution by doing a search here.
nlp nltk text-extraction text-analysis
arronsky Mar 16 2018-10-10T00: 00Z
source share