I would like to do some natural language processing on cooking recipes, in particular on ingredients (maybe later). I am mainly looking to create my own set of POS tags to help me determine the meaning of the line of ingredients.
For example, if one of the ingredients: 3/4 cup (lightly packed) leafy parsley leaves, separated
I would like the tags to display a list of ingredients and quanitity, which is usually a number, followed by a specific unit of measure. For instance:
3 \ NUM-QTY / \ FRACTION4 \ NUM-QTY cup \ N-MEAS (lightly \ ADV packed \ VD) [flat sheet \ ADJ parsley \ N] \ INGREDIENT leaves \ N, split \ VD
Tags that I found here .
I am not sure about some things:
- Do I have to use custom tags or do I need to do some processing after the tags after using a pre-existing tagger?
- If I use custom tags, is the best way to make training text just to list the ingredients and tag everything manually?
I feel that this processing of the language is so specific that it would be useful to train the tagger on the applicable set, but I'm not quite sure how to proceed.
Thank!
source
share