We have some unstructured text data in our data warehouse for applications. I wanted to create a tag cloud of one object with one property in a subset of the data warehouse objects. After inspection, I do not see any structure that will allow me to do this without writing it myself.
I mean:
- Write a map (as in the case of map reduction) to go through each object of a certain type in the data warehouse,
- Divide the text string into words
- For each word increment counter
- Use the final counts to create a tag cloud using third-party software (offline - any suggestions are welcome)
Like I never did before, I wandered if at first there is some kind of frame around that does this for me (please), if I do not approach it in the right way. ie, please feel free to point out the gaping holes in the plan.
source
share