We were assigned a new project in my class of data structures - creating text with chains of marks.
Overview
Given the input text file, we create an initial seed of length n characters. We add this to our output line and select our next character based on frequency analysis.
This is a cat, and there are two dogs.
Initial seed: "Th" Possible next letters -- i, e, e Therefore, probability of choosing i is 1/3, e is 2/3. Now, say we choose i. We add "i" to the output string. Then our seed becomes hi and the process continues.
My decision
I have 3 classes, Node, ConcreteTrie and Driver
Of course, the ConcreteTrie class is not a Trie of traditional meaning. Here's how it works:
Given a sentence with k = 2:
This is a cat, and there are two dogs.
I generate nodes Th, hi, is, ... + ..., gs, s. Each of these nodes has children, which are the letter that follows them. For example, Node Th will have children of i and e. I maintain calculations in each of these nodes so that I can then generate probabilities of choosing the next letter.
My question is:
First of all, what is the most effective way to complete this project? My decision seems very quick, but I really want to bring down my professors. (In my last project. Changing the Change Distance task, I did A *, the genetic algorithm, BFS and Simulated Annealing - and I know that the problem is NP-Hard)
Secondly, what is the point of this assignment? In fact, this does not look like much of what we covered in the class. What should we learn?