Java: How to think about Markov chain modeling?

I have a program for which I am trying to create a Markov text generator. I plan to split the text at a given interval and then save it in the class. The problem that I don’t know how to solve is how to handle the instance names of the class that I am going to do. I planned to generate instances in a for loop. The user will pass the method a certain amount of text (the length of which is not known in advance). Pseudocode below:

create vector for sets and tail letter; for (int c = 0; c < text.length; c++) { Check to make sure overflow doesnt happen; Create instance of set named c; store set and tailLetter into vector; } public class set { String characters; char tailLetter; } 

Sorry if this is not entirely clear. I teach myself Java, and this is my first post here.

+4
source share
4 answers

If you are learning Java, I would suggest focusing first on modeling problems with Java classes and methods.

The Markov chain is a model or statistical development of the seminal text, right? Using it to model text, he usually describes how often each word is followed by each other word. (usually you would split the text at word boundaries). It looks like he needs a class; it could be called MarkovChain .

In the MarkovChain class, you need something to hold on to every word that appears in the text and maps that word to other words in the text, as well as the number of other words.

Suppose the word "and". In the text “and” follows “four”, and “then” - three times. So you will need some data structure to hold something like this:

  and --> the (4) then (3) 

One way to do this is to use an ArrayList to store all the words, and then Map<T1,T2> , which contains the relationship between the words and the frequency of the next words. In this case, T1 is probably a string, and T2 is probably an ArrayList pair — a string and a (integer) counter for that string.

But wait, now you don’t need the ArrayList<> base for storing words, because they are just keys on the map.

... and so on. The next step would be to figure out how to populate this data structure. This is probably an internal (private) method that is called when the caller creates the MarkovChain class with the source text.

You probably also want the MarkovChain class to detect another open method that calls when they want to generate some random sequence from the chain, relying on probabilities based on frequency.

...

This is just one way to think about modeling a problem.

In any case, I would like to focus on this modeling / design exercise before writing code.

+4
source

Can't you use Map<String, Set> , where the key is the generated name?

+3
source

You can use ArrayList to manage instances. I like the idea of ​​a map better so you can dynamically set names instead of trying to access instances by index number.

0
source

I do not see the name points:

  • If they are only meant to make set objects have a separate line for debugging, the default implementation of toString() will give you this.

  • If you need to search for these “installed” objects, then a numerical identifier or serial number will work better.

If you explained the purpose of the names and how you intend to use them, perhaps we could give you better advice.

0
source

All Articles