Oh. The classic textbook. My copy is a bit outdated, but it looks like my section 1.2.4 discusses the same topics as yours.
Firstly, this is an introductory chapter that tries to be general and not intimidating, but as a result it is also very abstract and a bit vague. At this point, I would not worry too much about the fact that you do not understand the concepts; most likely, you overdid it. Subsequent chapters will outline what seems obscure now.
Value in this context should be understood as a measure of the quality or performance of a particular state or instance, and not as a โvalueโ as a whole. Using an example of his checkers, a high-value state is a situation with a board that is good / beneficial for a computer player.
The basic idea is that if you can provide all the possible states that may occur with a value, and there is a set of rules that determine which states can be reached from the current state by performing actions, then you can make an informed decision about what actions to take.
But assigning values โโto states is only a trivial task for the final states of a game. The value achieved in the final state is often called reward. The goal is to maximize reward. Assessment of training values โโrefers to the process of assigning predicted values โโto intermediate states based on results obtained later in the game.
So, while playing many training games, you keep track of which you encounter, and if you find that some state X leads to state Y, you can change the estimated value of X bits based on the current score for X and the current score Y This is what "evaluates training weights." With repeated training, the model gains experience and estimates should converge to reliable values. He will begin to avoid moves that lead to defeat, and prefer movements that lead to victory. There are many different ways to make such updates and many different ways to present the state of the game, but this is what the rest of the book says.
Hope this helps!
source share