Search Algorithm¶
-
class
AlphaZero.search.mcts.
MCTreeNode
(parent, prior_prob)¶ Tree Node in MCTS.
-
expand
(policy, value)¶ Expand a leaf node according to the network evaluation. NO visit count is updated in this function, make sure it’s updated externally.
Parameters: - policy – a list of (action, prob) tuples returned by the network
- value – the value of this node returned by the network
Returns: None
-
select
()¶ Select the best child of this node.
Returns: A tuple of (action, next_node) with highest Q(s,a)+U(s,a) Return type: tuple
-
update
(v)¶ Update the three values
Parameters: v – value Returns: None
-
get_selection_value
()¶ Implements PUCT Algorithm’s formula for current node.
Returns: None
-
get_mean_action_value
()¶ Calculates Q(s,a)
Returns: mean action value Return type: real
-
visit
()¶ Increment the visit count.
Returns: None
-
is_leaf
()¶ Checks if it is a leaf node (i.e. no nodes below this have been expanded).
Returns: if the current node is leaf. Return type: bool
-
is_root
()¶ Checks if it is a root node.
Returns: if the current node is root. Return type: bool
-
-
class
AlphaZero.search.mcts.
MCTSearch
(evaluator, game_config, max_playout=1600)¶ Create a Monto Carlo search tree.
-
calc_move
(state, dirichlet=False, prop_exp=True)¶ Calculates the best move
Parameters: - state – current state
- dirichlet – enable Dirichlet noise described in “Self-play” section
- prop_exp – select the final decision proportional to its exponential visit
Returns: the calculated result (x, y)
Return type: tuple
-
calc_move_with_probs
(state, dirichlet=False)¶ - Calculates the best move, and return the search probabilities.
- This function should only be used for self-play.
Parameters: - state – current state
- dirichlet – enable Dirichlet noise described in “Self-play” section
Returns: the result (x, y) and a list of (action, probs)
Return type: tuple
-
update_with_move
(last_move)¶ Step forward in the tree, keeping everything we already know about the subtree, assuming that calc_move() has been called already. Siblings of the new root will be garbage-collected. :returns: None
-
-
AlphaZero.search.mcts.
randint
(low, high=None, size=None, dtype='l')¶ Return random integers from low (inclusive) to high (exclusive).
Return random integers from the “discrete uniform” distribution of the specified dtype in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).
Parameters: - low (int) – Lowest (signed) integer to be drawn from the distribution (unless
high=None
, in which case this parameter is one above the highest such integer). - high (int, optional) – If provided, one above the largest (signed) integer to be drawn
from the distribution (see above for behavior if
high=None
). - size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned. - dtype (dtype, optional) –
Desired dtype of the result. All dtypes are determined by their name, i.e., ‘int64’, ‘int’, etc, so byteorder is not available and a specific precision may have different C types depending on the platform. The default value is ‘np.int’.
New in version 1.11.0.
Returns: out – size-shaped array of random integers from the appropriate distribution, or a single such random int if size not provided.
Return type: int or ndarray of ints
See also
random.random_integers()
- similar to randint, only for the closed interval [low, high], and 1 is the lowest value if high is omitted. In particular, this other one is the one to use to generate uniformly distributed discrete non-integers.
Examples
>>> np.random.randint(2, size=10) array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0]) >>> np.random.randint(1, size=10) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Generate a 2 x 4 array of ints between 0 and 4, inclusive:
>>> np.random.randint(5, size=(2, 4)) array([[4, 0, 2, 1], [3, 2, 2, 0]])
- low (int) – Lowest (signed) integer to be drawn from the distribution (unless