Search Algorithm¶

class AlphaZero.search.mcts.MCTreeNode(parent, prior_prob)¶

Tree Node in MCTS.

expand(policy, value)¶

Expand a leaf node according to the network evaluation. NO visit count is updated in this function, make sure it’s updated externally.

Parameters:	policy – a list of (action, prob) tuples returned by the network value – the value of this node returned by the network
Returns:	None

select()¶

Select the best child of this node.

Returns:	A tuple of (action, next_node) with highest Q(s,a)+U(s,a)
Return type:	tuple

update(v)¶

Update the three values

Parameters:	v – value
Returns:	None

get_selection_value()¶

Implements PUCT Algorithm’s formula for current node.

Returns:	None

get_mean_action_value()¶

Calculates Q(s,a)

Returns:	mean action value
Return type:	real

visit()¶

Increment the visit count.

Returns:	None

is_leaf()¶

Checks if it is a leaf node (i.e. no nodes below this have been expanded).

Returns:	if the current node is leaf.
Return type:	bool

is_root()¶

Checks if it is a root node.

Returns:	if the current node is root.
Return type:	bool

class AlphaZero.search.mcts.MCTSearch(evaluator, game_config, max_playout=1600)¶

Create a Monto Carlo search tree.

calc_move(state, dirichlet=False, prop_exp=True)¶

Calculates the best move

Parameters:	state – current state dirichlet – enable Dirichlet noise described in “Self-play” section prop_exp – select the final decision proportional to its exponential visit
Returns:	the calculated result (x, y)
Return type:	tuple

calc_move_with_probs(state, dirichlet=False)¶

Calculates the best move, and return the search probabilities.: This function should only be used for self-play.

Parameters:	state – current state dirichlet – enable Dirichlet noise described in “Self-play” section
Returns:	the result (x, y) and a list of (action, probs)
Return type:	tuple

update_with_move(last_move)¶: Step forward in the tree, keeping everything we already know about the subtree, assuming that calc_move() has been called already. Siblings of the new root will be garbage-collected. :returns: None

AlphaZero.search.mcts.randint(low, high=None, size=None, dtype='l')¶

Return random integers from low (inclusive) to high (exclusive).

Return random integers from the “discrete uniform” distribution of the specified dtype in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).

Parameters:	low (int) – Lowest (signed) integer to be drawn from the distribution (unless `high=None`, in which case this parameter is one above the highest such integer). high (int, optional) – If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if `high=None`). size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., `(m, n, k)`, then `m * n * k` samples are drawn. Default is None, in which case a single value is returned. dtype (dtype, optional) – Desired dtype of the result. All dtypes are determined by their name, i.e., ‘int64’, ‘int’, etc, so byteorder is not available and a specific precision may have different C types depending on the platform. The default value is ‘np.int’. New in version 1.11.0.
Returns:	out – size-shaped array of random integers from the appropriate distribution, or a single such random int if size not provided.
Return type:	int or ndarray of ints

Search Algorithm¶

Previous topic

Next topic

This Page