Search Algorithm

class AlphaZero.search.mcts.MCTreeNode(parent, prior_prob)

Tree Node in MCTS.

expand(policy, value)

Expand a leaf node according to the network evaluation. NO visit count is updated in this function, make sure it’s updated externally.

Parameters:
  • policy – a list of (action, prob) tuples returned by the network
  • value – the value of this node returned by the network
Returns:

None

select()

Select the best child of this node.

Returns:A tuple of (action, next_node) with highest Q(s,a)+U(s,a)
Return type:tuple
update(v)

Update the three values

Parameters:v – value
Returns:None
get_selection_value()

Implements PUCT Algorithm’s formula for current node.

Returns:None
get_mean_action_value()

Calculates Q(s,a)

Returns:mean action value
Return type:real
visit()

Increment the visit count.

Returns:None
is_leaf()

Checks if it is a leaf node (i.e. no nodes below this have been expanded).

Returns:if the current node is leaf.
Return type:bool
is_root()

Checks if it is a root node.

Returns:if the current node is root.
Return type:bool
class AlphaZero.search.mcts.MCTSearch(evaluator, game_config, max_playout=1600)

Create a Monto Carlo search tree.

calc_move(state, dirichlet=False, prop_exp=True)

Calculates the best move

Parameters:
  • state – current state
  • dirichlet – enable Dirichlet noise described in “Self-play” section
  • prop_exp – select the final decision proportional to its exponential visit
Returns:

the calculated result (x, y)

Return type:

tuple

calc_move_with_probs(state, dirichlet=False)
Calculates the best move, and return the search probabilities.
This function should only be used for self-play.
Parameters:
  • state – current state
  • dirichlet – enable Dirichlet noise described in “Self-play” section
Returns:

the result (x, y) and a list of (action, probs)

Return type:

tuple

update_with_move(last_move)

Step forward in the tree, keeping everything we already know about the subtree, assuming that calc_move() has been called already. Siblings of the new root will be garbage-collected. :returns: None

AlphaZero.search.mcts.randint(low, high=None, size=None, dtype='l')

Return random integers from low (inclusive) to high (exclusive).

Return random integers from the “discrete uniform” distribution of the specified dtype in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).

Parameters:
  • low (int) – Lowest (signed) integer to be drawn from the distribution (unless high=None, in which case this parameter is one above the highest such integer).
  • high (int, optional) – If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if high=None).
  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
  • dtype (dtype, optional) –

    Desired dtype of the result. All dtypes are determined by their name, i.e., ‘int64’, ‘int’, etc, so byteorder is not available and a specific precision may have different C types depending on the platform. The default value is ‘np.int’.

    New in version 1.11.0.

Returns:

outsize-shaped array of random integers from the appropriate distribution, or a single such random int if size not provided.

Return type:

int or ndarray of ints

See also

random.random_integers()
similar to randint, only for the closed interval [low, high], and 1 is the lowest value if high is omitted. In particular, this other one is the one to use to generate uniformly distributed discrete non-integers.

Examples

>>> np.random.randint(2, size=10)
array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])
>>> np.random.randint(1, size=10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Generate a 2 x 4 array of ints between 0 and 4, inclusive:

>>> np.random.randint(5, size=(2, 4))
array([[4, 0, 2, 1],
       [3, 2, 2, 0]])