Reinforcement Learning¶

class AlphaZero.train.parallel.evaluator.Evaluator(nn_eval_chal, nn_eval_best, r_conn, s_conn, game_config, ext_config)¶

This class compares the performance of the up-to-date model and the best model so far by holding games between these two models.

Parameters:

nn_eval_chal – NNEvaluator instance storing the up-to-date model
nn_eval_best – NNEvaluator instance storing the bast model so far
r_conn – Pipe to receive the message from optimizer
s_conn – Pipe to send the model updating message to the self play module
game_config – A dictionary of game environment configuration
ext_config – A dictionary of system configuration

eval_wrapper(color_of_new)¶

Wrapper for a single game.

Parameters:	color_of_new – The color of the new model (challenger)

run()¶: The main evaluation process. It will launch games asynchronously and examine the winning rate.

class AlphaZero.train.parallel.selfplay.Selfplay(nn_eval, r_conn, data_queue, game_config, ext_config)¶

This class generates training data from self play games.

Run only this file to start a remote self play session.

Example

$ python -m AlphaZero.train.parallel.selfplay <master addr>

Parameters:	nn_eval – NNEvaluator instance storing the best model so far r_conn – Pipe to receive the model updating message data_queue – Queue to put the data game_config – A dictionary of game environment configuration ext_config – A dictionary of system configuration

selfplay_wrapper()¶: Wrapper for a single self play game.

run()¶: The main data generation process. It will keep launching self play games.

model_update_handler()¶: The handler for model updating. It will try to load new network parameters. If it is the master session, it will also notify the remote sessions to update.

rcv_remote_data_handler()¶: The handler for receiving data from remote sessions. Only the master session uses this handler.

remote_update_handler()¶: The handler for receiving the update notification from the master session. Only the remote sessions use this handler.

class AlphaZero.train.parallel.datapool.DataPool(ext_config)¶

This class stores the training data and handles data sending and receiving.

Parameters:	ext_config – A dictionary of system configuration

serve()¶: The listening process. It will first load the saved data and then run a loop to handle data getting and putting requests.

merge_data(data)¶

Put the new data into the array. Since the array is pre-allocated, this function will overwrite the old data with the new ones and record the ending index.

Parameters:	data – New data from self play games

put(data)¶

Send the putting request. This function will be called by self play games.

Parameters:	data – New data

get(batch_size)¶

Send the getting request. This function will be called by the optimizer.

Parameters:	batch_size – The size of the minibatch
Returns:	Minibatch of training data

Reinforcement Learning¶

Previous topic

This Page