Reinforcement Learning¶
-
class
AlphaZero.train.parallel.evaluator.
Evaluator
(nn_eval_chal, nn_eval_best, r_conn, s_conn, game_config, ext_config)¶ This class compares the performance of the up-to-date model and the best model so far by holding games between these two models.
Parameters: - nn_eval_chal – NNEvaluator instance storing the up-to-date model
- nn_eval_best – NNEvaluator instance storing the bast model so far
- r_conn – Pipe to receive the message from optimizer
- s_conn – Pipe to send the model updating message to the self play module
- game_config – A dictionary of game environment configuration
- ext_config – A dictionary of system configuration
-
eval_wrapper
(color_of_new)¶ Wrapper for a single game.
Parameters: color_of_new – The color of the new model (challenger)
-
run
()¶ The main evaluation process. It will launch games asynchronously and examine the winning rate.
-
class
AlphaZero.train.parallel.selfplay.
Selfplay
(nn_eval, r_conn, data_queue, game_config, ext_config)¶ This class generates training data from self play games.
Run only this file to start a remote self play session.
Example
$ python -m AlphaZero.train.parallel.selfplay <master addr>
Parameters: - nn_eval – NNEvaluator instance storing the best model so far
- r_conn – Pipe to receive the model updating message
- data_queue – Queue to put the data
- game_config – A dictionary of game environment configuration
- ext_config – A dictionary of system configuration
-
selfplay_wrapper
()¶ Wrapper for a single self play game.
-
run
()¶ The main data generation process. It will keep launching self play games.
-
model_update_handler
()¶ The handler for model updating. It will try to load new network parameters. If it is the master session, it will also notify the remote sessions to update.
-
rcv_remote_data_handler
()¶ The handler for receiving data from remote sessions. Only the master session uses this handler.
-
remote_update_handler
()¶ The handler for receiving the update notification from the master session. Only the remote sessions use this handler.
-
class
AlphaZero.train.parallel.datapool.
DataPool
(ext_config)¶ This class stores the training data and handles data sending and receiving.
Parameters: ext_config – A dictionary of system configuration -
serve
()¶ The listening process. It will first load the saved data and then run a loop to handle data getting and putting requests.
-
merge_data
(data)¶ Put the new data into the array. Since the array is pre-allocated, this function will overwrite the old data with the new ones and record the ending index.
Parameters: data – New data from self play games
-
put
(data)¶ Send the putting request. This function will be called by self play games.
Parameters: data – New data
-
get
(batch_size)¶ Send the getting request. This function will be called by the optimizer.
Parameters: batch_size – The size of the minibatch Returns: Minibatch of training data
-