Reinforcement Learning

class AlphaZero.train.parallel.evaluator.Evaluator(nn_eval_chal, nn_eval_best, r_conn, s_conn, game_config, ext_config)

This class compares the performance of the up-to-date model and the best model so far by holding games between these two models.

Parameters:
  • nn_eval_chal – NNEvaluator instance storing the up-to-date model
  • nn_eval_best – NNEvaluator instance storing the bast model so far
  • r_conn – Pipe to receive the message from optimizer
  • s_conn – Pipe to send the model updating message to the self play module
  • game_config – A dictionary of game environment configuration
  • ext_config – A dictionary of system configuration
eval_wrapper(color_of_new)

Wrapper for a single game.

Parameters:color_of_new – The color of the new model (challenger)
run()

The main evaluation process. It will launch games asynchronously and examine the winning rate.

class AlphaZero.train.parallel.selfplay.Selfplay(nn_eval, r_conn, data_queue, game_config, ext_config)

This class generates training data from self play games.

Run only this file to start a remote self play session.

Example

$ python -m AlphaZero.train.parallel.selfplay <master addr>

Parameters:
  • nn_eval – NNEvaluator instance storing the best model so far
  • r_conn – Pipe to receive the model updating message
  • data_queue – Queue to put the data
  • game_config – A dictionary of game environment configuration
  • ext_config – A dictionary of system configuration
selfplay_wrapper()

Wrapper for a single self play game.

run()

The main data generation process. It will keep launching self play games.

model_update_handler()

The handler for model updating. It will try to load new network parameters. If it is the master session, it will also notify the remote sessions to update.

rcv_remote_data_handler()

The handler for receiving data from remote sessions. Only the master session uses this handler.

remote_update_handler()

The handler for receiving the update notification from the master session. Only the remote sessions use this handler.

class AlphaZero.train.parallel.datapool.DataPool(ext_config)

This class stores the training data and handles data sending and receiving.

Parameters:ext_config – A dictionary of system configuration
serve()

The listening process. It will first load the saved data and then run a loop to handle data getting and putting requests.

merge_data(data)

Put the new data into the array. Since the array is pre-allocated, this function will overwrite the old data with the new ones and record the ending index.

Parameters:data – New data from self play games
put(data)

Send the putting request. This function will be called by self play games.

Parameters:data – New data
get(batch_size)

Send the getting request. This function will be called by the optimizer.

Parameters:batch_size – The size of the minibatch
Returns:Minibatch of training data