TD3¶
genrl.agents.deep.td3.td3 module¶
-
class
genrl.agents.deep.td3.td3.
TD3
(*args, policy_frequency: int = 2, noise: genrl.core.noise.ActionNoise = None, noise_std: float = 0.2, **kwargs)[source]¶ Bases:
genrl.agents.deep.base.offpolicy.OffPolicyAgentAC
Twin Delayed DDPG Algorithm
Paper: https://arxiv.org/abs/1509.02971
-
network
¶ The network type of the Q-value function. Supported types: [“cnn”, “mlp”]
Type: str
-
env
¶ The environment that the agent is supposed to act on
Type: Environment
-
create_model
¶ Whether the model of the algo should be created when initialised
Type: bool
-
batch_size
¶ Mini batch size for loading experiences
Type: int
-
gamma
¶ The discount factor for rewards
Type: float
-
policy_layers
¶ Neural network layer dimensions for the policy
Type: tuple
ofint
-
value_layers
¶ Neural network layer dimensions for the critics
Type: tuple
ofint
Sizes of shared layers in Actor Critic if using
Type: tuple
ofint
-
lr_policy
¶ Learning rate for the policy/actor
Type: float
-
lr_value
¶ Learning rate for the critic
Type: float
-
replay_size
¶ Capacity of the Replay Buffer
Type: int
-
buffer_type
¶ Choose the type of Buffer: [“push”, “prioritized”]
Type: str
-
polyak
¶ Target model update parameter (1 for hard update)
Type: float
-
policy_frequency
¶ Frequency of policy updates in comparison to critic updates
Type: int
-
noise
¶ Action Noise function added to aid in exploration
Type: ActionNoise
-
noise_std
¶ Standard deviation of the action noise distribution
Type: float
-
seed
¶ Seed for randomness
Type: int
-
render
¶ Should the env be rendered during training?
Type: bool
-
device
¶ Hardware being used for training. Options: [“cuda” -> GPU, “cpu” -> CPU]
Type: str
-
get_hyperparams
() → Dict[str, Any][source]¶ Get relevant hyperparameters to save
Returns: Hyperparameters to be saved weights ( torch.Tensor
): Neural network weightsReturn type: hyperparams ( dict
)
-