新闻动态

您当前的位置: 首页 > 新闻动态 > 行业新闻

Training options for Adam optimizer

作者:佚名 发布时间:2024-04-15 13:05:04 浏览:

Maximum number of epochs (full passes of the data) to use for training, specified as a positive integer.

Data Types: | | | | | | | | |

Size of the mini-batch to use for each training iteration, specified as a positive integer. A mini-batch is a subset of the training set that is used to evaluate the gradient of the loss function and update the weights.

If the mini-batch size does not evenly divide the number of training samples, then the software discards the training data that does not fit into the final complete mini-batch of each epoch. If the mini-batch size is smaller then the number of training samples, then the software does not discard any data.

Data Types: | | | | | | | | |

Option for data shuffling, specified as one of these values:

  • — Shuffle the training and validation data once before training.

  • — Do not shuffle the data.

  • — Shuffle the training data before each training epoch, and shuffle the validation data before each neural network validation. If the mini-batch size does not evenly divide the number of training samples, then the software discards the training data that does not fit into the final complete mini-batch of each epoch. To avoid discarding the same data every epoch, set the training option to .

Initial learning rate used for training, specified as a positive scalar.

If the learning rate is too low, then training can take a long time. If the learning rate is too high, then training might reach a suboptimal result or diverge.

Data Types: | | | | | | | | |

This property is read-only.

Settings for the learning rate schedule, specified as a structure. has the field , which specifies the type of method for adjusting the learning rate. The possible methods are:

  • — The learning rate is constant throughout training.

  • — The learning rate drops periodically during training.

If is , then contains two more fields:

  • — The multiplicative factor by which the learning rate drops during training

  • — The number of epochs that passes between adjustments to the learning rate during training

Specify the settings for the learning schedule rate using .

Data Types:

Decay rate of gradient moving average for the Adam solver, specified as a nonnegative scalar less than . The gradient decay rate is denoted by in the Adaptive Moment Estimation section.

The default value works well for most tasks.

For more information, see Adaptive Moment Estimation.

Data Types: | | | | | | | | |

Decay rate of squared gradient moving average for the Adam solver, specified as a nonnegative scalar less than . The squared gradient decay rate is denoted by in the Adaptive Moment Estimation section.

Typical values of the decay rate are , , and , corresponding to averaging lengths of , , and parameter updates, respectively.

For more information, see Adaptive Moment Estimation.

Data Types: | | | | | | | | |

Denominator offset for Adam solver, specified as a positive scalar.

The solver adds the offset to the denominator in the neural network parameter updates to avoid division by zero. The default value works well for most tasks.

For more information, see Adaptive Moment Estimation.

Data Types: | | | | | | | | |


 

Copyright © 2012-2018 蓝狮在线-蓝狮注册陶瓷制品站  备案号:琼ICP备9527188号

搜索

平台注册入口