差分
このページの2つのバージョン間の差分を表示します。
両方とも前のリビジョン 前のリビジョン 次のリビジョン | 前のリビジョン 次のリビジョン両方とも次のリビジョン | ||
playground:playground [2021/08/22 14:21] – Hideaki IIDUKA | playground:playground [2021/08/22 14:32] – Hideaki IIDUKA | ||
---|---|---|---|
行 2: | 行 2: | ||
^ Convergence rates of stochastic optimization algorithms for convex and nonconvex optimization ^^^ | ^ Convergence rates of stochastic optimization algorithms for convex and nonconvex optimization ^^^ | ||
- | ^ Algorithms ^ Convex Optimization ^ ^ Nonconvex Optimization ^ | + | ^ Algorithms ^ Convex Optimization ^ ^ Nonconvex Optimization |
| | Constant learning rate | Diminishing learning rate | Constant learning rate | Diminishing learning rate | | | | Constant learning rate | Diminishing learning rate | Constant learning rate | Diminishing learning rate | | ||
- | | SGD \cite{sca2020} | + | | SGD \cite{sca2020}|$\displaystyle{\mathcal{O}\left( \frac{1}{T} \right) + C}$|$\displaystyle{\mathcal{O}\left( \frac{1}{\sqrt{T}} \right)}$|$\displaystyle{\mathcal{O}\left( \frac{1}{n} \right) + C}$| $\displaystyle{\mathcal{O}\left( \frac{1}{\sqrt{n}} \right)}$ | |
- | | $\displaystyle{\mathcal{O}\left( \frac{1}{T} \right) + C}$ | + | | SGD with SPS \cite{loizou2021}|---------|$\displaystyle{\mathcal{O}\left( \frac{1}{T} \right) + C}$|---------|$\displaystyle{\mathcal{O}\left( \frac{1}{n} \right) + C}$| |
- | | $\displaystyle{\mathcal{O}\left( \frac{1}{\sqrt{T}} \right)}$ | + | | Minibatch SGD \cite{chen2020}|---------|$\displaystyle{\mathcal{O}\left( \frac{1}{T} \right) + C}$|---------|$\displaystyle{\mathcal{O}\left( \frac{1}{n} \right) + C}$| |
- | | $\displaystyle{\mathcal{O}\left( \frac{1}{n} \right) + C}$ | + | | Adam \cite{adam}|---------|$\displaystyle{\mathcal{O}\left( \frac{1}{\sqrt{T}} \right)}^{(*)}$|---------|---------| |
- | | $\displaystyle{\mathcal{O}\left( \frac{1}{\sqrt{n}} \right)}$ | | + | | AMSGrad \cite{reddi2018}|---------|$\displaystyle{\mathcal{O}\left( \sqrt{\frac{1 + \ln T}{T}} \right)}$|---------|---------| |
- | SGD with SPS \cite{loizou2021} | + | | GWDC \cite{liang2020}|---------|$\displaystyle{\mathcal{O}\left( \frac{1}{\sqrt{T}} \right)}$|---------|---------| |
- | & --------- | + | | AMSGWDC \cite{liang2020}|---------|$\displaystyle{\mathcal{O}\left( \frac{1}{\sqrt{T}} \right)}$|---------|---------| |
- | & $\displaystyle{\mathcal{O}\left( \frac{1}{T} \right) + C}$ | + | | AMSGrad \cite{chen2019}|---------|$\displaystyle{\mathcal{O}\left( \frac{\ln T}{\sqrt{T}} \right)}$|--------|$\displaystyle{\mathcal{O}\left( \frac{\ln n}{\sqrt{n}} \right)}$| |
- | & --------- | + | |
- | & $\displaystyle{\mathcal{O}\left( \frac{1}{n} \right) + C}$ \\ | + | |
- | Minibatch SGD \cite{chen2020} | + | |
- | & --------- | + | |
- | & $\displaystyle{\mathcal{O}\left( \frac{1}{T} \right) + C}$ | + | |
- | & --------- | + | |
- | & $\displaystyle{\mathcal{O}\left( \frac{1}{n} \right) + C}$ \\ \hline\hline | + | |
- | Adam \cite{adam} | + | |
- | & --------- | + | |
- | & $\displaystyle{\mathcal{O}\left( \frac{1}{\sqrt{T}} \right)}^{(*)}$ | + | |
- | + | ||
- | & --------- | + | |
- | & --------- | + | |
- | AMSGrad \cite{reddi2018} | + | |
- | & --------- | + | |
- | & $\displaystyle{\mathcal{O}\left( \sqrt{\frac{1 + \ln T}{T}} \right)}$ | + | |
- | + | ||
- | & --------- | + | |
- | & --------- | + | |
- | GWDC \cite{liang2020} | + | |
- | & --------- | + | |
- | & $\displaystyle{\mathcal{O}\left( \frac{1}{\sqrt{T}} \right)}$ | + | |
- | & --------- | + | |
- | & --------- | + | |
- | AMSGWDC \cite{liang2020} | + | |
- | & --------- | + | |
- | & $\displaystyle{\mathcal{O}\left( \frac{1}{\sqrt{T}} \right)}$ | + | |
- | & --------- | + | |
- | & --------- | + | |
- | AMSGrad \cite{chen2019} | + | |
- | & --------- | + | |
- | & $\displaystyle{\mathcal{O}\left( \frac{\ln T}{\sqrt{T}} \right)}$ | + | |
- | + | ||
- | & --------- | + | |
- | & $\displaystyle{\mathcal{O}\left( \frac{\ln n}{\sqrt{n}} \right)}$ | + | |
AdaBelief \cite{adab} | AdaBelief \cite{adab} | ||
& --------- | & --------- |