机器学习中的优化 Optimization Chapter 2 Gradient Descent(1)
2022/4/25 6:42:36
本文主要是介绍机器学习中的优化 Optimization Chapter 2 Gradient Descent(1),对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
1. Step of Gradient descent
\[\begin{equation} x_{t+1} = x_t-\gamma \nabla f(x_t) \end{equation} \]2. Vanilla Analysis
\(\text{Let }{\bf g_t} = \nabla f(x_t)\),\(\text{ therefore we can get:}\)
\[\begin{equation} g_t = (x_t-x_{t+1})/\gamma \end{equation} \]\(\text{hence we get:}\)
\[\begin{equation} g_t^T(x_t-x^*) = \frac{1}{\gamma}(x_t-x_{t+1})^T(x_t-x^*) \end{equation} \]\(\text{Basic vector equation: }\) \(2v^Tw = ||v||^2+||w||^2-||v-w||^2\)
\(\text{Hence we can obtain:}\)
\[\begin{align} g_t^T(x_t-x^*)&= \frac{1}{\gamma}(x_t-x_{t+1})^T(x_t-x^*)\\ &=\frac{1}{2\gamma}[||x_t-x_{t+1}||^2+||x_t-x^*||^2-||x_{t+1}-x^*||^2]\\ &=\frac{1}{2\gamma}[\gamma^2||g_t||^2+||x_t-x^*||^2-||x_{t+1}-x^*||^2]\\ &=\frac{\gamma}{2}||g_t||^2+\frac{1}{2\gamma}[||x_t-x^*||^2-||x_{t+1}-x^*||^2] \end{align} \]\(\text{Then we sum up:}\)
\[\begin{align} \sum_{t=0}^{T-1}g_t^T(x_t-x^*)&= \frac{\gamma}{2}\sum_{t=0}^{T-1}||g_t||^2+\frac{1}{2\gamma}[||x_0-x^*||^2-||x_{T}-x^*||^2]\\ &\leq \frac{\gamma}{2}\sum_{t=0}^{T-1}||g_t||^2+\frac{1}{2\gamma}||x_0-x^*||^2 \end{align} \]\(\text{Then we take the Convexity into consideration: }f(y)>f(x)+g^T(y-x)\).\(\text{ Hence we can get:}\)
\[\begin{align} f(x_t)-f(x^*)<g_t^T(x_t-x^*) \end{align} \]\(\text{Combine the inequality (9):}\)
\[\begin{align} \large \sum_{t=0}^{T-1}f(x_t)-f(x^*) &\leq \frac{\gamma}{2}\sum_{t=0}^{T-1}||g_t||^2+\frac{1}{2\gamma}||x_0-x^*||^2 \end{align} \]\(\large \text{This gives us an upper bound for the average error.}\)
3. Lipschitz Convex function: \(O(1/\epsilon^2)\) steps
\(\bf \large \text{Theorem }2.1\):
\(f:\mathbb{R^d}\rightarrow \mathbb{R} \text{ convex and differentiable with a global minimum }x^*; \text{Suppose that }||x_0-x^*||\leq R, ||\nabla f(x)||\leq B \text{ for all }x.\text{ Choosing the stepsize: } \gamma = \frac{R}{B\sqrt{T}},\text{gradient descent yields: }\)
\(\large\bf Proof:\)
\(\text{From inequality (11), we can just put the assumption together and get the results.}\)
4. Smooth Convex functions: \(O(1/\epsilon)\) steps
\(\text{Definition }2.2:\text{ Smooth with a parameter }L:\)
\[\begin{align} f(y) \leq f(x)+g(x)^T(y-x)+\frac{L}{2}||x-y||^2 \end{align} \]\(\large \text{More generally, all quadratic functions of the form }f(x) = x^TQx+b^Tx+c \text{ are }\bf smooth.\)
\(\bf\large \text{Lemma } 2.4:\)
\(f:\mathbb{R^d}\rightarrow \mathbb{R} \text{ be convex and differentiable. The following statements are equivalent:}\)
\(\bf \large \text{Lemma }2.6:\)
\(f:\mathbb{R^d}\rightarrow\mathbb{R}\text{ be }{\bf differentiable\ and\ smooth with\ parameter\ }L. \text{ Choosing }\gamma= \frac{1}{L},\text{ gradient descent yields}\)
\(\bf \large Proof:\)
\(\text{Obviously, we can get }\)
\(\text{By Smooth definition:}\)
\[\begin{align} f(x_{t+1}) &\leq f(x_t)+g(x_t)^T(x_{t+1}-x_t)+\frac{L}{2}||x_t-x_{t+1}||^2\\ &\leq f(x_t)+L(x_t-x_{t+1})^T(x_{t+1}-x_t)+\frac{L}{2}||x_t-x_{t+1}||^2\\ &\leq f(x_t)-\frac{L}{2}\frac{1}{L^2}||\nabla f(x_t)||^2\\ &=f(x_t)-\frac{1}{2L}||\nabla f(x_t)||^2 \end{align} \]这篇关于机器学习中的优化 Optimization Chapter 2 Gradient Descent(1)的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-23增量更新怎么做?-icode9专业技术文章分享
- 2024-11-23压缩包加密方案有哪些?-icode9专业技术文章分享
- 2024-11-23用shell怎么写一个开机时自动同步远程仓库的代码?-icode9专业技术文章分享
- 2024-11-23webman可以同步自己的仓库吗?-icode9专业技术文章分享
- 2024-11-23在 Webman 中怎么判断是否有某命令进程正在运行?-icode9专业技术文章分享
- 2024-11-23如何重置new Swiper?-icode9专业技术文章分享
- 2024-11-23oss直传有什么好处?-icode9专业技术文章分享
- 2024-11-23如何将oss直传封装成一个组件在其他页面调用时都可以使用?-icode9专业技术文章分享
- 2024-11-23怎么使用laravel 11在代码里获取路由列表?-icode9专业技术文章分享
- 2024-11-22怎么实现ansible playbook 备份代码中命名包含时间戳功能?-icode9专业技术文章分享