From 6ffaa720d37a6645a4f80b75ece7274a3888fccd Mon Sep 17 00:00:00 2001 From: act65 Date: Wed, 1 Jan 2025 22:00:07 +1300 Subject: [PATCH] . --- .../inbetween-posts/2024-12-10-confidence.md | 53 +++++++++++ .../inbetween-posts/2024-12-12-contracts.md | 23 +++++ .../technical-posts/2024-08-01-init-opt.md | 88 +++++++++++++++++-- .../2025-01-01-requests-for-research.md | 29 ++++++ .../2020-04-10-requests-for-research.md | 1 + 5 files changed, 187 insertions(+), 7 deletions(-) create mode 100644 _drafts/inbetween-posts/2024-12-10-confidence.md create mode 100644 _drafts/inbetween-posts/2024-12-12-contracts.md create mode 100644 _drafts/technical-posts/2025-01-01-requests-for-research.md diff --git a/_drafts/inbetween-posts/2024-12-10-confidence.md b/_drafts/inbetween-posts/2024-12-10-confidence.md new file mode 100644 index 0000000000000..d7bbfe9a80d33 --- /dev/null +++ b/_drafts/inbetween-posts/2024-12-10-confidence.md @@ -0,0 +1,53 @@ +--- +title: Confidence versus rigor +subtitle: +layout: post +permalink: /confidence/ +categories: + - tutorial +--- + +Here is a tutorial designed to help you calibrate your (over) confidence. +I believe we easily confuse confidence of >80% with certainty. + + + +## Binary search example + +This is a simple problem. +Implement a binary search algorithm in your favorite programming language. + +You are given a sorted list of integers and a target value. +Return the index of the target value in the list or -1 if it is not present. + +*** + +Here's my (first) solution in Python: + +```python +def binary_search(arr, target, count=0): + if len(arr) == 0: + return -1 + elif len(arr) == 1: + return 0 if arr[0] == target else -1 + else: + mid = len(arr) // 2 + if arr[mid] == target: + return count + mid + elif arr[mid] < target: + return binary_search(arr[mid+1:], target, count + mid + 1) + else: + return binary_search(arr[:mid], target, count) +``` + + + +## The Borwein integrals. + + + + + +## \ No newline at end of file diff --git a/_drafts/inbetween-posts/2024-12-12-contracts.md b/_drafts/inbetween-posts/2024-12-12-contracts.md new file mode 100644 index 0000000000000..2d5eb59701a24 --- /dev/null +++ b/_drafts/inbetween-posts/2024-12-12-contracts.md @@ -0,0 +1,23 @@ +--- +title: The power of the pen +subtitle: How contracts shape our world +layout: post +permalink: /contracts/ +categories: + - economics +--- + + \ No newline at end of file diff --git a/_drafts/technical-posts/2024-08-01-init-opt.md b/_drafts/technical-posts/2024-08-01-init-opt.md index d3ee4a8cb2918..1beb54b982323 100644 --- a/_drafts/technical-posts/2024-08-01-init-opt.md +++ b/_drafts/technical-posts/2024-08-01-init-opt.md @@ -1,9 +1,14 @@ --- -title: "Note on ??" +title: Penalty term optimisation vs. finite-step optimisation +subtitle: The connection between steps and penalties layout: post permalink: /opt-init-const/ +categories: + - research --- + + Consider a constrained optimisation problem. We want to minimise an objective, while also staying close to a specific point $y$ (the constraint). @@ -12,13 +17,14 @@ Is there a rigorous connection between; 1. an optimisation with a penalty term (the typical approach) 2. a finite-step optimisation started from a specific initialisation +## Optimisation with a penalty term + $$ x^* = \arg\min_x f(x) + \lambda D(x, y) \\ D(x, y) = \frac{1}{2} \| x - y \|^2 \\ -x_0 \sim \mathcal{N}(0, \sigma^2) \\ $$ -*** +## Finite-step optimisation We have a fixed budget of $T$ steps to optimise a function $f(x)$, and we start from a specific initialisation $x_0$. @@ -30,8 +36,76 @@ $$ *** -What's the point of this? / Questions -- 2. will vary dramatically depending on the optimiser used, the learning rate, -- 2. would allow easier worst case distance bounds? -- relatonship to trust region methods? \ No newline at end of file +**Theorem:** +For a smooth, convex function f(x), under gradient descent optimization with learning rate η, there exists a relationship between: +- A λ-penalized optimization with penalty term D(x,y) = ½||x-y||² +- A T-step optimization starting from y + +such that their solutions are equivalent up to O(η²). + +**Proof:** + +1) First, let's establish some conditions: +- Let f(x) be L-smooth: ||∇f(x) - ∇f(y)|| ≤ L||x-y|| +- Let f(x) be μ-strongly convex: f(y) ≥ f(x) + ∇f(x)ᵀ(y-x) + (μ/2)||y-x||² + +2) For the penalized approach: + +$$ +\begin{align*} +\nabla L(x) &= \nabla f(x) + \lambda (x-y) \\ +\nabla L(x*) &= 0 \tag{at optimality} \\ +\nabla f(x*) + \lambda (x*-y) &= 0 \\ +\end{align*} +$$ + +3) For the T-step approach with gradient descent: + +$$ +\begin{align*} +x_{t+1} = x_t - \eta\nabla f(x_t) +x_0 = y +\end{align*} +$$ + +4) Key insight: After T steps, the maximum distance from y is bounded: + +$$ +||x_T - y|| ≤ \eta\sum_{t=0}^{T-1} ||\nabla f(x_t)|| +$$ + +5) Using smoothness and convexity: + +$$ +||∇f(x_t)|| ≤ ||∇f(y)|| + L||x_t - y|| +$$ + +6) This gives us a recursive bound: + +$$ +||x_T - y|| ≤ ηT||∇f(y)|| + ηTL·max_{t