.

act65 · Jan 1, 2025 · 6ffaa72 · 6ffaa72
1 parent 00eac7d
commit 6ffaa72
Show file tree

Hide file tree

Showing 5 changed files with 187 additions and 7 deletions.
diff --git a/_drafts/inbetween-posts/2024-12-10-confidence.md b/_drafts/inbetween-posts/2024-12-10-confidence.md
@@ -0,0 +1,53 @@
+---
+title: Confidence versus rigor
+subtitle:
+layout: post
+permalink: /confidence/
+categories:
+    - tutorial
+---
+
+Here is a tutorial designed to help you calibrate your (over) confidence.
+I believe we easily confuse confidence of >80% with certainty.
+
+<!-- 
+this is a manifesto for rigor. via example.
+ -->
+
+## Binary search example
+
+This is a simple problem.
+Implement a binary search algorithm in your favorite programming language.
+
+You are given a sorted list of integers and a target value.
+Return the index of the target value in the list or -1 if it is not present.
+
+***
+
+Here's my (first) solution in Python:
+
+```python
+def binary_search(arr, target, count=0):
+    if len(arr) == 0:
+        return -1
+    elif len(arr) == 1:
+        return 0 if arr[0] == target else -1
+    else:
+        mid = len(arr) // 2
+        if arr[mid] == target:
+            return count + mid
+        elif arr[mid] < target:
+            return binary_search(arr[mid+1:], target, count + mid + 1)
+        else:
+            return binary_search(arr[:mid], target, count)
+```
+
+
+
+## The Borwein integrals.
+
+
+
+
+
+## 
diff --git a/_drafts/inbetween-posts/2024-12-12-contracts.md b/_drafts/inbetween-posts/2024-12-12-contracts.md
@@ -0,0 +1,23 @@
+---
+title: The power of the pen
+subtitle: How contracts shape our world
+layout: post
+permalink: /contracts/
+categories:
+    - economics
+---
+
+<!-- 
+i want to write about contracts / legal agreements
+would be good to collect a few different kinds
+
+contracts for different purposes
+
+enron's contracts and the 
+
+how does the introduction of a new contract change the world?
+
+
+Advance commitment market
+https://worksinprogress.co/issue/how-to-start-an-advance-market-commitment/
+ -->
diff --git a/_drafts/technical-posts/2024-08-01-init-opt.md b/_drafts/technical-posts/2024-08-01-init-opt.md
@@ -1,9 +1,14 @@
 ---
-title: "Note on ??"
+title: Penalty term optimisation vs. finite-step optimisation
+subtitle: The connection between steps and penalties
 layout: post
 permalink: /opt-init-const/
+categories:
+    - research
 ---
 
+<!-- https://colab.research.google.com/drive/1lNCdi9PYQKBwwV7jIzm7Cw_YriNDi6SB#scrollTo=yF6zwcI2DX3i -->
+
 Consider a constrained optimisation problem.
 We want to minimise an objective, while also staying close to a specific point $y$ (the constraint).
 
@@ -12,13 +17,14 @@ Is there a rigorous connection between;
 1. an optimisation with a penalty term (the typical approach)
 2. a finite-step optimisation started from a specific initialisation
 
+## Optimisation with a penalty term
+
 $$
 x^* = \arg\min_x f(x) + \lambda D(x, y) \\
 D(x, y) = \frac{1}{2} \| x - y \|^2 \\
-x_0 \sim \mathcal{N}(0, \sigma^2) \\
 $$
 
-***
+## Finite-step optimisation
 
 We have a fixed budget of $T$ steps to optimise a function $f(x)$, and we start from a specific initialisation $x_0$.
 
@@ -30,8 +36,76 @@ $$
 
 ***
 
-What's the point of this? / Questions
 
-- 2. will vary dramatically depending on the optimiser used, the learning rate, 
-- 2. would allow easier worst case distance bounds? 
-- relatonship to trust region methods?
+**Theorem:**
+For a smooth, convex function f(x), under gradient descent optimization with learning rate η, there exists a relationship between:
+- A λ-penalized optimization with penalty term D(x,y) = ½||x-y||²
+- A T-step optimization starting from y
+
+such that their solutions are equivalent up to O(η²).
+
+**Proof:**
+
+1) First, let's establish some conditions:
+- Let f(x) be L-smooth: ||∇f(x) - ∇f(y)|| ≤ L||x-y||
+- Let f(x) be μ-strongly convex: f(y) ≥ f(x) + ∇f(x)ᵀ(y-x) + (μ/2)||y-x||²
+
+2) For the penalized approach:
+
+$$
+\begin{align*}
+\nabla L(x) &= \nabla f(x) + \lambda (x-y) \\
+\nabla L(x*) &= 0 \tag{at optimality} \\
+\nabla f(x*) + \lambda (x*-y) &= 0 \\
+\end{align*}
+$$
+
+3) For the T-step approach with gradient descent:
+
+$$
+\begin{align*}
+x_{t+1} = x_t - \eta\nabla f(x_t)
+x_0 = y
+\end{align*}
+$$
+
+4) Key insight: After T steps, the maximum distance from y is bounded:
+
+$$
+||x_T - y|| ≤ \eta\sum_{t=0}^{T-1} ||\nabla f(x_t)||
+$$
+
+5) Using smoothness and convexity:
+
+$$
+||∇f(x_t)|| ≤ ||∇f(y)|| + L||x_t - y||
+$$
+
+6) This gives us a recursive bound:
+
+$$
+||x_T - y|| ≤ ηT||∇f(y)|| + ηTL·max_{t<T}||x_t - y||
+$$
+
+7) For small enough η, this converges to:
+
+$$
+||x_T - y|| ≤ ηT||∇f(y)||/(1-ηTL)
+$$
+
+8) Comparing with the penalized approach:
+From step 2: ||x* - y|| = (1/λ)||∇f(x*)||
+
+9) For these to be equivalent:
+
+$$
+1/λ ≈ ηT
+$$
+
+Therefore, we can establish that choosing λ = 1/(ηT) makes the penalized approach approximately equivalent to the T-step approach, up to O(η²) terms.
+
+**Corollary:**
+The solutions will be closer when:
+- η is small
+- f(x) is well-conditioned (L/μ is small)
+- T is moderate (not too large or small)
diff --git a/_drafts/technical-posts/2025-01-01-requests-for-research.md b/_drafts/technical-posts/2025-01-01-requests-for-research.md
@@ -0,0 +1,29 @@
+---
+layout: post
+title: Requests for research
+subtitle: Some ideas from my masters.
+permalink: /requests-for-research/02
+categories: 
+  - "research"
+---
+
+To stay sane I needed to write down some of the _actionable_ ideas that occur to me.
+Otherwise I have the tendency to hoard them.
+So, these are the questions I am not going to answer (argh it hurts!).
+They appear to be perfectly good research directions, but "you need to focus" (says pretty much everyone I meet).
+
+# Requests for research
+
+_(the number of stars reflects how open the problem is: 1 star means little room for interpretation, 3 stars mean that there are some complex choices to be made)_
+
+&#9734; __The effect of the positional embeddings__
+Positional embedding must be added to the inputs of a transformer as transformers are permutation invariant. How much of the transformer's performance is due to the positional embeddings? 
+What if we build a convolutional network and add positional embeddings to the inputs?
+
+&#9734; __Does machine unlearning increase plasticity?__
+[Plasticity](https://arxiv.org/abs/2303.01486) is a measure of the ability of a NN to change its predictions in response to new information. Does [Machine unlearning](https://arxiv.org/abs/1912.03817) increase the plasticity of a model?
+Broader question: By removing knowledge from a model, can we increase its ability to learn new things?
+
+&#9734; __In-context learning via fine-tuning__
+In-context learning is a remarkable ability that some large models have. However, as the context grows, the model's ability to learn new things decreases (it this true?) and it's ability to remember and stay coherent decreases. 
+Rather 
diff --git a/_posts/technical-posts/2020-04-10-requests-for-research.md b/_posts/technical-posts/2020-04-10-requests-for-research.md
@@ -2,6 +2,7 @@
 layout: post
 title: Requests for research
 subtitle: Some ideas from my masters.
+permalink: /requests-for-research/01
 categories: 
   - "research"
 ---