Skip to content

Commit

Permalink
.
Browse files Browse the repository at this point in the history
  • Loading branch information
act65 committed Jan 1, 2025
1 parent 00eac7d commit 6ffaa72
Show file tree
Hide file tree
Showing 5 changed files with 187 additions and 7 deletions.
53 changes: 53 additions & 0 deletions _drafts/inbetween-posts/2024-12-10-confidence.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
title: Confidence versus rigor
subtitle:
layout: post
permalink: /confidence/
categories:
- tutorial
---

Here is a tutorial designed to help you calibrate your (over) confidence.
I believe we easily confuse confidence of >80% with certainty.

<!--
this is a manifesto for rigor. via example.
-->

## Binary search example

This is a simple problem.
Implement a binary search algorithm in your favorite programming language.

You are given a sorted list of integers and a target value.
Return the index of the target value in the list or -1 if it is not present.

***

Here's my (first) solution in Python:

```python
def binary_search(arr, target, count=0):
if len(arr) == 0:
return -1
elif len(arr) == 1:
return 0 if arr[0] == target else -1
else:
mid = len(arr) // 2
if arr[mid] == target:
return count + mid
elif arr[mid] < target:
return binary_search(arr[mid+1:], target, count + mid + 1)
else:
return binary_search(arr[:mid], target, count)
```



## The Borwein integrals.





##
23 changes: 23 additions & 0 deletions _drafts/inbetween-posts/2024-12-12-contracts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: The power of the pen
subtitle: How contracts shape our world
layout: post
permalink: /contracts/
categories:
- economics
---

<!--
i want to write about contracts / legal agreements
would be good to collect a few different kinds
contracts for different purposes
enron's contracts and the
how does the introduction of a new contract change the world?
Advance commitment market
https://worksinprogress.co/issue/how-to-start-an-advance-market-commitment/
-->
88 changes: 81 additions & 7 deletions _drafts/technical-posts/2024-08-01-init-opt.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
---
title: "Note on ??"
title: Penalty term optimisation vs. finite-step optimisation
subtitle: The connection between steps and penalties
layout: post
permalink: /opt-init-const/
categories:
- research
---

<!-- https://colab.research.google.com/drive/1lNCdi9PYQKBwwV7jIzm7Cw_YriNDi6SB#scrollTo=yF6zwcI2DX3i -->

Consider a constrained optimisation problem.
We want to minimise an objective, while also staying close to a specific point $y$ (the constraint).

Expand All @@ -12,13 +17,14 @@ Is there a rigorous connection between;
1. an optimisation with a penalty term (the typical approach)
2. a finite-step optimisation started from a specific initialisation

## Optimisation with a penalty term

$$
x^* = \arg\min_x f(x) + \lambda D(x, y) \\
D(x, y) = \frac{1}{2} \| x - y \|^2 \\
x_0 \sim \mathcal{N}(0, \sigma^2) \\
$$

***
## Finite-step optimisation

We have a fixed budget of $T$ steps to optimise a function $f(x)$, and we start from a specific initialisation $x_0$.

Expand All @@ -30,8 +36,76 @@ $$

***

What's the point of this? / Questions

- 2. will vary dramatically depending on the optimiser used, the learning rate,
- 2. would allow easier worst case distance bounds?
- relatonship to trust region methods?
**Theorem:**
For a smooth, convex function f(x), under gradient descent optimization with learning rate η, there exists a relationship between:
- A λ-penalized optimization with penalty term D(x,y) = ½||x-y||²
- A T-step optimization starting from y

such that their solutions are equivalent up to O(η²).

**Proof:**

1) First, let's establish some conditions:
- Let f(x) be L-smooth: ||∇f(x) - ∇f(y)|| ≤ L||x-y||
- Let f(x) be μ-strongly convex: f(y) ≥ f(x) + ∇f(x)ᵀ(y-x) + (μ/2)||y-x||²

2) For the penalized approach:

$$
\begin{align*}
\nabla L(x) &= \nabla f(x) + \lambda (x-y) \\
\nabla L(x*) &= 0 \tag{at optimality} \\
\nabla f(x*) + \lambda (x*-y) &= 0 \\
\end{align*}
$$

3) For the T-step approach with gradient descent:

$$
\begin{align*}
x_{t+1} = x_t - \eta\nabla f(x_t)
x_0 = y
\end{align*}
$$

4) Key insight: After T steps, the maximum distance from y is bounded:

$$
||x_T - y|| ≤ \eta\sum_{t=0}^{T-1} ||\nabla f(x_t)||
$$

5) Using smoothness and convexity:

$$
||∇f(x_t)|| ≤ ||∇f(y)|| + L||x_t - y||
$$

6) This gives us a recursive bound:

$$
||x_T - y|| ≤ ηT||∇f(y)|| + ηTL·max_{t<T}||x_t - y||
$$

7) For small enough η, this converges to:

$$
||x_T - y|| ≤ ηT||∇f(y)||/(1-ηTL)
$$

8) Comparing with the penalized approach:
From step 2: ||x* - y|| = (1/λ)||∇f(x*)||

9) For these to be equivalent:

$$
1/λ ≈ ηT
$$

Therefore, we can establish that choosing λ = 1/(ηT) makes the penalized approach approximately equivalent to the T-step approach, up to O(η²) terms.

**Corollary:**
The solutions will be closer when:
- η is small
- f(x) is well-conditioned (L/μ is small)
- T is moderate (not too large or small)
29 changes: 29 additions & 0 deletions _drafts/technical-posts/2025-01-01-requests-for-research.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
layout: post
title: Requests for research
subtitle: Some ideas from my masters.
permalink: /requests-for-research/02
categories:
- "research"
---

To stay sane I needed to write down some of the _actionable_ ideas that occur to me.
Otherwise I have the tendency to hoard them.
So, these are the questions I am not going to answer (argh it hurts!).
They appear to be perfectly good research directions, but "you need to focus" (says pretty much everyone I meet).

# Requests for research

_(the number of stars reflects how open the problem is: 1 star means little room for interpretation, 3 stars mean that there are some complex choices to be made)_

&#9734; __The effect of the positional embeddings__
Positional embedding must be added to the inputs of a transformer as transformers are permutation invariant. How much of the transformer's performance is due to the positional embeddings?
What if we build a convolutional network and add positional embeddings to the inputs?

&#9734; __Does machine unlearning increase plasticity?__
[Plasticity](https://arxiv.org/abs/2303.01486) is a measure of the ability of a NN to change its predictions in response to new information. Does [Machine unlearning](https://arxiv.org/abs/1912.03817) increase the plasticity of a model?
Broader question: By removing knowledge from a model, can we increase its ability to learn new things?

&#9734; __In-context learning via fine-tuning__
In-context learning is a remarkable ability that some large models have. However, as the context grows, the model's ability to learn new things decreases (it this true?) and it's ability to remember and stay coherent decreases.
Rather
1 change: 1 addition & 0 deletions _posts/technical-posts/2020-04-10-requests-for-research.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
layout: post
title: Requests for research
subtitle: Some ideas from my masters.
permalink: /requests-for-research/01
categories:
- "research"
---
Expand Down

0 comments on commit 6ffaa72

Please sign in to comment.