v0.6.0 - better modules #51

ndharasz · 2025-09-30T23:53:54Z

add new data module with balance_rank_transform and quantile_bin functions for building crypto data
add unit tests for new data functions
refactor existing functions into more appropriately named modules and avoid circular imports

…tions for building crypto data, add unit tests for new functions, refactor existing functions into more appropriately named modules and avoid circular imports

Copilot

Pull Request Overview

Refactors existing functions into more appropriately named modules to avoid circular imports and better organize the codebase, while adding new crypto-specific data transformation functions.

Moves mathematical functions from scoring.py to new math.py module
Creates new indexing.py module for index manipulation functions
Adds new data.py module with balanced_rank_transform and quantile_bin functions for crypto data processing

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_scoring.py	Removes tests for moved functions while keeping scoring-specific tests
tests/test_math.py	New test file for mathematical functions moved from scoring module
tests/test_indexing.py	New test file for index manipulation functions
tests/test_data.py	New test file for data transformation functions
pyproject.toml	Updates version to 0.6.0.dev0
numerai_tools/typing.py	New module defining type variables for DataFrame/Series unions
numerai_tools/signals.py	Updates imports to use new module structure
numerai_tools/scoring.py	Refactored to import from new modules and focus on scoring functions
numerai_tools/math.py	New module containing mathematical transformation functions
numerai_tools/indexing.py	New module for index filtering and sorting functions
numerai_tools/data.py	New module with crypto data transformation functions

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

numerai_tools/math.py

numerai_tools/scoring.py

Co-authored-by: Copilot <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

numerai_tools/data.py

Co-authored-by: Copilot <[email protected]>

andresnumer

Because correlation and dot products (subject to an L2 constraint) are so similar, the L2-based MPC is just like MMC but with weights rather than raw predictions (and a slightly different scaling factor).

Here's an implementation of the correct version

def meta_portfolio_contribution(
    predictions: pd.DataFrame,
    stakes: pd.Series,
    neutralizers: pd.DataFrame,
    sample_weights: pd.Series,
    targets: pd.Series,
) -> pd.Series:
    """Calculates the "meta portfolio" score:
        - rank, normalize, and power each signal
        - convert each signal into neutralized weights
        - generate the stake-weighted portfolio
        - calculate the gradient of the portfolio w.r.t. the stakes
        - multiplying the weights by the targets
    Arguments:
        predictions: pd.DataFrame - the predictions to evaluate
        stakes: pd.Series - the stakes to use as weights
        neutralizers: pd.DataFrame - the neutralization columns
        sample_weights: pd.Series - the universe sampling weights
        targets: pd.Series - the live targets to evaluate against
    """
    targets = center(targets)
    predictions, targets = filter_sort_index(predictions, targets)
    stake_weights = weight_normalize(stakes.fillna(0))
    assert np.isclose(stake_weights.sum(), 1), "Stakes must sum to 1"
    weights = generate_neutralized_weights(predictions, neutralizers, sample_weights)
    w = cast(np.ndarray, weights[stakes.index].values)
    s = cast(np.ndarray, stake_weights.values)
    t = cast(np.ndarray, targets.values)
    swp = w @ s
    swp = swp - swp.mean()
    l2_norm = np.sqrt(np.sum(swp**2))
    residualized_weights = orthogonalize(w, swp)
    mpc = (residualized_weights.T @ t).squeeze() / l2_norm
    return pd.Series(mpc, index=stakes.index)

andresnumer · 2025-11-21T23:58:37Z

The previous code had a small issue with not making each user's weights zero-mean (important for the correctness of the MPC calculation). This version also has the targets centered post-filtering (center earlier if that's an issue)
Explanation of the code.
Implemenation:

def meta_portfolio_contribution(
    predictions: pd.DataFrame,
    stakes: pd.Series,
    neutralizers: pd.DataFrame,
    sample_weights: pd.Series,
    targets: pd.Series,
) -> pd.Series:
    """Calculates the "meta portfolio" gradient w.r.t. stakes:
        - rank, normalize, and power each signal
        - convert each signal into neutralized weights
        - center weights across samples (explicit W_c = C W)
        - generate the stake-weighted portfolio
        - calculate the gradient of the portfolio w.r.t. the stakes
        - multiply by the (centered) targets

    Arguments:
        predictions: pd.DataFrame - the predictions to evaluate
        stakes: pd.Series - the stakes to use as weights
        neutralizers: pd.DataFrame - the neutralization columns
        sample_weights: pd.Series - the universe sampling weights
        targets: pd.Series - the live targets to evaluate against
    """
    # Align predictions and targets on the same index / universe
    predictions, targets = filter_sort_index(predictions, targets)

    # Center targets in sample space: t_c = C t
    targets = center(targets)

    # Normalize stakes to sum to 1
    stake_weights = weight_normalize(stakes.fillna(0))
    assert np.isclose(stake_weights.sum(), 1), "Stakes must sum to 1"

    # Generate neutralized weights W(predictions, neutralizers, sample_weights)
    weights = generate_neutralized_weights(predictions, neutralizers, sample_weights)

    # Extract aligned matrices/vectors
    w = cast(np.ndarray, weights.loc[stakes.index].values)     # W ∈ R^{N×K}
    s = cast(np.ndarray, stake_weights.values)                 # s ∈ R^K
    t = cast(np.ndarray, targets.values)                       # t_c ∈ R^N (already centered)

    # Explicit centering of weights across samples:
    # W_c = C W = W - 1 μ^T, where μ is the column-wise mean of W
    w_centered = w - w.mean(axis=0, keepdims=True)             # W_c

    # Centered prediction vector v = W_c s
    v = w_centered @ s                                         # v ∈ R^N, already mean ~ 0
    # Optionally re-center to remove numerical drift
    v = v - v.mean()

    # Its L2 norm r = ||v||
    l2_norm = np.sqrt(np.sum(v**2))

    # Residualize W_c against v:
    # residualized_w ≈ R_v W_c = (I - v v^T / ||v||^2) W_c
    residualized_w = orthogonalize(w_centered, v)

    # Gradient: ∇_s α = (1 / ||v||) (R_v W_c)^T t_c
    mpc = (residualized_w.T @ t).squeeze() / l2_norm

    return pd.Series(mpc, index=stakes.index)```

add new data module with balance_rank_transform and quantile_bin func…

6a83cbb

…tions for building crypto data, add unit tests for new functions, refactor existing functions into more appropriately named modules and avoid circular imports

Copilot AI review requested due to automatic review settings September 30, 2025 23:53

Copilot AI reviewed Sep 30, 2025

View reviewed changes

numerai_tools/math.py Show resolved Hide resolved

numerai_tools/scoring.py Outdated Show resolved Hide resolved

Update numerai_tools/scoring.py

f136722

Co-authored-by: Copilot <[email protected]>

Copilot AI review requested due to automatic review settings October 1, 2025 09:48

Copilot AI reviewed Oct 1, 2025

View reviewed changes

numerai_tools/data.py Outdated Show resolved Hide resolved

Update numerai_tools/data.py

a305e00

Co-authored-by: Copilot <[email protected]>

ndharasz changed the title ~~v0.6.0 - better modules~~ v0.6.0 - better modules & L2 norm MPC Nov 21, 2025

andresnumer reviewed Nov 21, 2025

View reviewed changes

ndharasz changed the title ~~v0.6.0 - better modules & L2 norm MPC~~ v0.6.0 - better modules Nov 21, 2025

ndharasz force-pushed the ndharasz/v0.6.0 branch from dd79bdc to a305e00 Compare November 21, 2025 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.6.0 - better modules #51

v0.6.0 - better modules #51

Uh oh!

ndharasz commented Sep 30, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

andresnumer left a comment

Uh oh!

andresnumer commented Nov 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

v0.6.0 - better modules #51

Are you sure you want to change the base?

v0.6.0 - better modules #51

Uh oh!

Conversation

ndharasz commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

andresnumer left a comment

Choose a reason for hiding this comment

Uh oh!

andresnumer commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ndharasz commented Sep 30, 2025 •

edited

Loading

andresnumer commented Nov 21, 2025 •

edited

Loading