Skip to content

Conversation

@ndharasz
Copy link
Contributor

@ndharasz ndharasz commented Sep 30, 2025

  • add new data module with balance_rank_transform and quantile_bin functions for building crypto data
  • add unit tests for new data functions
  • refactor existing functions into more appropriately named modules and avoid circular imports

…tions for building crypto data, add unit tests for new functions, refactor existing functions into more appropriately named modules and avoid circular imports
Copilot AI review requested due to automatic review settings September 30, 2025 23:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Refactors existing functions into more appropriately named modules to avoid circular imports and better organize the codebase, while adding new crypto-specific data transformation functions.

  • Moves mathematical functions from scoring.py to new math.py module
  • Creates new indexing.py module for index manipulation functions
  • Adds new data.py module with balanced_rank_transform and quantile_bin functions for crypto data processing

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_scoring.py Removes tests for moved functions while keeping scoring-specific tests
tests/test_math.py New test file for mathematical functions moved from scoring module
tests/test_indexing.py New test file for index manipulation functions
tests/test_data.py New test file for data transformation functions
pyproject.toml Updates version to 0.6.0.dev0
numerai_tools/typing.py New module defining type variables for DataFrame/Series unions
numerai_tools/signals.py Updates imports to use new module structure
numerai_tools/scoring.py Refactored to import from new modules and focus on scoring functions
numerai_tools/math.py New module containing mathematical transformation functions
numerai_tools/indexing.py New module for index filtering and sorting functions
numerai_tools/data.py New module with crypto data transformation functions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copilot AI review requested due to automatic review settings October 1, 2025 09:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@ndharasz ndharasz changed the title v0.6.0 - better modules v0.6.0 - better modules & L2 norm MPC Nov 21, 2025
Copy link

@andresnumer andresnumer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because correlation and dot products (subject to an L2 constraint) are so similar, the L2-based MPC is just like MMC but with weights rather than raw predictions (and a slightly different scaling factor).

Here's an implementation of the correct version

def meta_portfolio_contribution(
    predictions: pd.DataFrame,
    stakes: pd.Series,
    neutralizers: pd.DataFrame,
    sample_weights: pd.Series,
    targets: pd.Series,
) -> pd.Series:
    """Calculates the "meta portfolio" score:
        - rank, normalize, and power each signal
        - convert each signal into neutralized weights
        - generate the stake-weighted portfolio
        - calculate the gradient of the portfolio w.r.t. the stakes
        - multiplying the weights by the targets
    Arguments:
        predictions: pd.DataFrame - the predictions to evaluate
        stakes: pd.Series - the stakes to use as weights
        neutralizers: pd.DataFrame - the neutralization columns
        sample_weights: pd.Series - the universe sampling weights
        targets: pd.Series - the live targets to evaluate against
    """
    targets = center(targets)
    predictions, targets = filter_sort_index(predictions, targets)
    stake_weights = weight_normalize(stakes.fillna(0))
    assert np.isclose(stake_weights.sum(), 1), "Stakes must sum to 1"
    weights = generate_neutralized_weights(predictions, neutralizers, sample_weights)
    w = cast(np.ndarray, weights[stakes.index].values)
    s = cast(np.ndarray, stake_weights.values)
    t = cast(np.ndarray, targets.values)
    swp = w @ s
    swp = swp - swp.mean()
    l2_norm = np.sqrt(np.sum(swp**2))
    residualized_weights = orthogonalize(w, swp)
    mpc = (residualized_weights.T @ t).squeeze() / l2_norm
    return pd.Series(mpc, index=stakes.index)

@ndharasz ndharasz changed the title v0.6.0 - better modules & L2 norm MPC v0.6.0 - better modules Nov 21, 2025
@andresnumer
Copy link

andresnumer commented Nov 21, 2025

The previous code had a small issue with not making each user's weights zero-mean (important for the correctness of the MPC calculation). This version also has the targets centered post-filtering (center earlier if that's an issue)
Explanation of the code.
Implemenation:

def meta_portfolio_contribution(
    predictions: pd.DataFrame,
    stakes: pd.Series,
    neutralizers: pd.DataFrame,
    sample_weights: pd.Series,
    targets: pd.Series,
) -> pd.Series:
    """Calculates the "meta portfolio" gradient w.r.t. stakes:
        - rank, normalize, and power each signal
        - convert each signal into neutralized weights
        - center weights across samples (explicit W_c = C W)
        - generate the stake-weighted portfolio
        - calculate the gradient of the portfolio w.r.t. the stakes
        - multiply by the (centered) targets

    Arguments:
        predictions: pd.DataFrame - the predictions to evaluate
        stakes: pd.Series - the stakes to use as weights
        neutralizers: pd.DataFrame - the neutralization columns
        sample_weights: pd.Series - the universe sampling weights
        targets: pd.Series - the live targets to evaluate against
    """
    # Align predictions and targets on the same index / universe
    predictions, targets = filter_sort_index(predictions, targets)

    # Center targets in sample space: t_c = C t
    targets = center(targets)

    # Normalize stakes to sum to 1
    stake_weights = weight_normalize(stakes.fillna(0))
    assert np.isclose(stake_weights.sum(), 1), "Stakes must sum to 1"

    # Generate neutralized weights W(predictions, neutralizers, sample_weights)
    weights = generate_neutralized_weights(predictions, neutralizers, sample_weights)

    # Extract aligned matrices/vectors
    w = cast(np.ndarray, weights.loc[stakes.index].values)     # W ∈ R^{N×K}
    s = cast(np.ndarray, stake_weights.values)                 # s ∈ R^K
    t = cast(np.ndarray, targets.values)                       # t_c ∈ R^N (already centered)

    # Explicit centering of weights across samples:
    # W_c = C W = W - 1 μ^T, where μ is the column-wise mean of W
    w_centered = w - w.mean(axis=0, keepdims=True)             # W_c

    # Centered prediction vector v = W_c s
    v = w_centered @ s                                         # v ∈ R^N, already mean ~ 0
    # Optionally re-center to remove numerical drift
    v = v - v.mean()

    # Its L2 norm r = ||v||
    l2_norm = np.sqrt(np.sum(v**2))

    # Residualize W_c against v:
    # residualized_w ≈ R_v W_c = (I - v v^T / ||v||^2) W_c
    residualized_w = orthogonalize(w_centered, v)

    # Gradient: ∇_s α = (1 / ||v||) (R_v W_c)^T t_c
    mpc = (residualized_w.T @ t).squeeze() / l2_norm

    return pd.Series(mpc, index=stakes.index)```
    
    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants