Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FeatureRequest: Assigning what is metadata versus feature` #510

Open
2 tasks done
jenna-tomkinson opened this issue Feb 7, 2025 · 0 comments
Open
2 tasks done

FeatureRequest: Assigning what is metadata versus feature` #510

jenna-tomkinson opened this issue Feb 7, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@jenna-tomkinson
Copy link
Member

jenna-tomkinson commented Feb 7, 2025

Feature type

  • Add new functionality

  • Change existing functionality

General description of the proposed functionality

When columns are outputted from CellProfiler and processed in CytoTable, we get what can be considered metadata (e.g., plate and well) and feature, which are currently based on the prefix of the name of the column (e.g., features are prefixed with the compartment or nuclei, cells, and cytoplasm).

CellProfiler takes many measurements, but some I have noticed tend to not reflect the morphology of the cells that are included as features due to the naming convention that pycytominer expects as feature versus metadata. This is reflected in the function infer_cp_features, which dictates a column as a feature if it starts with the compartment.

This has become an issue that I have noticed, as I am finding center x,y coordinates or parent/child indexes being included in my feature space. I have had to manually add a "Metadata_" prefix to these columns to avoid this issue, but I am finding that I am not catching every instance of these "non-morphological features".

There is a discussion to be had if these features should be considered metadata or not. Personally, I think that we should have a simple if statement when running annotate, and have a parameter for if TRUEthen it will find columns withLocation, ParentandChildin the name, then it should be defined asmetadata`.

Feature example

  annotated_df = annotate(
      profiles=single_cell_df,
      platemap=platemap_df,
      join_on=["Metadata_well_position", "Image_Metadata_Well"],
      reassign_cp_columns = True
  )

I am not a fan of the parameter name here, but it gives the general idea.

In the annotate code, there could be something like:

if reassign_cp_columns:
    df.columns = [
        f"Metadata_{col}" if any(x in col for x in ["Location", "Parent", "Child"]) else col
        for col in df.columns
    ]

Alternative Solutions

This issue is very complicated and could require a more in-depth solution that will be more robust.

This issue is open for all to discuss and come up with a solution that works for all.

CC: @MikeLippincott, @MattsonCam, @axiomcura, @d33bs, @gwaybio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant