Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outline of work needed for sweeps analysis #104

Open
nspope opened this issue May 17, 2023 · 0 comments
Open

Outline of work needed for sweeps analysis #104

nspope opened this issue May 17, 2023 · 0 comments

Comments

@nspope
Copy link
Contributor

nspope commented May 17, 2023

Meeting with @mufernando and @andrewkern to outline what needs doing for sweeps analysis

What we want to produce:

  • Multi-panel plot showing FPR and TPR as function of coordinate with rugs exon density and recombination (like ProbGen poster), split by population (e.g. small vs large) and sweep calling method (sweepfinder, diploshic, D)
  • One null model (BGS) in main figure; in supp figure could show comparison between BGS and completely neutral null model
  • Scatterplots showing FPR/TPR as a function of recombination rate/exon density, with some boxplot/quantile lines to show how distribution of test statistic changes with rec rate
  • Scatterplots showing joint distribution of FPR/TPR (e.g. pair plots) for supplement
  • Use global critical value for simplicity of explanation

What has to be done:

  1. diploshic training PR is reviewed, needs some minor cleanup to be merged [Andy] -- done
  2. diploshic prediction workflow needs to be put together
    a. dump VCF per simulated window (the 5 Mb focal region, without simulated buffer) [Murillo] -- done
    b. apply diploshic, sliding across focal regions -- this'll output a score per window for soft-linked/hard-linked/neutral/soft/hard classification [Andy]
    c. pool soft+hard scores to get a binary "sweep vs not" score [Murillo/Nate]
    d. take max score across entire focal window to get test statistic for the window [Murillo/Nate]
    e. get critical value by calculating score for neutral/BGS simulations (as for CLR) [Murillo/Nate]
    f. keep training and prediction in separate workflows (e.g. the prediction step should go in the same workflow where CLR is calculated) [Murillo/Nate]
  3. write rule to generate figures based off Murillo's probgen draft [Murillo]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant