-
Notifications
You must be signed in to change notification settings - Fork 41
Simplified, more flexible annotation #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat-2-0
Are you sure you want to change the base?
Conversation
latest changes feat-2-0
…est transcript files.
…ript selection criteria.
…esolve (like intron,CDS)
RESOLVE_UNANNOTATED process: added validation and logging
One small note: the only place apart from |
…fined; remove unused profiles
added collect to deal with queue ref channels
This fix was implemented in goodwrigth/clipseq 9dfeafe to tweak the channel creation for MERGE_SUMMARY process. In its previous iteration, the premmaped cDNA crosslink files could be randomly paired with other icount summary files to produce erroneous "premapadjusted" icount summary files
fix merging of iCount summaries with premapped crosslinks
Simplified, more flexible annotation:
What this PR means for usage:
--skip_filter_gtf false
) so they can prioritise a single representative transcript per gene when regions are assigned across the genome--representative_transcript
) to be prioritised for regions assignment and for transcriptome analyses00_genome
folder (only the used GTFs and segmentation files are saved)replaced
FIND_LONGEST_TRANSCRIPT
andCLIPSEQ_FILTER_GTF
modules with →FILTER_GTF_BY_TRANSCRIPT
module.FILTER_GTF_BY_TRANSCRIPT
does the following:--gtf
to only keep features of representative transcripts.--representative_transcript
(which now replaces--longest_transcript
), and if not provided auto-selects longest transcript per gene (instead of the formerbasic
tag and TSL-based filtering)--representative_transcript
: all genes in the GTF must have exactly one representative transcript--skip_transcriptome false
)--skip_filter_gtf
isfalse
updated
CLIPSEQ_RESOLVE_UNANNOTATED
template script with important checks, incl. error messages if: unannotated regions remain even with full GTF; fai, regions, or annotation are mismatched; or regions don’t span the genome properly etc.both
FILTER_GTF_BY_TRANSCRIPT
andCLIPSEQ_RESOLVE_UNANNOTATED
are now emitting detailed logscurated saved outputs: in
modules.config
added conditionalpublishDir
directives to only save the GTF files used downstream in the pipeline outside ofPREPARE_GENOME
. Only final GTF and regions files (those actually used) are published in00_genome
results.PREPARE_GENOME
subworkflow changes:CLIPSEQ_RESOLVE_UNANNOTATED
on transcript segmentation, it only needs to be done on regions file--skip_filter_gtf
and--skip_transcriptome
soFILTER_GTF_BY_TRANSCRIPT
is only executed depending on these parametersFIND_LONGEST_TRANSCRIPT
execution.--skip_filter_gtf
(false
by default)--representative_transcript
→ replaces--longest_transcript
--representative_transcript_fai
→ replaces--longest_transcript_fai
--representative_transcript_gtf
→ replaces--longest_transcript_gtf
CLIPSEQ
workflow changes:ch_regions_used
andch_gtf_used
)test_full.config
:nextflow.config
and enabled UMI extraction (so now UMI collapse doesn't fail)test_full
runs further, but still fails during consensus peak analysis steps.test.config
:PR checklist
scrape_software_versions.py
nf-core lint .
).nextflow run . -profile test,docker
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).