-
Notifications
You must be signed in to change notification settings - Fork 9
Home
Welcome to the CoRAL wiki!
The breakpoint *_graph.txt file outputted for each amplicon consists of sequence/concordant/discordant edges, path constraints, and amplified intervals.
- Sequence Edges: Segments of the reference genome.
- Concordant Edges: Edges connecting consecutive segments from the reference genome
- Discordant Edges: Edges connecting nonconsecutive segments from the reference genome.
- Path Constraints: Paths of edges with high read support from the provided long-read data. Long reads can potentially sapn multiple breakpoints, and we use this to our advantage during our cycle extraction by prioritizing solutions that contain these sequences as subpaths. In graph output files, we use the following short-form for edges:
- e: sequence
- c: concordant
- d: discordant
- Amplified Intervals: Segments of high copy number from the provided long-read sequencing data. These are refined from initial seed intervals generated using the CNVkit copy number calling pipeline.
GBM39 example
SequenceEdge: StartPosition, EndPosition, PredictedCN, AverageCoverage, Size, NumberOfLongReads
sequence chr7:54659673- chr7:54763281+ 4.150534 45.907363 103609 576
sequence chr7:54763282- chr7:55127266+ 89.340352 1052.714362 363985 40637
sequence chr7:55127267- chr7:55155020+ 2.843655 32.729552 27754 172
sequence chr7:55155021- chr7:55609190+ 89.340352 1013.182857 454170 49675
sequence chr7:55609191- chr7:55610094+ 2.868261 31.027655 904 915
sequence chr7:55610095- chr7:56049369+ 89.340352 1023.280633 439275 49106
sequence chr7:56049370- chr7:56149664+ 4.150534 49.623899 100295 562
BreakpointEdge: StartPosition->EndPosition, PredictedCN, NumberOfLongReads
concordant chr7:54763281+->chr7:54763282- 4.150534 26
concordant chr7:55127266+->chr7:55127267- 2.843655 36
concordant chr7:55155020+->chr7:55155021- 2.843655 32
concordant chr7:55609190+->chr7:55609191- 2.697741 38
concordant chr7:55610094+->chr7:55610095- 2.697741 41
concordant chr7:56049369+->chr7:56049370- 4.150534 45
discordant chr7:55610095-->chr7:55609190+ 86.642611 869
discordant chr7:56049369+->chr7:54763282- 85.189818 981
discordant chr7:55155021-->chr7:55127266+ 86.496697 978
...
PathConstraint: Path, Support
path_constraint e2+:1,c2-:1,e3+:1,c3-:1,e4+:1 6
path_constraint e4+:1,c4-:1,e5+:1,c5-:1,e6+:1 34
AmpliconIntervals: chr, start, end
interval chr7 54659673 56149664
The *_cycles.txt file contains a list of the "heaviest" (highest Copy-Number) graph walks our cycle extraction process is able to find. In some cases where we are unable to find a cycle, we will output the heaviest path instead. These are listed in the Extracted cycles: section, and will be prefixed with Cycle: or Path: respectively depending on what the extraction process generates.
CHP212 example
Interval 1 chr2 11450087 12860095
Interval 2 chr2 14821054 16175121
List of cycle segments
Segment 1 chr2 11450087 11551950
Segment 2 chr2 11551951 11569465
Segment 3 chr2 11569466 11785247
Segment 4 chr2 11785248 11786083
Segment 5 chr2 11786084 11789939
Segment 6 chr2 11789940 11999240
Segment 7 chr2 11999241 12298448
Segment 8 chr2 12298449 12298516
Segment 9 chr2 12298517 12459661
Segment 10 chr2 12459662 12462513
Segment 11 chr2 12462514 12622497
Segment 12 chr2 12622498 12622527
Segment 13 chr2 12622528 12622734
Segment 14 chr2 12622735 12757957
Segment 15 chr2 12757958 12757960
Segment 16 chr2 12757961 12860095
Segment 17 chr2 14821054 14921053
Segment 18 chr2 14921054 14921210
Segment 19 chr2 14921211 15582774
Segment 20 chr2 15582775 15855187
Segment 21 chr2 15855188 15855221
Segment 22 chr2 15855222 16072476
Segment 23 chr2 16072477 16175121
List of longest subpath constraints
Path constraint 1 e14+:1,d1+:1,e18-:1,d4-:1,e20+:1 Support=181 Satisfied
Path constraint 2 e14+:1,c14-:1,e15+:1,d3-:1,e20+:1 Support=11 Satisfied
Path constraint 3 e6-:1,c5+:1,e5-:1,d5+:1,e12-:1,c11+:1,e11-:1 Support=123 Satisfied
Path constraint 5 e3-:1,c2+:1,e2-:1,d6+:1,e9-:1 Support=49 Satisfied
Path constraint 6 e6+:1,d11+:1,e2-:1,d6+:1,e9-:1 Support=1 Satisfied
Path constraint 7 e22-:2,c20+:1,e21-:1,d9-:1,e22+:2 Support=1 Satisfied
Path constraint 8 e7+:2,c7-:1,e8+:1,d10+:1,e7-:2 Support=3 Satisfied
Path constraint 9 e3+:1,c3-:1,e4+:1,c4-:1,e5+:1,c5-:1,e6+:1 Support=1 Satisfied
Path constraint 10 e7+:1,c7-:1,e8+:1,c8-:1,e9+:1 Support=160 Satisfied
Path constraint 11 e9+:1,c9-:1,e10+:1,c10-:1,e11+:1 Support=5 Satisfied
Path constraint 12 e11+:1,c11-:1,e12+:1,c12-:1,e13+:1,c13-:1,e14+:1 Support=2 Satisfied
Path constraint 13 e14+:1,c14-:1,e15+:1,c15-:1,e16+:1 Support=4 Satisfied
Path constraint 14 e17+:1,c16-:1,e18+:1,c17-:1,e19+:1 Support=5 Satisfied
Path constraint 15 e20+:1,c19-:1,e21+:1,c20-:1,e22+:1 Support=157 Satisfied
Extracted cycles
Cycle=1;Copy_count=101.72274978772717;Segments=2+,3+,14+,18-,20+,21+,22+,11+,12+,5+,6+,7+,8+,9+;Path_constraints_satisfied=1,3,4,9,14
Cycle=2;Copy_count=14.198554835085135;Segments=2+,3+,14+,15+,20+,21+,22+,11+,12+,5+,6+,7+,8+,9+;Path_constraints_satisfied=2,3,4,9,14
Path=3;Copy_count=0.4146276131624981;Segments=0+,1+,2+,3+,4+,5+,6+,7+,8+,7-,6-,5-,12-,11-,22-,21-,20-,15-,14-,13-,12-,11-,10-,9-,8-,7-,6-,5-,4-,3-,2-,1-,0-;Path_constraints_satisfied=2,3,7,8,10,11,14
Path=4;Copy_count=0.012213478032914176;Segments=0+,17+,18+,19+,20+,21+,22+,11+,6+,7+,8+,9+,10+,11+,12+,5+,6+,7+,8+,9+,10+,11+,12+,13+,14+,15+,20+,21+,22+,23+,0-;Path_constraints_satisfied=2,3,9,10,11,13,14
Path=5;Copy_count=0.00022551776473648353;Segments=0+,1+,2+,6-,5-,12-,11-,10-,9-,8-,7-,6-,5-,4-,3-,2-,9-,8-,7-,6-,5-,4-,3-,2-,1-,0-;Path_constraints_satisfied=4,5,8,9
Path=6;Copy_count=0.00014201800616832843;Segments=0+,16-,15-,14-,13-,12-,11-,10-,9-,8-,7-,6-,5-,12-,11-,22-,21+,22+,23+,0-;Path_constraints_satisfied=3,6,9,11,12
For each CoRAL run, we generate an amplicon_summary.txt file indicating how many amplicons we were able to solve (complete cycle extraction for) within the provided time limits. For each amplicon, we include some information about the relevant genomic intervals that were involved and the largest cycle/path we were able to find (if applicable).
Example summary
1/1 amplicons solved.
------------------------------------------------------------
AmpliconID = 1
#Intervals = 2
AmpliconIntervals:
Amplicon-1>chr2:11,450,087-12,860,095
Amplicon-1>chr2:14,821,054-16,175,121
Total Amplicon Size: 2764077
# Chromosomes: 1
# Sequence Edges: 23
# Concordant Edges: 21
# Discordant Edges: 11
# Non-Source Edges: 55
# Source Edges: 0
Heaviest graph walk solved was a cycle.
Cycle=1;Copy_count=101.72274978772717;Segments=2+,3+,14+,18-,20+,21+,22+,11+,12+,5+,6+,7+,8+,9+