Skip to content
Suhas Rao edited this page Feb 6, 2025 · 4 revisions

Welcome to the CoRAL wiki!

Interpreting Outputs

Breakpoint Graphs

The breakpoint *_graph.txt file outputted for each amplicon consists of sequence/concordant/discordant edges, path constraints, and amplified intervals.

  • Sequence Edges: Segments of the reference genome.
  • Concordant Edges: Edges connecting consecutive segments from the reference genome
  • Discordant Edges: Edges connecting nonconsecutive segments from the reference genome.
  • Path Constraints: Paths of edges with high read support from the provided long-read data. Long reads can potentially sapn multiple breakpoints, and we use this to our advantage during our cycle extraction by prioritizing solutions that contain these sequences as subpaths. In graph output files, we use the following short-form for edges:
    • e: sequence
    • c: concordant
    • d: discordant
  • Amplified Intervals: Segments of high copy number from the provided long-read sequencing data. These are refined from initial seed intervals generated using the CNVkit copy number calling pipeline.
GBM39 example
SequenceEdge: StartPosition, EndPosition, PredictedCN, AverageCoverage, Size, NumberOfLongReads
sequence	chr7:54659673-	chr7:54763281+	4.150534	45.907363	103609	576
sequence	chr7:54763282-	chr7:55127266+	89.340352	1052.714362	363985	40637
sequence	chr7:55127267-	chr7:55155020+	2.843655	32.729552	27754	172
sequence	chr7:55155021-	chr7:55609190+	89.340352	1013.182857	454170	49675
sequence	chr7:55609191-	chr7:55610094+	2.868261	31.027655	904	915
sequence	chr7:55610095-	chr7:56049369+	89.340352	1023.280633	439275	49106
sequence	chr7:56049370-	chr7:56149664+	4.150534	49.623899	100295	562
BreakpointEdge: StartPosition->EndPosition, PredictedCN, NumberOfLongReads
concordant	chr7:54763281+->chr7:54763282-	4.150534	26
concordant	chr7:55127266+->chr7:55127267-	2.843655	36
concordant	chr7:55155020+->chr7:55155021-	2.843655	32
concordant	chr7:55609190+->chr7:55609191-	2.697741	38
concordant	chr7:55610094+->chr7:55610095-	2.697741	41
concordant	chr7:56049369+->chr7:56049370-	4.150534	45
discordant	chr7:55610095-->chr7:55609190+	86.642611	869
discordant	chr7:56049369+->chr7:54763282-	85.189818	981
discordant	chr7:55155021-->chr7:55127266+	86.496697	978
...
PathConstraint: Path, Support
path_constraint e2+:1,c2-:1,e3+:1,c3-:1,e4+:1   6
path_constraint e4+:1,c4-:1,e5+:1,c5-:1,e6+:1   34
AmpliconIntervals: chr, start, end
interval        chr7    54659673        56149664

Cycles

The *_cycles.txt file contains a list of the "heaviest" (highest Copy-Number) graph walks our cycle extraction process is able to find. In some cases where we are unable to find a cycle, we will output the heaviest path instead. These are listed in the Extracted cycles: section, and will be prefixed with Cycle: or Path: respectively depending on what the extraction process generates.

CHP212 example
Interval        1       chr2    11450087        12860095
Interval        2       chr2    14821054        16175121
List of cycle segments
Segment 1       chr2    11450087        11551950
Segment 2       chr2    11551951        11569465
Segment 3       chr2    11569466        11785247
Segment 4       chr2    11785248        11786083
Segment 5       chr2    11786084        11789939
Segment 6       chr2    11789940        11999240
Segment 7       chr2    11999241        12298448
Segment 8       chr2    12298449        12298516
Segment 9       chr2    12298517        12459661
Segment 10      chr2    12459662        12462513
Segment 11      chr2    12462514        12622497
Segment 12      chr2    12622498        12622527
Segment 13      chr2    12622528        12622734
Segment 14      chr2    12622735        12757957
Segment 15      chr2    12757958        12757960
Segment 16      chr2    12757961        12860095
Segment 17      chr2    14821054        14921053
Segment 18      chr2    14921054        14921210
Segment 19      chr2    14921211        15582774
Segment 20      chr2    15582775        15855187
Segment 21      chr2    15855188        15855221
Segment 22      chr2    15855222        16072476
Segment 23      chr2    16072477        16175121
List of longest subpath constraints
Path constraint 1       e14+:1,d1+:1,e18-:1,d4-:1,e20+:1        Support=181     Satisfied
Path constraint 2       e14+:1,c14-:1,e15+:1,d3-:1,e20+:1       Support=11      Satisfied
Path constraint 3       e6-:1,c5+:1,e5-:1,d5+:1,e12-:1,c11+:1,e11-:1    Support=123     Satisfied
Path constraint 5       e3-:1,c2+:1,e2-:1,d6+:1,e9-:1   Support=49      Satisfied
Path constraint 6       e6+:1,d11+:1,e2-:1,d6+:1,e9-:1  Support=1       Satisfied
Path constraint 7       e22-:2,c20+:1,e21-:1,d9-:1,e22+:2       Support=1       Satisfied
Path constraint 8       e7+:2,c7-:1,e8+:1,d10+:1,e7-:2  Support=3       Satisfied
Path constraint 9       e3+:1,c3-:1,e4+:1,c4-:1,e5+:1,c5-:1,e6+:1       Support=1       Satisfied
Path constraint 10      e7+:1,c7-:1,e8+:1,c8-:1,e9+:1   Support=160     Satisfied
Path constraint 11      e9+:1,c9-:1,e10+:1,c10-:1,e11+:1        Support=5       Satisfied
Path constraint 12      e11+:1,c11-:1,e12+:1,c12-:1,e13+:1,c13-:1,e14+:1        Support=2       Satisfied
Path constraint 13      e14+:1,c14-:1,e15+:1,c15-:1,e16+:1      Support=4       Satisfied
Path constraint 14      e17+:1,c16-:1,e18+:1,c17-:1,e19+:1      Support=5       Satisfied
Path constraint 15      e20+:1,c19-:1,e21+:1,c20-:1,e22+:1      Support=157     Satisfied
Extracted cycles
Cycle=1;Copy_count=101.72274978772717;Segments=2+,3+,14+,18-,20+,21+,22+,11+,12+,5+,6+,7+,8+,9+;Path_constraints_satisfied=1,3,4,9,14
Cycle=2;Copy_count=14.198554835085135;Segments=2+,3+,14+,15+,20+,21+,22+,11+,12+,5+,6+,7+,8+,9+;Path_constraints_satisfied=2,3,4,9,14
Path=3;Copy_count=0.4146276131624981;Segments=0+,1+,2+,3+,4+,5+,6+,7+,8+,7-,6-,5-,12-,11-,22-,21-,20-,15-,14-,13-,12-,11-,10-,9-,8-,7-,6-,5-,4-,3-,2-,1-,0-;Path_constraints_satisfied=2,3,7,8,10,11,14
Path=4;Copy_count=0.012213478032914176;Segments=0+,17+,18+,19+,20+,21+,22+,11+,6+,7+,8+,9+,10+,11+,12+,5+,6+,7+,8+,9+,10+,11+,12+,13+,14+,15+,20+,21+,22+,23+,0-;Path_constraints_satisfied=2,3,9,10,11,13,14
Path=5;Copy_count=0.00022551776473648353;Segments=0+,1+,2+,6-,5-,12-,11-,10-,9-,8-,7-,6-,5-,4-,3-,2-,9-,8-,7-,6-,5-,4-,3-,2-,1-,0-;Path_constraints_satisfied=4,5,8,9
Path=6;Copy_count=0.00014201800616832843;Segments=0+,16-,15-,14-,13-,12-,11-,10-,9-,8-,7-,6-,5-,12-,11-,22-,21+,22+,23+,0-;Path_constraints_satisfied=3,6,9,11,12

Summary

For each CoRAL run, we generate an amplicon_summary.txt file indicating how many amplicons we were able to solve (complete cycle extraction for) within the provided time limits. For each amplicon, we include some information about the relevant genomic intervals that were involved and the largest cycle/path we were able to find (if applicable).

Example summary
1/1 amplicons solved.
------------------------------------------------------------
AmpliconID = 1
#Intervals = 2
AmpliconIntervals:
        Amplicon-1>chr2:11,450,087-12,860,095
        Amplicon-1>chr2:14,821,054-16,175,121
Total Amplicon Size: 2764077
# Chromosomes: 1
# Sequence Edges: 23
# Concordant Edges: 21
# Discordant Edges: 11
# Non-Source Edges: 55
# Source Edges: 0
Heaviest graph walk solved was a cycle.
Cycle=1;Copy_count=101.72274978772717;Segments=2+,3+,14+,18-,20+,21+,22+,11+,12+,5+,6+,7+,8+,9+

Clone this wiki locally