If you haven't yet red the "analysis steps" and "output folders" sections
of the CCseqBasic
main page
,
this is a good time to do so :
- if you don't know what the F2 / F3 / F5 / F6 folders are,
and what are the analysis steps performed in each of them,
interpreting the report counters is going to be error-prone !
Here more details and illustrations
about assigning fragments to "captures" and "reporters", like this :
( i.e. marking the mapped fragments to reflect the true results of the NG-CaptureC experiment )
Before reading the text below , check the link above : browse through the illustrations -
to get a clear picture what the text below refers to !
1a Actual reported fragments : 182 1b Actual reported CIS fragments : 88 1c Actual reported TRANS fragments : 94 Hba-1 1 Capture fragments (final count): 150 Hba-1 3a Reporter fragments (final count) : 110 Hba-1 3b Reporter fragments CIS (final count) : 46 Hba-1 3c Reporter fragments TRANS (final count) : 64 Hba-2 1 Capture fragments (final count): 102 Hba-2 3a Reporter fragments (final count) : 72 Hba-2 3b Reporter fragments CIS (final count) : 42 Hba-2 3c Reporter fragments TRANS (final count) : 30explaining the above :
Total capture counts :
1a Actual reported fragments : All reported fragments (CIS+TRANS), all capture sites. 1b Actual reported CIS fragments : All reported fragments (CIS), all capture sites. 1c Actual reported TRANS fragments : All reported fragments (TRANS), all capture sites.
Capture-site-wise counters (here two capture sites Hba-1 and Hba-2)
Hba-1 1 Capture fragments (final count): All capture fragments, Hba-1 capture site. This many fragments which mapped to the CAPTURE-site of Hba-1. The below 3 counters are then the REPORTERS which were found in the reads containing one or more of these CAPTURE fragments. Hba-1 3a Reporter fragments (final count) : All reported fragments (CIS+TRANS), interacting with Hba-1 capture site. Hba-1 3b Reporter fragments CIS (final count) : All reported fragments (CIS), interacting with Hba-1 capture site. Hba-1 3c Reporter fragments TRANS (final count) : All reported fragments (TRANS), interacting with Hba-1 capture site. Hba-2 1 Capture fragments (final count): Hba-2 capture-site RE-fragment mapping read-fragments (one or more per read) Hba-2 3a Reporter fragments (final count) : All reported fragments (CIS+TRANS), interacting with Hba-2 capture site. Hba-2 3b Reporter fragments CIS (final count) : All reported fragments (CIS), interacting with Hba-2 capture site. Hba-2 3c Reporter fragments TRANS (final count) : All reported fragments (TRANS), interacting with Hba-2 capture site.
The above counts are not "normal" in their distribution ( they originate from a TESTER file, which is not realistic )
explaining the above :Total reads (input fastq) Read count of all reads - as 100% Flashed / nonflashed Flash-combined reads in grey (from this point downwards) Non-combined reads in purple (from this point downwards) If you sequenced 150+150 bases PE reads, and sonicated to ~300 bases fragments, you should have most reads "flashed". If you sequenced 75+75 bases PE reads, and sonicated to ~300 bases fragments, you have less reads "flashed, but still many. If you sequenced 40+40 bases PE reads, and sonicated to ~300 bases fragments, you have almost all reads "nonflashed". Do/don't have RE site green : the desired RE-site can be seen at least once within the R1+R2 sequence of the read red : the desired RE-site can NOT be seen at least once within the R1+R2 sequence of the read Note : the non-flashed reads may have the RE site "invisible" - i.e. between the two halves of the read. Continue to mapping All non-flashed reads (purple) continue to mapping (stay green). Of the flashed reads (grey) we filter the ones where we didn't see the desired RE-cut sequence ( we already know these cannot have a ligation product ) Contains capture has capture (green) : the reads where some part(s) mapped to the desired captured RE fragments no capture (red) : the reads where no part(s) mapped to the desired captured RE fragments Contains capture and/or reporter has capture reporter (green) : the reads where also "another fragment" was seen. no "capture+reporter" (red) : the reads where 1) no another fragment, or 2) only too-close-to-capture-site fragment ("exclusion fragment") was seenThe above counters will help you to differentiate between these situations :
1) Wrongly given RE-coordinates in CCseqBasic run (all reads mysteriously vanish when asked do they "contain capture site") 2) Capture-C experiment had a weak capture step ( signal coming from genomic background - not actually enriched for captured sites) To confirm this, it may be necessary to visualise the bam file of F1 folder as well - if the not-analysed reads show just genomic background ( no higher-than-background signal in the capture sites ), that confirms this diagnosis. 3) Capture-C experiment had a poor ligation step (only single fragments going to sequencng) 4) Capture-C experiment had a poor digestion step (the RE-site containing reads only map to capture sites, and don't report anything) 5) Repeat-rich capture oligo design (filtered heavily in blat-filtering step) 6) Library not sequenced to exhaustion (only very few duplicates seen) 7) Capture sites within a single interaction domain (interacting with each others as well) - a lot of "multicapture" reads (which cannot be interpreted) It is generally not recommended to have interacting regions in same design (f.ex. enhancers and promoters in same design)The same counts as numbers instead of percentages :
all=86320 All read pairs allflashed=3001 All flashed read pairs allnonflashed=83319 All non-flashed read pairs REflashed=2893 RE-cut site containing flashed read pairs REnonflashed=157924 RE-cut site containing non-flashed read pairs continuesToMappingFlashed=2893 Flashed read pairs continuing to mapping (only the ones having RE cut site) continuesToMappingNonflashed=83313 Non-flashed read pairs continuing to mapping (all reads) containsCaptureFlashed=1434 After mapping, read is seen to contain a capture site containsCaptureNonflashed=78383 containsCapAndRepFlashed=104 After mapping, read is seen to contain also an interaction fragment containsCapAndRepNonflashed=73588 singleCapFlashed=87 Only one capture site seen within the read pair singleCapNonflashed=50027 (no f.ex enhancer-promoter-pairs) multiCapFlashed=17 Multiple capture sites seen within the read pair multiCapNonflashed=23561 (design rich in f.ex enhancer-promoter-pairs) nonduplicateFlashed=86 Unique reads - flashed (not duplicates) nonduplicateNonflashed=47480 Unique reads - nonflashed (not duplicates) blatploidyFlashed=0 Fragments filtered as blacklisted or homology regions - flashed blatploidyNonflashed=47480 Fragments filtered as blacklisted or homology regions - nonflashed
Interaction counters in F2 / F3 / F5 / F6 folders In general these counters are very straightforward : The code more or less reports "every step of the way". So - all the numbers correspond to CHRONOLOGICAL steps during the analysis. The only exception is number 6 (duplicate reads) - which comes way too early. All the other numbers correspond to the filtering order in the script. #################################################### We have divided each READ into RE-cut FRAGMENTS - and we count them in various ways i.e. counters (1-10) ################################################### So, first we have all fragments (1), and count them in various ways (1-10) 01 Number of capture sites loaded: 16 02 Restriction enzyme fragments loaded: 6199203 03 Lines in sam file header: 23 04 Data lines in sam file: 59265143 06 Unmapped fragments in SAM file: 2851157 06c Duplicate reads: 9801446 07 Mapped fragments: 56413986 09 Proximity exclusion fragments (Pre PCR duplicate removal): 2101978 10 Reporter fragments (Pre PCR duplicate removal): 10374399 #################################################### We reconstruct the READS from the RE-cut FRAGMENTS - and we count them in various ways i.e. counters (11-15) ################################################### After mapping in bowtie, our reads are in the file so, that all fragments of one "read" are one after another. Then, once we know we have all the fragments of a read, we look at them, and if we didn't lose all our reads in filtering steps above (counters 1-10) We further filter the reads. Read entering further analysis stages have to have 1) a capture fragment 2) reporter fragment but MAY STILL contain multiple different captures We also count the FRAGMENTS within the reads, which had at least one mapped fragment (counter 13) #################################################### Detailed counts of fragments and their composition (before duplicate-filtering) i.e. counters (11e,11ee) ################################################### Before duplicate filtering your "preliminary counts" of reporters are in 11e and 11ee How to read these lines : 11ee Total number of reads having captures in composition Hom9:1 (1) , having 1 reporters and 0 exclusion fragments : 918274 11ee Total number of reads having captures in composition Hom9:1 Pam6:1 (2) , having 1 reporters and 0 exclusion fragments : 3 11ee Total number of reads having captures in composition Hom9:2 (3) , having 1 reporters and 0 exclusion fragments : 255796 The first line (1) means : Hom9 capture. Capture fragments seen : 1. Reporter fragments seen 1. Exclusion fragments seen 0. Total count of these : 918274 Second line (2) : Hom9 capture, having also Pam6 capture. Capture fragments seen : 1 (in Hom9), Capture fragments seen : 1 (in Pam6). Reporter fragments seen 1. Exclusion fragments seen 0. Total count of these : 3 (so very rare) Third line (3) : Hom9 capture. Capture fragments seen : 2. (so, Hom9 capture was seen in 2 fragments of the read) Reporter fragments seen 1. Exclusion fragments seen 0. Total count of these : 918274 #################################################### We duplicate filter our reads i.e. counters (12-16) ################################################### Then the code makes duplicate filter (counters 16). So - we get rid of reads which most probably are each other's PCR duplicates. The "composition" is reported the same way as 11e and others above. Now we still continue reporting "global statistics" after the duplicate filtering, and continue all the way upto number 25. 12 Total number of reads entering duplicate-filtering - should be same count as 11f : 9886207 13 Count of fragments in Reads having at least one informative fragment : 21410249 14a Reads having 2 fragments: 8291768 14a Reads having 3 fragments: 1551218 14a Reads having 4 fragments: 43048 14a Reads having 5 fragments: 171 14a Reads having 6 fragments: 2 14b Reads having 2 informative fragments: 8291768 14b Reads having 3 informative fragments: 1551218 14b Reads having 4 informative fragments: 43048 14b Reads having 5 informative fragments: 171 14b Reads having 6 informative fragments: 2 16 Non-duplicated reads: 84761 #################################################### We count our duplicate filtered reads in various ways i.e. counters (16-23) ################################################### 16b and 16bb are counters after duplicate filter. Interpret these like the above counters 11e and 11ee 16c Proximity exclusion fragments (After PCR duplicate removal): 1191 16d Reporter fragments (After PCR duplicate removal): 92067 16f Total fragment count (after PCR duplicate removal): 84761 16g Reads having 2 informative fragments (after PCR duplicate whole-read removal): 52924 16g Reads having 3 informative fragments (after PCR duplicate whole-read removal): 29677 16g Reads having 4 informative fragments (after PCR duplicate whole-read removal): 2094 16g Reads having 5 informative fragments (after PCR duplicate whole-read removal): 64 16g Reads having 6 informative fragments (after PCR duplicate whole-read removal): 2 23 Reporters before final filtering steps 92067 #################################################### We do final filtering to the reads i.e. counters (23-25) ################################################### The "last filtering steps" the report mentions is : - if WITHIN the same read one reports SAME reporter fragment twice, it is counted only once. - if the reporter fragment is mapped so, that one end of it is in one DpnII fragment, and other end is in other DpnII fragment ( so that the mapped reporter fragment OVERLAPS DpnII cut site), it is filtered out, as it is believed to be mismapped read. 23 Reporters before final filtering steps 92067 24 Duplicate reporters (duplicate-excluded if stringent was on) 40776 25 Reporter fragments reporting the same RE fragment within a single read (duplicate-excluded) 3708 25e Error in Reporter fragment assignment to in silico digested genome (see 24ee for details) 440 25ee Binary search error - fragment overlapping multiple restriction sites: 440 26 Actual reported fragments : 87919 Counters with reporters 110329 #################################################### The FINAL COUNTS (the "most important counters" ) ################################################### Now as we reach the FINAL counts for everything, we divide to CAPTURE-SITE specific statistics. So - basically the same counts as the above ones, but divided by CAPTURE SITE. These report lines START with the capture site name, followed by numbers 12-17 So, these 12-17 have nothing to do with the 12-17 described above. These are the FINAL counts for each capture site. So basically after filtering duplicates we count the situation JUST BEFORE the duplicate filtering, and just after. These are the most important counters : Olig_aGlobin 12 Reporters before final filtering steps 752513 Olig_aGlobin 13 Duplicate reporters (duplicate-excluded if stringent was on) 720951 Olig_aGlobin 14 Reporter fragments reporting the same RE fragment within a single read (duplicate-excluded) 783 Olig_aGlobin 14e Error in Reporter fragment assignment to in silico digested genome (see 25ee for details) 1244 Olig_aGlobin 15 Capture fragments (final count): 728013 Olig_aGlobin 16 Proximity exclusions (final count): 2538 Olig_aGlobin 17 Reporter fragments (final count) : 750486 The one you are really interested is number 17 : the actual count of the interactions to other fragments. Usually you can only interpret the data if you have more than 30 000 fragments for each capture in counter (17) here