~ Interaction counters in F2 / F3 / F5 / F6 folders

Counters that are available :

Final interaction counters for each capture site

Summary counters for analysis success, and its bar graph.

Detailed read counters for all analysis steps - very thorough, good for troubleshooting and understanding the analysis method

Details below !

Pre-requisites to interpret the counters

If you haven't yet red the "analysis steps" and "output folders" sections
of the CCseqBasic main page ,
this is a good time to do so :

- if you don't know what the F2 / F3 / F5 / F6 folders are,
and what are the analysis steps performed in each of them,
interpreting the report counters is going to be error-prone !

Here more details and illustrations
about assigning fragments to "captures" and "reporters", like this :

( i.e. marking the mapped fragments to reflect the true results of the NG-CaptureC experiment )

Before reading the text below , check the link above : browse through the illustrations -
to get a clear picture what the text below refers to !

(1) Final interaction counters for each capture site

(Final REPORTER counts) and (Final counts) contain the same counts
- in the future there will be less counts in (Final REPORTER counts) than in (Final counts)

counters

1a Actual reported fragments :	182
1b Actual reported CIS fragments :	88
1c Actual reported TRANS fragments :	94
Hba-1 1 Capture fragments (final count):	150
Hba-1 3a Reporter fragments (final count) :	110
Hba-1 3b Reporter fragments CIS (final count) :	46
Hba-1 3c Reporter fragments TRANS (final count) :	64
Hba-2 1 Capture fragments (final count):	102
Hba-2 3a Reporter fragments (final count) :	72
Hba-2 3b Reporter fragments CIS (final count) :	42
Hba-2 3c Reporter fragments TRANS (final count) :	30

explaining the above :

Total capture counts :

1a Actual reported fragments :	            All reported fragments (CIS+TRANS), all capture sites. 
1b Actual reported CIS fragments :	    All reported fragments (CIS),       all capture sites.
1c Actual reported TRANS fragments :	    All reported fragments (TRANS),     all capture sites.

Capture-site-wise counters (here two capture sites Hba-1 and Hba-2)

Hba-1 1 Capture fragments (final count):	All capture fragments, Hba-1 capture site.
                                                This many fragments which mapped to the CAPTURE-site of Hba-1.
                                                The below 3 counters are then the REPORTERS which were found in
                                                the reads containing one or more of these CAPTURE fragments.
                                                    
Hba-1 3a Reporter fragments (final count) :	    All reported fragments (CIS+TRANS),
                                                                interacting with Hba-1 capture site.
Hba-1 3b Reporter fragments CIS (final count) :	    All reported fragments (CIS),       
                                                                interacting with Hba-1 capture site.
Hba-1 3c Reporter fragments TRANS (final count) :   All reported fragments (TRANS),     	
                                                                interacting with Hba-1 capture site.

Hba-2 1 Capture fragments (final count):	    Hba-2 capture-site RE-fragment mapping read-fragments
                                                                (one or more per read)
Hba-2 3a Reporter fragments (final count) :	    All reported fragments (CIS+TRANS), 
                                                                interacting with Hba-2 capture site.
Hba-2 3b Reporter fragments CIS (final count) :	    All reported fragments (CIS),       
                                                                interacting with Hba-2 capture site.
Hba-2 3c Reporter fragments TRANS (final count) :   All reported fragments (TRANS),     
                                                                interacting with Hba-2 capture site.

(2) Summary counters for analysis success, and its bar graph.

counters

The above counts are not "normal" in their distribution ( they originate from a TESTER file, which is not realistic )

explaining the above :

Total reads (input fastq)       Read count of all reads - as 100%

Flashed / nonflashed            Flash-combined reads in grey   (from this point downwards)
                                Non-combined   reads in purple (from this point downwards)
If you sequenced 150+150 bases PE reads, and sonicated to ~300 bases fragments,
    you should have most reads "flashed".
If you sequenced 75+75 bases PE reads, and sonicated to ~300 bases fragments,
    you have less reads "flashed, but still many.
If you sequenced 40+40 bases PE reads, and sonicated to ~300 bases fragments,
    you have almost all reads "nonflashed".
    
Do/don't have RE site

green : the desired RE-site can     be seen at least once within the R1+R2 sequence of the read
red :   the desired RE-site can NOT be seen at least once within the R1+R2 sequence of the read
Note : the non-flashed reads may have the RE site "invisible" - i.e. between the two halves of the read.

Continue to mapping 

All non-flashed reads (purple) continue to mapping (stay green).
Of the flashed reads (grey) we filter the ones where we didn't see the desired RE-cut sequence
( we already know these cannot have a ligation product )

Contains capture

has capture (green) : the reads where some part(s) mapped to the desired captured RE fragments
no  capture (red)   : the reads where no   part(s) mapped to the desired captured RE fragments

Contains capture and/or reporter

has capture reporter (green) : the reads where also "another fragment" was seen.

no  "capture+reporter" (red) : the reads where 1) no another fragment, or
   2) only too-close-to-capture-site fragment ("exclusion fragment") was seen

The above counters will help you to differentiate between these situations :

1) Wrongly given RE-coordinates in CCseqBasic run
    (all reads mysteriously vanish when asked do they "contain capture site")

2) Capture-C experiment had a weak capture step (
    signal coming from genomic background - not actually enriched for captured sites)
   To confirm this, it may be necessary to visualise the bam file of F1 folder as well
    - if the not-analysed reads show just genomic background
   ( no higher-than-background signal in the capture sites ), that confirms this diagnosis.
   
3) Capture-C experiment had a poor ligation step (only single fragments going to sequencng)

4) Capture-C experiment had a poor digestion step
    (the RE-site containing reads only map to capture sites, and don't report anything)

5) Repeat-rich capture oligo design (filtered heavily in blat-filtering step)

6) Library not sequenced to exhaustion (only very few duplicates seen)

7) Capture sites within a single interaction domain (interacting with each others as well)
   - a lot of "multicapture" reads (which cannot be interpreted)
   It is generally not recommended to have interacting regions in same design
   (f.ex. enhancers and promoters in same design)

The same counts as numbers instead of percentages :


    
all=86320                               All read pairs
allflashed=3001                         All flashed read pairs
allnonflashed=83319                     All non-flashed read pairs

REflashed=2893                          RE-cut site containing flashed read pairs
REnonflashed=157924                     RE-cut site containing non-flashed read pairs

continuesToMappingFlashed=2893          Flashed read pairs continuing to mapping (only the ones having RE cut site)
continuesToMappingNonflashed=83313      Non-flashed read pairs continuing to mapping (all reads)

containsCaptureFlashed=1434             After mapping, read is seen to contain a capture site 
containsCaptureNonflashed=78383

containsCapAndRepFlashed=104            After mapping, read is seen to contain also an interaction fragment 
containsCapAndRepNonflashed=73588

singleCapFlashed=87                     Only one capture site seen within the read pair 
singleCapNonflashed=50027                   (no f.ex enhancer-promoter-pairs) 

multiCapFlashed=17                      Multiple capture sites seen within the read pair 
multiCapNonflashed=23561                    (design rich in f.ex enhancer-promoter-pairs)

nonduplicateFlashed=86                  Unique reads - flashed (not duplicates)
nonduplicateNonflashed=47480            Unique reads - nonflashed (not duplicates)

blatploidyFlashed=0                     Fragments filtered as blacklisted or homology regions - flashed 
blatploidyNonflashed=47480              Fragments filtered as blacklisted or homology regions - nonflashed

(3) Detailed read counters for all analysis steps - very thorough, good for troubleshooting and understanding the analysis method


Interaction counters in F2 / F3 / F5 / F6 folders

In general these counters are  very straightforward :

The code more or less reports "every step of the way".


So - all the numbers correspond to CHRONOLOGICAL steps during the analysis.
The only exception is number 6 (duplicate reads) - which comes way too early.

All the other numbers correspond to the filtering order in the script.


####################################################

We have divided each READ into RE-cut FRAGMENTS - and we count them in various ways

i.e. counters (1-10)

###################################################


So, first we have all fragments (1), and count them in various ways (1-10)

01 Number of capture sites loaded:      16
02 Restriction enzyme fragments loaded: 6199203
03 Lines in sam file header:    23
04 Data lines in sam file:      59265143
06 Unmapped fragments in SAM file:      2851157
06c Duplicate reads:    9801446
07 Mapped fragments:    56413986
09 Proximity exclusion fragments (Pre PCR duplicate removal):   2101978
10 Reporter fragments (Pre PCR duplicate removal):      10374399


####################################################

We reconstruct the READS from the RE-cut FRAGMENTS - and we count them in various ways

i.e. counters (11-15)

###################################################


After mapping in bowtie, our reads are in the file so, that all fragments of one "read" are one after another.

Then, once we know we have all the fragments of a read, we look at them,
and if we didn't lose all our reads in filtering steps above (counters 1-10) 

We further filter the reads.

Read entering further analysis stages have to have 

1) a capture fragment 
2) reporter fragment 

but MAY STILL contain multiple different captures


We also count the FRAGMENTS within the reads, which had at least one mapped fragment (counter 13)


####################################################

Detailed counts of fragments and their composition (before duplicate-filtering)

i.e. counters (11e,11ee)

###################################################


Before duplicate filtering your "preliminary counts" of reporters are in 11e and 11ee

How to read these lines :

11ee Total number of reads having captures in composition Hom9:1                        (1)
                        , having 1 reporters and 0 exclusion fragments :     918274
11ee Total number of reads having captures in composition Hom9:1 Pam6:1                 (2)
                        , having 1 reporters and 0 exclusion fragments :      3
11ee Total number of reads having captures in composition Hom9:2                        (3)
                        , having 1 reporters and 0 exclusion fragments :     255796 

The first line (1) means :

Hom9 capture. Capture fragments seen : 1. 
Reporter fragments seen 1. Exclusion fragments seen 0. 
Total count of these : 918274

Second line (2) :

Hom9 capture, having also Pam6 capture.
Capture fragments seen : 1 (in Hom9), Capture fragments seen : 1 (in Pam6). 
Reporter fragments seen 1. Exclusion fragments seen 0. 
Total count of these : 3 (so very rare)

Third line (3) :

Hom9 capture. Capture fragments seen : 2. (so, Hom9 capture was seen in 2 fragments of the read)
Reporter fragments seen 1. Exclusion fragments seen 0. 
Total count of these : 918274


####################################################

We duplicate filter our reads

i.e. counters (12-16)

###################################################


Then the code makes duplicate filter (counters 16).
So - we get rid of reads which most probably are each other's PCR duplicates.
The "composition" is reported the same way as 11e and others above.

Now we still continue reporting "global statistics" after the duplicate filtering,
and continue all the way upto number 25.

12 Total number of reads entering duplicate-filtering - should be same count as 11f :   9886207
13 Count of fragments in Reads having at least one informative fragment :       21410249
14a Reads having 2 fragments:   8291768
14a Reads having 3 fragments:   1551218
14a Reads having 4 fragments:   43048
14a Reads having 5 fragments:   171
14a Reads having 6 fragments:   2
14b Reads having 2 informative fragments:       8291768
14b Reads having 3 informative fragments:       1551218
14b Reads having 4 informative fragments:       43048
14b Reads having 5 informative fragments:       171
14b Reads having 6 informative fragments:       2
16 Non-duplicated reads:        84761


####################################################

We count our duplicate filtered reads in various ways

i.e. counters (16-23)

###################################################

16b and 16bb are counters after duplicate filter.
Interpret these like the above counters 11e and 11ee

16c Proximity exclusion fragments (After PCR duplicate removal):        1191
16d Reporter fragments (After PCR duplicate removal):   92067
16f Total fragment count (after PCR duplicate removal): 84761
16g Reads having 2 informative fragments (after PCR duplicate whole-read removal):      52924
16g Reads having 3 informative fragments (after PCR duplicate whole-read removal):      29677
16g Reads having 4 informative fragments (after PCR duplicate whole-read removal):      2094
16g Reads having 5 informative fragments (after PCR duplicate whole-read removal):      64
16g Reads having 6 informative fragments (after PCR duplicate whole-read removal):      2
23 Reporters before final filtering steps       92067


####################################################

We do final filtering to the reads

i.e. counters (23-25)

###################################################


The "last filtering steps" the report mentions is :
- if WITHIN the same read one reports SAME reporter fragment twice, it is counted only once.
- if the reporter fragment is mapped so, that one end of it is in one DpnII fragment,
    and other end is in other DpnII fragment
    ( so that the mapped reporter fragment OVERLAPS DpnII cut site), it is filtered out,
    as it is believed to be mismapped read.


23 Reporters before final filtering steps       92067
24 Duplicate reporters (duplicate-excluded if stringent was on) 40776
25 Reporter fragments reporting the same RE fragment within a single read (duplicate-excluded)  3708
25e Error in Reporter fragment assignment to in silico digested genome (see 24ee for details)   440
25ee Binary search error - fragment overlapping multiple restriction sites:     440
26 Actual reported fragments :  87919
Counters with reporters 110329

####################################################

The FINAL COUNTS (the "most important counters" )

###################################################


Now as we reach the FINAL counts for everything,
we divide to CAPTURE-SITE specific statistics.

So - basically the same counts as the above ones, but divided by CAPTURE SITE.
These report lines START with the capture site name, followed by numbers 12-17
So, these 12-17 have nothing to do with the 12-17 described above.
These are the FINAL counts for each capture site.

So basically after filtering duplicates we count the situation JUST BEFORE the duplicate filtering,
and just after. 



These are the most important counters :

Olig_aGlobin 12 Reporters before final filtering steps	752513
Olig_aGlobin 13 Duplicate reporters (duplicate-excluded if stringent was on)	720951
Olig_aGlobin 14 Reporter fragments reporting the same RE fragment within a single read
                                                                    (duplicate-excluded)	783
Olig_aGlobin 14e Error in Reporter fragment assignment to in silico digested genome
                                                                    (see 25ee for details)	1244
Olig_aGlobin 15 Capture fragments (final count):	728013
Olig_aGlobin 16 Proximity exclusions (final count):	2538
Olig_aGlobin 17 Reporter fragments (final count) :	750486

The one you are really interested is number 17 : the actual count of the interactions to other fragments.
Usually you can only interpret the data if you have more than 30 000 fragments for each capture
in counter (17) here