######################################################### FAQ : Blat - filtering, what is this ? ######################################################### BLAT-filtering (removing false positive hits due to homologous regions) : This is done to avoid embarrasment, when long range cis and trans interactions are seen, just because the capture site happens to resemble closely ANOTHER location in the genome (and mismapped reads cause this to show up as a long range "true" interaction). This filtering is done like so : Each capture site (the whole region between neighboring restriction enzyme sites within which the capture site resides) is in turn suspected to a BLAT run, where this site is mapped against the whole genome. If hits are found (i.e. if the site has homologous sites elsewhere in the genome), +/- 20 000 bases from each hit are eliminated as reporter fragments (except hits +/- 200 000 bases from the target site itself). Thus, the sam files of each oligo in turn are searched for these "to-be-eliminated" regions, and if reads mapping there are found, the reporter fragments in there are deleted. Used blat parameters are : minMatch=2 tileSize=11 maxIntron=4000 stepSize=5 minScore=10 (and can be confirmed from the qsub.out file of the pipeline run) This means that all fully matching two 11base wide regions, separated by maximum 4000 bases from each others, trigger a homologous region - and are to be removed in the blat-filtering step. Step size determines how often this "search for homology" is restarted. Here we do in 5 bases steps along whole genome. Min score triggers "how similar" the sequences need to be to trigger a homologous region. Value 10 (used here) flags regions as homologous relatively easily. Page updated by Jelena 27Sep2017