CaptureC analysis pipeline website

CaptureC analysis codes - CCanalyser, blat filtering, and statistics originally coded by James Davies.
Developed to more mature and pipelined format by Jelena Telenius.
Statistical analysis modified by Marieke Oudelaar, and Damien Downes.
Colors for Rainbow visualisation by Helena Francis.

Alpha testers (read "the heroes" ) : Joke Van Bemmel, Duantida Songdej, Matthew Gosden, Mira Kassouf, Lars Hanssen, Ross Thorne
Beta testers :Jason Torres, Jessica Davies, Anna Sanniti, Nigel Roberts, Nicolas Servant

***************************************************************************************

General documentation :

Analysis workflow :

Generate oligo coordinate file
Run pipeline
Interpret your results
Normalise your tracks (compare between active and inactive tissue)
Pool your samples (check that all your replicates look the same)
Statistical analysis (get p-values for your comparison between active and inactive tissue)

***************************************************************************************

Pipeline run instructions

updated by Jelena - 16:50 08/Oct/2018

CITING the pipeline in your publications
If you use the pipeline in your publication - add Jelena Telenius to acknowledgements.
If you have received a lot of help and development in your analysis - you should consider authorship :)

See below for :

Current release
Example run script
Example run command
Pipeline manuals

CM5 - CCseqBasic5 ("First multithreaded CC5") Parallel runs. Released 08Oct2018. Current development version.

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh --help

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh -h

CS5 - CCseqBasic5 ("Stable portable CC5") Fully portable. Released 17Nov2017. Only receiving minor updates and bug fixes.

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh --help

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh -h

Bug fixes and updates (after release)

Current issues - known bugs and development targets !


To compare old samples to newly ran samples - run in backwards-compatible manner :
  
To run CC3 "as before" - run CS5 --CCversion CS3 --strandSpecificDuplicates
To run CC4 "as before" - run CS5 --CCversion CS4 --strandSpecificDuplicates

This re-introduces the old bugs into the new pipeline, so you can readily compare.

LIST OF ALL RELEASES

Recommended :
run CS5 with default settings.
  
CURRENT

CM5        : PARALLEL RUNS, RAINBOW VISUALISATIONS : current development version (fancy, potentially unstable)
CB5        : current development version (fancy, potentially unstable)
CS5        : current stable version

BUG-FIXED

CF5        : major bug fixes release for CB3a and CB4a pipes (released Nov2017)

OUTDATED

CB3a CB4a  : are just as CC3 and CC4 but still getting bug fixes.
CC3 CC4    : outdated, not updated any more. Contain more bugs than CB3a and CB4a.

Which version do I want to run ?

Less technical details containing guide : version quide

More technical details containing guide : versionsdetails

Check if the new development versions have features you would need :

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipe.sh --help

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh --help

1) Current fixes and issues

Bug fixes and updates (all releases)

Current issues - known bugs and development targets !

Bug fixes and updates (old site)

2) Example run script

Example run script for setting up your run (serial runs - CF5, CS5, CB5 pipelines) : run.sh
Example run script for setting up your run (parallel runs - CM5 pipeline) : run.sh

Start your run in an empty folder (if you run several pipeline runs in same folder, you risk overwriting files).

3) Example run command

Example run command - if you don't like the above run file, but rather like this way better :

3.1) SERIAL RUNS ( CF5, CS5, CB5 pipelines )
3.2) PARALLEL RUNS ( CM5-rainbow pipeline )

3.1) SERIAL RUNS ( CF5, CS5, CB5 pipelines )

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh \
    -o /t1-data/user/telenius/oligoDpnFragments.txt \
    -s C57captureTest \
    --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
    --genome mm9 \
    --chunkmb 1012 \
    --R1 /t1-data/user/telenius/R1_001.fastq \
    --R2 /t1-data/user/telenius/R2_001.fastq \
    --CCversion CF5

Make an empty folder, and go into that folder.

Save the above command as run.sh in that empty folder.

Make sure that there is at least ONE EMPTY LINE in the end of this file (one or more "enters" in the end)

Change user permissions for the file : chmod u=rwx run.sh

Then submit run by typing : qsub -cwd -o qsub.out -e qsub.err -N runName < ./run.sh

You can see if the job is running by typing : qstat | grep yourUsername

The data you get, will be stored in this very folder, and the data hub address you find by typing : tail qsub.out

( See the --help of the pipeline command (above) , and manuals (below) to see what the parameters -o , -s , --pf etc. mean ! )

3.2) PARALLEL RUNS ( CM5-rainbow pipeline )

/t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh \
    -o /t1-data/user/telenius/oligoDpnFragments.txt \
    -s C57captureTest \
    -p 4 \
    --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
    --BLATforREUSEfolderPath /t1-data/wherever/you/have/this/F4_blatPloidyFilteringLog_whatever/BlatPloidyFilterRun/BLAT_PLOIDY_FILTERED_OUTPUT \
    --genome mm9 \
    --chunkmb 1012 \
    --CCversion CM5 \
    --wobblyEndBinWidth 20

Make an empty folder, and go into that folder.

Save the above command as run.sh in that empty folder.

Make sure that there is at least ONE EMPTY LINE in the end of this file (one or more "enters" in the end)

Change user permissions for the file : chmod u=rwx run.sh

Parallel runs also need PIPE_fastqPaths.txt (in the same folder as the run.sh is in) to set the fastq locations :

PIPE_fastqPaths.txt

/t1-data/user/whatever/file_read1.fastq.gz   /t1-data/user/whatever/file_read2.fastq.gz
/t1-data/user/whatever/file2_read1.fastq.gz  /t1-data/user/whatever/file2_read2.fastq.gz

or

file_read1.fastq.gz   file_read2.fastq.gz   /t1-data/user/whatever
file2_read1.fastq.gz  file2_read2.fastq.gz  /t1-data/user/whatever

More details of running CM5-rainbow in parallel

Then submit run by typing : qsub -cwd -o qsub.out -e qsub.err -N runName < ./run.sh

You can see if the job is running by typing : qstat | grep yourUsername

The data you get, will be stored in this very folder, and the data hub address you find by reading the file : cat E_hubAddresses.txt

( See the --help of the pipeline command (above) , and manuals (below) to see what the parameters -o , -s , --pf etc. mean ! )

4) Pipeline manuals

There is no real manual at the time (10/Jan/2017) - but Jelena is writing one, the manual should be out around March 2017 !
Below some already-existing documentation, however !

How to navigate and interpret your CM5 parallel runs (incomplete manual)

These are general instructions (all pipe versions) to interpret output, and better understand what the pipeline run does :
Interpret and troubleshoot your results
- these instructions are a little outdated for CB4 and CC4 pipelines (red and orange graphs are not explained in the above).

These are instructions to interpret the output report counters (both serial and parallel pipes):
Interpret your report file

( for parallel pipe see also the other report files - (incomplete manual) )

These are instructions to run the steps of the pipeline one-by-one : they give a clear outline which tools form the "backbone" of the pipeline ( for parallel pipe these steps are done within parallel folders B and D - but the order and contents are the same )
User manual for CaptureC analysis without pipeline
- these instructions are a little outdated for visualisation graphs (red and orange graphs and rainbow graphs are not explained in the above) !

More documentation in this site :
Interpret your results
And for parallel pipe in :
Interpret your CM5 results