CaptureC analysis pipeline website

CaptureC analysis codes - CCanalyser, blat filtering, and statistics originally coded by James Davies.
Developed to more mature and pipelined format by Jelena Telenius.
Statistical analysis modified by Marieke Oudelaar, and Damien Downes.
Colors for Rainbow visualisation by Helena Francis.

Alpha testers (read "the heroes" ) : Joke Van Bemmel, Duantida Songdej, Matthew Gosden, Mira Kassouf, Lars Hanssen, Ross Thorne
Beta testers :Jason Torres, Jessica Davies, Anna Sanniti, Nigel Roberts, Nicolas Servant


***************************************************************************************

General documentation :

  1. What does the pipeline do ? - from input to output
  2. FAQ : Which pipeline version do I want to run ? - and why there is so many ?
  3. FAQ : What is the difference between VS05 and VS04 and VS03 (CCseqBasic5/4/3) ?
  4. FAQ : Flashed and nonflashed reads - what is this ?
  5. FAQ : Blat - filtering, what is this ?
  6. FAQ : Ploidy - filtering, what is this ?
  7. FAQ : Duplicate - filtering, how does it differ from "normal duplicate filtering" ?
  8. FAQ : This is all very nice, but I would like to have more P-values here ?


Analysis workflow :

  1. Generate oligo coordinate file
  2. Run pipeline
  3. Interpret your results
  4. Normalise your tracks (compare between active and inactive tissue)
  5. Pool your samples (check that all your replicates look the same)
  6. Statistical analysis (get p-values for your comparison between active and inactive tissue)


***************************************************************************************

Pipeline run instructions

updated by Jelena - 16:50 08/Oct/2018
CITING the pipeline in your publications
If you use the pipeline in your publication - add Jelena Telenius to acknowledgements.
If you have received a lot of help and development in your analysis - you should consider authorship :)


See below for :
  1. Current release
  2. Example run script
  3. Example run command
  4. Pipeline manuals


CM5 - CCseqBasic5 ("First multithreaded CC5") Parallel runs. Released 08Oct2018. Current development version.

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh -h

  • CS5 - CCseqBasic5 ("Stable portable CC5") Fully portable. Released 17Nov2017. Only receiving minor updates and bug fixes.

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh -h

  • Bug fixes and updates (after release)

    Current issues - known bugs and development targets !


    
    To compare old samples to newly ran samples - run in backwards-compatible manner :
      
    To run CC3 "as before" - run CS5 --CCversion CS3 --strandSpecificDuplicates
    To run CC4 "as before" - run CS5 --CCversion CS4 --strandSpecificDuplicates
    
    This re-introduces the old bugs into the new pipeline, so you can readily compare.
    


    LIST OF ALL RELEASES

    Recommended :
    run CS5 with default settings.
      
    CURRENT
    
    CM5        : PARALLEL RUNS, RAINBOW VISUALISATIONS : current development version (fancy, potentially unstable)
    CB5        : current development version (fancy, potentially unstable)
    CS5        : current stable version
    
    BUG-FIXED
    
    CF5        : major bug fixes release for CB3a and CB4a pipes (released Nov2017)
    
    OUTDATED
    
    CB3a CB4a  : are just as CC3 and CC4 but still getting bug fixes.
    CC3 CC4    : outdated, not updated any more. Contain more bugs than CB3a and CB4a.
    
    

    Which version do I want to run ?

    Less technical details containing guide : version quide

    More technical details containing guide : versionsdetails

    Check if the new development versions have features you would need :

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh --help

  • 1) Current fixes and issues

    Bug fixes and updates (all releases)

    Current issues - known bugs and development targets !

    Bug fixes and updates (old site)


    2) Example run script

    Example run script for setting up your run (serial runs - CF5, CS5, CB5 pipelines) : run.sh
    Example run script for setting up your run (parallel runs - CM5 pipeline) : run.sh

    Start your run in an empty folder (if you run several pipeline runs in same folder, you risk overwriting files).

    3) Example run command

    Example run command - if you don't like the above run file, but rather like this way better :

    3.1) SERIAL RUNS ( CF5, CS5, CB5 pipelines )
    3.2) PARALLEL RUNS ( CM5-rainbow pipeline )



    3.1) SERIAL RUNS ( CF5, CS5, CB5 pipelines )
    
    /t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh \
        -o /t1-data/user/telenius/oligoDpnFragments.txt \
        -s C57captureTest \
        --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
        --genome mm9 \
        --chunkmb 1012 \
        --R1 /t1-data/user/telenius/R1_001.fastq \
        --R2 /t1-data/user/telenius/R2_001.fastq \
        --CCversion CF5
    
    
  • Make an empty folder, and go into that folder.
  • Save the above command as run.sh in that empty folder.
  • Make sure that there is at least ONE EMPTY LINE in the end of this file (one or more "enters" in the end)
  • Change user permissions for the file : chmod u=rwx run.sh
  • Then submit run by typing : qsub -cwd -o qsub.out -e qsub.err -N runName < ./run.sh
  • You can see if the job is running by typing : qstat | grep yourUsername
  • The data you get, will be stored in this very folder, and the data hub address you find by typing : tail qsub.out
  • ( See the --help of the pipeline command (above) , and manuals (below) to see what the parameters -o , -s , --pf etc. mean ! )


    3.2) PARALLEL RUNS ( CM5-rainbow pipeline )
    
    /t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh \
        -o /t1-data/user/telenius/oligoDpnFragments.txt \
        -s C57captureTest \
        -p 4 \
        --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
        --BLATforREUSEfolderPath /t1-data/wherever/you/have/this/F4_blatPloidyFilteringLog_whatever/BlatPloidyFilterRun/BLAT_PLOIDY_FILTERED_OUTPUT \
        --genome mm9 \
        --chunkmb 1012 \
        --CCversion CM5 \
        --wobblyEndBinWidth 20
    
    
  • Make an empty folder, and go into that folder.
  • Save the above command as run.sh in that empty folder.
  • Make sure that there is at least ONE EMPTY LINE in the end of this file (one or more "enters" in the end)
  • Change user permissions for the file : chmod u=rwx run.sh
  • Parallel runs also need PIPE_fastqPaths.txt (in the same folder as the run.sh is in) to set the fastq locations :
    
    PIPE_fastqPaths.txt
    
    /t1-data/user/whatever/file_read1.fastq.gz   /t1-data/user/whatever/file_read2.fastq.gz
    /t1-data/user/whatever/file2_read1.fastq.gz  /t1-data/user/whatever/file2_read2.fastq.gz
    
    or
    
    file_read1.fastq.gz   file_read2.fastq.gz   /t1-data/user/whatever
    file2_read1.fastq.gz  file2_read2.fastq.gz  /t1-data/user/whatever
    
    More details of running CM5-rainbow in parallel

  • Then submit run by typing : qsub -cwd -o qsub.out -e qsub.err -N runName < ./run.sh
  • You can see if the job is running by typing : qstat | grep yourUsername
  • The data you get, will be stored in this very folder, and the data hub address you find by reading the file : cat E_hubAddresses.txt
  • ( See the --help of the pipeline command (above) , and manuals (below) to see what the parameters -o , -s , --pf etc. mean ! )


    4) Pipeline manuals

    There is no real manual at the time (10/Jan/2017) - but Jelena is writing one, the manual should be out around March 2017 !
    Below some already-existing documentation, however !

    How to navigate and interpret your CM5 parallel runs (incomplete manual)

    These are general instructions (all pipe versions) to interpret output, and better understand what the pipeline run does :
    Interpret and troubleshoot your results
    - these instructions are a little outdated for CB4 and CC4 pipelines (red and orange graphs are not explained in the above).

    These are instructions to interpret the output report counters (both serial and parallel pipes):
    Interpret your report file

    ( for parallel pipe see also the other report files - (incomplete manual) )

    These are instructions to run the steps of the pipeline one-by-one : they give a clear outline which tools form the "backbone" of the pipeline ( for parallel pipe these steps are done within parallel folders B and D - but the order and contents are the same )
    User manual for CaptureC analysis without pipeline
    - these instructions are a little outdated for visualisation graphs (red and orange graphs and rainbow graphs are not explained in the above) !

    More documentation in this site :
    Interpret your results
    And for parallel pipe in :
    Interpret your CM5 results