CaptureC analysis pipeline website

CaptureC analysis codes - CCanalyser, blat filtering, and statistics originally coded by James Davies.
Developed to more mature and pipelined format by Jelena Telenius.
Statistical analysis modified by Marieke Oudelaar, and Damien Downes.

Alpha testers (read "the heroes" ) : Joke Van Bemmel, Duantida Songdej, Matthew Gosden, Mira Kassouf, Lars Hanssen, Ross Thorne
Beta testers : Jessica Davies, Anna Sanniti, Nigel Roberts, Nicolas Servant


***************************************************************************************

General documentation :

  1. What does the pipeline do ? - from input to output
  2. FAQ : Which pipeline version do I want to run ? - and why there is so many ?
  3. FAQ : What is the difference between VS05 and VS04 and VS03 (CCseqBasic5/4/3) ?
  4. FAQ : Flashed and nonflashed reads - what is this ?
  5. FAQ : Blat - filtering, what is this ?
  6. FAQ : Ploidy - filtering, what is this ?
  7. FAQ : Duplicate - filtering, how does it differ from "normal duplicate filtering" ?
  8. FAQ : This is all very nice, but I would like to have more P-values here ?


Analysis workflow :

  1. Generate oligo coordinate file
  2. Run pipeline
  3. Interpret your results
  4. Normalise your tracks (compare between active and inactive tissue)
  5. Pool your samples (check that all your replicates look the same)
  6. Statistical analysis (get p-values for your comparison between active and inactive tissue)


***************************************************************************************

Pipeline run instructions

updated by Jelena - 16:50 08/Oct/2018
CITING the pipeline in your publications
If you use the pipeline in your publication - add Jelena Telenius to acknowledgements.
If you have received a lot of help and development in your analysis - you should consider authorship :)


See below for :
  1. Current release
  2. Example run script
  3. Example run command
  4. Pipeline manuals


CM5 - CCseqBasic5 ("First multithreaded CC5") Parallel runs. Released 08Oct2018. Current development version.

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh -h

  • CS5 - CCseqBasic5 ("Stable portable CC5") Fully portable. Released 17Nov2017. Only receiving minor updates and bug fixes.

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh -h

  • Bug fixes and updates (after release)

    Current issues - known bugs and development targets !


    
    To compare old samples to newly ran samples - run in backwards-compatible manner :
      
    To run CC3 "as before" - run CS5 --CCversion CS3 --strandSpecificDuplicates
    To run CC4 "as before" - run CS5 --CCversion CS4 --strandSpecificDuplicates
    
    This re-introduces the old bugs into the new pipeline, so you can readily compare.
    


    LIST OF ALL RELEASES

    Recommended :
    run CS5 with default settings.
      
    CURRENT
    
    CM5        : PARALLEL RUNS, RAINBOW VISUALISATIONS : current development version (fancy, potentially unstable)
    CB5        : current development version (fancy, potentially unstable)
    CS5        : current stable version
    
    BUG-FIXED
    
    CF5        : major bug fixes release for CB3a and CB4a pipes (released Nov2017)
    
    OUTDATED
    
    CB3a CB4a  : are just as CC3 and CC4 but still getting bug fixes.
    CC3 CC4    : outdated, not updated any more. Contain more bugs than CB3a and CB4a.
    
    

    Which version do I want to run ?

    Less technical details containing guide : version quide

    More technical details containing guide : versionsdetails

    Check if the new development version has features you would need :

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipe.sh -h

  • 1) Current fixes and issues

    Bug fixes and updates (all releases)

    Current issues - known bugs and development targets !

    Bug fixes and updates (old site)


    2) Example run script

    Example run script for setting up your run (serial runs - CF5, CS5, CB5 pipelines) : run.sh
    Example run script for setting up your run (parallel runs - CM5 pipeline) : run.sh

    Start your run in an empty folder (if you run several pipeline runs in same folder, you risk overwriting files).

    3) Example run command

    Example run command - if you don't like the above run file, but rather like this way better :
    SERIAL RUNS ( CF5, CS5, CB5 pipelines )
    
    /t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh \
        -o /t1-data/user/telenius/oligoDpnFragments.txt \
        -s C57captureTest \
        --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
        --genome mm9 \
        --chunkmb 1012 \
        --R1 /t1-data/user/telenius/R1_001.fastq \
        --R2 /t1-data/user/telenius/R2_001.fastq \
        --CCversion CF5
    
    
    PARALLEL RUNS ( CM5-rainbow pipeline )
    
    /t1-data/data/hugheslab/jelenatools/CCseqBasic/CM5/pipeRainbow.sh \
        -o /t1-data/user/telenius/oligoDpnFragments.txt \
        -s C57captureTest \
        -p 4 \
        --pf /public/telenius/CAPTUREC_DATA/C57captureTest_run4 \
        --BLATforREUSEfolderPath /t1-data/wherever/you/have/this/F4_blatPloidyFilteringLog_whatever/BlatPloidyFilterRun/BLAT_PLOIDY_FILTERED_OUTPUT
        --genome mm9 \
        --chunkmb 1012 \
        --CCversion CM5 \
        --wobblyEndBinWidth 20
    
    Parallel runs also need PIPE_fastqPaths.txt to set the fastq locations :
    
    PIPE_fastqPaths.txt
    
    /t1-data/user/whatever/file_read1.fastq.gz   /t1-data/user/whatever/file_read2.fastq.gz
    /t1-data/user/whatever/file2_read1.fastq.gz  /t1-data/user/whatever/file2_read2.fastq.gz
    
    or
    
    file_read1.fastq.gz   file_read2.fastq.gz   /t1-data/user/whatever
    file2_read1.fastq.gz  file2_read2.fastq.gz  /t1-data/user/whatever
    
    
    

    More details of running CM5-rainbow in parallel

  • Make an empty folder, and go into that folder.
  • Save the above command as run.sh in that empty folder.
  • Make sure that there is at least ONE EMPTY LINE in the end of this file (one or more "enters" in the end)
  • Change user permissions for the file : chmod u=rwx run.sh
  • Then submit run by typing : qsub -cwd -o qsub.out -e qsub.err -N runName < ./run.sh
  • You can see if the job is running by typing : qstat | grep yourUsername
  • The data you get, will be stored in this very folder, and the data hub address you find by typing : tail qsub.out
  • ( See the --help of the pipeline command (above) , and manuals (below) to see what the parameters -o , -s , --pf etc. mean ! )


    4) Pipeline manuals

    There is no real manual at the time (10/Jan/2017) - but Jelena is writing one, the manual should be out around March 2017 !
    Below some already-existing documentation, however !

    How to run CM5 in parallel (incomplete manual)

    These are instructions to interpret output, and better understand what the pipeline run does :
    Interpret and troubleshoot your results
    - these instructions are a little outdated for CB4 and CC4 pipelines (red and orange graphs are not explained in the above).

    These are instructions to interpret the output report counters :
    Interpret your report file

    These are instructions to run the steps of the pipeline one-by-one : they give a clear outline which tools form the "backbone" of the pipeline
    User manual for CaptureC analysis without pipeline
    - these instructions are a little outdated for CB4 and CC4 pipeline (red and orange graphs are not explained in the above) !

    More documentation in this site :
    Interpret your results