CaptureC analysis pipeline versions

updated by Jelena - 16:40 08/Oct/2018


Which CC version should I choose ? - the technical details !


Less technical details containing version guide : version quide

RELEASES IN DETAIL

CB5 - CCseqBasic5 ("Currently developed CC5")
CS5 - CCseqBasic5 ("Stable portable CC5")
CF5 - CCseqBasic5 ("Frozen portable CC5") : major bug fix release to alpha releases below

CB4 alpha - CCseqBasic4 ("First portable CC4") : portability release of CC4 below
CB3 alpha - CCseqBasic3 ("First portable CC3") : portability release of CC3 below

CC4 ("Second stable")
CC3 ("First stable")
CC2 ("First beta")


Bug fixes and updates (all versions)

Current issues - known bugs and development targets !

Bug fixes and updates (old site)


CM5 - CCseqBasic5-MultiThread ("Currently developed parallel CC5") Not yet portable (intra-WIMM only). Released 08Oct2018. Current development version.

CM5 is like CB5-rainbow below - but capable of running input fastqs in parallel.
It features the new great "rainbow" visualisations : whole experiment is described by 7 tracks total, not 7 tracks per capture site.
Full user manual still under construction.

New features :

  • Fully parallel (uses queue threads).
  • Uses PIPE_fastqPaths.txt parameter file to read in MULTIPLE fastq files (no catenation of lanes pre-pipeline needed).
  • New visualisations (more feasible to big designs ~ dozens or hundreds of capture sites).
  • Better quality control (raw mapped fragments visualisation, more illustrative description.html page).
  • Normalised tracks (to total reporter counts AND only-cis reporter counts) for all capture sites.
  • Blat-filter generation separated to its own run type with --onlyBlat flag : and by demanding blats to be pre-generated when entering actual production run.
  • Rainbow parallel pipe - featuring proper PIPE_fastqPaths.txt and oligo file division to 100 oligos bunches.

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipeRainbow.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipeRainbow.sh -h
  • How to run Rainbow parallel pipe :

    PIPE_fastqPaths.txt auto-detects if the files are .gz packed or not. the file format is like in NGseqBasic pipe, but without first column (name column is not needed), i.e. for example :

    read1.fastq.gz  read2.fastq.gz  /t1-data/user/hugheslab/telenius/developmentAndTesting/cs5test_091117/run39/FASTQ
    

    The default parallelisation unit is 4 processors, and you can change this with flag -p 4 . If the queue looks empty-ish, you can freely give more than 4.
    Submit the run to the queue normally - it will take care of the parallelisation on its own.
    If you have a lot of fastqs, and/or hundreds of capture sites, you may want to run in "wholenode mode", and submit to wholenode queue instead.

    Output folder structure and rerun options (a draft to become the user manual)

    Example run script

    Wholenode style run instructions (a draft of user instructios for this special run style)

    In the future, when rainbow pipe code will be fully finished, these runs will automatically finish up normally, and generate also "overlay rainbow graphs" into the hub,
    to plot 100 oligos to one hub track (for easier visualisation).
    In far future (2-3 months : time of writing this 13Apr2018) this will be properly parallelised, and the fastqs and bams will serve as parallelisation units.


    CB5 - CCseqBasic5 ("Currently developed CC5") Fully portable. Released 20Nov2017. Current development version.

    CB5 is like CS5 below - but getting all new fancy features.

    New features :

  • Bowtie2 support
  • Normal serial pipe :

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipe.sh -h
  • Rainbow serial pipe - featuring proper PIPE_fastqPaths.txt and oligo file division to 100 oligos bunches.

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipeRainbow.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB5/pipeRainbow.sh -h
  • How to run Rainbow serial pipe :

    PIPE_fastqPaths.txt auto-detects if the files are .gz packed or not. the file format is like in NGseqBasic pipe, but without first column (name column is not needed), i.e. for example :

    read1.fastq.gz  read2.fastq.gz  /t1-data/user/hugheslab/telenius/developmentAndTesting/cs5test_091117/run39/FASTQ
    

    Currently Rainbow pipe stops right after bam division
    - the output folders should be ready for normal CB5 --onlyCCanalyser runs (one run to be started per each 100 oligo bunch bam).
    Also the oligo files mathcing these bam files should be already provided in the outputs.

    In the future, when rainbow pipe code will be fully finished, these runs will automatically finish up normally, and generate also "overlay rainbow graphs" into the hub,
    to plot 100 oligos to one hub track (for easier visualisation).
    In far future (2-3 months : time of writing this 13Apr2018) this will be properly parallelised, and the fastqs and bams will serve as parallelisation units.


    CS5 - CCseqBasic5 ("Stable portable CC5") Fully portable. Released 17Nov2017. Only receiving minor updates and bug fixes.

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CS5/pipe.sh -h
  • CS5 is like CF5 below.

    Introducing "third way to filter non-flashed reads" :
    Because: CB3a is too lenient to duplicates (of non-flashable reads) and CB4a is too strict - CB5 is "in between" (just goldilocks).

    In CS5 all the pipes merge to a single run command :
    User can call the wanted duplicate filtering mode with --CCversion CS3 or --CCversion CS4 or --CCversion CS5

    Duplicate filtering bug fix : fixing a bug (which apparently hasn't really caused any damage however - so this is smaller fix than it seems)

  • In previous versions CC3/4 CB3/4a duplicate filtering was strand-specific. Now reverting back to CC2 style (non-strand-specific).
  • The CC3/4 CB3/4a "wrong" duplicate filtering can be requested with a flag (backwards compatibility).
  • Due to this, when comparing CS5 and CB4a/CB3a plots - we assume CF5 signal to be lower signal and less noisy throughout.
  • To reproduce the analysis "exactly as it would be" in CB4a/CB3a (to compare to earlier samples), there is backwards-compatibility flag :
    To reproduce CB3a run : --strandSpecificDuplicates --CCversion CS3
    To reproduce CB4a run : --strandSpecificDuplicates --CCversion CS4
  • New features :

  • UMI support with --UMI flag (default is non-umi run) : details to set up UMI run : contact Damien Downes.
  • Wobbly end duplicate filter with --wobblyEndBinWidth flag : non-exact-read-ends duplicate filtering.
  • To turn wobbly ends off, set this to 1 ( --wobblyEndBinWidth 1 ) - this is the default.
    If using --UMI , --wobblyEndBinWidth 20 is recommended.

    For example : --wobblyEndBinWidth 20 means : bin of 20 bases for duplicate filter :
    if all fragment coordinates are the same +/- 10 bases, ( and if --UMI is used : UMI is the same), reads are duplicates.


    CF5 - CCseqBasic5 ("Frozen portable CC5") Fully portable. Frozen in development to 16Nov2017. Only receiving bug fixes.

    Duplicate filtering bug fix : fixing a bug (which apparently hasn't really caused any damage however - so this is smaller fix than it seems)

  • In previous versions CC3/4 CB3/4a duplicate filtering was strand-specific. Now reverting back to CC2 style (non-strand-specific).
  • The CC3/4 CB3/4a "wrong" duplicate filtering can be requested with a flag (backwards compatibility).
  • Due to this, when comparing CF5 and CB4a/CB3a plots - we assume CF5 signal to be lower signal and less noisy throughout.
  • Introducing "third way to filter non-flashed reads" :
    Because: CB3a is too lenient to duplicates (of non-flashable reads) and CB4a is too strict - CB5 is "in between" (just goldilocks).

    In CF5 all the pipes merge to a single run command :
    User can call the wanted duplicate filtering mode with --CCversion CF3 or --CCversion CF4 or --CCversion CF5

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CF5/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CF5/pipe.sh -h

  • "Frozen in time" releases - stable development end points receiving only bug fixes :


  • Like CC3 and CC4 pipes (below) - but main runner code is now called analyseMappedReads.pl instead of CCanalyser[2/3/4].pl
  • Contains mostly portability fixes - this is the first version detached from WIMM environment, and thus possible for external users to set up.
  • Bug fixes and updates (after release)

    Current issues - known bugs and development targets !

    Bug fixes and updates (old site)

    CB4 alpha - CCseqBasic4 ("First portable CC4") Fully portable. Frozen in development to 16Nov2017. Only receiving bug fixes.

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB4a/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB4a/pipe.sh -h
  • CB3 alpha - CCseqBasic3 ("First portable CC3") Fully portable. Frozen in development to 16Nov2017. Only receiving bug fixes.

  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB3a/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CCseqBasic/CB3a/pipe.sh -h

  • 5b) Previous pipeline releases :


    Pipe2 ("Second pipe") release :

  • This pipeline runs captureC analysis from fastqs to data hub. Filters ploidy regions and homology regions (blat-filter).
  • /t1-data/data/hugheslab/jelenatools/CC/CC4/Pipe2/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CC/CC4/Pipe2/pipe.sh -h
  • /t1-data/data/hugheslab/jelenatools/CC/CC3/Pipe2/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CC/CC3/Pipe2/pipe.sh -h
  • Pipe1 ("First pipe") release :

  • This pipeline runs captureC analysis from fastqs to data hub. It does not filter ploidy regions or homology regions (blat-filter).
  • If you ran your data with this release, but you want to add the ploidy and blat filters - you can ask, if Jelena has figured how to do this without running all the way from fastqs again with Pipe2 (above) : it should be reasonably easy.
  • /t1-data/data/hugheslab/jelenatools/CC/CC3/Pipe1/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CC/CC3/Pipe1/pipe.sh -h
  • /t1-data/data/hugheslab/jelenatools/CC/CC2/Pipe1/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CC/CC2/Pipe1/pipe.sh -h

  • Previous CCanalyser.pl "core code" releases :


    CC4 ("Second stable")

  • Red-Orange-Green graph improved to show : duplicates(red), blat- and ploidy regions(orange), filtered (green).
  • Duplicate filtering bug fixed (some duplicate reads leaked into reported reads still in CC3 release) - "short reads" and un-flash-able reads should not cause problems any more.
  • Blat filter improved to use more reasonable default values (now filters better) - also custom flags given to user to finetune blat filtering (use with --onlyCCanalyser when finetuning, if you wish shorter run time)
  • Pipe2 "Second pipe" wrapper provides automatic ploidy and blat-filtering for the data ! (ploidy filtering can be turned OFF with flag --noPloidyFilter )
  • /t1-data/data/hugheslab/jelenatools/CC/CC4/Pipe2/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CC/CC4/Pipe2/pipe.sh -h
  • Bug fixes and updates (after release)

    Current issues - known bugs and development targets !

    CC3 ("First stable")

  • User-friendly version of CC2 beta release.
  • Does the same things as CC2 beta, but crashes less, and contains better output log files.
  • Pipe2 "Second pipe" wrapper provides automatic ploidy and blat-filtering for the data ! (ploidy filtering can be turned OFF with flag --noPloidyFilter )
  • /t1-data/data/hugheslab/jelenatools/CC/CC3/Pipe2/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CC/CC3/Pipe2/pipe.sh -h
  • You can run CC3 version with the same run instructions as the current CC4 release (above)

    Bug fixes and updates (after release)

    Current issues - known bugs and development targets !

    CC2 ("First beta") release :

  • Release to accompany the publication of the method as published scientific paper.
  • This is to be a collection of the codes - not-so-user-friendly, but already run-able version, distributed along the supplements of the paper.

  • /t1-data/data/hugheslab/jelenatools/CC/CC2/Pipe2/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CC/CC2/Pipe2/pipe.sh -h
  • /t1-data/data/hugheslab/jelenatools/CC/CC2/Pipe1/pipe.sh --help
  • /t1-data/data/hugheslab/jelenatools/CC/CC2/Pipe1/pipe.sh -h
  • You can run CC2 version with the same run instructions as the current CC4 release (above)

    CC2 issues - known bugs and development targets !

    Bug fixes and improvements in the CC2 release