Statistics (compare inactive and active tissue) for CB4/CC4/CF*/CS*/Cb* pipeline output

updated by Jelena - 15:10 17/Nov/2017
[ Back to statistics main page ]

PREPARE THE FOLDER STRUCTURE FOR THE RUN

  1. Put all your CB4/CC4/CF*/CS*/Cb* pipeline runs into a same folder (if they already aren't)
  2. Check that your folder names differentiate between 'condition' and 'control'
  3. Check that you have CB4/CC4/CF*/CS*/Cb* output files for all your OLIGOS of interest
COLLECT YOUR RUN PARAMETERS
  1. Set up your --folders --oligos and --path parameter
  2. Set up rest of your parameters
  3. Collect the parameters to a run command
RUN THE COMMAND
  1. Run the command you generated
  2. Check the results
5. TIPS AND HELP TO MAKE THE RUN COMMAND


Bug fixes and updates (after release)

Bug reports (not yet fixed)




1) Put all your CB4/CC4/CF*/CS*/Cb* samples into a same place - under a common TOP FOLDER

To run the statistics analysis - you need 6 CB4/CC4/CF*/CS*/Cb* pipeline runs (containing folders F1 to F7) : 3 replicates of your 'condition' and 3 replicates for your 'control'.
Place all these 6 folders under a same TOP FOLDER . You can just move them there with mv command like this :

How to move folders ???


EXAMPLE FOLDER - any output data from six runs (six samples = 3 'condition' + 3 'control' ) with CB4/CC4/CF*/CS*/Cb* pipeline


All the output data is in same folder
located in /t1-data/usr/....../Promoter_capture

Promoter_capture
   |
   | CONDITION-samples (3 replicates) - this condition is called INSERT
   |
   |-- Analysis_INSERT_EB5
   |   `-- F6_greenGraphs_combined_..._CC4
   |        |-INSERT_EB5_Mitof.gff
   |        |-INSERT_EB5_SOX2.gff
   |        `-INSERT_EB5_nanog.gff
   |
   |-- Analysis_INSERT_EB6
   |   `-- F6_greenGraphs_combined_..._CC4
   |-- Analysis_INSERT_EB7
   |   `-- F6_greenGraphs_combined_..._CC4
   |
   | CONTROL-samples (3 replicates) - this control is called WT
   |
   |-- Analysis_WT_Sp5
   |   `-- F6_greenGraphs_combined_..._CC4
   |-- Analysis_WT_Sp6
   |   `-- F6_greenGraphs_combined_..._CC4
   `-- Analysis_WT_Sp15
       `-- F6_greenGraphs_combined_..._CC4




2) Check that your folder names differentiate between 'condition' and 'control'

The condition samples (above) have word INSERT in the folder name
and the control samples have word WT in the folder name

You can name your condition and control folder with any name you want, as long as there are 3 folders named as condition folders, and 3 folders named as control folders.

If your folder structure does not look like that - change folder names :

How to change folder names ???




3) Check that you have CB4/CC4/CF*/CS*/Cb* output files for all your OLIGOS of interest

Go to your top folder ( folder /t1-data/usr/....../Promoter_capture in above example ),
and run the tester command :

/t1-data/data/hugheslab/jelenatools/CC/statistics/oligolister.sh

You should see each of your OLIGOs once for each of your 6 samples !
Check also the file size ! - you may have empty file (size 0)



Example output :

Analysis_INSERT_EB5

File size       OLIGOfile name
539K    COMBINED_CC4_nprl3.gff
346K    COMBINED_CC4_mpg.gff

Analysis_INSERT_EB6

File size       OLIGOfile name
846K    COMBINED_CC4_nprl3.gff
607K    COMBINED_CC4_mpg.gff

... et cetera ...

Analysis_WT_Sp16

File size       OLIGOfile name
482K    COMBINED_CC4_nprl3.gff
334K    COMBINED_CC4_mpg.gff


If you have VERY MANY folders in there (more than the 6 samples for this statistics run)
- you can add to the command, which "folder series" you want to list:

/t1-data/data/hugheslab/jelenatools/CC/statistics/oligolister.sh INSERT
/t1-data/data/hugheslab/jelenatools/CC/statistics/oligolister.sh WT


4) Build the run command

Example run command :
    
/t1-data/data/hugheslab/jelenatools/CC/statistics/statisticsRunner.sh 
    --ccversion CC4 ( or CB4/CC4/CF*/CS*/Cb* )
    --genome mm9
    --name outputFolder_statisticsAnalysis
    --pf /public/username/WT_INSERT_analysis
    --condition INSERT 
    --control WT
    --path /t1-data/usr/....../Promoter_capture
    --folders Analysis_INSERT_EB5,Analysis_INSERT_EB6,Analysis_INSERT_EB7,Analysis_WT_Sp5,Analysis_WT_Sp6,Analysis_WT_Sp15
    --oligos nprl3,mpg

RUN SCRIPT : Here all the above as run script if you prefer !

Tips and helpers to build your run command below (after run instructions) !

Run instructions :

  • Save the command (all written to a single line) as run.sh.
  • Make sure that there is at least ONE EMPTY LINE in the end of this file (one or more "enters" in the end)
  • Change user permissions for the file : chmod u=rwx run.sh
  • Then submit run by typing : qsub -cwd -o qsub.out -e qsub.err -N runName < ./run.sh
  • You can see if the job is running by typing : qstat | grep yourUsername
  • The data you get, will be stored in this very folder, and the location of HubAddresses.txt you find by typing : tail qsub.out


  • 5) Tips and helpers to build your run command

    Some useful tips :

    You don't need to put all oligos in - if you have many oligos : list only the ones you actually WANT to analyse.

    The folders don't need to be in "any order" - just check that you listed all 6 of them.

    The public folder does not need to exist (is generated during the run).

    The output folder --name is generated in the run folder.

    Below some helper commands to make the --folders --oligos and --path parameters


    All genes in correct format :

    Go to your top folder ( folder /t1-data/usr/....../Promoter_capture in above example ),
    and run the oligolist generator command :
    /t1-data/data/hugheslab/jelenatools/CC/statistics/oligolistGenerator.sh
    
    Example output :
    --oligos mpg,nprl3
    

    You can add to the command, which "folder series" you want to list:

    /t1-data/data/hugheslab/jelenatools/CC/statistics/oligolistGenerator.sh INSERT
    
    NOTE !!
    If you combined your globins, the combined names will not show in the above list !
    
    To use combined globins :
    
    --oligos mpg,nprl3,HbaCombined,HbbCombined
    

    All folders in correct format :

    Go to your top folder ( folder /t1-data/usr/....../Promoter_capture in above example ),
    and run the folderlist generator command :
    /t1-data/data/hugheslab/jelenatools/CC/statistics/folderlistGenerator.sh
    
    Example output :
    --folders Analysis_INSERT_EB5,Analysis_INSERT_EB6,Analysis_INSERT_EB7,Analysis_WT_Sp5,Analysis_WT_Sp6,Analysis_WT_Sp15
    

    You can add to the command, which 2 "folder series" you want to list:

    /t1-data/data/hugheslab/jelenatools/CC/statistics/folderlistGenerator.sh INSERT WT
    

    How to print out folder path ?

    Go to your top folder ( folder /t1-data/usr/....../Promoter_capture in above example ),
    and run the 'full path of current directory' command :
    pwd
    



    [ Back to statistics main page ]