Updated by Jelena 15:00 01/Jan/2017 Send all confusion and weird things to jelena__telenius__at__gmail__com ( and if needed, she will forward them to James ) ------------------------------------------------------- A) CCanalyser script B) Shell wrapper for the whole pipeline ------------------------------------------------------- A) CCanalyser3.pl script BUGS / NON-STANDARD BEHAVIOR : - Correct read counts only in _capture_reads_CC3.sam . The reads are marked with their type like this (in column 15 in sam file - a comment field ) : CO:Z:Hba-1_CAP CO:Z:Hba-1_EXC CO:Z:Hba-1_REP CO:Z:Hba-1_REPDUP - Other read counts (capture-wise separated sam files and gff files) still contain within-same-read reporter duplicates (or other "leaking reads"). Have to investigate which reads leak, and how to fix this. The amount of leaking reads is small (do not affect analysis results). INSTABILITY ISSUES : - Still unexpected "empty output" without error messages sometimes, when ccanalyser faces wrongly given input. DEVELOPMENT TARGETS : 1) Oligo-file auto-generator 2) Dividing the output logs to 4 files : runtime log, runtime error, run results statistics (compact), run results statistics (very detailed) 3) Providing a test data set with its output files - for new users to test that they get the same results 4) Adding facets from log files to the MIG output (counter values which are now only exported to logs/debugging files) 5) Ensuring the SNP behave the same way than non-SNP run (consistency to the duplicate capture reporting) 6) Better user manual - to add the interpretation of the data hub tracks and data to the manual. ------------------------------------------------------- B) Shell wrapper for the whole pipeline BUGS / NON-STANDARD BEHAVIOR : --qmin parameter does not work (will set anyways q-score 20) INSTABILITY ISSUES : Does not kill the script properly when any of the underlying scripts fail. Only checks for output files, and not even that for the last script (CCanalyser). DEVELOPMENT TARGETS (only Pipe2 is developed) : 1) Better crashing behavior - crashes when any of the perl scripts crashes. 2) Input format integrity testing - not allowing any of the files to go to next steps if they are not of proper format 3) Deleting files when they are not any more needed (currently saves all files from FLASHing step onwards) 4) Better red-green graph (non-duplicate-filtered reads as red graph) 5) Organising output folders for easy input to FourCSeq, PeakC, r3Cseq (possibly generating pre-input files for these in output) 6) Keeping track who is running and if the run crashed