Tuesday, 14 July 2009

Chimp Repeats

Uploaded the Chimp repeat data to my $WORK directory. Now I have to alter the experiment report files for easier parsing:


cd /lustre/work1/sanger/io1/2009-07-08_chimp_repeats
perl -i -plne 's/\t/,/g; s/\s+/_/g; s/[\(\)]//g; s/,/\t/g' OID2130?/experimental_report.txt


Now set up the jobs to parse the data:


perl -E 'say "ng42m_parser.pl -o sample_$_.txt -- $_" for @ARGV' OID2130? > parse_data_commands.txt
bsub -o load_repeats.%J.out -J 'load_repeats[1-3]%3' -q basement -R "select[mem>=3000] rusage[mem=3000]" -M3000000 'submit_job_array parse_data_commands.txt'


For future reference we might want to know the order ID to sample name mapping:


cut -f 1,12 OID2130?/experimental_report.txt | sort | uniq | grep -v 'ORDER_ID' > order_id_sample_mapping.txt



The Processed_data_files for orders 2130[78] were not gzipped. Thus this had to be rectified b/4 proceeding:


bsub -Ip 'gzip OID2130[78]/Processed_data_files/*.txt'

Thursday, 2 July 2009

Flip a matrix on its side

This is surprisingly (well not really) simple:


m <- matrix(...)
m <- t(m)


"t" for transpose

Sam suggests plotmath -- useful for mathmatical expressions.

Wednesday, 1 July 2009

mouse CGH

Will be trying to do the mouse CGH analysis the same way we did the 42 million analysis. So I will be using BigDB to store the data. Then will attempt to:

* quantile normalize
* median normalize
* GC normalize and
* wave normalize