Tuesday, 14 July 2009

Chimp Repeats

Uploaded the Chimp repeat data to my $WORK directory. Now I have to alter the experiment report files for easier parsing:


cd /lustre/work1/sanger/io1/2009-07-08_chimp_repeats
perl -i -plne 's/\t/,/g; s/\s+/_/g; s/[\(\)]//g; s/,/\t/g' OID2130?/experimental_report.txt


Now set up the jobs to parse the data:


perl -E 'say "ng42m_parser.pl -o sample_$_.txt -- $_" for @ARGV' OID2130? > parse_data_commands.txt
bsub -o load_repeats.%J.out -J 'load_repeats[1-3]%3' -q basement -R "select[mem>=3000] rusage[mem=3000]" -M3000000 'submit_job_array parse_data_commands.txt'


For future reference we might want to know the order ID to sample name mapping:


cut -f 1,12 OID2130?/experimental_report.txt | sort | uniq | grep -v 'ORDER_ID' > order_id_sample_mapping.txt



The Processed_data_files for orders 2130[78] were not gzipped. Thus this had to be rectified b/4 proceeding:


bsub -Ip 'gzip OID2130[78]/Processed_data_files/*.txt'

Thursday, 2 July 2009

Flip a matrix on its side

This is surprisingly (well not really) simple:


m <- matrix(...)
m <- t(m)


"t" for transpose

Sam suggests plotmath -- useful for mathmatical expressions.

Wednesday, 1 July 2009

mouse CGH

Will be trying to do the mouse CGH analysis the same way we did the 42 million analysis. So I will be using BigDB to store the data. Then will attempt to:

* quantile normalize
* median normalize
* GC normalize and
* wave normalize

Thursday, 21 May 2009

The CNVRs for LD b/w intensities & SNPs

The files used were:

% ll /nfs/team29/io1/data/ldapp/???_CNVs_with_freq_gt_2.txt

 2345 /nfs/team29/io1/data/ldapp/CEU_CNVs_with_freq_gt_2.txt
 2958 /nfs/team29/io1/data/ldapp/YRI_CNVs_with_freq_gt_2.txt



-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

LD Analysis Update

Plugged-in the refactored Rsquare Diff code (should name this R2D) into the LD Analysis pipeline. Am getting results but they look a bit strange. Briefly looking over the results indicates that I do not get the same results that I had before for a sample CNV. Need to verify this against Matt's example.

Tuesday, 19 May 2009

Trying again ...

Now trying out the corrections again using HighestLD v1.091390 as follows:


perl -IHighestLD-1.091390/lib highest_ld_v0.0.3.pl --configfile config.yml --input_file Affy6_22.txt.linear.txt --output_file Affy6_22.R2D --error_file Affy6_22_ERR.yml

Found the bug!

Turns out I was just being stupid (as usual)! In my attempts to make my code saner I used CSV instead of TSV files as my default. However, I forgot to set the format of my CNV resourse file to CSV as well. Hence I was not finding any values for my CNVs resulting in the undefined value bug.

Monday, 18 May 2009

Still need to find bug in LD analysis

I am yet to figure out why my refactored HighestLD.pm does not work as it should. As I am feeling a bit switched off now I will attack this when I get home and first thing tomorrow.

Results of the highest LD Analysis

The results of the LD Analysis on the intensity data are currently here:


/lustre/scratch1/sanger/io1/2009-05-06_LD_agillent-CNVs_HapMap-PhaseII-SNPs/CEU-intensity/results.csv
/lustre/scratch1/sanger/io1/2009-05-06_LD_agillent-CNVs_HapMap-PhaseII-SNPs/YRI-intensity/results.csv


The results for the genotype data are here:


/lustre/scratch1/sanger/io1/LD_Analysis/2009-05-07_rerun/ceu/genotype/results.csv
/lustre/scratch1/sanger/io1/LD_Analysis/2009-05-07_rerun/yri/genotype/results.csv

Wednesday, 18 February 2009

Installing Pretty Emacs

I found the source files here, chose the most recent update and built it as follows:


./configure --prefix=/sotware/cnpoly/emacs/snapshot --with-gif=no --with-tiff=no
make bootstrap


Decided not to install it just yet because the fonts look the same as the standard emacs distribution.