haotu : an open lab notebook

2012/03/28

R arrows function like lines function

Filed under: Uncategorized — S @ 01:40

I want an arrows function that takes the same input as lines so that I can connect sequential points on a plot.

ind<-dim(foo1)
for(i in 1:(ind[1]-1)){
  arrows(foo1[i,1],foo1[i,2],foo1[i+1,1],foo1[i+1,2])
}

2012/03/23

How to use Endnote in Google Docs

Filed under: Google Docs, Reference Software — S @ 01:50

1. Have your document open.
2. Have endnote open.
3. Drag and drop citation to place in document wanted.
4. Export as doc and use Word.

see post here: (a href=”https://groups.google.com/a/googleproductforums.com/forum/#!category-topic/docs/docs-community/HqCcNsvHorg”>https://groups.google.com/a/googleproductforums.com/forum/#!category-topic/docs/docs-community/HqCcNsvHorg

Comment from post here:

Hi everyone,
Looking for a way to use Endnote and Google Docs I find out that is possible just drag references from endnote to google docs. It will appear in google doc unformatted {Adler, 2006 #578} I just did it with this reference. Once I get my paper done I export to Open Office or Word and use Endnote to format bibliography. I and my collaborators keep using the same Endnote library synchronized trough dropbox (www.dropbox.com). Obviously will be better if we have a plugin, So I hope this workaround helps.

2012/03/21

Large BLASTX XML import to BLAST2GO

Filed under: Uncategorized — S @ 05:48

I have a large blastx output file in xml that I want to import into blast2go.

1. Load your fasta file of sequences into blast2go.

2. Split up the large blastx file into multiple smaller files with the perl code found here:split_xml_blast

3. Usage: perl split_xml_blast 3000 myblastoutput.xml

4. The 3000 can be changed and indicates the number of sequences to place in each xml file.

 

 

here is the code:

#!/usr/bin/awk -f

# split big blast output in xml format into severals files
# type split_xml_blast without parameters to see usage.

BEGIN{

{
if (ARGC==3 && ARGV[1] !~ "^[a-zA-Z]+$")
{
# max is number of sequences per output file
max = ARGV[1]+0
ARGV[1]=""
} else
{
assert_exit = 1
usage()
}
}
cpt=nb=1
suffix=".xml"
end="\n"
begin="\n\n"
begin=begin "\nblastx\nblastx 2.2.18 [Mar-02-2008]\n"
begin=begin "\n/home/data/blastdb/nr\nlcl|1_0\n"
begin=begin "\n\n\n\nBLOSUM62\n"
begin=begin "0.1\n11\n1\nF\n"
begin=begin "\n\n"
}

function usage()
{

print "###################################################################################"
print "# split_xml_blast -- split big blast output in xml format into severals files. #"
print "# Performed in Awk v3.1 A.V. Aho, P.J. Weinberger, and B.W. Kernighan #"
print "# OS supported: *nix, Windows9x/NT #"
print "###################################################################################"
print "# Author: Laurent Manchon #"
print "# If you have comments or questions, send to the author at: #"
print "# lmanchon@univ-montp2.fr #"
print "###################################################################################"
print "# #"
print "# This program takes a file containing blast result in XML format and split #"
print "# it into severals small files, as: split_xml_blast #"
print "# with : Number of sequences per output file #"
print "# #"
print "###################################################################################"

exit 1
}

//{
split(FILENAME,prefix,".")
file=prefix[1] "_"
output_file=file nb suffix
i=1
if(cpt==1){print begin >> output_file}
print $0 >> output_file
next
}

i==1{print $0 >> output_file}

/<\/Iteration>/{
cpt++
if(cpt==max+1){
print end >> output_file
close (output_file)
nb++
cpt=1
i=0
next
}

}

END {
if (assert_exit) exit 1
print "\nYour input file",FILENAME,"has just been splitted into",nb,"files with",max,"sequences per file:\n"
cmd="ls -1 "file"*.xml"
system(cmd)
close(cmd)
}

2012/03/20

current date R code

Filed under: R — S @ 12:25

Sys.time {base}
Sys.Date()

Tabular BLAST output

Filed under: Uncategorized — S @ 12:04

use -m 8
The headings are:

queryId, subjectId, percIdentity, alnLength, mismatchCount, gapOpenCount, queryStart, queryEnd, subjectStart, subjectEnd, eVal, bitScore

2012/03/14

RDA Redundancy Analysis

Filed under: Uncategorized — S @ 07:54

is similar to canonical correlation analysis but allows the user to derive a specified number of synthetic variables from one set of (independent) variables that explain as much variance as possible in another (independent) set. It is a multivariate analogue of regression.

http://midag.cs.unc.edu/pubs/papers/Pschometrika81_Muller.pdf

The redundancy statistic can be characterized as the mean squared loadings of one set on a canonical variate of the other set.

Redundancy analysis should be treated as evaluating adequacy of regression (prediction) and not association.

2012/03/09

Install Chrome apps on Ubuntu does not work

Filed under: Uncategorized — S @ 11:34

I have no idea why when i try to install ubuntu chrome with its apps from the app store, the install does not always work. Does anyone have any answers?

Make a minimal sized BLAST output file xml

Filed under: BLAST — S @ 07:07

The BLAST output txt/xml file can be huge. In order to make it smaller, especially for when dealing with many contigs, is to use the -b and the -v commands

-b

 Number of database sequence to show alignments for (B) [Integer] default = 250

-v

 Number of database sequences to show one-line descriptions for (V) [Integer] default = 500

blastall -p blastx -d xtropProtein12_03_08.fasta -i 2.fasta -e 2e-20 -b 0 -v 1 -o out2.xml

This produced BLAST output with 1 hit above 2e-20 and no alignments.

BLASTALL parameters Table list

Filed under: BLAST, Genomics — S @ 06:32

The complete arguments for BLASTALL 2.2.4
http://www.plexdb.org/modules/documentation/NCBIblastall.htm
-p
Program Name [String]
-d
Database [String] default = nr
-i
Query File [File In] default = stdin
-e
Expectation value (E) [Real] default = 10.0
-m
alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = query-anchored no identities and blunt ends, 6 = flat query-anchored, no identities and blunt ends, 7 = XML Blast output, 8 = tabular, 9 tabular with comment lines [Integer] default = 0
-o
BLAST report Output File [File Out] Optional default = stdout
-F
Filter query sequence (DUST with blastn, SEG with others) [String] default = T
-G
Cost to open a gap (zero invokes default behavior) [Integer] default = 0
-E
Cost to extend a gap (zero invokes default behavior) [Integer] default = 0
-X
X dropoff value for gapped alignment (in bits) (zero invokes default behavior) blastn 30, megablast 20, tblastx 0, all others 15 [Integer] default = 0
-I
Show GI’s in deflines [T/F] default = F
-q
Penalty for a nucleotide mismatch (blastn only) [Integer] default = -3
-r
Reward for a nucleotide match (blastn only) [Integer] default = 1
-v
Number of database sequences to show one-line descriptions for (V) [Integer] default = 500
-b
Number of database sequence to show alignments for (B) [Integer] default = 250
-f
Threshold for extending hits, default if zero blastp 11, blastn 0, blastx 12, tblastn 13 tblastx 13, megablast 0 [Integer] default = 0
-g
Perfom gapped alignment (not available with tblastx) [T/F] default = T
-Q
Query Genetic code to use [Integer] default = 1
-D
DB Genetic code (for tblast[nx] only) [Integer] default = 1
-a
Number of processors to use [Integer] default = 1
-O
SeqAlign file [File Out] Optional
-J
Believe the query defline [T/F] default = F
-M
Matrix [String] default = BLOSUM62
-W
Word size, default if zero (blastn 11, megablast 28, all others 3) [Integer] default = 0
-z
Effective length of the database (use zero for the real size) [Real] default = 0
-K
Number of best hits from a region to keep (off by default, if used a value of 100 is recommended) [Integer] default = 0
-Y
Effective length of the search space (use zero for the real size) [Real] default = 0
-S
Query strands to search against database (for blast[nx], and tblastx) 3 is both, 1 is top, 2 is bottom [Integer] default = 3
-T
Produce HTML output [T/F] default = F
-l
Restrict search of database to list of GI’s [String] Optional
-U
Use lower case filtering of FASTA sequence [T/F] Optional default = F
-y
X dropoff value for ungapped extensions in bits (0.0 invokes default behavior) blastn 20, megablast 10, all others 7 [Real] default = 0.0
-Z
X dropoff value for final gapped alignment in bits (0.0 invokes default behavior) blastn/megablast 50, tblastx 0, all others 25 [Integer] default = 0
-R
PSI-TBLASTN checkpoint file [File In] Optional
-n
MegaBlast search [T/F] default = F
-L
Location on query sequence [String] Optional
-A
Multiple Hits window size, default if zero (blastn/megablast 0, all others 40 [Integer] default = 0
-w
Frame shift penalty (OOF algorithm for blastx) [Integer] default = 0
-t
Length of the largest intron allowed in tblastn for linking HSPs (0 disables linking) [Integer] default = 0

Search text file Ubuntu, split text file based on marker in text file

Filed under: Lynux, Ubuntu — S @ 05:06

http://allgeeks.info/77/how-to-search-inside-text-files-in-ubuntu/

grep -Rs some-title-or-text /some/directory/*

http://stackoverflow.com/questions/3644238/split-text-file-in-two-using-bash-script

awk -vRS=”MARKER” ‘{print $0>NR”.txt”}’ file

Older Posts »

Blog at WordPress.com.