SOFTWARE
We developed several packages for the analysis of nucleotide variability at single or multilocus data and with one or several populations:
ngasp: Computational Solution for performing next generation analysis of sequence polymorphisms using NGS data:
Jené J, Navarro J, Ferretti L, Perez-Enciso M, Rozas, J, Hernández-Budé, P, Vera G, Ramos-Onsins SE
ngasp is an ongoing project and, although it is still incomplete, a preliminar version is available. It has been designed to calculate statistics analysis related to genome variability from NGS input data like genomes or exomes of individuals or even pooled data of population subsets. It will provide a series of analyses of importance to animal geneticists like tests to detect evidence of selection, differentiation, etc. It is foreseen that, in the future, can also accommodate phenotype data as soon as new analysis are developed and incorporated to ngasp.
mlcoalsim v1. Multilocus Coalescent Simulations: Download ZIP
S.E. Ramos-Onsins and T. Mitchell-Olds (EBO, 2007)
The application program mlcoalsim (multilocus coalescent simulations) is designed to generate samples and calculate neutrality tests and other statistics under stationary model, several demographic models or strong positive selection using coalescent theory. It performs multilocus analyses and linked loci and unlinked loci are enabled. Multilocus statistics for unlinked loci are the average and the variance for each statistic. It also allows recurrent mutations (multiple hits). More, it includes heterogeneity in mutation rate across the length of the sequence and heterogeneity in recombination rate across the length of the sequence. Hotspots or a constant value for all positions in mutation or recombination are possible. This program is based on a previous version of Hudson’s coalescent program ms (Hudson, 2002) and modified for the above purposes. The function to calculate minimum recombinant values is a modification of Wall’s code (Wall, 2000). The gamma function was partially obtained from Grassly, Adachi and Rambaut code (Grassly et al., 1997). This program is distributed under the GNU GPL License. BUGS KNOWN: The logistic change of Ne is not working properly.
mlcoalsim v2. Multilocus Coalescent Simulations using parallel computing for ABC analysis: Download ZIP beta version 1.9916b (20170515)
S.E. Ramos-Onsins (unpublished, but used in Heidel et al., Mol. Ecol. 2010)
This code has been developed thanks to the Grant CGL2009-09346 (MICINN, Spain).
Multilocus coalescent simulation program performs coalescent simulations under several demographic and also a selective model. In this version, the parameters can be included in separated prior files. Also, it calculates simulations using more than one processor using mpi (defined by the user). Furthermore, the input file has been changed significantly in relation to the fist version.
The graphical Interface is under development. The Manual of use is still under construction.
MANVa: Multilocus Analysis of Nucleotide Variation: Download ZIP (beta version)
S.E. Ramos-Onsins and T. Mitchell-Olds (Used in Ramos-Onsins et al. Mol. Ecol. 2008 and in Ojeda et al. Heredity 2011)
Analysis of Nucleotide Variation from a Population Genetics point of view. The application calculates a wide number of summary statistics and neutrality test up to 32K independent loci for a population with or without an outgroup species. The input files (one per locus) must be in a folder and in fasta or nbrf fromat. The annotation files must be in a folder in format GFF 2 and must have the same name than the fasta files (except for the extension, that must be .gff). This software also does coalescent simulations and calculates probabilities for the fit of the observed data with simulated data.
mstatspop: Statistical Analysis using Multiple Populations for Genomic Data: Download ZIP beta version
S. E. Ramos-Onsins, L. Ferretti, E. Raineri, J. Jené, Giacomo Marmorini, W. Burgos and G. Vera (unpublished)
This code has been developed thanks to the Grant CGL2009-09346 (MICINN, Spain) and AGL2013-41834-R.
This application calculates statistics of variability using multiple population in tfasta (transposed fasta), fasta or ms-format files in text or zip files (and optionally GTFv2 files). The program has multiple options, missing values are allowed and IUPAC code for diploid individuals can also be processed. Fst comparisons and permutation test can be performed among all populations.
Optimal tests of neutrality are calculated but it is necessary to include GSL libraries in case of compiling the code. The application can be pipelined with ms (or another simulator with the same output) and calculates the statistics for each replicate. Multiple options for outputs are allowed.
Sliding windows genomic analysis can be performed using the application fastagff2ms to cut the genome in windows and pipeline it to mstatspop with ms format option.
Calculation of variability and neutrality tests based on frequency spectrum in data considering positions with missing values are now available.
fastaconvtr: Conversor of fasta/tfasta alignments (plus GTF) to tfasta/fasta/ms format: Download ZIP beta version
S.E. Ramos-Onsins and G. Vera (unpublished)
This code has been developed thanks to the Grant CGL2009-09346 (MICINN, Spain) and AGL2013-41834-R.
fastaconvtr is a command line application to convert tfasta (transposed fasta)/fasta alignment files (ziped or not) into fasta/tfasta/ms format (ziped or not).
The application also reads GTFv2 annotation files and is able to filter the regions or positions of the interest (ex. coding, synonymous, nonsynonymous and others). The application also can release a weight file, wich gives the weight of each position for posterior analyses with mstatspop program.
A fasta format of diploid sequences can be codified using IUPAC code. Double homozygote positions are coded in uppercase (ex. A means AA) , lowercase is coded for single homozygous positions (ex. a means AN).
GHcaller: Genotype/Haplotype SNP caller (version 0.0.1) (02122013)
B. Nevado and S. E. Ramos-Onsins (used in B. Nevado, S.E. Ramos-Onsins and M. Perez-Enciso Mol.Ecol. 2014)
This code has been partially developed thanks to the Grant CGL2009-09346 (MICINN, Spain).
GHcaller c++ program calls SNPs from a mpileup file. It outputs either genotypes or haplotypes, depending on read depth and genotypes' likelihoods. Data is transformed into fasta format.
The algorithm is based on Lynch (2009) Genetics 182:295-301; Roesti et al. (2012) Molecular Ecology 21: 2852-2862.
Within the distribution, there is a README and an examples folder.
HKAdirect: Multilocus HKA test : Download ZIP (beta version 0.70b)
S.E. Ramos-Onsins, Emanuelle Raineri, Luca Ferretti (Used in Esteve-Codina et al. BMC Genomics 2013)
This code has been developed thanks to the Grant CGL2009-09346 (MICINN, Spain).
This program computes the HKA from a dataset table of a population and a single individual outgroup.
The program computes the expected polymorphism and divergence as well as the theta values per nucleotide, the Time to the ancestor, the partial HKA for each locus (window), the Chi-square and the P-value. The variance of S in case of including missing values is calculated by simulation and take some time.
mspar: parallelized ms coalescent simulator:
Montemuiño C, Espinosa A, Moure JC, Vera Rodriguez G, Ramos-Onsins SE, Hernández Budé P.
A parallel version of the popular Hudson's coalescent simulator.
PopGenome: An efficient swiss army knife for population genomic analyses in R:
Bastian Pfeifer, Ulrich Wittelsbürger, S.E. Ramos-Onsins and Martin J. Lercher
Mol Biol Evol. 2014 Jul;31(7):1929-36. doi: 10.1093/molbev/msu136. Epub 2014 Apr 16.
We have collaborated with Martin J. Lercher group for the construction of this very useful R library. This library performs population genetics calculations. It can efficiently process genome-scale data as well as large sets of individual loci.
npstats: Population genetics tests and estimators for pooled NGS data
This code implements some population genetics tests and estimators that can be applied to pooled sequences from Next Generation Sequencing experiments. The statistics are described in the paper "Population genomics from pool sequencing" by L. Ferretti, S.E. Ramos-Onsins and M. Perez-Enciso, Molecular Ecology (2013), DOI: 10.1111/mec.12522.
DIGUP: Detection of Incompatible Genealogies with Unphased data Populations
This code developed by M. Vidal automatically detects fragments in the genome having incompatible histories and gives a list of all existing variant combinations among populations. The program is built in python v2.7. This work is part of the Masther thesis in Bioinformatics of M. Vidal.
OTHER SCRIPTS:
A number of scripts that may be useful are also available:
output_selector6
output_selector6 is a perl script designed to calculate composite probabilities given a vector of observed data and a matrix of simulated statistics. This perl function is based on Voight et al. (PNAS 2005 102: 18508-18513). This script was used in Ramos-Onsins et al. Mol. Ecol. (2008).
Usage: perl output-selector6.pl -in [inputfile (each header statistic ending with brackets)] -obs [obs_freq_file] -cols [selected_cols_from_inputfile(ej 1:5:9)] -w [file with weights for each statistic, default 1] -out [outputfile] -emp [calculate empirical P: (y/n)] -tail [left/right/both]
seq2tab.pl
This script constructs a table of polymorphism ad divergence from an alignment data in fasta or nbrf format:
Usage: perl seq2tab.pl -in [inputfile] -format [fasta : nbrf] -outg [name or fragment of the outgroup name] -div [0:no print div; 1:include div] -mis [0:no missing; 1:include missing] -gap [0:no gaps; 1:include gaps] -cols [output length of cols]
Do.upgma.pops.bootstrap
This script calculates the UPGMA tree using the frequency of the SNPs at a number of populations.
It is necessary to install the R libraries "ape" and “phangorn” to visualize the UPGMA tree.
USAGE:
R --no-save --args [input_file] [bootstrap_iterations] [output_file] < ./script.upgma.bootstrap.R
LDL
This program analyzes linkage disequilibrium from sign test as is described in Lewontin 1995, Genetics 140:377-88.