Data analysis area

From CRAGBioWiki
Jump to: navigation, search

The data analysis area within the bioinformatics unit @ CRAG offers the following:

Consulting

Providing advise on systems biology and bioinformatic data analysis questions any researcher at CRAG may have is a priority to us. Just drop us a line and we will set up a meeting so you can discuss your question with us. If your question requires extended analysis that you wish us to perform, we can offer further help by establishing an analysis pipeline (see Support and analysis pipelines) or a collaboration (see Research).

Our expertise lie within the following areas:

  • Genetics
  • Transcriptomics
  • Proteomics
  • Phenomics
  • Signal transduction pathways & network architecture
  • Sequence analysis
  • Experimental design

Based on the datasets above, we provide advise on the following questions:

  • Deriving quantitative metrics to assess a biological process
  • Hypothesis testing and statistics
  • Multivariate data-driven analyses
  • Integration of heterogenous datasets, e.g. genomics and phenomics
  • Network analysis
  • Classification strategies
  • Logic and mechanistic modeling
  • Hypothesis-based prediction
  • Data visualization and figure preparation

We can also advise on the requirements for data analysis of grant applications, and support you at the grant writing stage, experimental design stage and through to your research analysis.

Training

The Molecular Data Analysis area at CRAG created a Systems biology and bioinformatics training to provide the tools and help experimental researchers become computational biologists, who can analyse molecular biology questions from the large datasets created to address the cutting edge research undertaken at CRAG.

The complexity of the mechanisms studied and engineered in plant metabolism and genomics, as well as those that plants use to regulate signal transduction, development and response to stress, calls for large experiments that acquire the behavior of multiple, interacting components regulating those processes. While the contributions of individual components can be understood using classic molecular biology analysis, an integrative understanding of the interaction and crosstalk of multiple components remains a fundamental challenge in classical biology that can be addressed by systems biology, biostatistic and bioinformatic approaches.

To date, the program consists of two cornerstones: firstly, the following courses are offered every year and are constantly under development to adapt to CRAG’s research, as well as to taylor it to the feedback by attendants of previous courses. Genevestigator plant and biomedical webinar Data visualization, storytelling and scientific principles of design Network analysis and modeling: an introduction to Systems Biology Biostatistics and introductory R programming for molecular biology analyses Introduction to Python programming for molecular biology Introduction to R programming for molecular biology

TrainingOverview.png

In addition, the training program includes a bi-weekly meeting where questions are posed by a volunteer scientist who needs feedback on a specific research question, followed by an open discussion. The rest of attendants propose solutions and, in doing so, they all learn from one another, in-house knowledge is transfered, and internal collaboration as well as sinergy are leveraged. This is done in an informal, question-driven atmosphere that we call The I love data club. The club currenlty consists of 50 CRAG researchers and is moderated by the head of the Molecular Data Analysis area, so please free to drop an email if you wish to join the club or even if you just need feedback for a specfic question. For a detailed description and to register your interest, click on the image below or here, and see below for more information on the I love data club

Analysis software maintained by the Molecular Data Area

We maintain institutional licenses for the following analysis software:

  • KEGG: The Kyoto Encyclopedia of Genes and Genomes(KEGG) is a database containing metabolic and non-metabolic pathways for a large number of organisms. It also offers a short list of analysis tools. If you wish to conduct more advanced analyses than those offered by the portal on top of the Database, you can use CRAG's license, which allows you to download the database with all pathways (metabolic and/or) non-metabolic for your organism of choice and perform your own analysis. If you need help in doing so, please drop us a line.
  • Genevestigator: Genevestigator is a search engine for gene expression. For instance, in plant biology, it features over 3,000 citations in peer-reviewed journals. It integrates thousands of manually curated, well described public microarray and RNAseq experiments. If you wish to search in this database, please write to marti.bernardo@cragenomica.es and we'll help you set up your own user so that you can explore the database online.
  • Graphpad Prism: Prism is an analysis and graphing GUI-based (i.e. clicking instead of programming) suite. CRAG is currently considering if institutional licenses should be bought.


The I love data club

The club is a biweekly meeting for people who love data. Indeed, much like the Force, data comes from all living things, it surrounds us and binds us together. In our meetings, one scientist presents a problem for which (s)he needs feedback. The rest try to help, and in doing so we all learn from one another. Join us! For a detailed description, click on the image below or here.

Yes, data rocks

Support and analysis pipelines

We have recently established a number of new analysis approaches that are now available to all CRAG scientist. The full list will be available here soon

If you already acquired a biological dataset -or multiple datasets-, we can establish a new analysis pipeline tailored to your specific question and data, or adapt a published pipeline. If your analysis is computationally demanding, we join forces with the scientific IT area within the bioinformatics unit to use computer cluster and resources at CRAG. Examples of ongoing pipelines are:

Quantification of RNAseq data and differential expression analysis

Raw data cleaning

  • Overrepresented sequences removal
  • Extra adaptor cleaning and quality trimming
  • Filtering of rRNA

Alignment to the Genome

  • Genome indexing
  • Alignment

Transcript Quantification

  • Raw counts
  • Estimation of number of undetected genes
  • Alignment and count data quality control. GC bias correction
  • Transcript quantification: Normalization
  • Transcript quantification: Batch effect correction
  • Transcript quantification: Low-count genes removal

Differential Expression Analysis

  • Non-parametric approach
  • Bayesian linear modelling approach

RNAseq assessment of alternative splicing and differential expression analysis of isoforms

Ongoing


To ask for support, drop us a line at support.bioinformatics@cragenomica.es

Research

We work in close collaboration with scientists at CRAG to unravel new insights on the molecular mechanisms that plants use to regulate cellular behavior. To that end, we use systems biology and bioinformatic approaches. In these collaborations, we are happy to help from the onset of the project in the design of experiments that will lead to subsequent analysis. See here for past examples of systems biology approaches applied to characterize signal transduction mechanisms.