Bioinformatics tools for biological interpretation and data visualization

Our group has developed tools such as the Food-Biomarker Ontology (FOBI) (Castellano-Escuder P, et al., 2020), the first ontology designed to integrate metabolomics and nutrition data, and POMAShiny (Castellano-Escuder P, et al., 2021), which offers univariate and multivariate statistical methods, dimensionality reduction techniques, feature selection approaches, regularized regression analysis, machine learning–based classification algorithms, predictive modeling strategies, and various high-quality interactive visualization options.

Following FAIR principles, both the source codes and data files are available through public GitHub repositories.

The Food-Biomarker Ontology (FOBI) is the first ontology developed to integrate metabolomics and nutrition data (Castellano-Escuder P, et al., 2020). This ontology aims to link different types of foods with their associated metabolites or dietary intake biomarkers.

FOBI comprises 1,197 terms, 4 different properties, 13 top-level food classes, 11 top-level biomarker classes, and over 4,500 relationships. Additionally, FOBI is part of the OBO Foundry project, and FOBI identifiers have been indexed in the HMDB and FooDB databases to facilitate interoperability and data exchange.

Go to Fobitools
Image 1

Food-Biomarker Ontology (FOBI)

Graphical visualization of FOBI

FOBI architecture using the apple as an example.

Image 1
Analysis of FOBI information from OBO to a human-readable table format
Compound ID conversion (between metabolite names, FOBI, ChemSpider, KEGG, PubChemCID, InChIKey, InChICode, and HMDB IDs)
Analysis of biological significance using ORA and MSEA methods
  • Chemical class enrichment analysis: ORA and MSEA using FOBI chemical classes as metabolite sets.

  • Food enrichment analysis: ORA and MSEA using FOBI food groups as metabolite sets.

Image 1
Text mining algorithm for the annotation of dietary data in free text

POMAShiny

POMAShiny is a web-based tool that offers a structured, flexible, and user-friendly workflow for processing, exploring, and statistically analyzing metabolomics data. It is built on the POMA package from R/Bioconductor, which enhances the reproducibility and flexibility of the analysis outside the web environment. The POMAShiny workflow is organized into four sequential and well-defined panels:

  1. Data upload

  2. Preprocessing

  3. Exploratory Data Analysis (EDA)

  4. Statistical analysis

Ask ChatGPT
Go to POMAShiny
Image 1

POMAShiny

Data upload

POMAShiny requires two input files in CSV format: a metadata file (target) and a features file. The metadata file should include sample names in the first column, group labels (e.g., control and case) in the second, and optionally, relevant covariates from the third column onward. The features file contains the quantified features from the experiment, with one feature per column. The row order must be the same in both files. Once uploaded, POMAShiny converts the files into an MSnSet object, following the MSnbase package from R/Bioconductor.

Users can select specific samples from the metadata file to create data subsets for analysis. Additionally, POMAShiny offers an optional function to combine features that belong to the same entity (such as peptides from a protein or ions from a compound). To use this feature, a “group” CSV file is required, indicating which features should be combined. It also allows users to download a table with the coefficient of variation for the combined features.

Image 1
Preprocessing

Missing Value Imputation, Normalization, and Outlier Detection

Missing Value Imputation
In metabolomics and proteomics, some values are often not detectable or quantifiable due to biological or technical reasons (e.g., imprecise detection or values below the limit of quantification). To address this, POMAShiny offers a dedicated missing value imputation panel with three sequential steps:

  1. Distinguish between zeros and missing values.
  2. Remove features with a high percentage of missing values (default: 20%).
  3. Impute the remaining missing values using methods such as zero, mean, median, minimum, or the k-nearest neighbors algorithm

Normalization
Variability in data can affect statistical results, making normalization essential. POMAShiny provides six one-step normalization methods to transform and scale the data:

  • Autoscaling

  • Level scaling

  • Log scaling

  • Log transformation

  • Vast scaling

  • Log Pareto scaling

These approaches help correct for differences in magnitude, technical variability, or heteroscedasticity.

Outlier Detection
Outliers can be biological (natural variations) or analytical (errors during processing). These can distort statistical results and predictive modeling techniques. POMAShiny facilitates outlier detection through interactive plots and tables, with customizable options to remove them prior to statistical analysis.

Ask Chat
Image 1
Exploratory Data Analysis (EDA)

EDA helps identify uncontrolled factors and potential outliers, and it is recommended to perform it before statistical analysis. Moreover, in the absence of significant biases, EDA can provide an initial overview of the most relevant features of the study.

POMAShiny offers interactive and customizable visualizations for EDA, including:

  • Volcano plots (for two-group comparisons)

  • Boxplots

  • Density plots

  • Clustered heatmaps

It also includes options for Principal Component Analysis (PCA) and cluster analysis.

Image 1
Statistical Analysis

This panel includes a variety of statistical methods, ranging from the most commonly used approaches in metabolomics and proteomics data analysis to other methodologies that are less frequent in these fields. All statistical methods offered by POMAShiny are implemented in a highly intuitive way for the user and generate both downloadable tables and interactive plots as results. The available analyses include:

  • Univariate analysis

  • Limma (Linear Models for Microarray Data)

  • Multivariate analysis

  • Cluster analysis

  • Correlation analysis

  • Regularized regression

  • Random forests

  • Odds ratio

  • Rank products

Image 1