With cutting-edge capabilities at our disposal, we are charting a new course for scientific discovery.
PIPA technological capabilities
PIPA is a company that conducts research and development of technology in the field of bioinformatics, machine learning and artificial intelligence. It also provides IT integration, software design and development services.
Our enterprise, cloud platforms and services leverage a gamut of cutting-edge capabilities in AI/ML, bioinformatics and software engineering. These technological assets empower our partners to optimize processes, vertically integrate siloed workflows and ultimately make data-informed decisions that advance scientific discovery and propel product innovation across the fields of nutrition, food, ingredients, and health.
Machine Learning & AI
Our Data Science & Machine Learning teams use best-in-class public and in-house frameworks to integrate and augment scientific knowledge. Integrative Knowledge Graphs, powerful search modalities and actionable dashboards and visualizations bring unmatched efficiency and precision to how research and discovery is conducted, significantly reducing costs, resources and risk in Nutrition/Food and Life Science R&D.
At PIPA, business and R&D problems are handled through the analysis of complex, often heterogeneous data. To achieve that we leverage a multitude of machine learning algorithms, ranging from traditional statistical models, to state-of-the-art deep neural networks. Our toolset includes gradient boosting and ensemble classifiers; language models, used within a powerful NLP engine, capable of performing automated information extraction; and representation learning approaches with deep embedding models applied to words, sentences, graph nodes, and graph edges. We understand that our analysis drives high-stake decision-making and we significantly invest in the interpretability and explainability of insights using meta-learning techniques.
Our vision is a multidimensional understanding of the data, that combines algorithmic approaches and mathematical models, with knowledge from domain experts, in order to extract valuable, interpretable insights that our clients trust.
In recent years, we have seen an exponential growth in research dedicated to automatically extracting information through knowledge graphs (KG); a trend partially driven by the successful use of proprietary graphs by companies like Google and Facebook and the recent availability of quality, manually curated KBs. At PIPA, we realized that the automated mining of biomedical domain KG has the potential to significantly accelerate research and discovery across nutrition, biology, and drug discovery. PIPA develops state-of-the-art graph learning algorithms capable of reasoning on KG and extracting relevant features from them.
Our goal is to supply our clients with accurate, actionable information and novel insights that manifest themselves when using AI to query the integrated knowledge bases.
PIPA handles and processes big, complex, heterogeneous data efficiently within the five V’s of big data: volume, velocity, variety, veracity and value. Our natural language processing pipelines analyze massive amounts of documents in a short period of time. Efficient expansion and further integration of novel data is a challenge that cannot be simply addressed by horizontal or vertical infrastructure scaling.
We invest significant effort in identifying scalable solutions based on the appropriate combination of algorithms and resources to provide efficient scalability and accuracy. Probabilistic approaches and hashing schemes are seamlessly integrated in our engines to significantly reduce the complexity of our solutions. Candidate objects of interest are quickly identified and then analyzed in depth by more complex pipelines. Furthermore, we have developed algorithmic mechanisms to identify and deal with noisy and potentially erroneous data. Our fault-tolerant and distributed infrastructure design allows us to continuously and accurately analyze vast volumes of data in an ever changing, messy, and exciting real-world environment.
PIPA harnesses Generative AI to enrich our integrative knowledge graphs and to elevate user experience on our platforms in the form of journal-grade summarizations and through nuanced and accurate conversationally derived answers. The main modules include GPT and Knowledge Graph in-context learning, text summarizations at article and multiple-article levels and an LLM-based training data augmentation module.
GPT and Knowledge Graph In-context Learning: This is responsible for combining graph mining results with LLM output results. In a knowledge graph that relates different biomedical entities like diseases, genes, and drugs, an LLM uses this graph as context and combines it with its own context to generate proper insights and responses.
Text Summarization: Distills essential information that gives readers a high-level understanding of a collection of articles. To summarize a collection of documents we use a fine-tuned LLM for within-article summarization. This is essential for digesting long-form content, assisting in information extraction, and enabling quicker understanding of lengthy documents. A high-level article summarization LLM takes place, which implies summarizing hierarchically multiple article summaries while capturing the most significant points of each
LLM-based Training Data Augmentation: Domain specific training datasets for Language Models are scarce and costly. To tackle the domain data shortage we are using a pre-trained Language Model within a generation module that takes a list of examples and a pre-process corpus and generates after filtering a training dataset of real sentences containing the examples in the given list.
Some of our favorite tools and technologies
Bioinformatics & Chemoinformatics
We integrate public and proprietary data via in-house bioinformatics and chemoinformatics pipelines to generate insights on cohort identification, taxonomic and metabolic pathway abundances, gene expression, differentially abundant features, and structure- and sequence-based bioactivity prediction. Best practices in data integration/harmonization ensure the best possible representation of biomedical, biochemical and nutritional entities across our offerings.
Our state-of-the-art bioinformatics pipelines (16S Taxa, 16S Pathways, RNA-seq, Deep Shotgun metagenomics, Differential analysis) provide processing of raw data from multiple high-throughput experiment types, from both public and proprietary sources.
These pipelines pass strict scientific validation and provide targets for downstream analysis. The algorithms and reference data banks are frequently updated for optimal performance.
Chemoinformatics pipelines help us predict bioactivities of natural or synthetic compounds and identify candidate species as bioactive sources/ingredients in foods, beverages, supplements and functional foods.
The pipelines make high-throughput screening for bioactives discovery more efficient and effective.
Bioinformatics analysis is impossible without high quality data. We created the Omics Engine Service (OES), a highly automated application for collection, quality control and curation of high-throughput data and accompanying metadata.
OES uses proprietary machine learning technologies and human-in-the-loop review to provide more than a 10-fold increase in the speed of curation without compromising data quality.
Some of our favorite tools and technologies
Our products and innovation services harness the PIPA Data & Analytics Platform (PDAP), enabling a multi-tenant performant solution that offers high scalability while also guaranteeing high levels of security.
The LEAP™ ecosystem is powered by a microservices-based architecture built on top of Azure. It leverages the scalability capabilities provided by its Kubernetes service to enable a solution that supports multi-layered multi-tenancy and guarantees data isolation and security.
The application layer is integrated with the PIPA Data and Analytics Platform (PDAP) which serves as the backbone of the data infrastructure of the entire ecosystem. PDAP leverages Azure’s Databricks and Batch services along with LEAP™ core’s Prefect-powered Execution Engine. Together, they enable highly performant and scalable data and bioinformatics pipelines which when coupled with its inherent multitenant nature can generate unique insights by combining the platform’s data assets with user-provided ones.
The LEAP™ core platform offers a wide range of powerful and intuitive data visualizations that enable our end-users to gain a nuanced, contextualized understanding of the insights they are viewing. The pinnacle of our visualization capabilities is the interactive Network Graph. Users explore insights through our rich, traversable Graph to get an interactive view of insights and evidence, and zoom in and out on various depths of information.
To enhance the efficiency of users’ data exploration, users can reduce the viewable dataset with the help of easily applicable filtering.
Our advanced Data and Analytics Platform (PDAP) enables the automated generation and evaluation of multiple data artifacts and Machine Learning models to provide a scalable and repeatable solution that guarantees data lineage and governance while also supporting data isolation through its inherent multitenant nature.
Scalable data-Bio pipelines as a Service: Our Power Notebooks feature serves as the front-end interface for a wide portfolio of scalable data-bio pipelines, offering users a dual advantage: the familiarity of Jupyter notebooks and the computational prowess of PDAP. They provide an interactive, secure, and highly scalable environment that caters to diverse bioinformatics requirements, from basic data analysis to complex computational tasks, to entire end-to-end bioinformatics pipelines.
Data Lineage: Ensuring Transparency, Traceability, and Trust; Data lineage is a fundamental component of PDAP. It empowers us to trace the flow and transformations of data throughout its lifecycle, offering multiple benefits that enhance the value proposition of our AI and Bioinformatics SaaS offerings. PDAP’s data lineage capabilities are more than just a feature; they’re an integral part of our strategy to maintain high levels of transparency, governance, and operational efficiency. This focus enhances our compliance standing, facilitates better collaboration among teams, and enables quicker debugging, further solidifying our reputation for delivering reliable, high-quality services.
Some of our favorite tools and technologies
Let’s schedule a demo so you can see what PIPA can do for you.
Let’s advance scientific discovery together
Our mission is to unlock a cost-efficient, faster path to innovation for our partners by leveraging our team’s top-tier scientific expertise and our proprietary Nutrition AI, LEAP™.