Technology
We are charting a new course for scientific discovery and innovation.
PIPA technological capabilities
PIPA conducts research and development of technology in the field of bioinformatics, machine learning and artificial intelligence. It also provides IT integration, software design and development services. Our enterprise, cloud apps and services leverage a gamut of cutting-edge capabilities in AI/ML, bioinformatics and software engineering. These technological assets empower our partners to optimize processes, vertically integrate siloed workflows and ultimately make data-informed decisions that advance scientific discovery and propel product innovation across the fields of nutrition, food, ingredients, and health. A team of world-class AI/ML experts, scientists, and software engineers relentlessly hone modalities that power our AI applications: LEAP™, Ingredient Profiler and OES.
Machine Learning & AI
Our Data Science & Machine Learning teams use best-in-class public and in-house frameworks to integrate and augment scientific knowledge. Integrative Knowledge Graphs, powerful search modalities and actionable dashboards and visualizations bring unmatched efficiency and precision to how research and discovery is conducted, significantly reducing costs, resources, and risk in Nutrition/Food and Life Science R&D.
At PIPA, business and R&D problems are handled through the analysis of complex, often heterogeneous data. To achieve that we leverage a multitude of machine learning algorithms, ranging from traditional statistical models, to state-of-the-art deep neural networks. Our toolset includes gradient boosting and ensemble classifiers; language models, used within a powerful NLP engine, capable of performing automated information extraction; and representation learning approaches with deep embedding models applied to words, sentences, graph nodes, and graph edges. We understand that our analysis drives high-stake decision-making and we significantly invest in the interpretability and explainability of insights using meta-learning techniques.
Our vision is a multidimensional understanding of the data, that combines algorithmic approaches and mathematical models, with knowledge from domain experts, in order to extract valuable, interpretable insights that our clients trust.
In recent years, we have seen an exponential growth in research dedicated to automatically extracting information through knowledge graphs (KG); a trend partially driven by the successful use of proprietary graphs by companies like Google and Facebook and the recent availability of quality, manually curated KBs. At PIPA, we realized that the automated mining of biomedical domain KG has the potential to significantly accelerate research and discovery across nutrition, biology, and drug discovery. PIPA develops state-of-the-art graph learning algorithms capable of reasoning on KG and extracting relevant features from them.
Our goal is to supply our clients with accurate, actionable information and novel insights that manifest themselves when using AI to query the integrated knowledge bases.
PIPA handles and processes big, complex, heterogeneous data efficiently within the five V’s of big data: volume, velocity, variety, veracity and value. Our natural language processing pipelines analyze massive amounts of documents in a short period of time. Efficient expansion and further integration of novel data is a challenge that cannot be simply addressed by horizontal or vertical infrastructure scaling.
We invest significant effort in identifying scalable solutions based on the appropriate combination of algorithms and resources to provide efficient scalability and accuracy. Probabilistic approaches and hashing schemes are seamlessly integrated in our engines to significantly reduce the complexity of our solutions. Candidate objects of interest are quickly identified and then analyzed in depth by more complex pipelines. Furthermore, we have developed algorithmic mechanisms to identify and deal with noisy and potentially erroneous data. Our fault-tolerant and distributed infrastructure design allows us to continuously and accurately analyze vast volumes of data in an ever changing, messy, and exciting real-world environment.
PIPA harnesses Generative AI to enrich our integrative knowledge graphs and to elevate user experience on our platforms in the form of journal-grade summarizations and through nuanced and accurate conversationally derived answers. The main modules include GPT and Knowledge Graph in-context learning, text summarizations at article and multiple-article levels and an LLM-based training data augmentation module.
GPT and Knowledge Graph In-context Learning: This is responsible for combining graph mining results with LLM output results. In a knowledge graph that relates different biomedical entities like diseases, genes, and drugs, an LLM uses this graph as context and combines it with its own context to generate proper insights and responses.
Text Summarization: Distills essential information that gives readers a high-level understanding of a collection of articles. To summarize a collection of documents we use a fine-tuned LLM for within-article summarization. This is essential for digesting long-form content, assisting in information extraction, and enabling quicker understanding of lengthy documents. A high-level article summarization LLM takes place, which implies summarizing hierarchically multiple article summaries while capturing the most significant points of each
LLM-based Training Data Augmentation: Domain specific training datasets for Language Models are scarce and costly. To tackle the domain data shortage we are using a pre-trained Language Model within a generation module that takes a list of examples and a pre-process corpus and generates after filtering a training dataset of real sentences containing the examples in the given list.
Some of our favorite tools and technologies
Bioinformatics & Chemoinformatics
We integrate public and proprietary data via in-house bioinformatics and chemoinformatics pipelines to generate insights on cohort identification, taxonomic and metabolic pathway abundances, gene expression, differentially abundant features, and structure- and sequence-based bioactivity prediction. Best practices in data integration/harmonization ensure the best possible representation of biomedical, biochemical, and nutritional entities across our offerings.
At PIPA, business and R&D problems are handled through the analysis of complex, often heterogeneous data. To achieve that we leverage a multitude of machine learning algorithms, ranging from traditional statistical models, to state-of-the-art deep neural networks. Our toolset includes gradient boosting and ensemble classifiers; language models, used within a powerful NLP engine, capable of performing automated information extraction; and representation learning approaches with deep embedding models applied to words, sentences, graph nodes, and graph edges. We understand that our analysis drives high-stake decision-making and we significantly invest in the interpretability and explainability of insights using meta-learning techniques.
Our vision is a multidimensional understanding of the data, that combines algorithmic approaches and mathematical models, with knowledge from domain experts, in order to extract valuable, interpretable insights that our clients trust.
In recent years, we have seen an exponential growth in research dedicated to automatically extracting information through knowledge graphs (KG); a trend partially driven by the successful use of proprietary graphs by companies like Google and Facebook and the recent availability of quality, manually curated KBs. At PIPA, we realized that the automated mining of biomedical domain KG has the potential to significantly accelerate research and discovery across nutrition, biology, and drug discovery. PIPA develops state-of-the-art graph learning algorithms capable of reasoning on KG and extracting relevant features from them.
Our goal is to supply our clients with accurate, actionable information and novel insights that manifest themselves when using AI to query the integrated knowledge bases.
PIPA handles and processes big, complex, heterogeneous data efficiently within the five V’s of big data: volume, velocity, variety, veracity and value. Our natural language processing pipelines analyze massive amounts of documents in a short period of time. Efficient expansion and further integration of novel data is a challenge that cannot be simply addressed by horizontal or vertical infrastructure scaling.
We invest significant effort in identifying scalable solutions based on the appropriate combination of algorithms and resources to provide efficient scalability and accuracy. Probabilistic approaches and hashing schemes are seamlessly integrated in our engines to significantly reduce the complexity of our solutions. Candidate objects of interest are quickly identified and then analyzed in depth by more complex pipelines. Furthermore, we have developed algorithmic mechanisms to identify and deal with noisy and potentially erroneous data. Our fault-tolerant and distributed infrastructure design allows us to continuously and accurately analyze vast volumes of data in an ever changing, messy, and exciting real-world environment.
PIPA harnesses Generative AI to enrich our integrative knowledge graphs and to elevate user experience on our platforms in the form of journal-grade summarizations and through nuanced and accurate conversationally derived answers. The main modules include GPT and Knowledge Graph in-context learning, text summarizations at article and multiple-article levels and an LLM-based training data augmentation module.
GPT and Knowledge Graph In-context Learning: This is responsible for combining graph mining results with LLM output results. In a knowledge graph that relates different biomedical entities like diseases, genes, and drugs, an LLM uses this graph as context and combines it with its own context to generate proper insights and responses.
Text Summarization: Distills essential information that gives readers a high-level understanding of a collection of articles. To summarize a collection of documents we use a fine-tuned LLM for within-article summarization. This is essential for digesting long-form content, assisting in information extraction, and enabling quicker understanding of lengthy documents. A high-level article summarization LLM takes place, which implies summarizing hierarchically multiple article summaries while capturing the most significant points of each
LLM-based Training Data Augmentation: Domain specific training datasets for Language Models are scarce and costly. To tackle the domain data shortage we are using a pre-trained Language Model within a generation module that takes a list of examples and a pre-process corpus and generates after filtering a training dataset of real sentences containing the examples in the given list.
Some of our favorite tools and technologies
Software Engineering
Our products and innovation services harness the PIPA Data & Analytics Platform (PDAP), enabling a multi-tenant performant solution that offers high scalability while also guaranteeing high levels of security for client and partner data.
At PIPA, business and R&D problems are handled through the analysis of complex, often heterogeneous data. To achieve that we leverage a multitude of machine learning algorithms, ranging from traditional statistical models, to state-of-the-art deep neural networks. Our toolset includes gradient boosting and ensemble classifiers; language models, used within a powerful NLP engine, capable of performing automated information extraction; and representation learning approaches with deep embedding models applied to words, sentences, graph nodes, and graph edges. We understand that our analysis drives high-stake decision-making and we significantly invest in the interpretability and explainability of insights using meta-learning techniques.
Our vision is a multidimensional understanding of the data, that combines algorithmic approaches and mathematical models, with knowledge from domain experts, in order to extract valuable, interpretable insights that our clients trust.
In recent years, we have seen an exponential growth in research dedicated to automatically extracting information through knowledge graphs (KG); a trend partially driven by the successful use of proprietary graphs by companies like Google and Facebook and the recent availability of quality, manually curated KBs. At PIPA, we realized that the automated mining of biomedical domain KG has the potential to significantly accelerate research and discovery across nutrition, biology, and drug discovery. PIPA develops state-of-the-art graph learning algorithms capable of reasoning on KG and extracting relevant features from them.
Our goal is to supply our clients with accurate, actionable information and novel insights that manifest themselves when using AI to query the integrated knowledge bases.
PIPA handles and processes big, complex, heterogeneous data efficiently within the five V’s of big data: volume, velocity, variety, veracity and value. Our natural language processing pipelines analyze massive amounts of documents in a short period of time. Efficient expansion and further integration of novel data is a challenge that cannot be simply addressed by horizontal or vertical infrastructure scaling.
We invest significant effort in identifying scalable solutions based on the appropriate combination of algorithms and resources to provide efficient scalability and accuracy. Probabilistic approaches and hashing schemes are seamlessly integrated in our engines to significantly reduce the complexity of our solutions. Candidate objects of interest are quickly identified and then analyzed in depth by more complex pipelines. Furthermore, we have developed algorithmic mechanisms to identify and deal with noisy and potentially erroneous data. Our fault-tolerant and distributed infrastructure design allows us to continuously and accurately analyze vast volumes of data in an ever changing, messy, and exciting real-world environment.
PIPA harnesses Generative AI to enrich our integrative knowledge graphs and to elevate user experience on our platforms in the form of journal-grade summarizations and through nuanced and accurate conversationally derived answers. The main modules include GPT and Knowledge Graph in-context learning, text summarizations at article and multiple-article levels and an LLM-based training data augmentation module.
GPT and Knowledge Graph In-context Learning: This is responsible for combining graph mining results with LLM output results. In a knowledge graph that relates different biomedical entities like diseases, genes, and drugs, an LLM uses this graph as context and combines it with its own context to generate proper insights and responses.
Text Summarization: Distills essential information that gives readers a high-level understanding of a collection of articles. To summarize a collection of documents we use a fine-tuned LLM for within-article summarization. This is essential for digesting long-form content, assisting in information extraction, and enabling quicker understanding of lengthy documents. A high-level article summarization LLM takes place, which implies summarizing hierarchically multiple article summaries while capturing the most significant points of each
LLM-based Training Data Augmentation: Domain specific training datasets for Language Models are scarce and costly. To tackle the domain data shortage we are using a pre-trained Language Model within a generation module that takes a list of examples and a pre-process corpus and generates after filtering a training dataset of real sentences containing the examples in the given list.
Some of our favorite tools and technologies
Let’s schedule a demo so you can see what PIPA can do for you.
Let’s advance scientific discovery together
Our mission is to unlock a cost-efficient, faster path to innovation for our partners, get in touch today.