Technology
The noble beginnings and proudest moments
- Home
- /
- TECHNOLOGY
Designed for scientific discovery. Trusted by R&D leaders.


At PIPA, business and R&D problems are handled through the analysis of complex, often heterogeneous data. To achieve that we leverage a multitude of machine learning algorithms, ranging from traditional statistical models, to state-of-the-art deep neural networks. Our toolset includes gradient boosting and ensemble classifiers; language models, used within a powerful NLP engine, capable of performing automated information extraction; and representation learning approaches with deep embedding models applied to words, sentences, graph nodes, and graph edges. We understand that our analysis drives high-stake decision-making and we significantly invest in the interpretability and explainability of insights using meta-learning techniques.
Our vision is a multidimensional understanding of the data, that combines algorithmic approaches and mathematical models, with knowledge from domain experts, in order to extract valuable, interpretable insights that our clients trust.
In recent years, we have seen an exponential growth in research dedicated to automatically extracting information through knowledge graphs (KG); a trend partially driven by the successful use of proprietary graphs by companies like Google and Facebook and the recent availability of quality, manually curated KBs. At PIPA, we realized that the automated mining of biomedical domain KG has the potential to significantly accelerate research and discovery across nutrition, biology, and drug discovery. PIPA develops state-of-the-art graph learning algorithms capable of reasoning on KG and extracting relevant features from them.
Our goal is to supply our clients with accurate, actionable information and novel insights that manifest themselves when using AI to query the integrated knowledge bases.
PIPA handles and processes big, complex, heterogeneous data efficiently within the five V’s of big data: volume, velocity, variety, veracity and value. Our natural language processing pipelines analyze massive amounts of documents in a short period of time. Efficient expansion and further integration of novel data is a challenge that cannot be simply addressed by horizontal or vertical infrastructure scaling.
We invest significant effort in identifying scalable solutions based on the appropriate combination of algorithms and resources to provide efficient scalability and accuracy. Probabilistic approaches and hashing schemes are seamlessly integrated in our engines to significantly reduce the complexity of our solutions. Candidate objects of interest are quickly identified and then analyzed in depth by more complex pipelines. Furthermore, we have developed algorithmic mechanisms to identify and deal with noisy and potentially erroneous data. Our fault-tolerant and distributed infrastructure design allows us to continuously and accurately analyze vast volumes of data in an ever changing, messy, and exciting real-world environment.

Our state-of-the-art bioinformatics pipelines provide processing of raw data from multiple high-throughput experiment types, from both public and proprietary sources. All of our pipelines pass strict scientific validation. The algorithms and reference data banks are frequently updated for optimal performance.
Bioinformatics analysis is impossible without high quality data. We created the Omics Engine Service (OES), a highly automated application for collection, quality control and curation of high-throughput data and accompanying meta-data. OES uses proprietary machine learning technologies and human-in-the-loop review to provide more than 10-fold increase in the speed of curation without compromising quality.

The LEAP™ ecosystem is powered by a microservices-based architecture built on Azure. It leverages the scalability capabilities provided by its native Kubernetes service to enable a solution that supports multi-layered multi-tenancy and guarantees data isolation and security.
The application layer is integrated with the PIPA Data and Analytics Platform (PDAP) which serves as the backbone of the data infrastructure of the entire ecosystem. PDAP leverages Azure’s Databricks and Batch services along with LEAP™ core’s Prefect-powered Execution Engine. Together, they enable highly performant and scalable data and bioinformatics pipelines which when coupled with its inherent multitenant nature can generate unique insights by combining the platform’s data assets with user-provided ones.
The LEAP™ core platform offers a wide range of powerful and intuitive data visualizations that enable our end-users to gain a nuanced, contextualized understanding of the insights they are viewing. The pinnacle of our visualization capabilities is the interactive Network Graph. Users explore insights through our rich, traversable Graph to get an interactive view of insights and evidence, and zoom in and out on various depths of information. To enhance the efficiency of users’ data exploration, users can reduce the viewable dataset with the help of easily applicable filtering.
Our advanced Data and Analytics Platform (PDAP) enables the automated generation and evaluation of multiple data artifacts and Machine Learning models to provide a scalable and repeatable solution that guarantees data lineage and governance while also supporting data isolation through its inherent multitenant nature.
Let’s advance scientific discovery together
Our mission is to offfer a cost-efficient, faster path to innovation for our partners by leveraging our team’s top-tier scientific expertise and our proprietary AI technologies.