As the smallest living units, cells are key to understanding disease—and still have much that is unknown about them. We don’t know, for example, how billions of biomolecules like DNA, proteins, and lipids come together to function as a cell. Nor do we know how our various types of cells interact within our bodies. We have a limited understanding of how cells, tissues, and organs become diseased and what is necessary for them to become healthy.
AI can help us answer these questions and use that knowledge to improve health and well-being around the world—if researchers can access and harness these powerful new technologies. are
Imagine if we had a way of representing each cell state and cell type using AI models. A “virtual cell” can mimic the appearance and known properties of any type of cell in our body – from the rods and cones that detect light in our retinas to the cardiomyocytes that make our hearts beat. are
Scientists can use such simulators to predict how cells might react to certain conditions and stimuli: how an immune cell reacts to an infection, What happens at the cellular level, or even how the patient’s body responds to a new drug, arises with a rare disease. Scientific discovery, patient diagnosis, and therapeutic decisions will all become faster, safer, and more efficient.
At NumRule, we’re helping to generate scientific data and build the computing infrastructure to make it a reality—and give scientists the tools they need to harness AI to help end disease.
Advances in AI and massive amounts of scientific data have already predicted the structure of almost all known proteins. DeepMind trained Alphafold on 50 years of carefully collected data, and in just five years, they solved the puzzle of protein structure.
ESM, another AI system developed at Meta, is a protein language model that is trained on more than 60 million protein sequences, not words. It is used for a wide range of applications, such as predicting protein structures and the effects of single-sequence mutations.
A virtual cell modeling system would also require a large amount of data. Since 2023, NumRule has supported researchers worldwide in their efforts to generate and interpret data about cells and their components, creating tools to integrate these large data sets, and making them available to researchers.
A global consortium of researchers is building a reference map of every cell type in the body, and our San Francisco BioHub is building a cell atlas of the entire organism.
Together, these data sets are creating the first draft of an open-source human cell atlas, which will chart cell types in the body from development to adulthood. Our Exnrt.com and NumRUle are partnering on OpenCell, which maps the locations of various proteins in our cells.
Researchers are using machine learning models like Geneformer and scGPT to explore large amounts of data about genes and cells—including data generated from CELLxGENE, which NumRule’s science and technology teams use to analyze single cells.
Similarly, with a new prototype data portal for cryo-electron tomography, our imaging institute and our science and technology teams are engaging machine learning experts to develop automated interpretations of microscopy data. This will increase data processing time from months or years to mere weeks.
We are making the data as representative as possible to ensure that everyone benefits from scientific progress. This includes incorporating pediatric data into the Human Cell Atlas, filling gaps in our knowledge of the cellular mechanisms of childhood diseases.
With our Ancestry Networks grants, we’re also supporting researchers who are developing stem cell data based on tissue samples from Blacks, Latinos, Southeast Asians, and Native Americans who are illiterate, ethnic, , and are from aboriginal backgrounds.
Already, research teams have made discoveries using these well-curated datasets. One discovered that a disrupted gene linked to cystic fibrosis was expressed by a type of cell that scientists had never seen before, while another identified respiratory cells that harbor SARS-CoV-2.
Others are using the data to find new ways to isolate genes to potentially correct disease-causing mutations in specific cells.
These discoveries are the first step in the development of treatments for diseases—and we believe AI can significantly accelerate the pace of researchers’ discoveries.
We’re constructing a high-powered computing cluster with over 1000 H100 GPUs to create a virtual cell. This will facilitate the development of new AI models trained on extensive datasets related to cells and biomolecules, including data from our scientific institutes.
Over time, we aim to enable scientists to simulate every type of cell, both healthy and diseased, and use these simulations to explore complex biological processes. This includes understanding cell generation, body-wide interactions, and the precise effects of disease-related changes.
While our computing cluster may not match the scale of those used in the private sector for commercial purposes, it will rank among the world’s largest AI clusters dedicated to nonprofit scientific research once operational.
It will be a valuable resource for academic teams looking to leverage datasets in innovative ways, overcoming the cost barrier associated with accessing cutting-edge AI technology. Like our other tools, these digital cell models, along with their associated data and applications, will be openly available to researchers worldwide.
Generating these data sets, building this computing cluster, and using AI for biology is the kind of multidisciplinary, collaborative effort that defines our work.
Our Biohub Network has brought together experts from different disciplines and institutions to tackle some of science’s biggest and riskiest challenges, which couldn’t be solved in traditional academic settings. Through projects like CELLxGENE, researchers around the world have helped build a single-cell data corpus—a testament to how effectively a shared resource for open science can grow with more collaborators contributing resources and brainpower.
NumRule first launched our science work in 2023, we committed to a big goal: to help the scientific community cure, prevent, or manage all disease by the end of this century. We believe this goal is possible and will be significantly advanced if leading scientists and technologists work together to make the most of the opportunities created by AI.
We can start by unlocking the mysteries of our cells, and that can lead to work that helps end many diseases as we know them.