DNA

Generative enzyme modeling for a residual material degradation.

This PhD project focuses on designing machine learning models to accelerate the degradation of agricultural waste and microplastics using custom enzymes.

Project Outline

Agricultural waste, a residual material generated from various agricultural activities and sources, for example coffee pulp, husks from the cereal industry, and peels from the starch-based industry are produced in large quantities, and require processing and disposal. Micro-plastics and plastic packaging in food industry need to be degraded fast. To this end, enzymes act as catalysts accelerating degradation of the residual material. Recent few years led to significant progress in machine learning models for protein, enzymes, and molecule design. These models open up an exciting opportunity to design enzymes with desired properties to accelerate degradation of residual material. However, the space of viable proteins and enzymes is combinatorial. This PhD project seeks to design new machine learning models and tools: Starting with an ML representation of protein motives with known functional properties (existing datasets), we will create a generative model [2] rendering these motives into a large family of protein backbones. The resulting digital proteins will be evaluated in terms of preservation of folding properties of motives (think AlphaFold3 or lighter weight model in the loop can validate folding agreement). A statistical model will represent correctly folding proteins w.r.t. several key properties such as kinetic energy, functional temperatures, undesired properties etc., to provide a re-sampling step for active learning guiding the learning process. A small number of candidates predicted to be the most fit will be synthesized and tested for fitness to refine the model and accelerate enzyme design. We expect publications in top-tier ML venues (Neurips, ICML, ICLR, AAAI) and a possibility of domain specific publications (Nature etc.) should the model lead to practical high-level discoveries.

The student will:
- get familiar with SOTA machine learning representations for proteins
- learn caveats of generative models in application of protein synthesis
- get familiar with domain knowledge of enzyme design
- get familiar with statistical ML models for forecasting properties
- become an expert on digital enzyme modeling
- learn PyTorch and other key deep learning packages.

To register an expression of interest, click here. You will need to outline why you have selected the research project and how your skills, experience and/or knowledge meet the project requirements.