Terrestrial Planet Formation
Mentor: Dr. Seth Jacobson
During terrestrial planet formation, the inner part of our solar system transformed from a disk of gas and dust to the 4 terrestrial planets, their moons, and the leftover asteroids that we observe today. In this project, the student will simulate the growth of planetesimals and planetary embryos in this disk considering different proposed planet formation scenarios. In particular, the student will be exploring the role of giant impacts and whether these putative violent collisions produced features observable in the main asteroid belt today.
By the end of this project, students will be able to work with state-of-the-art N-body accretion models to test hypotheses regarding the latest terrestrial planet formation scenarios. Students will develop skills in a GPU-accelerated C program using the NVIDIA CUDA toolkit as well as Python (or an equivalent language of their choice) for data analysis.
Students applying to this project should have basic programming skills and be familiar with the command line. They should also possess an interest in astronomy and planetary sciences.
Keywords: Terrestrial planet formation, Astrophysical modeling, Solar System history
Assortative matching: a machine-learning approach
Mentor: Dr. Hanzhe Zhang
Assortative matching is the tendency for individuals to pair with people of similar types. For example, marriages are found to be more likely between individuals of similar socioeconomic and education levels, and workers and their managers are also found to have similar education levels. I seek to document these tendencies globally with existing big datasets.
At the end of this project, students will be able to use statistical analysis softwares such as STATA and R, and present research findings in LaTex.
It is a bonus to have worked with STATA, R, or Python. Some experience with any programming language is required.
Keywords: assortative matching, economics, game theory, machine learning
Intelligent tools for image understanding
Mentors: Dr. Dirk Colbry
Our team develops tools (written in python) designed to help researchers obtain measurements from images. There are three major components to these tools; the first is the building/testing of a user interface that is easy for all researchers to use; the second is the design of a Machine Learning algorithm that searches for automated solutions and the third is the scaling of the search across large scale advanced computing systems. Students in our group will work with the PI to identify specific problems based on the student's interests and the needs of the project.
At the end of the project students will have significant experience in software design using tools such as git. Students will also have experience in high performance computing and Machine learning.
Our team can support applicants with a wide range of backgrounds. Although no prior work experience is required, applicants with experience in one or more of the following is desired: python programming; version control (git), graphical user interface design; script writing (bash, zsh, perl, etc.); hacking or tinkering.
Keywords: Machine Learning, Scientific Image Understanding, Genetic Algorithms, High Throughput Computing, User Interface Design
Physical-informed modeling of complex multiscale dynamic systems from data
Mentor: Dr. Huan Lei
Despite the establishment of modern physics and its far-reaching impact on our understanding of complex physical systems, there are many systems still facing fundamental challenges, e.g., prediction of the nanoscale heat transfer relevant to the CPU design, the dynamics of protein folding, even a droplet of handsoap; conventional physical models show limitations for such systems. This summer project is designed to introduce some recent development of machine-learning based methods to directly learn the governing equations from data. One essential problem is how to ensure that such data-driven-based models strictly preserve all the physical symmetries and constraints.
By the end of this project, students will be able to get familiar with modeling systems with ordinary differential equations, data-driven reduced modeling, and symmetry-preserving neural network construction and training.
Students applying to this project should have knowledge of differential equations and should be able to program in Python or Matlab.
Keywords: Physics-informed modeling; machine-learning; multiscale modeling and simulation
Statistical analysis of drug targeting in living subjects (e.g., animals)
Mentor: Dr. Bryan Smith
When nanomaterials are injected into the body as drugs or imaging agents to treat and diagnose cancer, they do not always arrive to the correct place. In order to improve, they must be targeted, yet, it is not well-understood how targeting improves drug/nanomaterial uptake in the tumor site.
To understand this process, we performed highly detailed imaging of nanomaterial targeting to tumors on the subcellular scale in living animals over weeks, thereby accumulating 100s to 1000s of images and processed these images which are ready to be analyzed statistically. The goal of this project is to apply these data to determine the impact of targeting ligands on drug/nanomaterial uptake in tumors, particularly their binding to tumor cells and propensity to be taken up into tumor cells, and their retention at these sites over time.
By the end of this projects, students will be familiar with performing statistical programming in R confidently, with an understanding of how to link medical imaging data to robust statistical analysis.
Students applying to this project should be able to program, preferably with experience in R and statistics.
Keywords: Cancer, microscopy, statistical analysis of imaging data, biomedical engineering/biophysics
Artificial Photosynthesis for alternative energy
Mentor: Dr. Jose Mendoza
One method researchers are investigating has already been shown to be feasible by nature: photosynthesis. The goal of this direction of research is to artificially reproduce the rudimentary process of photosynthesis of converting water and CO2 into a fuel via sunlight. Following the paradigm set by nature, the approach of the research group here is to use theoretical methods to computationally model materials for splitting water into H2 and O2 and reducing Carbon Dioxide (CO2).
The study of bond distances and transition states has been the primary focus of study for our theory group. Using quantum mechanical codes, we have been able to the calculate the binding energies of many CO2 complexes. At the end of this project, students will be able to calculate the properties using our codes and create a database of the many compounds that existed and at the same time, try to predict new compounds based on this knowledge.
The student should read a little bit about atomistic molecular simulation software. Experience with Linux/Unix and any computer coding should suffice (e.g. Python, Matlab, C++). However, none of the above is a strict requirement. We are basically seeking someone very motivated, capable and willing to learn.
Keywords: Computational modelling, quantum calculations
Study of biocompatible chemicals and materials
Mentor: Dr. Jose Mendoza
We will focus on the interaction of materials and chemicals with living systems for applications in biomedicine. We have currently the tools to investigate many classes of materials and chemicals. We will study their compatibility with biological systems (i.e., Biocompatibility). The determination of such principles is the basis for the use of certain chemicals or materials in living tissue.
At the end of this project, students will be able to simulate and calculate the interactions of different compounds and materials with different proteins at the molecular level.
Experience with Linux and any computer coding should suffice (e.g. Python, Matlab, C++). The student will need to use some molecular simulation software such as lammps (http://lammps.sandia.gov/). However, none of the above is a strict requirement but only encouraged. We are mainly seeking someone very motivated, capable and willing to learn, with a strong interest in the topic.
Keywords: Molecular dynamics, computer simulations
Learning on graphs for biological applications
Mentor: Dr. Selin Aviyente
This project will explore the concepts of graph filtering for attributed graphs. The goal will be to identify outlier or anomalous nodes in a given network using graph filters. The designed algorithms will be applied to biological networks, e.g. brain networks constructed from neuroimaging data.
At the end of this project, students will be able to use basic Python packages for graph structured data and implement basic graph filtering operations to analyze different types of network data.
Background in Python programming is strongly desired. Students applying to this project will use intermediate-level Python, including numpy, scipy, matplotlib and tensorflow.
Keywords: Machine learning on graphs, biological networks
Matching molecules in C++
Mentor: Dr. Alex Dickson
The student will write code in C++ allowing for efficient search through a database of molecular structures using methods such as k-d trees. This will be used to enable a new method that designs new drugs during a molecular simulation.
By the end of the project, students will be familiar with encoding molecules using a set of numerical features; quantitatively assessing performance of an algorithm; understanding concepts of translation-, rotation- and index-invariance.
Students in this project with use C++. Experience with chemistry is a bonus.
Simplifying Complex Models Using Sparse Grids
Mentor: Dr. Jason Bazil
In physiology, mathematical models are powerful tools that we use to understand complex phenomena. However, the resultant high-dimensional models are difficult and time-consuming to analyze. To circumvent these issues, an approximation of the model output behavior can be generated using sparse grids (i.e., interpolation). Thus, the main goal of this project is to simplify an existing ordinary differential equations-based model of the mitochondrial bioenergetics, i.e., the mitochondrial biochemical reaction network. The end product will be a simplified model that runs significantly faster without compromising accuracy.
At the end of this project, students will be able to reduce complex, multidimensional models that are difficult to analyze into dimensionally reduced polynomial regression models using sparse grids.
Linear algebra and MATLAB will be used in this project.
Keywords: Computational physiology, metabolism, model reduction, sparse grids
Identifying Worldwide Astrophysicists from Scientific Literature
Mentors: Dr. Vicente Amado Olivo
The advent of the internet has allowed for worldwide accessibility and faster publication of academic research. A significant challenge for researchers is monitoring the expanding literature and the growing number of researchers in their field. For example, in astronomy, roughly 17,000 publications are published yearly and this number doubles every 14 years, only exacerbating the existing problems. The NASA Astrophysics Data System is a repository of over 16 million publications and the associated metadata (author names, keywords, etc). In this project, the selected student will investigate the dataset of over 16 million publications to understand the number of astrophysicists in the field through data science techniques, such as visualizations, statistics, or machine learning.
By the end of this project, the student will answer the following questions: How have the sizes of collaborations in Astrophysics and across its subfields changed over the years? How often do astrophysicists publish together across subfields? Taking these steps: 1. Analyzing dataset of over 16 million publications, 2. Visualizing the number of Astrophysicists, and 3. Creating co-author network from NASA/ADS.
Prior experience in python programming will be viewed favorably and experience in command line is helpful.
Keywords: Computational meta-research, natural language processing
Evaluating Bio-based BPA Alternatives for Binding to the Estrogen Receptor
Mentor: Dr. Josh Vermaas
Plasticizers, such as bis-phenol A, can mimic the effect of hormones such as estrogen, causing unwanted health impacts. Bio-derived alternative plasticizers have been proposed, but not rigorously evaluated for their ability to bind to the estrogen receptor. Using molecular dynamics simulations, we will be evaluating how well (or poorly) these alternatives bind.
By the end of the project, the student will be able to rigorously quantify the binding between a small molecule and a protein.
Some experience with python would be ideal, but any programming language can be used. Students are only required to know what a protein is.
Keywords: Biophysics, Drug discovery, Molecular simulation, Computational chemistry
Machine learning to improve project team networks
Mentor: Dr. Sinem Mollaoglu
In my field of study, we focus on improving the way project teams can work. We focus on student project teams for skill development and industry teams for just in time feedback on how project networks can be improved for information flow. We use real life field data, video and voice recordings of teams in action, and meeting notes to code and analyze priority issues and communication behaviors. We then utilize machine learning to automate the processing, interpretation, and use of the data to predict and inform behaviors in project teams.
At the end of this project, students will be able to work with real life data and use programming techniques and machine learning to inform how project teams can better collaborate.
An engineering background would be great but not required, basic statistics knowledge (regression, ANOVA, tabulation of data) would be useful. Ideal applicants will have some experience with any programming language, Python, and/or R, and be comfortable with using Excel.
Keywords: Machine learning, project teams
Informing data science curriculum design using modern computational tools
Mentor: Dr. Devin Silvia
The original design of the curriculum for the Bachelor of Science in Data Science program at MSU was motivated by a survey of the data science landscape in the years leading up to the roll out of the program (approximately 2015-2018). While this original design was thoughtfully crafted and significant effort has been made to align the new program courses with this design and the associated learning outcomes, it is important that we ensure that our program continues to best serve the needs of our students as they prepare to enter the ever-changing Data Science workforce. In support of this effort, this project will involve curating a database of recent data science job materials (e.g. advertisements, job descriptions, resumes, etc.) through the development of a web-scraping algorithm such that the database can be regularly updated as the landscape evolves. By performing text-based analysis aided by modern machine learning (ML) and natural language processing (NLP) techniques on this database, this project will seek to determine what employers are currently looking for in future employees and compare this to a summary of the learning objectives and skills covered in our current Data Science major courses. There are opportunities to work on parts or all of the proposed project components.
By the end of this project, students will be more familiar with web-scraping software as well as modern machine learning and natural language processing techniques. Through the use of these tools, students will help to inform future curriculum changes within CMSE undergraduate programs to ensure that the next generation of graduates will be well-equipped to join and positively contribute to the evermore data-driven workforce.
This work will leverage pre-existing open-source software available within the Python programming language ecosystem and thus familiarity and competency with Python will be a valuable asset.
Keywords: Web-scraping, machine learning, and natural language processing
Optimization of Blocks Sparse Eigensolvers on GPUs
Mentor: Dr. H. Metin Aktulga
Our group has already developed efficient block sparse solvers for a nuclear structure computation code called MFDn running on traditional CPU architectures. However, Graphics Processing Units (GPUs) have become the main workhorse for large-scale scientific computing applications. As such, it is desirable to port these solvers to GPU architectures. GPUs are more difficult to program and optimize for compared to CPU based systems. These challenges are further exacerbated by the emergence of GPU architectures from different vendors such as Nvidia, Intel and AMD. In this project, we aim to develop a portable and efficient version of our existing block sparse eigensolvers using high level programming models such as OpenMP and SYCL.
At the end of this project, students will be able to compile, execute and profile scientific codes on GPU architectures. They will also be able to develop simple OpenMP and SYCL based kernels for GPUs.
Experience with C/C++ and Unix-based systems is desirable.
Keywords: High Performance Computing, GPU Programming, Performance Profiling and Optimization, Computational Nuclear Physics
Predicting gene interactions across species using machine learning
Mentor: Dr. Shin-Han Shiu
Transfer learning is a machine learning technique in which an algorithm applies knowledge learned from previous tasks (e.g., a source species) to make predictions about novel, unseen data (e.g., in a target species). To assess the extent to which gene-gene interaction information from data-rich species (source) can be transferred to predict gene-gene interactions in data-poor species (target), the student will leverage several transfer learning approaches. Since comprehensive genome-wide gene-gene interaction networks are not available for most species, the student will be working on transferring the Saccharomyces cerevisiae (baker’s yeast) network and other multi-omics data (e.g. protein-protein interactions and gene co-expression networks) to make predictions of gene-gene interactions in Arabidopsis thaliana, a model plant species.
At the end of this project, the student will gain familiarity with cleaning biological data using Python and/or R and be able to analyze the data using descriptive statistics, machine learning models, and generating quality figures, and be able to interpret modeling results. The student will learn about framing research questions into a machine learning problem as well as learn about key genetics concepts. The student will have the opportunity to work in a highly collaborative environment to gain experience contributing to a research project as part of a team.
Students interested in this project should have some experience in Python (preferred) using numpy, pandas, and matplotlib or R. In addition, basic experience working in a command-line environment is desirable, but not required. Courses in statistics, programming and genetics are strongly desired.
Keywords: Machine learning and genomics
Propagation of errors in meteorological measurements into water resources modeling
Mentor: Dr. Yadu Pokhrel
The project will involve employing the Community Land Model (CLM) from the National Center for Atmospheric Research (NCAR) in a High-Performance Computing (Supercomputer) environment to conduct numerical simulations of water resources in selected regions of the American southwest. The student will perform model simulations using a newly developed probabilistic meteorological dataset to reconstruct and quantify past changes in water availability.
Students in this project will learn to use the model in an HPC environment, perform long term simulations with the model using the probabilistic meteorological dataset, learn to make sense of the model output and to quantify historical variability and changes in hydrological states and fluxes, and learn to give scientific presentations and to participate in manuscript writing.
Intermediate programming experience with Python or R is necessary and a background in statistical analysis is desired.
Keywords: Environmental modeling, water resources, hydrology
Simulating polymer breakdown at the molecular level
Mentor: Dr. John Dorgan
In the chemical recycling of plastics, the large molecular weights of polymers are reduced through chemical reactions. These reactions involve a small molecule reacting with the backbone of a polymer chain causing the chain to cleave. Such processes involve intrinsic chemical reaction rates but can be heavily influenced by limitations associated with the penetration of the small molecules into the polymer through a process known as diffusion. In this project, the selected student will help write a computer program that can simulate these important processes at the molecular level.
By the end of this project, students will have knowledge of the following: 1. Describe a Markov chain using words and appropriate mathematical equations. 2. Sketch cubic, body-centered, and face centered lattices. 3. Describe the meaning of self-diffusion using words, a graph, and equations. 4. Provide a definition of chemical potential and describe mutual diffusion using words, graphs, and equations. 5. Sketch elementary moves on a lattice associate with the cooperative motion algorithm. 6. Write elementary routines in the FORTRAN programming language. 7. Demonstrate an understanding of how Markov chains can be used to model chemical reactions by writing a subroutine to calculate the change in molecular weight for polymer chains being degraded through chemical reactions.
It's strongly desired that the student has some background in molecular sciences, for example one or more semesters of chemistry. It would be helpful (i.e., in the bonus category) to have a student majoring in physics, chemistry, materials science, or chemical engineering with some experience in any programming language
Keywords: Sustainability, Recycling, Molecular simulation, Advanced materials