Graduate Seminar: ML in the Natural Science
Machine Learning Methods in Natural Science Modeling (BIOL/PHYS 5566)
Bhuvnesh Jain, Junhyong Kim
This is a course for PhD students in natural sciences with interests in applying new machine learning and AI approaches to their problem domains. The course will consist of directed readings and tutorials with weekly discussions. The goal is to motivate mutual self-learning through guided discussions. Weekly participation and completion of readings or other assigned materials is essential. The following topics will be covered in Spring 2023.
- Fundamentals of statistical learning. Quick introduction to ‘bread and butter ML’: kNN, logistic regression, decision trees, support vector machines and neural nets.
- Introduction and in-depth exploration of selected key topics in ML: reinforcement learning, generative adversarial networks, and transformers. This will occupy the bulk of the semester and students will be expected to actively participate by reading and presenting sections of papers. Tutorials and light coding will also be part of the exploration and will include applications to Biology, Physics and other natural sciences.
- Finally, we will discuss foundation models, built on the transformer architecture underlying chatGPT and related products, and how they are/will change research in the natural sciences, with topics selected from Biology, Physics, Linguistics, and Environmental science.
Readings
For the first two weeks, students are asked to read:
- Artificial Intelligence, by Melanie Mitchell. Along with the introductory lectures, this book will bring everyone on the same page before we study more technical topics.
- Introduction to Statistical Learning (ISLR), Gareth James, Daniela Witten, Trevor Hastie, & Robert Tibshirani (Springer, 2013)
This is a now standard textbook on introduction to machine learning for advanced undergraduates. Please read Chapters 2, 4, 8 and others to catch up on any basic topics you are not familiar with. The book is free from the author’s website (http://www-bcf.usc.edu/~gareth/ISL/)! - Hands-on ML with Scikit-Learn, Keras and Tensorflow, Aurelien Geron.
Weekly Schedule
- Jan 23: Intro to ML: K Nearest Neighbors vs Linear regression. Classification.
- Jan 30: Validation/resampling methods; Logistic regression; Decision trees/random forest.
- Feb 6 and 13: Perceptron and multi-layer perceptron; SVM and kernel methods; Techniques for analysis of learning methods.
- Assignment 1: code a neural network. Due on Feb 6.
- Assignment 2: Read Chapter 11 from “Hands on ML…”. Do jupyter notebook exercises on different activation functions, optimizers, learning rates, batch normalization etc.
- Feb 20: Introduction to language models and transformers.
- Feb 27: Student presentations on Attention mechanism.
- March: Evolutionary-scale prediction of atomic level protein structure with a language model
- March: Improving language recognition with deep learning
- March: Foundation models: part 1
- March: Foundation models: part 2
- April: Climate modeling with foundation models
- April: Guest lecture by Konrad Kording