Graduate Seminar: ML in the Natural Science

Machine Learning Methods in Natural Science Modeling (BIOL/PHYS 5566)

Bhuvnesh Jain, Junhyong Kim

This is a course for PhD students in natural sciences with interests in applying new machine learning and AI approaches to their problem domains. The course will consist of directed readings and tutorials with weekly discussions. The goal is to motivate mutual self-learning through guided discussions. Weekly participation and completion of readings or other assigned materials is essential. The following topics will be covered in Spring 2023.

  • Fundamentals of statistical learning. Quick introduction to ‘bread and butter ML’: kNN, logistic regression, decision trees, support vector machines and neural nets.
  • Introduction and in-depth exploration of selected key topics in ML: reinforcement learning, generative adversarial networks, and transformers. This will occupy the bulk of the semester and students will be expected to actively participate by reading and presenting sections of papers. Tutorials and light coding will also be part of the exploration and will include applications to Biology, Physics and other natural sciences.
  • Finally, we will discuss foundation models, built on the transformer architecture underlying chatGPT and related  products, and how they are/will change research in the natural sciences, with topics selected from Biology, Physics, Linguistics, and Environmental science.

Readings

For the first two weeks, students are asked to read:

  1. Artificial Intelligence, by Melanie Mitchell. Along with the introductory lectures, this book will bring everyone on the same page before we study more technical topics.
  2. Introduction to Statistical Learning (ISLR), Gareth James, Daniela Witten, Trevor Hastie, & Robert Tibshirani (Springer, 2013)
    This is a now standard textbook on introduction to machine learning for advanced undergraduates. Please read Chapters 2, 4, 8 and others to catch up on any basic topics you are not familiar with. The book is free from the author’s website (http://www-bcf.usc.edu/~gareth/ISL/)!
  3. Hands-on ML with Scikit-Learn, Keras and Tensorflow, Aurelien Geron.

Weekly Schedule