UMSL Computer Science Colloquium
Fall 2021 - Spring 2022

Date Time Speaker Affiliation Title Zoom
10/19 4:00pm Kevin Scannell SLU Machine learning and language technology in minority language contexts Link
11/12 TBA Jason Martin DISA TBA Link
11/30 4:00pm Matthew Lane UT-Knoxville TBA Link
4/6 3:00pm Jim Miller U Kansas TBA Link

Title: Machine learning and language technology in minority language contexts

Abstract: Techniques based on machine learning and neural networks have led to huge advances in technologies such as machine translation and speech recognition. Generally speaking, very large text and speech corpora or annotated datasets are required to employ these techniques, and smaller language communities face a number of challenges in trying to produce suitable datasets for machine learning. I will discuss a number of approaches that we have used in the three Gaelic language communities to overcome these challenges, including crowdsourcing, transfer learning from better-resourced languages, and mining of historical archives for data.

Bio: Kevin Scannell earned a BS in Pure Mathematics at MIT in 1991 and a Ph.D. in Mathematics at UCLA in 1996 under the supervision of Geoffrey Mess. His mathematical research focused on hyperbolic 3-manifolds, low-dimensional topology, and mathematical physics. After spending two years at Rice University as a G. C. Evans postdoctoral instructor, he joined the faculty at Saint Louis University in 1998. His current research uses machine learning to develop computational resources that support speakers of indigenous and minority languages around the world, particularly Irish and the other Celtic languages.

Title: How regression can be effectively used in place of classification in the context of protein folding

Abstract: Protein contact prediction, a binary classification problem in bioinformatics, lies at the heart of a six-decade-old problem of protein folding. Dozens of methods, based on almost all kinds of machine learning and deep learning algorithms, have been published over the last two decades for predicting contacts. Recently, many groups including Google DeepMind have demonstrated that reformulating the problem as a multi-class classification problem is a more promising direction to pursue. As yet another alternative approach, we recently proposed to formulate and attack the problem as a regression problem - the way the information exists in nature. Nuances of protein three-dimensional structures make this formulation a unique and tempting regression problem. In this work, we discuss novel ways of output label engineering (different from feature engineering) through the use of a variety of data transformation functions, and demonstrate, for the first time, that deep learning methods for real-valued protein distance prediction can deliver distances as precise as the binary classification methods. We also demonstrate how the more granular information contained in our real-valued distances can be used to build significantly more accurate three-dimensional protein models. We believe that our work will stand as a milestone marking the dawn of real-valued distance prediction.

Bio: Jacob Barger is a Master's student under the supervision of Dr. Badri Adhikari. This is his thesis defense.