Skip to main content Skip to secondary navigation

SHTEM2STEM 2024 Cohort

STEM2SHTEM 2024

Main content start

In 2024, from June to August, 63 high school students attended the STEM to SHTEM (Science, Humanities, Technology, Engineering and Mathematics) summer Engineering and Mathematics) summer program hosted by Prof. Tsachy Weissman and the Stanford Compression Forum. During this summer program, the high schoolers pursued fun research projects in various domains under the supervision of 16 mentors, where a YouTube playlist and the entire collection of the high schoolers’ reports can be found below.

STEM2SHTEM 2024 Playlist

Analyzing Radio Data From The Green Bank Telescope (Using Manual And Machine Learning Techniques) To Discover New Pulsars (PDF, 8.11 MB)

  • By: Ama Okyerewa Okraku Mantey, Anh Vu Ha Nguyen, Ellie Han, Khushi Upadhyay, Nesreen M. Majd
  • Mentor: Josephine Wong

Abstract

Background and Objective: A pulsar is a highly magnetized rotating neutron star that emits beams of electromagnetic radiation out of its magnetic poles. Pulsars aid us in understanding Ultra-dense matter, Einstein’s general theory of relativity, quantum mechanics, and superdense matter. We aim to find new pulsars by analyzing data found from the Green Bank Telescope.

Method: After gaining access to the PSC database, we analyzed radio data from the GBT to determine if the data can be evaluated either as a Pulsar candidate or as Noise/RFI. We used the Australia Telescope National Facility (ATNF) Pulsar Catalog to explore pulsar properties and verify potential pulsar candidates. We also used machine learning to automate the process and explored different types of algorithms, comparing their performances.

Results: After looking at over 2000 plots, we found two potential pulsar candidates. Upon sending one of the pulsar candidates to the PSC, they determined that this one was not a pulsar. However, the PSC is closely inspecting data marked “Maybe”, so there may be future potential pulsar candidates from our data. The machine learning model was ultimately able to reach an accuracy of 100%. There is an overall increase in accuracy, the final accuracy reaching about 1, the maximum, which signifies that the model is getting more of its predictions’ “right”. There is also a low loss very close to 0 – indicating fewer errors and higher model performance.

Conclusions: The project utilizes manual and machine learning algorithms for the analysis of radio data obtained through the GBT for the detection of new pulsars. For this analysis, we manually reviewed the plots sent in by the Pulsar Search Collaboratory and created a simple machine learning model to distinguish between pulsars and non-pulsars. We identified several candidates that appeared to align with the characteristics of known pulsars, and the model was able to correctly identify pulsar plots with high accuracy.

Doctor Who?: The Influence of AI on Affective Responses to Vaccine Calls (PDF, 113.56 KB)

  • By: Cathy Sheng, Deanna Vuong , Jason Zhang , Noha Yousif , Zhiyuan Ma
  • Mentor: Danielle Amir-Lobel

Abstract

AI voice generators are becoming increasingly prevalent in personal and professional settings. Existing research postulates a negative human bias towards AI-generated text; however, there is a gap in the literature surrounding perceptions of artificial audio. Using a novel survey, this paper examines how the gender (male or female) and perceived identity (AI or human) of a caller affects receptiveness towards the message: in this context, a promotion for a fictitious vaccine. Contrasting previous literature, the results do not support a bias against AIgenerated audio. These findings, in tandem with further research, could inform more effective implementation of artificial audios.

Investigating the Failure Modes of Multimodal Models (PDF, 2.51 MB)

  • By: Youssef Baldreldin, Raudel Garcia, Sumaia Jewena, Chinazam Madu, Jaden Sanders
  • Mentors: Benjamin Martinez & Navaneeth Dontuboyina

Abstract

The massive integration of AI technology into the lives of many people, multimodal systems have served as tool complete tasks, often done by humans, but with higher accuracy and better precision. Utilizing datasets and other large amounts of sources, multimodal systems are able to conduct virtual almost every task and with the rise in GPT systems like ChatGPT, the efficiency and reliability of systems are likely to improve in the coming years, however; these experience complexities and subsequent failures in processing visual data, which often lead wide limitations in many GPT systems. In our project we investigated the failures of multimodal systems and engineered a ViLT-script model to optimize data visualization in GPT models.

Reinforcement Learning: Distributional Soft Actor Critic (PDF, 1.37 MB)

  • By: Atharva Babu, Elaine Liu, Samuel Sosa, Varun Sriram, Sowmya Venkatesh
  • Mentors: Prof. Zhengyuan Zhou and Junyao Chen

Abstract 

This paper will evaluate a recently proposed variant of a reinforcement learning (RL) algorithm called Distributional Soft Actor-Critic (DSAC), combining elements of two successful reinforcement learning algorithms: Distributional Reinforcement Learning (DRL), which uses the distribution of returns in order to adapt to different scenarios and have an increased capacity to learn different behaviors, and Soft-Actor Critic (SAC), which aims to maximize entropy thereby encouraging exploration. In the Open AI Gymnasium environment, variations of this DSAC algorithm that combine variants of Distributional RL such as Implicit Quantile Network (IQN), Quantile Regression Deep Q-Network (QR DQN), and Fully-Parametrized Quantile Function (FQF) combined with the SAC algorithm are tested and the optimal combination of algorithms for DSAC is presented.

Towards Sustainable Funding and Synergistic Management of Neglected Tropical Diseases (pdf, 482.81 KB)

  • By: Alim A. Oraz, Deeksha Ravi, Hind Essalhi, Md. Shafin Jubayer, Puttipong (Nai) Kong-In
  • Mentor: Rocky An

Abstract

Neglected tropical diseases (NTDs) such as schistosomiasis impact over one billion people worldwide, primarily in low- and middle-income countries, as highlighted by the World Health Organization. Despite their widespread prevalence and severe health consequences, these diseases remain poorly understood and significantly underfunded. This research seeks to address the gap in funding and treatment by developing sustainable solutions that integrate environmental and ecological considerations, especially in regions with high co-infection rates. We examine two particular strategies: that of mass drug administration (MDA) and sustainable vector control through vegetation removal. While MDA campaigns have proven cost-effective in preventing NTDs, their true costs and benefits extend beyond immediate drug expenses. Through cost-benefit analyses and life cycle sustainability assessments, this study evaluates the long-term effectiveness of these campaigns. Additionally, we examine the scalability and environmental sustainability of a targeted vector control strategy - vegetation removal - to reduce the prevalence of vector-borne NTDs like schistosomiasis. The outcomes of this research include an optimized strategy for reducing disease prevalence, improving long-term health outcomes, and promoting socio-economic development in affected regions, ultimately advancing global health equity.

Computational Drug Optimization: Using Machine Learning to Inhibit EZH2 (PDF, 2.89 MB)

  • By: Adarsh Khullar, April Surac, Ivy Wang, Merab Miller
  • Mentor: David Candes, Juan Almanza

Abstract 

Enhancer of zeste homolog 2 (EZH2), a histone-lysine N-methyltransferase enzyme encoded by the EZH2 gene, has recently become a key target in drug discovery due to its carcinogenic properties. EZH2 primarily functions as a gene silencer through its role in transcriptional repression. A part of the polycomb repressive complex 2 (PRC2), it also plays a key role in stem cell pluripotency and cell differentiation. When mutated or overexpressed, however, EZH2 has been linked to the excessive inhibition of tumor suppressor genes, resulting in the growth of various cancers. Additionally, dysregulation of EZH2 is tied to accelerated cell proliferation as well as prolonged cell survival, both of which are telltale biomarkers of cancer development.

In this bioinformatics project, we develop and evaluate machine learning models to predict the activity of potential EZH2 inhibitors. We aim to create a framework for computational drug discovery and novel cancer therapies. Using a curated bioactivity dataset from the ChEMBL database, we preprocess our data and perform exploratory analysis based on Lipinski’s Rule of Five. We then remove low-variance features and split the data into training and testing sets to train our models. Ultimately, we employ various regression algorithms to predict pIC50 values, an indicator of inhibitory potency, and assess model performance through metrics such as R-squared and RMSE, visualized with scatter plots and bar charts.

Optimizing Large Language Models: Learning from Mistakes in Gameplay (PDF, 834.8 KB)

  • By: Federica D’Alvano Kirakidis, Lily Gao, Aaron George, Alex Huang, Niv Levy
  • Mentors: Prof. Benjamin Von Roy, Yifan Zhu, Henry Widjaja

Abstract

In recent years, there has been a surge in research and public interest in Large Language Models (LLMs), which have demonstrated remarkable potential across diverse applications and domains. This paper provides a comprehensive survey of the applications of LLMs, particularly focusing on their roles and capabilities within multi-agent systems (MAS). We utilized Gemini 1.5 Flash by Google to introduce a benchmark for evaluating LLM learning based on mistakes in previous data. Our findings reveal significant variations in LLM performance across different prompt engineering strategies, enhancing our understanding of their strategic thinking in relation to learning through game data. Additionally, we explore the complexities of extending LLM-based self-supervised learning to MAS, emphasizing coordination and communication among agents. By identifying underexplored areas and promising research directions, this survey lays the groundwork for innovative research at the intersection of LLMs, game logic, and MAS, advancing toward Artificial General Intelligence (AGI).

Simulating Evolutionary Processes Using Genetic Algorithms and Variable Constraints (PDF, 1.05 MB)

  • By: Ho Lok Cheung, Annum Hashmi, Devaki Rawal, Lundi Moyo
  • Mentor: Samuel Do

Abstract

Our project uses the evolution of artificial life in a virtual environment to simulate natural evolution. By observing naturally evolved organisms, we investigated the impacts of environmental conditions and evolutionary processes under changing constraints.

Using Image Segmentation to Identify Firearms in CCTV Cameras (PDF, 597.32 KB)

  • By: Anir Suren, Daniyal Ahmad Ansari, Blake Misquitta, Mário Rosa, Saaria Zaheer, Tae Esparanza Cooper
  • Mentor: David Jose Florez Rodriguez

Abstract

Segmenting firearms from complex backgrounds in images is a crucial task with applications in law enforcement and security, especially in a context where around 150,000 armed crimes have been committed only in 2022 (Statista Research Department, 2024). This study assesses the efficacy of machine learning for firearm segmentation and the possibility of a synthetic dataset supporting this task. CNNs excel at accurately segmenting and analyzing complex images but may face challenges related to data dependency and generalization. This study utilizes a dataset augmented from the Kaggle firearm segmentation dataset and a synthetic dataset created by superimposing firearm images onto diverse backgrounds of scenes. The results demonstrate significant potential to utilize image segmentation to identify firearms, with CNNs showing consistent performance. Future research on this topic should focus on expanding training datasets, exploring different image segmentation techniques and architectures, and optimizing models for real-time applications in diverse scenarios.

Reimagining the Power Grid (PDF, 2.35 MB)

  • By: Andres Tejada, Antonia Tarfulea, Areeba Tayyeb, Manya Singla, Wesley Chen
  • Mentors: Srabanti Chowdhury, Nish Sinha

Abstract

The modern world runs on electric power, from heating systems to electric vehicles. Yet, despite the prominence of electricity, inefficient conversions between AC and DC power by silicon-based transistors dissipate around 60% of the power that is generated. These traditional silicon (Si) transistors have high loss in switching (turning on and off) and result in complex maintenance, high costs and energy losses. Wide BandGap transistors (WBGs) have properties such as higher charge carrier mobilities, critical electric fields, and thermal conductivities that can minimize losses in power grid applications such as DC-to-DC converters, rectifiers and inverters. The employment of WBG devices can help solve a significant problem in the existing power grid, improving efficiency and reliability. This paper aims to explore the history and losses of the current power transmission system while identifying areas that can benefit from WBG device implementation. The paper also uses LTSpice to model circuits and explore how characteristics of WBG devices, such as silicon carbide (SiC), can reduce losses compared to traditional Si devices. By extracting data from existing sets and LTSpice, this study analyzes the power factor along with other variables and uses machine learning to statistically prove that a power grid using WBG devices is more efficient than existing ones.

Electroencephalography (EEG): Analyzing Object-identification with Electrodes and Classifiers (PDF, 1.46 MB)

  • By: Wiktoria Blazik, Anna Gagne, Ines Villaseca Gonzalez, Sydney Korzyniowski, Reese Kugel, Khushi Mehrotra, Aaron Nguyen, Jeenaev Shah, Ata Sugun
  • Mentors: Noah Huffman & Joanna Sands

Abstract

Object recognition is essential for human survival, and the human visual cortex has evolved to perform this task efficiently. Through our research examining this phenomenon, we aim to enhance electroencephalography (EEG) methods to investigate how the human visual cortex recognizes and categorizes different types of objects. Participants’ brain responses were recorded using EEG as they viewed a series of 72 object photographs. These images were broadly categorized as Animate (Human Face, Human Body, Animal Face, and Animal Body) and Inanimate (Natural or Man-made). This study aims to discern how the brain distinguishes between these different categories of objects. To achieve this, prior work has used single-trial classification to perform a Representational Similarity Analysis (RSA) (Kaneshiro et. al., 2015). This work expounds upon prior results in two key respects. First, while prior work utilized all 128 EEG channels, our approach explores using specific clusters of channels for image classification in order to find the minimum number of electrodes needed to still obtain accurate object identification. Second, we investigate the usefulness of additional image classifiers in both speed and prediction accuracy.

Exploring AI and Statistical Physics with Pokémon (PDF, 1.98 MB)

  • By: Hyunsuh Gu, Delta Blendea, Jessica Liu, Avi Iyer, and Atreyi Saha
  • Mentor: Sofia Helpert

Abstract

While artificial intelligence (AI) can be incredibly powerful when outcomes are consistent, they will often fail to perform as consistently when randomness and luck play a large factor in their environment. Examining the Brock fight in Pokémon Red, we compared a model that chose moves randomly with a reinforcement learning (RL) model to map the battle landscape and better understand how such models adapt to random situations. The RL agent was ultimately able to win about 95% of games, compared to the roughly 45% win rate of the random agent. We found that on average, both models win in 32 turns, however under certain conditions the RL model can win in as few as 28 turns. Like the random simulation, the RL model found that early growl usage was imperative to winning the battle, however unlike the random simulation, the RL model was able to fine tune the number of growls it used in order to win the battles faster.