Skip to main content Skip to secondary navigation

STEM2SHTEM 2023 Cohort

STEM2SHTEM 2023

Main content start

In 2023, from June to August, 63 high school students attended the STEM to SHTEM (Science, Humanities, Technology, Engineering and Mathematics) summer Engineering and Mathematics) summer program hosted by Prof. Tsachy Weissman and the Stanford Compression Forum. During this summer program, the high schoolers pursued fun research projects in various domains under the supervision of 34 mentors, where a YouTube playlist and the entire collection of the high schoolers’ reports can be found below.

SaiFETY: An Integration of Audio Protection and Ethical Data Collection Comparisons Within Txt2Vid (PDF, 511.9 KB)

  • By: Lucas Caldentey, Avrick Altmann, Yan Li Xiao, Fenet Geleta
  • Mentors: Arjun Barrett, Laura Gomezjurado, Pulkit Tandon
Abstract

As a result of globalization and massive technological advancements, multi-media communications have begun running excessively on internet traffic. This reliance on digital connections further increased due to the recent Covid-19 pandemic. From the daily dependence on News channels and social media live streaming to peer-to-peer online meetings, the world's primary form of transmitting information is now digital. With the decline of human-to-human interaction, it is critical to not only have a stable and reliable metric to converse but also an effective way to ensure the safety and ethicality of all users. We introduce an extended version of Txt2Vid with more clear and developed stances toward user safety. Specifically, we focused on comparing the data collection methods between Txt2Vid and other video communication platforms. Additionally, we developed an audio key authentication system using text-dependent voice verification (Novikova) that prevents users from falsely using the voices and information of others. With these implementations, we hope to smoothen the transition and comfort of the public as AI becomes more and more prevalent in our lives. And show people from all over the world that deep fakes can be used safely and positively to bring us closer together.

Self-Learning AI Model on Limited Biomedical Imaging Data and Labels (PDF, 403.49 KB)

  • By: Niraj Gupta, Saniya Khalil, Jolie Li, Iris Ochoa, Elisa Torres
  • Mentor: David J. Florez Rodriguez
Abstract

This research explores self-learning AI using Google Colab by pre-training a general TensorFlow-coded model on recognizing patterns in limited, unlabeled biomedical image data. This allows the model to understand the basic underlying patterns and structures in the images. After self-learning, we train the model with labeled data. We hypothesize that self-learning will decrease how much we depend on extensively labeled data for the development of accurate AI models.

Understanding Patient Preferences for Kidney Transplants (PDF, 337.52 KB)

  • By: Omry Bejerano, Yash Chanchani, Eugene Kwek, Anvika Renuprasad
  • Mentor: Itai Ashlagi
Abstract

Kidney transplantation is the most effective treatment option for end-stage kidney disease. In the United States, Organ procurement organizations (OPOs) are responsible for recovering these organs from deceased donors and offering them to patients that need transplants. However, the current kidney transplant process is suffering from many frictions and inefficiencies. In the US, most patients wait three to five years for a kidney transplant while an average of 3,500 kidneys are discarded each year. And, unfortunately, around 5,000 patients per year die while waiting for a kidney transplant. One cause of these frictions is the lack of patient involvement in the organ allocation process. Surgeons typically accept organs, not only kidneys, for their patients without consulting them. In order to get patients involved, it is important that they understand the different factors that go into organ allocation.

We analyzed data from the Organ Procurement and Transplantation Network to gain insight on exactly what factors affect a patient’s waiting time, specifically for kidneys. Looking into exploiting variability in transplant centers’ decision, the allocation process, and historical data, we can accurately provide patients with predictions regarding waiting time which would help them make informed decisions on their transplant.

Exploring the Intersection of Creativity and AI (PDF, 198.52 KB)

  • By: Jane Shin, Ashir Rao, Helen Zheng, Taylor Melford Feinberg, Khup Tuang, Isabella Kennedy, Claire Wong
Abstract

We are a group of high school seniors and college freshmen who took part in Stanford University’s STEM-to-SHTEM summer research internship, with a focus on generative AI. Our research culminated in the creation of a zine, which we hope will inspire others to see the possibilities of creative collaboration between humans and AI.

Misconceptions about AI are rampant in the modern world, only heightened by the cries of false media reports and speculations over the future of the workforce. While valid concerns have been voiced and there is a real need to put protections in place to make sure that our work with AI is both ethical and sustainable, our goal is to combat any and all fallacies by illuminating the limitless potential of AI’s applications in the creative world.

Going into this internship, our research team had various opinions and ideas regarding AI. There were the avid supporters who were optimistic for the new capabilities AI would gain next to alleviate burdens from our lives, and those more apprehensive over what the implementation of AI would mean for the future of the workforce and creatives in general. Over the past two months, we have bridged gaps in our knowledge about AI and come together to become a collaborative force. In working together as a team, we have learned to question the bias each of us brings to our work, to listen to one another and honor our differences, finding that each of our unique perspectives strengthens our work as a group.

Ultimately, we created a zine that both expresses our collective humanness and is enhanced by substantial collaboration with AI, in order to illuminate the intersections between AI and the arts.

Unveiling the Orchestra: A Novel System for Audio Separation and Instrument Identification in Musical Recordings (PDF, 447.9 KB)

  • By: Manato Ogawa, Juan Almanza, Brad Ma, Rakshithaa V. Jaiganesh, Danielle Wang
  • Mentors: Quincy Delp, Alan Wong
Abstract

Audio source separation is a widely applicable field that aims to utilize signal processing techniques to extract separate sources from a piece of audio. While the human brain can easily discern between audio sources, computers require source separation systems to achieve this task. It is difficult for a computer to identify which instrument is playing directly from audio – but we can create a unique frequency “signature” for each instrument based on qualities such as timbre to accomplish this task. The Fast Fourier Transform (FFT) lets us translate audio signals from the time domain to the frequency domain. This creates a unique visualization from which we can extract a musical signature to discern which instruments are playing at a given time. Conventional approaches run slowly and often fail to accurately separate sound. Using the FFT, we are developing an efficient and effective model to detect, categorize, and separate multiple audio sources based on relative magnitudes of harmonic frequency peaks. We hope that our results will demonstrate the ability of a model to differentiate between instruments using features we extract from audio. The instrument feature characterization could be applied to various domains, including music production, automated transcription, and music recommendation systems.

Investigating the Viability of Semantic Compression Techniques Relying on Image-to-Text Transformations (PDF, 1017.17 KB)

  • By: Adit Chintamaneni, Rini Khandelwal, Kayla Le, Sitara Mitragotri, Jessica Kang
  • Mentors: Lara Arikan, Tsachy Weissman
Abstract

Data compression is a crucial technique for reducing the storage and transmission costs of data. As the amount of data that is consumed and produced continues to expand, it is essential to explore more efficient compression methodologies. The concept of semantics offers an interesting new approach to compression, enabled by recently developed technology. Concisely, we sought to discover whether the most important features of an image could be compressed into text, and if this text could be reconstructed by a decompressor into a new image with a high level of semantic closeness to the original image. The dataset of images that were compressed is composed of five common image categories: single person, group of people, single object, group of objects, and landscape. Each image was compressed through the following pipeline: image-to-text conversion, text compression and file size determination, file decompression and text recovery, and text-to-image conversion. This pipeline enables any image to be compressed into a few dozen bytes. When examining image-to-text compressors, we experimented with both human and artificial intelligence (AI) powered procedures. We selected the text-to-image model DALL-E 2 as our decompressor. We released multiple surveys to assess structural fidelity and semantic closeness between original images and reconstructed images. We also included compressed JPEGs and WebPs to benchmark performance. Human and AI reconstructions received lower structural fidelity scores than WebP and JPEG images. Individually, images reconstructed from human captions were perceived to have higher structural fidelity and semantic closeness to the original images than AI captions did. Participants' textual descriptions, of both human and AI reconstructions, had high semantic fidelity scores to their descriptions of the original images. This demonstrates that the proposed pipeline is a viable semantic compression mechanism.

Designing the Most Efficient Recombination Process by Classical and Quantum Algorithms (PDF, 814.6 KB)

  • By: Aden Lee, Allan Jiang, Kim-Nga Shenoy, Vihaan Kodeboyina
  • Mentors: Junjie Luo, Kepler Boyce
Abstract

With the development of synthetic biology, to achieve highly specific and accurate control of living organisms, or to construct complex metabolic pathways, it is often desirable to create genetic circuits with multiple genetic elements. Traditional approaches involve docking these genetic elements on different chromosomes or integrating them at different loci far apart on the same chromosome and then recombining them. Because the traditional genetic approaches are constrained by the fundamental laws of genetics, the turnover time increases linearly with the number of genetic elements in the circuit. And the cost of maintaining all the genetic elements in the genetic circuit increases dramatically with the number of genetic elements.

Dr. Schnitzer Lab developed a recombination tool that can recombine two transgenes at the same docking site. This approach greatly accelerates the construction of intricate gene circuits and allows for the synthesis of biological strains with numerous genetic elements, leading to the efficient attainment of complex functionalities.

Based on the newest version of the Super Recombination system, SuRe 3.0, which uses 3 orthogonal adaptor pairs to sufficiently recombine any number of genes, we created a computational program that finds the quickest process to recombine multiple genetic elements. The turnover time for the recombination is proportional to the logarithm of the number of transgenes to be recombined. Our application initially assesses whether genes possess recombination capabilities. If recombination is possible, our application determines the shortest and quickest recombination tree by finding the shortest path. Our application allows researchers in the biology field to design the optimized recombination process with a computer automatically.

Behavior Cloning (BC) of Human Policy via Logged Data (PDF, 63.6 KB)

  • By: Aashna Kumar, Evelyn Jin, Hooriya Faisal, Samuel Sosa, Tyler Paik
  • Mentors: Zhengyuan Zhou, Junyao Chen, Dailin Ji, Ni Yan, Ethan Cao
Abstract

Human decision policy can be learned by machine learning (ML) models using logged data. Our research aims to train a convolutional neural network (CNN) that can predict the next action of a user given the current game state in the snake game. Predicting the user's next action is called behavior cloning. We collected the logged data manually and by heuristics replicating high scoring rounds. The collected data serves as our dataset, consisting of input-output pairs representing the game state and the corresponding actions taken by the human players. After training, our CNN reached an accuracy of 93% on the testing dataset.

Effectiveness of Virtual Reality in Surgeries, Surgeon Training, and Medical Education (PDF, 832.08 KB)

  • By: Alys Jimenez Peñarrieta, Davyn Paringkoan, Nyali Latz-Torres, Yasmeen Galal, Karen Zhang
  • Mentor: Suyeon Choi
Abstract

Augmented Reality (AR) and Virtual Reality (VR) have emerged as transformative perspective tools for medical surgeries. These technologies have the potential to enhance surgical precision, drastically improve patient outcomes, and revolutionize medical training. Furthermore, they can alter the way medical education is approached. However, AR/VR assisted surgeries raise critical policy, accessibility, and privacy concerns, given the information necessary about surroundings and the potential inequities of VR. This research paper provides a comprehensive review of existing literature. The results demonstrate how the mechanisms behind VR improve healthcare.

In addition to our literature review, our team programmed an educational brain anatomy simulation for elementary and middle school students. Educational VR programs could be an effective way of teaching as they are more engaging than traditional teaching mediums and they help students to visualize concepts, which is likely to lead to an improvement in learning retention. We used the program Unity to construct a 3D model of the brain. The different sections of the brain were labeled and color coordinated. When a student clicked on a label, it would take them to a screen with more information regarding what that section of the brain does. In addition, we made a PDF document with the same information, but with 2-D visuals.

We distributed our VR product to a test group, and our PDF document to a separate group. Both groups consisted of 20 elementary school children who are going into the same grade and attend the same summer camp. After each group was given an hour and a half to read the PDF or explore the program, they were given a short test on the information presented. The results demonstrate that VR programs can be an effective tool to teach anatomy and medical concepts.

On the Detection and Prediction of Seizures using EEG (PDF, 328.65 KB)

  • By: Fatima Ansari and Aren Wang
  • Mentor: Joanna Sands
Abstract

Seizures are abrupt, rapid bursts of electrical activity within the brain. Those with epilepsy, a central nervous system disorder, suffer repeated seizures that appear to occur randomly and without warning. Frequent seizures may cause physical injury or even death. A device that can quickly detect and respond to the onset of a seizure may lessen these risks. The most commonly used instrument to detect such an event is an electroencephalogram (EEG), which is noninvasive and contains graphs of multiple channels. These graphs reveal the brain’s electrical activity. EEG can be used to distinguish different seizure types and epilepsy types (focal or generalized, idiopathic or symptomatic, or a symptom of a larger epilepsy syndrome), and thus the choice of antiepileptic treatment and prognosis prediction.

Bridging the Gap in Generative A.I. for Audio Generation (PDF, 1.81 MB)

  • Pranav Battini, Kaley Chung, Navaneeth Dontuboyina, Sude Ozkaya, Kedaar Rentachintala
  • Mentors: Mert Pilanci, Rajarshi Saha, Zachary Shah, Indu Subramanian, Fangzhao Zhang
Abstract

Generative A.I. has come a long way in recent years, popularized by OpenA.I.’s ChatGPT and DALL-E, with advancements such as diffusion models paving the way to the future. However, while there has been much progress made in the realm of natural language processing and image generation, there is still much work to be done in the fields of A.I.-generated audio and video. The advancement of generative A.I. in audio generation in particular would lead to a plethora of innovation and development, from music generation to improving training data for autonomous cars.

Our project seeks to fill in the gap in A.I.'s ability to generate audio by refining an easily modifiable model to produce unique media that can ultimately be used for additional model training and possible commercial use through the implementation of Stable Diffusion and Riffusion libraries. We experimented and built our solution using Jupyter Notebooks through either Google Colab or locally with Anaconda, installing dependencies as needed. Our first milestone in our project was running inferences with Zachary Shah’s riff-pix2pix model to gain an understanding of the current progress in the field by generating audio samples. Via riff-pix2pix, we ran inferences to generate short audio clips based on our prompts. Our second milestone was to utilize Jupyter Notebooks such as generate_splices.ipynb and splice_together.ipynb to generate longer audio than what was previously possible in riff-pix2pix, with the goal being to generate a 1-minute audio sample by seamlessly splicing together up to 5 seconds samples that Shah’s riff-cnet model could generate. Our third and final milestone was to curate an expanded dataset using technologies such as Spleeter and Librosa, to retrain our model with, hypothetically resulting in a vast improvement in its audio generation capability.

Evaluating Location-Dependent Variation in Political Google Search Results: A Case Study in Brazilian Politics (PDF, 2.57 MB)

  • By: Nguyen Hoang Minh Ngoc, Vania Tucto
  • Mentor: Amy Dunphy
Abstract

With the rise of the internet and social media, misinformation has driven elections all over the world to become increasingly contentious and polarized. Web search results, while generally understudied as a vector for misinformation and political bias, have been found to have dramatic effects on the behavior of undecided voters. 

In this project, we sought to study the effect of search location on the ranking and political slant of search results. In a small initial dataset focused on the 2022 Brazilian election, we observed differences in the results returned for different queries based on the location from which a user was searching. This effect could have a widespread impact on elections worldwide. In future work, we seek to quantify the differences between results for different locations, in order to shed light on the specific ways that a location’s characteristics could impact the search experience of voters in that region.

Segmented Image Compression in Healthcare (PDF, 282.3 KB)

  • By: Alex Nava, Cristina Bonilla Bernal, Jayden Tang, Logan Graves
  • Mentors: Ayushman Chakraborty, Qingxi Meng
Abstract

The crossroads at which medical imaging and data compression intersect has yielded a fascinating, novel area of research, particularly pertaining to the Segment Anything Model (SAM), an AI-based image segmentation model. We researched a plethora of standard medical imaging techniques, including Computed Tomography (CT scans), Positron Emission Tomography (PET scans), Ultrasound, and Magnetic Resonance Imaging (MRI). Additionally, we analyzed more specific areas of medical imaging such as digital pathology, mammography, and photoacoustic imaging. To supplement our knowledge of the different types of imaging, we researched, specifically, how MRI scans are processed in terms of segmentation and standard storage, and compression practices. Furthermore, we studied recurrent difficulties that medical professionals face when segmenting certain areas of the body, paying particular attention to issues within the brain and the spinal cord. By speaking to a member of the Radiology Interest Group at Stanford, we also determined frequent issues surrounding storage and clinical workflow, thus narrowing our research into how the Segment Anything Model can be applied in a robust, efficient, and critical manner.

Integrating our knowledge about various kinds of medical imaging technology, we present a proof-of-concept for a novel image compression technique based on SAM, one which is especially suited to medical imaging technology. By automatically distinguishing between unimportant image aspects (such as the blank black background of an MRI) and important aspects (such as the anatomical details of the scan), we can apply lossy compression to nonessential aspects and lossless compression to essential ones, allowing much greater amounts of compression without losing details relevant to the scans. We compare this technique with existing compression methods and suggest its potential applications, as well as areas for future research.