Skip to content Skip to navigation

Stanford Compression Workshop 2021

Sponsored by

The Stanford Compression Workshop 2021 was held virtually on February 25th and 26th, 2021. The Workshop is a gathering of people from academia and industry interested in new and improved ways to model, represent, store, compress, query, process, communicate and protect the data the world is amassing. It encompasses diverse areas including genomic data compression, quantum information, video compression with an emphasis on perceptual quality, DNA-based data storage, neuroscience, databases, computation on compressed and distributed data, virtual theater technologies, compression of neural network models, and the use of neural networks for data compression. The workshop consists of talks and panels comprising students, academia, and industry participants, as well as a poster session. The schedule and other information is available below. Enjoy!

Talk and panel recordings now available as YouTube playlist!

The poster session will also be virtual and will be organized on We are excited to announce a best poster prize (Netflix vouchers)!

Best Poster Prize Winners

  • Arjun Barrett, The Harker School: "FFlate: A universal, high-performance compression library optimized for web browsers" [poster]
  • Shashank Vatedka, Indian Institute of Technology Hyderabad: "Locally Decodable and Update Efficient Data Compression" [poster]

The prize includes a one-year Netflix subscription, courtesy of our sponsors.

Please use the Zoom Q&A feature to ask questions during the talks and panels or raise your hand towards the end of the talk if you want to ask the question personally to the speaker. Questions will be entertained at the end of the talks and throughout the panels.

Gather town link will be provided here 30 minutes before the poster session on Thursday and before the social hangout on Friday. We have created a README document with instructions on navigating the interface. Please use Chrome browser to avoid technical issues.

Gather town link:

Join Slack workspace for informal networking and for technical discussions. Registered participants will receive an invite, let the organizers know if you don't. Attendees from Stanford can directly request an invite via, while the others should contact us directly.

Contact us

Previous workshop: Stanford Compression Workshop 2019


All times are in the Pacific Time Zone (PT).

Thursday, February 25, 2021

8:30 am - 8:45 am Tsachy Weissman, Stanford University
Opening Remarks [video]
8:45 am - 9:15 am José Ignacio Latorre, Centre for Quantum Technologies, Singapore; Technology Innovation Institute, Abu Dhabi
Quantum compression of classical and quantum information [video] [slides]

Abstract:  A quantum register can accommodate 2 2^n -2 real numbers. We show how an image can be casted onto a real, entangled state, and then processed for compression. We also show a quantum compression scheme for a quantum state, based on variational quantum autoencoders.

Bio:  José Ignacio Latorre is the Director of the Center for Quantum Technologies in Singapore, and the Chief Scientist at the Quantum Research Centre of the Technology Innovation Institute in Abu Dhabi. He has worked in quantum field theory, renormalization group, quantum information and computation.

9:15 am - 9:45 am Sebastian Deorowicz, Silesian University of Technology
Compression of genome collections [video] [slides]

Abstract:  Accessibility of cheap sequencing technologies allowed comparative genomics to extend its field of interests from viruses in the 1980s to humans in the 2010s. Nowadays, the largest sequencing projects cover tens or even hundreds of thousands of individuals. It seems obvious that in the near future we will see collections of millions of human genomes. Genome collections are usually stored in the Variant Call Format (VCF) files. Such files are usually huge and require compression. In the talk, I will discuss modern attempts to reduce the space necessary for VCF files. Moreover, I will show how a lot of queries can be answered without decompressing the VCF files.

Bio:  Sebastian Deorowicz is a Professor and Head of Algorithmics and Software Department at Silesian University of Technology, Poland. He completed his PhD in 2003 in Computer Science. In 2010 he moved to bioinformatics. His lab focuses on compression and analysis of genomic data (, ). This includes compression of sequencing data (in FASTQ format) as well as genome collections (in VCF or FASTA formats) for efficient storage and transfer. The works include also development of compressed data structures allowing fast queries of various types to compressed genome collections. The group develops also tools for analysis of genomic data, e.g., for k-mer counting, Illumina's read mapping, metagenomic studies. Other work focuses on multiple sequence alignments of huge protein families. The main goal of the group is to offer high-performant implementations able to work both on the server and workstation platforms.

9:45 am - 10:15 am Idoia Ochoa, University of Illinois Urbana-Champaign
GPress: a framework for querying general feature format (GFF) files and expression files in a compressed form [video] [slides]

Abstract:  Sequencing data are often summarized at different annotation levels for further analysis, generally using the general feature format (GFF) or its descendants, gene transfer format (GTF) and GFF3. Existing utilities for accessing these files, like gffutils and gffread, do not focus on reducing the storage space, significantly increasing it in some cases. We propose GPress, a framework for querying GFF files in a compressed form. GPress can also incorporate and compress expression files from both bulk and single-cell RNA-Seq experiments, supporting simultaneous queries on both the GFF and expression files. In brief, GPress applies transformations to the data which are then compressed with the general lossless compressor BSC. To support queries, GPress compresses the data in blocks and creates several index tables for fast retrieval.
We tested GPress on several GFF files of different organisms and showed that it achieves on average a 61% reduction in size with respect to gzip (the current de facto compressor for GFF files) while being able to retrieve all annotations for a given identifier or a range of coordinates in a few seconds (when run in a common laptop). In contrast, gffutils provides faster retrieval but doubles the size of the GFF files. When additionally linking an expression file, we show that GPress can reduce its size by more than 68% when compared to gzip (for both bulk and single-cell RNA-Seq experiments), while still retrieving the information within seconds. Finally, applying BSC to the data streams generated by GPress instead of to the original file shows a size reduction of more than 44% on average.

Bio:  Dr. Ochoa graduated with B.Sc. and M.Sc. degrees in Electrical Engineering from the University of Navarra, Spain, in 2009. She then went to Stanford, where she obtained a MS and a PhD in the Electrical Engineering Department, in 2012 and 2016, respectively. During her time at Stanford Dr. Ochoa performed internships as a software engineering at Google, CA and at Genapsys, CA. She also served as a technical consultant for the HBO show “Silicon Valley”. After obtaining the PhD, Dr. Ochoa joined the faculty at the Electrical and Computing Engineering department at the University of Illinois at Urbana-Champaign (UIUC), as an assistant professor, in January 2017. After three years at UIUC, Dr. Ochoa joined the faculty at Tecnun as a collaborator professor in January 2020. She still holds an adjunct faculty position at UIUC and continues advising her PhD students.
Her research interests include computational biology, data compression, bioinformatics, information theory and coding, machine learning, communications, and signal processing. In particular, her research focuses on the development of computational methods tailored to omics data, to aid the storage, handling, and analysis of these data. She has developed several compression algorithms for genomic, methylation and mass spectrometry data that are currently the state-of-the-art, as well as novel computational tools to improve the genomic analysis pipeline, such as a novel variant filtering tool based on ensemble methods.
Dr. Ochoa is recipient of the Stanford Graduate Fellowship, La Caixa Graduate Fellowship, and an award for excellence from the Basque Government. She has also been recently awarded the MIT Innovators under 35 award, the Gipuzkoa Fellows, and a Ramon y Cajal grant.

10:15 am - 10:30 am Break
10:30 am - 11:00 am Debbie Leung, University of Waterloo
Quantum data compression [video] [slides]

Abstract:  We define the problem, highlight some differences between
the quantum and the classical settings, summarize major results, and
discuss a few recent developments.

Bio:  Debbie Leung has been a faculty member at the Institute for Quantum Computing and the Department of Combinatorics and Optimization at the University of Waterloo since 2005. She works on quantum information theory, including the study of quantum channel capacities, quantum correlations, and quantum error correction. She obtained her PhD in Physics from Stanford University, and worked as a postdoctoral fellow at Caltech and IBM TJ Watson Research Center.

11:00 am - 11:30 am Yiannis Andreopoulos, iSIZE Technologies
Deep Perceptual Optimization for Video Encoding [video] [slides]

Abstract:  In recent work, we have extended the concept of learnable video precoding (rate-aware neural-network processing prior to encoding) to deep perceptual optimization (DPO). Our framework comprises a pixel-to-pixel convolutional neural network that is trained based on the virtualization of core encoding blocks (block transform, quantization, block-based prediction) and multiple loss functions representing elements of rate, distortion and perceptual quality achieved by the virtual encoder. In this talk, we shall summarize the key principles of our approach and present results on both natural-scene content, as well as animation/gaming content. We shall also explore some interesting complexity vs. Bjontegaard delta-rate (BD-rate) trade-offs enabled by our proposal and make some visual comparisons showing the visual quality difference corresponding to the reported BD-rate gains. Some suggestions for further extensions will be outlined.

Bio:  Yiannis Andreopoulos is Technical Director at iSIZE, a London-based AI company optimizing video delivery. He is also Professor of Data and Signal Processing Systems at University College London (UCL). His research interests are in video signal processing, machine learning and high-performance computing. He has published extensively in these areas and his work has been recognised by best paper awards and a Senior Research Fellowship from the Royal Academy of Engineering and the Leverhulme Trust.

11:30 am - 12:00 pm Kedar Tatwawadi, WaveOne, Inc.
Challenges and Opportunities in Learned Video Coding [video]

Abstract:  In the age of exponential growth of video conferencing and video on demand services, video compression has become more and more important. In the past few years, there has been a significant amount of progress in designing video compression techniques using machine learning to augment or replace the traditional video codecs. In this talk, I will discuss some of the key ideas shaping the next generation of learned video codecs, and how they improve upon some of the shortcomings of traditional codecs. In spite of the impressive strides, significant challenges remain in making the learned video codecs a reality. I will discuss some of the key challenges, and how we at WaveOne are working towards overcoming them.

Bio:  Kedar is a research scientist at WaveOne Inc. He received his Ph.D. from Stanford University in 2020, where he specialized in the field of data compression and information theory. He holds a B.Tech in Electrical Engineering Indian Institute of Technology, Bombay, and a M.S. from Stanford University. Kedar is the recipient of the Numerical Technologies Founders Prize at Stanford, and the Qualcomm Innovation Fellowship.

12:00 pm - 12:15 pm Break
12:15 pm - 12:45 pm Christos Bampis & Lukáš Krasula, Netflix
A lot of bang for the bit [video]

Abstract:  At Netflix, we strive to serve our members the best viewing experience possible. This means that we need to spend bits in a way that maximizes the perceptual quality of each video, on each device, in each streaming session. Naturally, such optimization requires a reliable way to quantify the observed quality. This talk will cover our efforts towards developing more accurate methods of measuring the subjectively perceived video quality under different viewing conditions, as well as the design of algorithms capable of estimating the quality automatically.

Bio:  Christos G. Bampis is currently working as an engineer within the Encoding Technologies team at Netflix, focusing on the research and productization of video quality algorithms at scale. Before that, he received a Ph.D. degree in Electrical and Computer Engineering from the University of Texas at Austin in the US and a Diploma degree from the National Technical University of Athens in Greece. In his free time, he likes reading and writing poetry, practicing martial arts, and traveling.

Lukáš Krasula is a research scientist in Netflix’s Encoding Technologies Team. He spends most of his days trying to figure out how to improve objective video quality metrics. He holds a double Ph.D. degree in Computer Science and Radioelectronics from the University of Nantes, France, and Czech Technical University in Prague, Czech Republic, respectively. When he’s not working, you’d probably find him playing music, doing sports, reading, or watching movies.

12:45 pm - 1:00 pm Roshan Prabhakar, Fremont High School
Reducing latency and bandwidth for video streaming using keypoint extraction and digital puppetry [video] [slides]

Abstract:  COVID-19 has made video communication one of the most important modes of information exchange. While extensive research has been conducted on the optimization of the video streaming pipeline, in particular the development of novel video codecs, further improvement in the video quality and latency is required, especially under poor network conditions. This paper proposes an alternative to the conventional codec through the implementation of a keypoint-centric encoder relying on the transmission of keypoint information from within a video feed. The decoder uses the streamed keypoints to generate a reconstruction preserving the semantic features in the input feed. Focusing on video calling applications, we detect and transmit the body pose and face mesh information through the network, which are displayed at the receiver in the form of animated puppets. Using efficient pose and face mesh detection in conjunction with skeleton-based animation, we demonstrate a prototype requiring lower than 35 kbps bandwidth, an order of magnitude reduction over typical video calling systems. The added computational latency due to the mesh extraction and animation is below 120ms on a standard laptop, showcasing the potential of this framework for real-time applications.

Bio:  Roshan Prabhakar, currently a Senior at Fremont High School, has worked in the fields of Information Theory relating to Data Coding and Compression. Through Stanford’s Information Systems Laboratory Roshan has worked on numerous projects at Stanford University most notably on Video Compression and Encoding through a mentorship for the STEM to SHTEM 2020 summer program (research listed for Publication at the 2021 Data Compression Conference), and on novel Digital Puppetry as a means to Facilitate Live Theatrical Performances in the age of COVID (Digital Puppetry at Stanford). He has worked under the close mentorship of Shubham Chandak and Kedar Tatwawadi (Stanford PhD fellows) as well as Professor Tsachy Weissman. Roshan is scheduled to attend Stanford’s undergraduate class of 2025.

1:00 pm - 1:15 pm Theater Online: Interns from the STEM to SHTEM program discuss their interdisciplinary project developing a multiplatform virtual theater experience [video] [slides]
1:15 pm - 2:15 pm Poster session [List of Posters]
2:15 pm - 2:30 pm Break
2:30 pm - 3:30 pm Panel: Compression and streaming technologies for live theater [video]
Keith Winstein, Stanford University

Bio:  Keith Winstein is an assistant professor of computer science and, by courtesy, of electrical engineering at Stanford University. His research group creates new kinds of networked systems by rethinking abstractions around communication, compression, and computing. Winstein previously served as a staff reporter at The Wall Street Journal. He did his undergraduate and graduate work at MIT.

Michael Rau, Stanford University

Bio:  Michael Rau is a live performance director specializing in new plays, opera, and digital media projects. He has worked internationally in Germany, Brazil, the UK, Ireland, Canada, and the Czech Republic. He has created work in New York City at Lincoln Center, The Public Theater, PS122, HERE Arts Center, Ars Nova, The Bushwick Starr, The Brick, 59E59, 3LD, and Dixon Place. Regionally, his work as been seen at the Ingenuity Festival in Cleveland OH, and the American Repertory Theatre in Cambridge, MA. He has developed new plays at the Eugene O’Neill National Playwrights Conference, the Lark and the Kennedy Center. Michael Rau is a recipient of fellowships from the Likhachev Foundation, the Kennedy Center, and the National New Play Network. He has been a resident artist at the Orchard Project, E|MERGE, and the Tribeca Performing Arts Center. He has been an associate director for Anne Bogart, Les Waters, Robert Woodruff, and Ivo Van Hove. He is a New York Theater Workshop Usual Suspect and a professor of directing and devising at Stanford University.

Devon Baur, University of California, Los Angeles

Bio:  Devon's career has been dedicated to the interplay of art and technology. She is currently working on a PhD in Theater and Performance Studies at UCLA, studying multisensory technologies in live performance. She has done ongoing work as a researcher and artist-in-residence in the Stanford Compression Forum exploring immersive storytelling in virtual spaces. Prior to this, she worked in the VR/AR industry for half a decade as both a curator and producer. Most notably, she produced the award-winning Tree VR, which toured to over 90 festivals including Tribeca Film Festival, Cannes Film Festival, and twice to the World Economic Forum in Davos.

Marieke Gaboury, Palo Alto Children's Theatre

Bio:  As a Director of Operations for the Palo Alto Children’s Theatre, Marieke is delighted to be a part of the team that supports performing arts programs for youth in the Bay Area. A California native who has happily returned home, Marieke spent some years in New Orleans, where she was the Manager of Institutional Development for the New Orleans Ballet Association, as well as Managing Director of Southern Rep, one of the only professional theatre companies in Louisiana. Her move to New Orleans followed 13 years in New York City, where she was Producing Director of LAByrinth Theater Company, the ensemble-driven Off Broadway collective which developed and produced new work by both emerging and distinguished, award-winning theatre artists. In the Summer of 2020, Marieke also co-founded The Breath Project, an archive of original theatrical works created by BIPOC artists, in response to the murder of George Floyd, and the deaths of so many people of color at the hands of law enforcement in this country.

Michaela Murray, Stanford University

Bio:  Michaela Murray is currently a junior at Stanford University studying Computer Science with a concentration in systems. Previously, she has done research in radar systems used in Radio Glaciology, and in large scale data analysis of TV news videos. She is currently focused on secure distributed operating systems for low-power devices, and emerging streaming technologies for live performances in the Zoom Era. Outside of CS, Michaela has a passion for violin performance, and she takes lessons and performs in ensembles at Stanford's Braun Music Center.

Tsachy Weissman, Stanford University
3:30 pm - 4:30 pm Break
4:30 pm - 5:00 pm Live StageCast theater performance

Friday, February 26, 2021

8:30 am - 9:00 am Sara Hooker, Google
The Myth of the Perfect Model: Characterizing the Generalization Trade-offs Incurred By Compression [video] [slides]

Abstract:  To-date, a discussion around the relative merits of different compression methods has centered on the trade-off between level of compression and top-line metrics such as top-1 and top-5 accuracy. Along this dimension, compression techniques such as pruning and quantization are remarkably successful. It is possible to prune or heavily quantize with negligible decreases to test-set accuracy. However, top-line metrics obscure critical differences in generalization between compressed and non-compressed networks. In this talk, we will go beyond test-set accuracy and discuss some of my recent work measuring the trade-offs between compression, robustness and algorithmic bias. Characterizing these trade-offs provide insight into how capacity is used in deep neural networks -- the majority of parameters are used to represent a small fraction of the training set. Formal auditing tools like Compression Identified Exemplars (CIE) also catalyze progress in training models that mitigate some of the trade-offs incurred by compression.

Bio:  Sara Hooker is a researcher at Google Brain working on training models that fulfill multiple desiderata. Her main research interests gravitate towards interpretability, model compression and security. In 2014, she founded Delta Analytics, a non-profit dedicated to bringing technical capacity to help non-profits across the world use machine learning for good.

9:00 am - 9:15 am Jonathan Frankle, Massachusetts Institute of Technology
The Lottery Ticket Hypothesis: On Sparse, Trainable Neural Networks [video]

Abstract:  I recently proposed the lottery ticket hypothesis: that the dense neural networks we typically train have much smaller subnetworks capable of reaching full accuracy from early in training. This hypothesis raises (1) scientific questions about the nature of overparameterization in neural network optimization and (2) practical questions about our ability to accelerate training. In this talk, I will discuss established results and the latest developments in my line of work on the lottery ticket hypothesis, including the empirical evidence for these claims on small vision tasks, changes necessary to scale these ideas to practical settings, and the relationship between these subnetworks and their "stability" to the noise of stochastic gradient descent. I will also describe my vision for the future of research on this topic.

Bio:  Jonathan Frankle is on the job market! He is a fifth year PhD student at MIT, where he empirically studies deep learning with Prof. Michael Carbin. His current research focus is on the properties of sparse networks that allow them to train effectively as embodied by his "Lottery Ticket Hypothesis" (ICLR 2019 best paper award). Jonathan also has an interest in technology policy: he has worked closely with lawyers, journalists, and policymakers on topics in AI policy and has taught at the Georgetown University Law Center. He earned his BSE and MSE in computer science at Princeton and has previously spent time at Google, Facebook, and Microsoft.

9:15 am - 9:30 am Berivan Isik, Stanford University
An Information-Theoretic Approach to Neural Network Compression [video] [slides]

Abstract:  As a simple and easy to implement method, pruning is one of the most established neural network (NN) compression techniques. Although it is a mature method with more than 30 years of history, there is still a lack of good understanding and systematic analysis of why pruning works well even with aggressive compression ratios. In this talk, I will explain how we answer this question by studying NN compression from an information-theoretic approach and show that rate distortion theory suggests pruning to achieve the theoretical limits of NN compression. Our derivation also provides an end-to-end compression pipeline involving a novel pruning strategy. That is, in addition to pruning the model, we also find a minimum-length binary representation of it via entropy coding. Our method consistently outperforms the existing pruning strategies and reduces the pruned model's size by 2.5 times.

Bio:  Berivan Isik is a PhD student in the Department of Electrical Engineering at Stanford University. She got her BS degree from Middle East Technical University in Electrical-Electronics Engineering Department in 2019. Her research interests include machine learning, data compression, information theory and coding theory. Her current focus is on neural network compression, federated learning and learned data compression. She is the recipient of the Stanford Graduate Fellowship.

9:30 am - 10:00 am Sanjeev Arora, Princeton University
Compression in Theory of Deep Learning: Two Applications [video] [slides]

Abstract:  We describe use of compression arguments in understanding generalization in machine learning. The first is a simpler alternative to traditional PAC-Bayes bounds (also an information-theoretic method) that yields new insights into the overparametrization mystery of deep learning. The second is "Rip van Winkle's Razor", a new approach to Adaptive Data Analysis (Dwork et al'15) a field devoted to understanding the phenomenon of fitting to the test set (e.g., millions of deep models being trained using the same publicly available training/test set, as happens for popular datasets in machine learning as well as some scientific fields).
Paper 1 is joint work with Rong Ge, Behnam Neyshabur and Yi Zhang. (ICML'18)
Paper 2 is joint work with Yi Zhang. (Manuscript '21)

10:00 am - 10:15 am Break
10:15 am - 10:45 am Emily Leproust, Twist Biosciences
DNA Storage for Digital Preservation [video]

Abstract:  Learn why DNA based storage is no longer SciFi and why it might be the holy grail of cold archival storage. We will go over the technology, its' advantages over traditional storage mediums, and the potential for a sustainable solution for the explosive growth of data we are generating.

Bio:  As an early pioneer in the high-throughput synthesis and sequencing of DNA, Dr. Leproust is disrupting markets to enable the exponential growth of DNA-based applications including chemicals/materials, diagnostics, therapeutics, food and digital data storage. In 2020, BIO presented her with the Rosalind Franklin Award for Leadership. Foreign Policy named her one of their 100 Leading Global Thinkers and Fast Company named her one of the Most Creative People in Business. Prior to Twist Bioscience, she held escalating positions at Agilent Technologies where she architected the successful SureSelect product line that lowered the cost of sequencing and elucidated mechanisms responsible for dozens of Mendelian diseases. She also developed the Oligo Library Synthesis technology, where she initiated and led product and business development activities for the team. Dr. Leproust designed and developed multiple commercial synthesis platforms to streamline microarray manufacturing and fabrication. She serves on the Board of Directors of CM Life Sciences and is a co-founder of Petri, an accelerator for start-ups at the forefront of engineering and biology. Dr. Leproust has published over 30 peer-reviewed papers – many on applications of synthetic DNA, and is the author of numerous patents. She earned her Ph.D. in Organic Chemistry from University of Houston and her M.Sc. in Industrial Chemistry from the Lyon School of Industrial Chemistry.

10:45 am - 11:45 am Panel: DNA-based data storage [video]
Olgica Milenkovic, University of Illinois Urbana-Champaign

Bio:  Olgica Milenkovic is a professor of Electrical and Computer Engineering at the University of Illinois, Urbana-Champaign (UIUC), and Research Professor at the Coordinated Science Laboratory. She obtained her Masters Degree in Mathematics in 2001 and PhD in Electrical Engineering in 2002, both from the University of Michigan, Ann Arbor. Prof. Milenkovic heads a group focused on addressing unique interdisciplinary research challenges spanning the areas of algorithm design and computing, bioinformatics, coding theory, machine learning and signal processing. Her scholarly contributions have been recognized by multiple awards, including the NSF Faculty Early Career Development (CAREER) Award, the DARPA Young Faculty Award, the Dean's Excellence in Research Award, and several best paper awards. In 2013, she was elected a UIUC Center for Advanced Study Associate and Willett Scholar while in 2015 she was elected a Distinguished Lecturer of the Information Theory Society. In 2018 she became an IEEE Fellow. She has served as Associate Editor of the IEEE Transactions of Communications, the IEEE Transactions on Signal Processing, the IEEE Transactions on Information Theory and the IEEE Transactions on Molecular, Biological and Multi-Scale Communications. In 2009, she was the Guest Editor in Chief of a special issue of the IEEE Transactions on Information Theory on Molecular Biology and Neuroscience.

Henry Lee, Kern Systems

Bio:  Henry is Co-founder and CEO of Kern Systems, Inc., an early stage start-up from George Church's lab at Harvard Medical School to store information using biology. He is broadly interested in developing technologies that harness biology’s unique properties of self-replication, massive parallelization, and programmable atomic-level precision.

Daniel Chadash, Twist Biosciences

Bio:  Daniel co-founded Genome Compiler to democratize synthetic biology and enable scientists to focus on real science by giving them innovative software tools. Daniel was the VP Product of Genome Compiler before it was acquired by Twist Bioscience in 2016. At Twist, Daniel led the digital products and business systems and created a true end-to-end digital DNA e-commerce system to allow Twist to scale its business. Since late 2019 Daniel is leading the commercial side of Twist's DNA Data Storage product, including the product aspects and business development. Daniel is one of the founders of the DNA Data Storage Alliance that was launched in 2020 to accelerate the adoption and awareness of DNA as a storage medium.

Sergey Yekhanin, Microsoft

Bio:  Sergey Yekhanin received his Specialist Diploma from Moscow State University in 2002, and his Ph.D. from MIT in 2007. In 2007-2008 he was a Member of the School of Mathematics at the Institute for Advanced Study at Princeton. In 2008 Dr. Yekhanin joined Microsoft Research where he is currently a senior principal researcher in the MSR Redmond Algorithms group. Dr. Yekhanin's current research interests are in differential privacy, coding for DNA data storage, and coding for distributed data storage. Dr. Yekhanin is a recipient of the ACM Doctoral Dissertation Award (2007) and the IEEE Communications Society and Information Theory Society Joint Paper Award (2014). He has been an invited speaker at the 2014 International Congress of Mathematicians.

Shubham Chandak, Stanford University
11:45 am - 11:50 am Janani Balasubramanian, Stanford University
Artistic inquiry and DNA storage [video]

Abstract:  Artist Janani Balasubramanian has been in residence with the Stanford compression forum throughout 2021. Here, they will offer early thoughts on their ongoing artistic work on DNA storage--particularly as this emerging technology relates to questions of time, compression, and marvel.

Bio:  Janani Balasubramanian is an artist and researcher working at the intersections of contemporary art and science. Janani's work is rooted in years-long, invited collaborations with scientists, through which they discover how artistic inquiry can meet, expand, and provoke new thought in relation to a given scientific discipline.

Janani's work has been presented and/or commissioned by over 160 venues across North America and Europe, including The Public Theater, MOMA, Andy Warhol Museum, Red Bull Arts, Ace Hotel, Brooklyn Museum, High Line, and the Metropolitan Museum of Art. They have been an Innovator-in-Residence at Colorado College; Brooklyn College/Tow Foundation artist resident; artist-in-residence at the University of Colorado; Sundance Institute Fellow; MAP Fund grantee; a Pioneer Works Narrative Arts Fellow; a Jerome Hill Artist Fellow; a Critical Media Mellon Fund Grantee at Harvard University; and Van Lier Fellow at the Public Theater.

Janani is currently a Hemispheric Institute fellow at NYU; artist-in-residence in the brown dwarf astrophysics group at the American Museum of Natural History; 2021 visiting artist at Stanford University’s Institute for Diversity in the Arts; a resident artist with the Stanford Compression Forum; Pew Foundation grantee through the Academy of Natural Sciences; inaugural Collider fellow at Lincoln Center for the Performing Arts; and member of the Guild of Future Architects.

11:50 am - 12:00 pm Break
12:00 pm - 12:30 pm Markus Meister, California Institute of Technology
Neural processing in the retina: Efficient coding or selective computation? [video] [slides]

Abstract:  The task of the visual system is to extract from about 1 Gbit/s of raw image information the 20 bits/s that actually matter for cognition and behavior. The role of the retina in this process has traditionally been interpreted as data compression: an encoder that faithfully transmits the image to the brain through a narrow data channel of 1 million optic nerve fibers. Indeed this "efficient coding theory" accounts for a remarkable range of phenomena. But more recent insights don’t fit into this picture. For example, there seem to be ~30 types of retinal ganglion cell, each of which completely tiles the visual field. What does each of these channels encode? How is that accomplished by the neural circuitry between photoreceptors and ganglion cells? And what is the functional benefit of such an early split of the visual pathway? I will sketch sample answers to these questions.

Bio:  Dr. Meister studied physics at the Technische Universität in München, Germany, then at Caltech, where he received a Ph.D. for research on bacterial motion with Howard Berg. He was introduced to the beauty and mysteries of the retina during post-doctoral research with Denis Baylor at Stanford University. In 1991, Dr. Meister took a professorship at Harvard University, where he worked until his return to Caltech in 2012. Dr. Meister studies the function of large neuronal circuits, with a focus on the visual and olfactory sensory systems. Early in his career he pioneered the use of multi-electrode arrays for parallel recording from many of the retina’s output neurons. Together with new approaches to visual stimulation, this helped reveal how much visual processing is accomplished in the retina. His work extended to both smaller and larger scales of organization: on the one hand the circuit mechanisms of visual computations, on the other the role of retinal computation for visually guided behavior. In recent years, Meister’s group has been exploring population coding in the mammalian superior colliculus to understand the next stage of visual processing. Meister also serves on advisory boards of research organizations and foundations including the Allen Brain Institute, the Howard Hughes Medical Institute, the Max Planck Institute for Neurobiology, Cold Spring Harbor Laboratory, the Pew Scholars Program, the Helen Hay Whitney Foundation, and the McKnight Endowment Fund for Neuroscience.

12:30 pm - 12:45 pm Lisa Yamada, Stanford University
Seizure detection using human intracranial electrophysiology via compression

Abstract:  Of the 1% of the world population with epilepsy, one-third have refractory epilepsy, in which their only option to manage seizures is a high-risk surgery to remove seizure onset zones (SOZs, brain regions that are most likely to cause seizures). The current state of epilepsy treatment heavily relies on manual evaluation of EEGs by epileptologists and lacks interventions that leverage their rich information. To combat this limitation, we introduce an information theoretic estimate of joint entropy called the inverse compression ratio (ICR) as a potential quantitative EEG (qEEG) method. With our data repository of continuous, 10kHz intracranial EEGs acquired from clinical neuromonitoring studies of adult and pediatric participants, we study the relationship between ICR and seizure activity. When comparing ICR across time, we observed a sharp peak at seizure onset, followed by a dip before returning to baseline. Furthermore, when analyzing characteristics of ICR peaks that occurred at seizure onsets (e.g. peak amplitude) across intracranial channels, we observed prominent changes that distinguished channels located in SOZs. When using ICR to perform seizure detection, we found an average sensitivity/specificity (SE/SP) of 81%/98% across 5 participants with a total of 30 seizures. In comparison, the average SE/SP of sample entropy, approximate entropy, and variance were 29%/97%, 36%/97%, and 45%/96%, respectively. Our results demonstrate that our information theoretic measure of ICR performs comparably to - if not better than - other qEEG measures, suggesting their potential in seizure detection and localization. Previous studies on the entropy of epileptic EEGs may not have detected the brief spike in information content at seizure onset due to signal quality or limitations in sampling rate. Implementing robust qEEG techniques to clinical practice may offload labor-intensive tasks from clinicians and uncover EEG features that cannot be detected by eye, broadening our understanding of epilepsy and improving therapy.

Bio:  Lisa Yamada is a PhD graduate student working with Dr. Paul Nuyujukian in the Brain Interfacing Laboratory at Stanford University. She is interested in applying engineering tools for medical applications, and her research focuses on the analysis of human electrophysiology data (i.e. intracortical EEGs of participants with refractory epilepsy) for seizure detection, localization, and prediction. Yamada earned her BS in Electrical Engineering and Mathematics from Trinity College (Hartford, CT) and her MS in Electrical Engineering from Stanford University.

12:45 pm - 1:15 pm Rashmi Vinayak, Carnegie Mellon University
Learning-Based Coded-Computation [video] [slides]

Abstract:  Recent advances have shown the potential for coded computation to impart resilience against slowdowns and failures that occur in distributed computing systems. However, existing coded computation approaches are either unable to support non-linear computations, or can only support a limited subset of non-linear computations while requiring high resource overhead. In this work, we propose a learning-based coded computation framework to overcome the challenges of performing coded computation for general non-linear functions. We show that careful use of machine learning within the coded computation framework can extend the reach of coded computation to impart resilience to more general non-linear computations. We showcase the applicability of learning-based coded computation to neural network inference, a major workload in production services. Our evaluation results show that learning-based coded computation enables accurate reconstruction of unavailable results from widely deployed neural networks for a variety of inference tasks such as image classification, speech recognition, and object localization. We implement our proposed approach atop an open-source prediction serving system and show its promise in alleviating slowdowns that occur in neural network inference. These results indicate the potential for learning-based approaches to open new doors for the use of coded computation for broader, non-linear computations.

Bio:  Rashmi K. Vinayak is an assistant professor in the Computer Science department at Carnegie Mellon University. Rashmi is a recipient of NSF CAREER Award 2020-25, Tata Institute of Fundamental Research Memorial Lecture Award 2020, Facebook Distributed Systems Research Award 2019, Google Faculty Research Award 2018, Facebook Communications and Networking Research Award 2017, Eli Jury Award 2016 from UC Berkeley EECS for outstanding achievement in the area of systems, communications, control, or signal processing, IEEE Data Storage Best Paper and Best Student Paper Awards for the years 2011/2012. Her research interests broadly lie in computer/networked systems and information/coding theory, and the wide spectrum of intersection between the two areas. Her current focus is on addressing reliability, availability, scalability, and performance challenges in large-scale distributed systems. The key application thrusts include storage and caching systems, systems for machine learning, and live streaming communication.

Rashmi received her Ph.D. from UC Berkeley in 2016 where she worked on resource-efficient fault tolerance for big-data systems, and was a postdoctoral scholar at UC Berkeley's AMPLab/RISELab from 2016-17. During her Ph.D. studies, Rashmi was a recipient of Facebook Fellowship 2012-13, the Microsoft Research PhD Fellowship 2013-15, and the Google Anita Borg Memorial Scholarship 2015-16.

1:15 pm - 1:45 pm Gonzalo Navarro, University of Chile
Repetitiveness and Indexability [video]

Abstract:  Compressed indexes for highly repetitive text collections can reduce the data size by orders of magnitude while still supporting efficient searches. Compression of this kind of data requires dictionary-based methods, because statistical compression fails to capture repetitiveness. Unlike statistical compression, where the state of the art is mature and indexes reaching entropy size are already several years old, there is not even a clear concept of entropy for highly repetitive collections. There is a wealth of measures, some more ad-hoc and some more principled. Some relations are known between them, other relations are unknown. It is known that no compressor can reach some measures, it is known how to reach others, and for some it is unknown whether this is possible. From the reachable ones, some allow random access to the compressed text, for others it is unknown how to do it. Finally, some admit indexed searches, for others we do not know if this is possible. In this talk I will survey this zoo of measures, show their properties and known relations, show what is known and unknown about them, and point out several open questions that relate repetitiveness with indexability.

Bio:  Gonzalo Navarro completed his PhD in Computer Science in 1998 at the University of Chile, where he is currently full professor. His areas of interest include algorithms and data structures, compression, and text searching.
He has directed the Millennium Nucleus Center for Web Research, RIBIDI (an Ibero American project funded by CYTED), and a project funded by Yahoo! Research, apart from smaller projects. He has participated in various research projects, such as the Millennium Institute for Cell Dynamics and Biotechnology, an ECOS/CONICYT project (Chile-France cooperation), AMYRI (a CYTED project), and a Fondef project. He currently participates in the Center for Biotecnology and Bioengineering (CeBiB) and the Millennium Institute for Foundational Research on Data (IMFD)
He has been PC (co-)chair of several conferences: SPIRE 2001, SCCC 2004, SPIRE 2005, SIGIR 2005 Posters, IFIP TCS 2006, a track of ENC 2007, SISAP 2008, SISAP 2012, LATIN 2016, SPIRE 2018, and CPM 2018. He co-created SISAP on 2008, and was Steering Committee member of SPIRE, LATIN, and SISAP. He is the Editor in Chief of the ACM Journal of Experimental Algorithmics and a member of the Editorial Board of Information Retrieval and Information Systems. He has been guest editor of special issues in ACM SIGSPATIAL, Journal of Discrete Algorithmics, Information Systems, and Algorithmica. He has given around 50 invited talks in several universities and international conferences, including 12 plenary talks and 5 tutorials in international conferences. He created in 2005 the Workshop on Compression, Text, and Algorithms, which has become a permanent satellite of SPIRE. He is an ACM Distinguished Member.
He has coauthored two books published by Cambridge University Press, about 25 book chapters, 10 proceedings of international conferences (editor), more than 160 papers in international journals, and over 240 in international conferences. He is one of the most prolific and highly cited authors in Latin America.

1:45 pm - 2:00 pm Break
2:00 pm - 2:30 pm Ioannis Kontoyiannis, University of Cambridge
Compression with Different Types of Side Information [video] [slides]

Abstract:  We consider the problem of data compression when common side information is available to both the compressor and decompressor.
Two different versions of this problem are considered:
1. Reference-based compression, when a single side information string is used repeatedly to compress different source messages (as, e.g., in genomic compression); and
2. Pair-based compression, where a different side information string is used for each source message (as, e.g., in file synchronisation). We find, perhaps somewhat surprisingly, that in the practical, non-asymptotic regime, the best achievable compression performance in these two settings is fundamentally different.
This is joint work with Lampros Gavalakis.

Bio:  Ioannis Kontoyiannis was born in Athens, Greece, in 1972. He received the B.Sc. degree in mathematics in 1992 from Imperial College (University of London), and in 1993 he obtained a distinction in Part III of the Cambridge University Pure Mathematics Tripos. In 1997 he received the M.S. degree in statistics, and in 1998 the Ph.D. degree in electrical engineering, both from Stanford University. In 1995 he worked at IBM Research, on a NASA-IBM satellite image processing and compression project. From 1998 to 2001 he was with the Department of Statistics at Purdue University (and also, by courtesy, with the Department of Mathematics, and the School of Electrical and Computer Engineering). Between 2000 and 2005 he was with the Division of Applied Mathematics and with the Department of Computer Science at Brown University. Between 2005 and 2021 he was been with the Department of Informatics of the Athens University of Economics and Business. Between 2018 and 2020 he was with the Department of Engineering of the University of Cambridge, where he held the Chair of Information and Communications, and he was Head of the Signal Processing and Communications Laboratory. In 2020 he joined the Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge, where he is the Churchill Professor of Mathematics. He is a Fellow of Darwin College.

In 2002 he was awarded the Manning endowed assistant professorship; in 2004 he was awarded the Sloan Foundation Research Fellowship; in 2005 he was awarded an honorary Master of Arts Degree Ad Eundem by Brown University; in 2009 he was awarded a two-year Marie Curie Fellowship; in 2011 he was elevated to the grade of IEEE Fellow. He has published over 150 articles in leading international journals and conferences. He also holds two U.S. patents. He has served on the editorial board of the American Mathematical Society's Quarterly of Applied Mathematics journal, the IEEE Transactions on Information Theory, Springer-Verlag's Acta Applicandae Mathematicae, the book series Lecture Notes in Mathematics by Springer-Verlag, and the online journal Entropy. He has served as a chair or member of the program committee of numerous IEEE conferences, and he also served a short term as Editor-in-Chief of the IEEE Transactions on Information Theory.

His research interests include applied probability, information theory, statistics, data compression, and mathematical biology.

2:30 pm - 3:00 pm Yann Collet, Facebook
Data Compression usages in Data Centers [video]

Abstract:  Data compression is a technology with many fields of applications. In data centers, it can be used as both a cost saving and a performance enhancer technology. But to do what exactly?
Let's wander some time into the world of large Cloud infrastructures, and discover the many applications that churn non-stop on server racks, their usage of Data Compression, and their expectation with regards to algorithm characteristics and efficiency impact.

Bio:  Yann Collet is leading the Data Compression team at Facebook. He is known for the development and support of 'LZ4' and 'Zstandard' compression algorithms, widely deployed in data centers and beyond.

3:00 pm - 3:30 pm Jonathan Dotan, HBO; Stanford University
Restoring Trust in our Digital Age with Compression [video]

Abstract:  For 78 days, culminating in the 2021 Presidential Inauguration, teams at Stanford and USC’s Starling Lab worked together with Reuters to document the presidential transition with an array of new image authentication technologies and decentralized web protocols to head off the challenges of fake news and altered digital photos that mark our era of distrust in digital media. The prototype archive we created leveraged the new Adobe-led Content Authenticity Initiative (CAI) to store information about a photo's provenance directly in the photo itself using the new JPEG universal metadata box format. The CAI standard builds upon long-standing photo metadata projects, like EXIF, IPTC, and XMP, and marks a significant opportunity to reintroduce this information back into photos on social media and web platforms that have long stripped out metadata for security reasons or left them inaccessible to end-users. Various challenges abound as the CAI photos are now burdened with being repositories of image pixels, metadata, and also secure chains of custody. Compression will play a key role in making this standard viable and effective. In doing, compression will yield a complex image of trust. The technology carries the promise to establish and restore trust but also creates new risks of surveillance and new vulnerabilities for journalists. Yet, the call to understand how it could contribute to improving the project of news gathering is compelling — and vital.

Bio:  Jonathan Dotan is a Fellow at the Stanford Compression Forum and Stanford’s Center for Blockchain Research. He researches and lectures on applied strategy and policy for the decentralized web. For the last two years, he has led the creation of the Starling Framework for Data Integrity, a comprehensive set of tools and principles that empowers organizations to securely capture, store and verify human history. He brings to Stanford over 20 years of experience navigating the intersections of media, tech, and policy. Jonathan recently wrapped six seasons writing and producing HBO’s Emmy Award-winning series, SILICON VALLEY. He received a BA in Information Policy from UCLA, and an MPhil in International Relations from Oxford University, St. Antony’s College.

3:30 pm - 3:45 pm Andrey Norkin, Netflix
Current video compression challenges [video]

Abstract:  Digital video compression has been deployed on a large scale for more than two decades. A number of video coding standards have been developed, each improving compression over the previous generation. Compression is not becoming less important though since the share of video in the internet traffic is steadily increasing. The talk will cover challenges with compressing video nowadays and highlight some of the ongoing efforts in the video compression field. It will also mention some problems that are faced when encoding video content at Netflix.

Bio:  Andrey Norkin is a Senior Research Scientist at Netflix, USA, working on new video compression algorithms, encoding techniques for OTT video streaming, and High Dynamic Range (HDR) video. Previously, he was with Ericsson, Sweden and UK, conducting research on video compression and 3D video. He participated in ITU-T and MPEG efforts on developing video compression standards, including HEVC, its extensions, and VVC. He has been actively contributing to the Alliance for Open Media (AOM) development of the AV1 video codec and is currently a co-chair of the AOM Codec Working Group.
He received the M.Sc. degree in computer engineering from Ural State Technical University in 2001 and the Ph.D. in signal processing from Tampere University of Technology in 2007.

3:45 pm - 4:00 pm Tsachy Weissman, Stanford University
Closing Remarks [video]
4:00 pm - 5:00 pm Social hangout


2021 workshop organizers