HackGT: A New Experience

It’s been a while! I’m back here to update on a few things that’s happened. This year’s been crazy with COVID and the transition to college. Going to Georgia Tech is going almost as well as I imagined. This fall semester, it really took a few months to adjust to college, but now I’m feeling better than ever.

About a month ago, I entered HackGT, the annual hackathon hosted by Georgia Tech. I was joined with Morris Wan, Ethan Chee, and Kurt Hu. Our team name was “Two Over One,” which derives from the many, many games of Bridge that we played almost every day at the lunch and dinner table the first few months. Morris and I were partners-in-crime all the way back from high school, but we met Ethan and Kurt in the same building, and we’ve become close.

HackGT began on October 9th, a Friday afternoon.

This project was a wild ride. Coming into this hackathon with little information as to what we as a group wanted to create, it felt like an impossible task to get any meaningful work done within about the 36 hours. The first three hours were spent on hard brainstorming, splashing our mini-white board with as many ideas that we could think of. Simulation of living without glasses, simulation of colorblindness, a calorie fitness app, a Youtube addiction-curber, a Bridge AI. There were so many problems that we wanted to solve, but nothing stood out for all four of us. Right before dinner time hit, I thought about the incredibly cool animations of the old Windows Media Player had. You know, the swirly, trippy, colorful images that played in sync with your song? I thought that’d be incredibly cool to make. Kurt, though, thought of an even better idea: representing an audio file by a singular image. Not everyone was for this idea, initially, as we kind of thought of it as a joke concept at first. Like, how hard could it be to convert bits to bits? It’ll be easy weekend.

Challenges started to arise deep into Friday night though. As we thought about it more, we concluded it was a conflict of whether we wanted the image to be a one-to-one conversion or an artistic representation. On one hand, it would be extremely cool and useful to have an image completely represent a sound. It would be great for encoding information in the physical world and act as “QR code” for sound. On the other hand, the idea of seeing a computer-generated picture and immediately understanding what a song’s tone and vibe was extremely satisfying too. We went to sleep, nervous but excited, with an idea that we knew had a lot of potential.

Saturday morning, we drew up the semblance of a plan. The plan was to convert the song’s raw bit data into the fields of the pixels of the image, and that newly created image would then be multiplied by invertible transformation matrices created dependent on analyses of the audio’s unique characteristics (frequency, amplitude, key, length, etc). We each divided the work by section: Ethan primarily worked on the audio-to-image conversion algorithm, Morris worked on the developing our dynamic website, I worked on the transformations, and Kurt was a helper that greatly contributed to all three. In all three, we had major obstacles that weren’t foreseen.

Ethan developed two algorithms, one that converts a .wav file to an image and one that converts an image back into a .wav file. For the .wav to image conversion, we assumed that the .wav file would be a 16-bit file as that is what is most common. We needed to convert these 16-bit data points into a 32-bit pixel, so we did the following. We broke up bits 2-16 and assigned them to be bits in the RGB values of a pixel. We distributed the bits such that the most significant bits in the input were the most significant bits in the colors. We then padded the rest of the RGB values such that they were each 8 bits long. Next, we used the first bit (signed bit) in the input to determine the alpha value of the pixel. If the bit was positive the alpha was 255 and if negative the alpha was 128. The exact opposite process was used to turn an image back into a .wav file. The algorithms are designed such that if you run a file through one of the algorithms and then use the outputted file in the other algorithm, the final output will be the original file.

I looked at the characteristics of music and how to extract that information from an audio file. I was able to find the average frequency rather easily, as well as the average amplitude. The one obstacle that I faced was the finding the key. I innocently thought that it would be easy converting the song to a MIDI file and have an algorithm analyze the key. It was not. After 5 hours of searching, I came to the sinking conclusion that the conversion could only take place in Python 2.7, when the rest of the project was being written for Python 3.8. While searches of version control with virtualenv gave me hope, we simply didn’t have enough time to implement it. Saddened but not discouraged, I still wanted to overlay two sin waves of the sound file’s unique frequency and amplitude. But because so much time was spent as a group in developing the website and algorithm, these features had to be ultimately cut. Alas, a project made in less than 2 days won’t always go according to plan.

Morris and Kurt spent Saturday afternoon finding out exactly what kind of website solution we wanted to go through. We as a group had a little experience with web development, let alone dynamic website development. They juggled with the idea of Firebase with Google Cloud, but the ultimate solution was Flask. With a Frankenstein of google searches and snippets of a previous Flask solution that Ethan worked on, at midnight we achieved perfection: a website that could legitimately take a file upload and not die. We were so excited that I think Morris’s scream at 1 AM woke up the whole dorm.