My Rubik's Cube is a Game Controller
My Rubik's Cube is a Game Controller: The Story of Snail Mail #
All good projects start with a bad idea. My first one was a DJ deck controlled by a Rubik's Cube. That was a non-starter though, reading a cube in real-time is tricky ... and I'm a terrible musician (no matter how much I played around with Strudel). So, after that glorious failure, I pivoted.
The result is "Snail Mail," my entry for the Google Cloud Run Hackathon. It's a game where you play a postal snail (called Steve) saving up for a questionable vacation to Salt Lake City. The fun mechanic is that the world is your puzzle, and your controller is a real-life Rubik's Cube.
The Tech Stack: An Excuse to Try New Toys #
For the frontend, I grabbed Phaser.js. I’d been wanting to try a real game engine for years, and a hackathon is the perfect excuse to do something slightly ill-advised. It handles all the sprites, physics, and gameplay logic.
The backend is a Flask API. Why? Because it's Python, the language of AI, and frankly, I knew I could get a server running before my coffee got cold. This API would be the brains of the operation, tasked with the grand challenge of understanding my cube.
The Absurdity of Computer Vision #
The core problem was simple: how do I get a computer to reliably read the colors on a 3x3 grid in my home-office? My workspace lighting is aggressively yellow, meaning my webcam's white balance has a mind of its own.
My solution is a two-stage system running on a Google Cloud Run service:
- The Fast Intern (OpenCV): When the player takes a picture of their cube, it first goes to an OpenCV script. It’s great for a first pass and gets it right most of the time.
- The Senior Engineer (Gemma 12B): If OpenCV throws its hands up in confusion (which, at night, it often does), the image gets escalated to a much beefier, self-hosted Gemma 12B vision model. It’s slower, but it can tell the difference between yellow and "sad beige-in-bad-lighting."
Before I landed on this, I tried doing cube detection directly in the browser with TensorFlow.js. It was chaos. My code spent too many CPU cycles trying to figure out the colour of the "tiles" on my coffee mug / frisbee / keyboard. I ended up continuing to use TensorFlow.js ... just to do object detection. And leaving the more tricky stuff up to the server to figure out.
Giving Enemies an Attitude Problem #
What’s a game without enemies that can throw a little shade? Another, much smaller, Cloud Run service hosts a tiny Gemma 1B model. Its sole purpose in life is to generate sassy trash talk for the enemies (ducks/toads/snakes) you encounter. The combat is quick, so the taunts needed to be delivered rapidly.
Life on Cloud Run #
Thankfully, deploying all this on Cloud Run was the easy bit. The auto-scaling is magic-it just works... though I've only got 1 occasional player at the moment (me) so that's not a problem (yet!!). Google Secret Manager was also a lifesaver, letting me keep my API keys out of a public git repo without any drama.
The one part frustrated me was getting the services to talk to each other securely. I didn't want my LLM endpoints open to the entire internet. After lots of frustrating searching, the google-auth library came to the rescue, allowing me to create an authenticated handshake between the main API and the model services. I can't quite describe how satisfying it was to get that working.
So, Does it Actually Work? #
Of course! Well, at least as well as it can for a Hackathon project! The client side object detection is still a bit sketchy. And I'm eyeing up moondream as an alternative to Gemma (shh don't tell Google!) for the vision task. But I think the game completely changes how you interact with a Rubik's Cube. You stop thinking "how do I solve this?" and start thinking, "Okay, I need a 3x3 pattern of fire, water, and regret to get past this duck. How do I twist this plastic nightmare to make that happen?"
It’s wonderfully non-trivial and, I hope, a fun new way to look at a classic puzzle. Now, if you'll excuse me, I have some design crimes to undo and a mobile version to dream about. Also... did someone say multiplayer??