Every few months, we set aside a day to get playful with technology. Our Hack Days are about experimenting, exploring, and sometimes outright breaking things to see what happens. This time, the spotlight was on Agent-to-Agent (A2A) protocols, framed around a deceptively simple challenge: a “20 Questions” game.
The idea was straightforward. Each of us built an agent that could connect to a shared game server and try to guess the secret word in 20 yes-or-no questions. But the fun wasn’t just in playing, it was in seeing how far we could push the system. What tricks could we use to cheat? How quickly could we close the loopholes once they were discovered?
Some of us went for direct tool abuse, discovering ways to exploit exposed tools, instantly forcing a win or revealing the secret word. Others leaned into social engineering, tricking the agent into giving up answers by pretending to be testers. A few went hunting in the code itself, pulling word lists straight from the source.
And then there were the “unintended features”: agents that didn’t crash when things broke but instead hallucinated their way forward, making guesses as though nothing was wrong.
It became clear that the system, fun as it was, had plenty of weak spots.
Every time we found a weakness, we patched it. When tools exposed too much, we hid sensitive functions behind another agent. When social engineering worked, we split roles so that one agent handled the game while another acted as a hidden checker. When memory leaked between clients, we experimented with isolated sessions to prevent crosstalk. Lastly, when hallucinations crept in, we tightened server-side validation so the rules couldn’t be bent.
By the end of the day, the server was far sturdier than the version we started with. This was proof that we weren’t just poking holes for fun but actively learning how to plug them.
As much fun as it was to poke holes in a trivia game, these experiments revealed truths that apply directly to the work we do with clients.
Here are the key lessons we learnt:
Together, these changes add up to more secure, more reliable agent-powered systems, exactly the kind of resilience our clients need.
Not every experiment was about breaking the server. Some participants also explored how to make their agents play better, using clever strategies like binary search and grouped questioning to cut down the number of guesses. While these techniques wouldn’t fly in a production system (they’re essentially “cheats”), they showed how design and logic can dramatically improve performance.
It was a playful reminder that efficiency isn’t just about infrastructure, sometimes it comes down to how you frame the problem.
This Hack Day showed us both sides of working with agents: how fragile they can be when left unguarded, and how much stronger they become once you fix the cracks. The real value wasn’t just in breaking the system, it was in building it back better.
That’s what makes Hack Days so powerful. They give us a safe space to innovate, pushing limits, exposing weaknesses, and engaging in an iterative break-and-build process that strengthens our client systems. Proving that when you know how something breaks, you know how to make it stronger.
Curious how lessons from experimenting with AI agents can strengthen your own agent workflows? Explore our Agentic Ops Factory to see how we turn these insights into resilient, optimized AI operations with built-in guardrails. Reach out to our team to discuss how we can help you create safe and effective AI agents.