AWS / AI

Hack Day: From Building AI to Building With AI

Jared Brook

4 Minute Read

Over the past twelve months, our Hack Days have had a recurring theme: AI. We've built predictive models, experimented with agent-to-agent protocols, and wired up enterprise search pipelines with Amazon Kendra and Bedrock. In each case, AI was the thing we were building. This time, we flipped the perspective. Instead of building AI solutions, we wanted to explore what it looks like to build with AI - using the latest generation of agentic tools to accelerate the kind of work we do every day.

The timing felt right. The tooling landscape has shifted significantly in a short window. Google shipped Antigravity, their agent-first Integrated Development Environment (IDE) that can plan, write, and test code across editor, terminal, and browser in a single loop. Cursor has matured alongside increasingly capable models like Claude Sonnet 4.6. Claude Code has emerged as a compelling terminal-native workflow. These tools aren't just autocomplete anymore - they're opinionated about how software gets built. We wanted to push the limits of what's possible with the latest capabilities and see how far these tools can take us today.

Putting Agentic IDEs to Work

Several teams gravitated toward Google Antigravity, which had just entered public preview. The draw was its browser automation - Antigravity doesn't just write code, it launches the result in a browser, interacts with the page, and validates its own output visually.

One team used this to automate form completion, pointing Antigravity at a Google Forms quiz and watching it read the questions, reason about the answers, and fill in the form. It worked, and it was a surprisingly effective way to explore how well the agent understood context from a rendered page versus raw markup.

Another team took on frontend development, building a UI component and using Antigravity's browser testing loop to iterate - write the component, render it, screenshot the result, adjust.

A third team went further and attempted a full-stack web application from a natural language prompt. The scaffolding came together fast, though the team found that precision dropped as the codebase grew and the agent had more context to juggle.

On the Cursor side, a team explored automated documentation generation. The experiment paired Cursor running Claude Sonnet 4.6 with Scalar, an API documentation tool, to generate OpenAPI specs and human-readable docs directly from an existing codebase. The results were solid - the model was able to trace through route handlers, extract parameter types, and produce spec-compliant output with minimal correction. The manual part was reviewing what it generated, not writing it.

Another team dug into Claude Code tips and tricks, drawing from a recent walkthrough by Anthropic. The focus was on understanding how to structure prompts, manage context windows, and chain tasks effectively in a terminal-first workflow. The practical takeaway was that small adjustments to how you frame a task - being explicit about file scope, providing examples of desired output - have an outsized impact on the quality of what you get back.

Automating Internal Workflows

Not every experiment was about coding faster. A couple of teams turned their attention to internal tooling.

One built Cardify, an extension to base2bot - our internal Slack bot - that uses Amazon Bedrock to automatically generate Trello cards from Slack threads. The idea is straightforward: a support conversation happens in Slack, someone reacts with a trigger, and Cardify reads the thread, extracts the relevant details, and creates a structured Trello card with a title, description, and labels. It removes the context-switching of copying information between tools and reduces the chance of details getting lost in translation.

Another team tackled alert routing. We currently push health dashboard events through Amazon EventBridge into OpsGenie, but the filtering options at that boundary are broad - you can match on event type, but not much else. The team built an intermediate Lambda that sits between EventBridge and OpsGenie, adding fine-grained filtering logic. This means we can route alerts based on specific resource tags, severity combinations, or time-of-day rules without modifying the upstream event structure. It's a small architectural change that cuts through the noise and lets our support team focus on the alerts that actually matter.

What We Took Away

The consistent thread across every experiment was that AI-assisted development is no longer a novelty - it's becoming a practical default for certain categories of work. Documentation, boilerplate, form-based automation, and Slack-to-tool integrations are all areas where the current generation of tools produces usable output with relatively little oversight.

That said, the day also reinforced where the limits are. Agentic IDEs work best on bounded, well-defined tasks. The teams that had the most success were the ones that broke their problem into small, clear steps and gave the agent one thing to focus on at a time. When the scope expanded - a full-stack app, a complex multi-service integration - the agent's output required more intervention, and the productivity gain narrowed.

The biggest shift from our previous Hack Days was cultural. A year ago, the conversation was "What can we build with AI for our customers?" Now it's "How should we be using AI in how we build?" The tools have crossed a threshold where ignoring them means leaving real productivity on the table, but using them well still requires deliberate practice and an understanding of their constraints.

Curious about how AI-assisted development could fit into your team's workflow? Get in touch - we're happy to share what we've learned.

 



More Blog Posts