Cloud / AI

The Real Pain of Building Software with AI in the Cloud

Aaron Walker

6 Minute Read

There’s a growing narrative that AI has “solved” software development. Faster coding, instant scaffolding, fewer engineers needed. But if you look closely at teams actually pushing AI into real-world cloud environments, especially on AWS, a very different picture emerges.

The bottleneck isn’t generating code. It’s everything around it.

The Illusion: AI Makes Cloud Development Easy

On the surface, AI tools feel magical. They generate Lambda functions in seconds, scaffold infrastructure-as-code, and explain unfamiliar AWS services instantly. This creates a compelling illusion: that building in the cloud is now primarily a code generation problem.

It isn’t.

Because real cloud engineering has never just been about writing code. It is about systems, safety, and operating complex distributed environments at scale.

The Reality: AI Breaks Down at the Edges of Real AWS Systems


When AI meets real AWS environments, friction appears quickly.

Lack of Context

AI tools typically operate without access to the live state of your AWS environment. They do not know which feature flags are enabled, what is happening inside your VPC, how your IAM policies are structured, or what your production traffic actually looks like.

  As a result, even technically correct outputs can be operationally useless or, worse, misleading. A generated Terraform module or CDK construct might look valid while completely ignoring the constraints of your organisation’s networking, security boundaries, or deployment patterns.

Trust and Safety in Production Environments

In AWS, everything is an API call away from impact. Giving an AI agent access to your environment raises immediate concerns. What happens if it modifies the wrong resource? What if it accesses sensitive data in S3 or queries the wrong DynamoDB table? Who is accountable for the action?

  Without strong guardrails, AI is either too dangerous to use or so restricted that it becomes ineffective. The challenge is not enabling capability, but constraining it safely.

Misalignment with Cloud Workflows

Cloud engineering is deeply tied to workflows such as CI/CD pipelines, infrastructure provisioning, change management, and incident response. Most AI tools are designed to complete isolated tasks, not to operate within these workflows. They can generate a CloudFormation template, but they do not enforce your pull request standards, validate deployment intent, or integrate cleanly into your release process.

This creates a gap between AI-generated output and production-ready changes.

No Observability into AI Usage

In AWS, we obsess over observability. We instrument systems with CloudWatch, X-Ray, and OpenTelemetry. Yet when it comes to AI usage, many organisations have no equivalent visibility. They cannot see where AI is helping, where it is failing, or how engineers and non-engineers are actually using it.

  Without this, there is no feedback loop. AI adoption becomes anecdotal rather than engineered.

Repeated Failures Without Learning

The same issues appear again and again. AI suggests incorrect IAM policies, misunderstands networking configurations, or generates patterns that do not align with internal standards.

  But most teams lack a mechanism to capture these failures, classify them, and feed them back into the system. Without a structured loop of learning, the platform does not improve.

Shallow Understanding of Distributed Systems

AI is strong at generating snippets, but cloud systems are not built from snippets. They are built from interactions between services. Understanding how API Gateway, Lambda, DynamoDB, and EventBridge interact in a production system requires reasoning about latency, failure modes, retries, and consistency.

  Without structured guidance, AI tends to provide answers that are locally correct but globally fragile.

Environment and Tooling Discrepancies

Even at a basic level, AI struggles with the variability of real developer environments. Differences between local setups, CI runners, and AWS accounts introduce friction. A command that works in one environment fails in another. A deployment assumption breaks because of account-level configuration.

These small inconsistencies compound quickly and erode trust.

Permission Fatigue and Security Boundaries

AWS environments are built around fine-grained permissions. When AI agents are introduced, this often translates into constant approval prompts or overly broad access.

 Too many prompts interrupt flow. Too much access introduces risk. Static permission models do not adapt to real usage patterns, leaving teams stuck between friction and fear.

Knowledge Is Not Packaged

Every AWS organisation develops its own best practices for networking, security, observability, and deployment. Yet when using AI, this knowledge is rarely encoded.

 Instead, engineers repeatedly guide the model through the same constraints. The system never accumulates organisational expertise in a reusable way.

AI as a Tool, Not a Platform

Most organisations deploy AI in the cloud as a personal productivity tool. An engineer opens a chat window, asks a question, and gets an answer. This approach does not scale. It leads to inconsistent practices, fragmented capabilities, and limited organisational impact.

 In contrast, AWS itself is a platform. It standardises, governs, and scales capabilities across teams. AI needs to be treated in the same way.

The Shift: From AI Assistants to Cloud-Native AI Platforms

The teams getting real value from AI on AWS are not just adopting tools. They are building platforms.

  • They connect AI to real system context, allowing it to reason about production safely.
  • They enforce guardrails through IAM, policy checks, and controlled interfaces.
  • They instrument AI interactions just like any other workload, using telemetry to understand behaviour and drive improvements.
  • They create feedback loops where real usage informs new capabilities.
  • They package expertise into reusable skills that reflect how their organisation builds and operates systems on AWS

Most importantly, they embed AI directly into the software delivery lifecycle rather than treating it as an external assistant.

The Key Insight

The hard part of AI in cloud engineering is not intelligence. It is integration with real systems, real constraints, and real organisational practices. And more importantly, it is operationalisation.

AI is not replacing cloud engineers. It is exposing what cloud engineering actually is. It is not just writing infrastructure code or application logic. It is understanding systems, managing risk, encoding workflows, and continuously improving how software is delivered on platforms like AWS.

The teams that succeed will not be the ones with the best prompts. They will be the ones that build the best systems around their AI.

Final Thought

If your AI strategy in AWS starts and ends with choosing a model or adding a coding assistant, you are solving the wrong problem. The real question is this:

How do we turn AI into a reliable, observable, and continuously improving part of how we build and operate systems in the cloud?

Because that is where the real leverage is!

Before you invest further in AI tooling, ask yourself: Is your organisation set up to learn from AI usage, or just consume it?

If that question is uncomfortable, it’s probably worth a deeper conversation. Contact Us



More Blog Posts