Home / Blog / Agile Work Meets AI: Why Traditional Methods Fail in Automation

Agile Work Meets AI: Why Traditional Methods Fail in Automation

Why Scrum and Kanban Fail in AI Projects
The Three Fundamental Differences Between Software and AI Development
AI-first Ways of Working: What Comes After Scrum
Concrete Ways of Working for AI-Driven Teams in Practice
Tool Stack and Processes: What Really Works
Common Mistakes When Transitioning to AI-Driven Workflows
Frequently Asked Questions

Last week, I almost lost my mind.

My client insisted on squeezing their AI project into 2-week sprints.

That’s how we did it when developing our app, he said.

I had to explain: AI is not software development.

After three failed sprints and a frustrated development team, we changed course completely.

The result: In six weeks, we accomplished more than in the three preceding months.

Today, I’ll show you why classic agile methods fall short for AI projects—and which approaches actually work.

Why Scrum and Kanban Fail in AI Projects

Scrum was invented for software development.

AI follows different laws.

Ignoring that is the most common mistake I see with my clients.

The Sprint Problem: AI Doesn’t Progress Linearly

A sprint lasts two weeks.

A machine learning model might need three weeks before yielding any useful results.

What do you present in weeks one and two? We’re still training as the sprint outcome?

I’ve seen this happen.

A dev team went through six sprints and produced almost nothing to show for it—even though they did excellent work.

The issue: AI development operates on different time cycles.

Sometimes, data preparation takes four weeks, then the first model works right away.

Sometimes you train 50 different approaches before one finally works.

This doesnt fit into two-week rhythms.

Backlog Chaos: When Algorithms Reprioritize for You

Product backlogs work for features.

With AI, priorities shift based on what you discover.

Here’s a real-life example:

We wanted to build a sentiment analysis tool for customer feedback.

Five features were planned:

Positive/Negative Classification
Emotion Detection
Topic Extraction
Trend Analysis
Dashboard

After the initial data exploration, we discovered 60% of our data was unusable.

Suddenly, data cleaning became priority number one.

That wasn’t on the backlog anywhere.

In AI projects, the data often dictates the next step—not the product owner.

Daily Standups Become Weekly Standups (and That’s Okay)

What did you do yesterday?

Tuned hyperparameters.

What will you do today?

Tune hyperparameters.

Are there any blockers?

Training is running for another 12 hours.

This is what daily standups look like in AI teams.

Pointless.

AI development has longer feedback loops.

Training a model can take days.

Evaluating an A/B test requires statistical significance—which can mean weeks.

Daily syncs make no sense when nothing substantial changes each day.

The Three Fundamental Differences Between Software and AI Development

Software development: You know what you’re building.

AI development: You know what you’re trying out.

That’s the core difference.

Unpredictability Instead of Planability

In software, you write code—it works (or it doesn’t).

In AI, you train a model, it performs at 73%—and you have no idea why.

Not because of bad code.

But due to unexpected data issues or model performance.

You can’t plan when a model will hit your desired accuracy.

You can only experiment and iterate.

That makes classic project planning impossible.

Experimental vs. Linear Development Process

Software development follows a linear process:

Requirements → Design → Implementation → Testing → Deployment

AI development is an experimentation loop:

Hypothesis → Experiment → Evaluation → Learning → new hypothesis

I spend 80% of my time on AI projects running experiments that fail.

This is normal.

This is even good.

Every failed experiment brings you closer to the solution.

For software, 80% failed code would be unacceptable.

For AI, it’s the way to success.

Data Quality as a Critical Success Factor

Software works fine with synthetic test data.

AI lives and dies by real, high-quality data.

I’ve seen projects where a team wrote perfect code for six months.

The model was useless anyway.

Because the data was poor.

In AI projects, you spend 60–80% of your time on data tasks:

Data collection
Data cleaning
Data labeling
Data validation
Building data pipelines

You won’t find this in any software sprint plan.

But without it, AI doesn’t work.

AI-first Ways of Working: What Comes After Scrum

I’ve spent the past three years trying different approaches for AI teams.

Here’s what actually works.

Hypothesis-driven Development Instead of User Stories

Forget user stories.

As a user, I want to… doesn’t work for AI.

Instead: Hypothesis-driven development.

Each development phase starts with a measurable hypothesis:

If we add Feature X, model accuracy improves by at least 5%.

Or:

If we use Algorithm Y, training time is reduced by 50%.

Every hypothesis has:

A measurable metric
A target value
A time estimate for the experiment
Success/failure criteria

That turns We’ll optimize the model into a concrete experiment with a clear outcome.

Continuous Experimentation as a Core Process

In software, you develop features.

In AI, you run experiments.

The crucial process isn’t sprint planning—it’s experiment design.

Our standard experimental workflow:

Hypothesis Definition (1 day)
Experiment Setup (2–3 days)
Execution (variable: 1 day to 2 weeks)
Evaluation (1–2 days)
Decision (continue/discard/pivot)

Important: Every experiment is documented.

Even the failures.

Especially the failures.

We keep an Experiment Log with:

Hypothesis
Approach
Result
Learnings
Next steps

This becomes the team’s most valuable asset.

Data-centric Workflows for AI Teams

Software teams organize themselves around features.

AI teams must organize around data.

Our workflow doesn’t run through a Kanban board, but a data pipeline:

Phase	Responsible	Output	Quality Criteria
Data Collection	Data Engineer	Raw Dataset	Completeness, Recency
Data Cleaning	Data Scientist	Clean Dataset	<5% Missing Values
Feature Engineering	ML Engineer	Feature Set	Correlation with Target
Model Training	Data Scientist	Trained Model	Target accuracy achieved
Model Deployment	ML Engineer	Production Model	Latency < 200ms

Each phase has clear handover criteria.

Nothing is done without measurable quality.

Concrete Ways of Working for AI-Driven Teams in Practice

Enough theory.

Here are the workflows I use every day.

The 3-Phase Method for AI Projects

I break every AI project into three phases:

Phase 1: Discovery (25% of the time)

Goal: Understand what’s possible.

Activities:

Data exploration
Proof of concept
Feasibility assessment
Initial baseline models

Success metric: Is the problem solvable with AI?

Phase 2: Development (60% of the time)

Goal: Build the best possible model.

Activities:

Iteratively improving models
Feature engineering
Hyperparameter optimization
Cross-validation

Success metric: Target accuracy achieved.

Phase 3: Deployment (15% of the time)

Goal: Get the model into production.

Activities:

Model packaging
API development
Monitoring setup
A/B testing

Success metric: Model runs stably in production.

Important: These phases are not linear.

You’ll switch between discovery and development repeatedly.

That’s normal.

Agile Data Science: Rethinking Sprints

We still use sprints—but differently.

Our AI sprints last three to four weeks (not two).

Each sprint has an experiment goal, not a feature goal.

Sprint planning works like this:

Experiment Review: What did we learn from the last sprints?
Hypothesis Prioritization: Which experiments are the most promising?
Resource Allocation: Who works on which experiment?
Success Criteria: How will we measure success?

Sprint reviews don’t showcase features, but insights:

Which hypotheses were confirmed/refuted?
What new findings did we gain?
How did the model’s performance change?
What are the next logical experiments?

Properly Organizing Cross-functional AI Teams

An AI team needs different roles than a software team.

Our typical setup for an AI project:

Role	Main Task	Skills	% of Time
Data Scientist	Model development	ML, Statistics, Python	40%
Data Engineer	Data pipeline	ETL, Databases, Cloud	30%
ML Engineer	Model deployment	DevOps, APIs, Scalability	20%
Product Manager	Business Alignment	Domain Knowledge, Strategy	10%

Important: The Product Manager is NOT the Scrum Master.

They define business goals, not sprint goals.

The experiment prioritization is a team effort.

Tool Stack and Processes: What Really Works

AI teams need different tools than software teams.

Here’s our proven stack.

Project Management Tools for AI Teams

Jira is fine for software.

For AI, we use a combination:

Experiment Tracking: MLflow

All experiments are automatically logged
Parameters, metrics, and artifacts in one place
Compare multiple model versions

Task Management: Notion

Hypothesis backlog
Experiment documentation
Team learnings
Data quality dashboards

Communication: Slack + Daily Data Reports

Automated reports on model performance
Alerts for data quality issues
Channel for every ongoing experiment

But the most important tool is a shared experiment log.

We document EVERY experiment—whether successful or not.

Versioning Models and Data

You version code with Git.

But what about models and data?

This is our approach:

Data Versioning: DVC (Data Version Control)

Every dataset has a version number
Reproducible data pipelines
Automatic data lineage tracking

Model Versioning: MLflow Model Registry

Every model is versioned automatically
Staging/Production environments
Rollback in case of performance drop

Code Versioning: Git + Pre-commit Hooks

Automatic code quality checks
Experiment metadata is committed automatically
Jupyter notebooks are cleaned before commit

Without versioning, AI development isn’t reproducible.

And if it’s not reproducible, it’s not debuggable.

Testing and Deployment in AI Environments

Unit tests for AI code are different than for regular software.

You’re not just testing functions, but also data quality and model performance.

Our testing framework:

Data Quality Tests

Schema validation (are all columns present?)
Data freshness (is the data up-to-date?)
Statistical tests (has the data distribution changed?)
Completeness checks (how many missing values?)

Model Performance Tests

Accuracy threshold tests
Latency tests
Memory usage tests
Bias detection tests

Integration Tests

End-to-end pipeline tests
API response time tests
Load tests

Deployment is done via blue-green deployment with automatic rollback.

If model performance drops by more than 5%, the previous version is automatically restored.

Common Mistakes When Transitioning to AI-Driven Workflows

I’ve seen the same mistakes at almost every client.

Here are the most common ones—and how to avoid them.

The Scrum Master Becomes the AI Product Owner

Classic mistake:

A company wants to stay agile and makes the Scrum Master the AI Product Owner.

The problem: A Scrum Master understands processes—but not data science.

They can’t prioritize experiments because they don’t know what’s realistic.

The fix: The AI Product Owner must have a technical background.

They need to understand:

How machine learning works
What data quality means
How long model training takes
Which metrics matter

For us, the AI Product Owner is always a data scientist or ML engineer with business acumen.

Never just a project manager.

Why Classic Definition of Done Doesn’t Work

Software: Feature works as specified = done.

AI: Model reaches 85% accuracy = done.

But what if the model only gets to 84%?

Is it not done, then?

Classic definitions of done lead to endless optimization loops in AI.

Our approach: Probabilistic definition of done.

Instead of Model must reach 85% accuracy we say:

Model achieves at least 80% accuracy and outperforms the current baseline.

Plus a time limit:

If there’s no significant improvement after four weeks of optimization, the current model is production-ready.

This prevents perfectionism and allows for iterative improvement in production.

Change Management for Traditional Teams

The hardest part isn’t the technology.

It’s the change management.

Software engineers are used to building deterministic systems.

AI is probabilistic.

That’s a mindset shift.

Here’s what I do with every transition:

1. Expectation Management

Communicate honestly: 80% of experiments fail
That’s normal and valuable
Success is measured differently

2. Pair Programming for AI

Experienced data scientists work side-by-side with software engineers
Knowledge transfer via code reviews
Joint planning of experiments

3. Continuous Learning

Weekly ML Learning Sessions
Case studies of successful experiments
Post-mortems for failed approaches

The transition takes three to six months.

Plan accordingly.

And celebrate small successes—even the failed experiments that deliver valuable insights.

Frequently Asked Questions

How long does it take to transition from Scrum to AI-driven ways of working?

From my experience, it takes three to six months. The team has to learn new mindsets and establish new processes. The key is to manage the change gradually, not overhaul every process at once.

Can’t you combine Scrum and AI development at all?

Actually, yes—but you’ll need to seriously adapt Scrum. Longer sprints (three or four weeks), experiment-based goals instead of feature goals, and flexible timelines. Pure Scrum implementation doesn’t work for AI projects.

What roles does an AI team absolutely need?

At minimum: data scientist, data engineer, and ML engineer. In smaller teams, one person may fill multiple roles, but all bases need to be covered. A product manager with AI understanding is important, too.

How do you measure the success of AI experiments?

By using predefined, measurable metrics such as accuracy, precision, recall, or business KPIs. Importantly: failed experiments are also successes if they yield learnings. We document all experiments systematically.

What are the biggest challenges when transitioning?

Change management is harder than the tech. Teams need to learn to deal with uncertainty and think probabilistically instead of deterministically. You’ll also need new tools and data/model versioning strategies.

Which tools are essential for AI teams?

MLflow for experiment tracking, DVC for data versioning, a cloud provider for computing power, and a solid documentation tool like Notion. Git alone isn’t enough—you need tools purpose-built for data science.

How do you deal with the unpredictability of AI projects?

With timeboxed experiments and clear go/no-go criteria. Set time limits on optimization cycles and define minimum performance thresholds. Plan for buffers and communicate uncertainties transparently to stakeholders.

Do classic project management tools work for AI teams?

To a limited extent. Jira can be used for task management, but you also need extra tools for experiment tracking and data lineage. We use a combination of several specialized tools.

How do you organize code reviews for machine learning code?

ML code reviews are different from software code reviews. You check not just code quality, but also experiment design, data quality, and model validation. Pair programming between experienced data scientists and software engineers helps with knowledge transfer.

What if an AI project fails completely?

That happens in 15–20% of cases—and it’s normal. The key is to recognize quickly when something isn’t working and pivot fast. Document all learnings—they’re valuable for future projects. A failed project can help others avoid the same mistakes.