Home / Blog / Agile Arbeitsweisen treffen auf KI: Warum traditionelle Methoden bei der Automatisierung scheitern

Agile Arbeitsweisen treffen auf KI: Warum traditionelle Methoden bei der Automatisierung scheitern

Why Scrum and Kanban Fail in AI Projects
The Three Fundamental Differences between Software and AI Development
AI-first Workflows: What Comes After Scrum
Practical Working Methods for AI-driven Teams
Tool Stack and Processes: What Really Works
Common Mistakes When Transitioning to AI-driven Workflows
Frequently Asked Questions

Last week I almost lost my mind.

My client insisted on cramming their AI project into 2-week sprints.

Thats how we did app development, he said.

I had to explain: AI is not software development.

After three failed sprints and a frustrated dev team, we completely changed our approach.

The result: More progress in 6 weeks than in the previous 3 months.

Today, Ill show you why classic agile methods fail for AI projects – and which approaches truly work.

Why Scrum and Kanban Fail in AI Projects

Scrum was invented for software development.

AI obeys different laws.

Ignoring that is the most common mistake I see with my clients.

The Sprint Problem: AI Doesn’t Develop Linearly

A sprint lasts 2 weeks.

A machine learning model can take 3 weeks before producing any useful results.

What do you do in weeks 1 and 2? Present we’re still training as a sprint result?

I’ve watched it happen.

One dev team delivered practically nothing showable in 6 sprints—despite doing excellent work.

The problem: AI development works on different timelines.

Sometimes data prep takes 4 weeks, and then the first model instantly works perfectly.

Sometimes you train 50 different approaches before one works.

That doesnt fit into 2-week cycles.

Backlog Chaos: When Algorithms Change the Priorities

Product backlog works for features.

In AI, priorities shift due to new findings.

Example from my experience:

We wanted to build a sentiment analysis tool for customer feedback.

5 features were planned:

Positive/Negative Classification
Emotion Detection
Topic Extraction
Trend Analysis
Dashboard

After the initial data exploration, we realized 60% of our data was unusable.

Suddenly, data cleaning was priority number one.

That was nowhere on the backlog.

In AI projects, it’s often the data that determines the next step—not the product owner.

Daily Standups Become Weekly Standups (and That’s Okay)

What did you do yesterday?

Optimized hyperparameters.

What are you doing today?

Optimizing hyperparameters.

Any blockers?

Training is still running for another 12 hours.

That’s what daily standups look like for AI teams.

Pointless.

AI development has longer feedback loops.

Training a model can take days.

Evaluating an A/B test needs statistical significance—that often means weeks.

Daily syncs don’t make sense when nothing substantive changes each day.

The Three Fundamental Differences between Software and AI Development

Software development: You know what you’re building.

AI development: You know what you’re trying out.

That’s the core difference.

Unpredictability Instead of Plannability

With software you write code, it works (or doesn’t).

With AI you train a model, it hits 73% accuracy—and you don’t know why.

Not because of bad code.

But due to unexpected data problems or model performance.

You can’t plan when a model will hit a target accuracy.

You can only experiment and iterate.

That makes classic project planning impossible.

Experimental vs. Linear Development Process

Software development follows a linear process:

Requirements → Design → Implementation → Testing → Deployment

AI development is a cycle of experimentation:

Hypothesis → Experiment → Evaluation → Learning → New Hypothesis

I spend 80% of my time in AI projects running experiments that don’t work.

That’s normal.

In fact, it’s good.

Every failed experiment gets you closer to the solution.

For software, 80% failed code would be unacceptable.

For AI, it’s the route to success.

Data Quality as a Critical Success Factor

Software works with synthetic test data.

AI lives and dies by real, high-quality data.

I’ve seen projects where the team wrote perfect code for 6 months.

The model was still useless.

Because the data was bad.

In AI projects, you spend 60–80% of your time on data work:

Data collection
Data cleaning
Data labeling
Data validation
Building data pipelines

You won’t see that in any software sprint plan.

But without it, AI doesn’t work.

AI-first Workflows: What Comes After Scrum

I’ve spent the last 3 years working with different approaches for AI teams.

Here’s what really works.

Hypothesis-driven Development Instead of User Stories

Forget user stories.

As a user I want… doesn’t work for AI.

Replace it with: hypothesis-driven development.

Every development phase starts with a measurable hypothesis:

If we add feature X, model accuracy will improve by at least 5%.

Or:

If we use algorithm Y, training time will be cut by 50%.

Each hypothesis has:

A measurable metric
A target value
A time estimate for the experiment
Criteria for success/failure

This turns we’re optimizing the model into a concrete experiment with a clear result.

Continuous Experimentation as Core Process

In software, you build features.

In AI, you conduct experiments.

The most important process isn’t sprint planning but experiment design.

Our standard experiment workflow:

Hypothesis definition (1 day)
Experiment setup (2–3 days)
Execution (variable: 1 day to 2 weeks)
Evaluation (1-2 days)
Decision (Continue/Drop/Pivot)

Important: Every experiment is documented.

Even failed ones.

Especially the failed ones.

We keep an experiment log with:

Hypothesis
Approach
Result
Learnings
Next steps

This becomes the most valuable resource for the team.

Data-centric Workflows for AI Teams

Software teams organize around features.

AI teams must organize around data.

Our workflow isn’t managed via a Kanban board, but through a data pipeline:

Phase	Responsible	Output	Quality Criterion
Data Collection	Data Engineer	Raw Dataset	Completeness, Freshness
Data Cleaning	Data Scientist	Clean Dataset	<5% Missing Values
Feature Engineering	ML Engineer	Feature Set	Correlation with Target
Model Training	Data Scientist	Trained Model	Target Accuracy met
Model Deployment	ML Engineer	Production Model	Latency < 200ms

Each stage has clear handover criteria.

Nothing is done without measurable quality.

Practical Working Methods for AI-driven Teams

Enough theory.

Here are the workflows I use every day.

The 3-Phase Method for AI Projects

I split each AI project into 3 phases:

Phase 1: Discovery (25% of the time)

Goal: Understand what’s possible.

Activities:

Data exploration
Proof of concept
Feasibility assessment
Initial baseline models

Success metric: Is the problem solvable with AI?

Phase 2: Development (60% of the time)

Goal: Build the best possible model.

Activities:

Iterative model improvement
Feature engineering
Hyperparameter optimization
Cross-validation

Success metric: Target accuracy reached.

Phase 3: Deployment (15% of the time)

Goal: Bring the model into production.

Activities:

Model packaging
API development
Monitoring setup
A/B testing

Success metric: Model runs stably in production.

Important: The phases aren’t linear.

You’ll bounce between discovery and development.

That’s normal.

Agile Data Science: Rethinking Sprints

We still use sprints—but differently.

Our AI sprints last 3–4 weeks (not 2).

Each sprint has an experiment goal, not a feature goal.

Sprint planning works like this:

Experiment Review: What did we learn last sprint?
Hypothesis Prioritization: Which experiments are most promising?
Resource Allocation: Who’s working on which experiment?
Success Criteria: How will we measure success?

Sprint review presents insights, not features:

Which hypotheses were validated/refuted?
What new findings did we gain?
How did model performance evolve?
What are the next logical experiments?

How to Organize Cross-functional AI Teams

An AI team needs different roles compared to a software team.

Our standard setup for an AI project:

Role	Main Responsibility	Skills	% of Time
Data Scientist	Model Development	ML, Statistics, Python	40%
Data Engineer	Data Pipeline	ETL, Databases, Cloud	30%
ML Engineer	Model Deployment	DevOps, APIs, Scalability	20%
Product Manager	Business Alignment	Domain Knowledge, Strategy	10%

Important: The Product Manager is NOT the Scrum Master.

They define business goals, not sprint goals.

Experiment prioritization is a team effort.

Tool Stack and Processes: What Really Works

The tooling for AI teams is different from traditional software teams.

Here’s our proven stack.

Project Management Tools for AI Teams

Jira is fine for software.

For AI, we use a combination:

Experiment Tracking: MLflow

All experiments are automatically logged
Parameters, metrics, artifacts in one view
Comparison of different model versions

Task Management: Notion

Hypothesis backlog
Experiment documentation
Team learnings
Data quality dashboards

Communication: Slack + Daily Data Reports

Automated reports on model performance
Alerts on data quality issues
Channel for every running experiment

The most important tool, however, is a shared experiment log.

We document EVERY experiment—whether successful or not.

Versioning Models and Data

You version code with Git.

But what about models and data?

Our approach:

Data Versioning: DVC (Data Version Control)

Every dataset gets a version number
Reproducible data pipelines
Automatic data lineage tracking

Model Versioning: MLflow Model Registry

Each model is automatically versioned
Staging/Production environments
Rollback option in case of performance drop

Code Versioning: Git + Pre-commit Hooks

Automatic code quality checks
Experiment metadata is auto-committed
Jupyter notebooks are cleaned before commit

Without versioning, AI development isn’t reproducible.

And if it’s not reproducible, it’s not debuggable.

Testing and Deployment in AI Environments

Unit testing for AI code is different from normal software.

You not only test functions, but also data quality and model performance.

Our testing framework:

Data Quality Tests

Schema validation (are all columns present?)
Data freshness (are the data up to date?)
Statistical tests (has the data distribution changed?)
Completeness checks (how many missing values?)

Model Performance Tests

Accuracy threshold tests
Latency tests
Memory usage tests
Bias detection tests

Integration Tests

End-to-end pipeline tests
API response time tests
Load tests

Deployment is done via blue-green deployment with automatic rollback.

If model performance drops by more than 5%, it automatically rolls back to the previous version.

Common Mistakes When Transitioning to AI-driven Workflows

I’ve seen the same mistakes with almost every client.

Here are the most common ones—and how you can avoid them.

The Scrum Master Becomes the AI Product Owner

Classic mistake:

The company wants to stay agile and turns the Scrum Master into the AI Product Owner.

The problem: A Scrum Master understands process, but not data science.

They can’t prioritize experiments because they can’t judge what’s realistic.

The solution: The AI Product Owner needs a technical background.

They must understand:

How machine learning works
What data quality means
How long model training takes
Which metrics matter

For us, the AI Product Owner is always a Data Scientist or ML Engineer with business understanding.

Never a pure project manager.

Why Classic Definition of Done Doesn’t Work

Software: Feature works as specified = Done.

AI: Model achieves 85% accuracy = Done.

But what if the model only achieves 84%?

Is it not done?

Classic Definition of Done leads to endless optimization cycles for AI.

Our approach: Probabilistic Definition of Done.

Instead of Model must achieve 85% accuracy we define:

Model achieves at least 80% accuracy and outperforms the current baseline approach.

Plus a time limit:

If after 4 weeks of optimization there’s no significant improvement, the current model is production-ready.

This prevents perfectionism and enables iterative improvement in production.

Change Management for Traditional Teams

The hardest part isn’t the tech.

It’s the change management.

Software developers are used to building deterministic systems.

AI is probabilistic.

It’s a mindset shift.

Here’s what I do on every transition:

1. Expectation Management

Communicate honestly: 80% of experiments fail
That’s normal and valuable
Success is measured differently

2. Pair Programming for AI

Experienced data scientists work alongside software engineers
Knowledge transfer through code reviews
Joint experiment planning

3. Continuous Learning

Weekly ML Learning Sessions
Case studies of successful experiments
Post-mortems for failed approaches

The transition takes 3–6 months.

Plan for that.

And celebrate small wins—even the failed experiments that yield valuable insights.

Frequently Asked Questions

How long does the transition from Scrum to AI-driven workflows take?

In my experience, the switchover takes 3–6 months. The team must learn new mindsets and establish new processes. The key is to approach the transition step by step rather than trying to change everything at once.

Can Scrum and AI development really not be combined?

You can—but you must adapt Scrum significantly. Longer sprints (3–4 weeks), experiment-based rather than feature-based goals, and more flexible timelines. Pure Scrum implementation doesn’t work in AI projects.

What roles are essential for an AI team?

At minimum: Data Scientist, Data Engineer, and ML Engineer. In smaller teams, one person may cover several roles, but all domains must be represented. A Product Manager with AI understanding is also important.

How do you measure the success of AI experiments?

With predefined, measurable metrics such as accuracy, precision, recall, or business KPIs. Important: Even failed experiments are successful if they provide learnings. We document all experiments systematically.

What are the biggest challenges in the transition?

Change management is tougher than the tech. Teams must learn to handle uncertainty and think probabilistically, not deterministically. They also need new tools and versioning strategies for data and models.

Which tools are essential for AI teams?

MLflow for experiment tracking, DVC for data versioning, a cloud provider for computing power, and a good documentation tool like Notion. Git alone isn’t enough—you need tools purpose-built for data science.

How do you handle the unpredictability of AI projects?

Through timeboxed experiments with clear go/no-go criteria. Set time limits for optimization loops and define minimum performance thresholds. Plan with buffers and communicate uncertainty transparently to stakeholders.

Do classic project management tools work for AI teams?

To some extent. Jira can be used for task management, but you need extra tools for experiment tracking and data lineage. We use a combination of specialized tools.

How should code reviews be organized for machine learning code?

ML code reviews differ from software code reviews. You review not only code quality, but also experiment design, data quality, and model validation. Pair programming between experienced data scientists and software developers boosts knowledge transfer.

What if an AI project fails completely?

That happens in 15–20% of cases and is normal. What’s key is to recognize quickly if an approach isn’t working and pivot fast. Document all learnings—they’re valuable for future projects. A failed project can prevent others from making the same mistakes.