Agile Arbeitsweisen treffen auf KI: Warum traditionelle Methoden bei der Automatisierung scheitern

Last week I almost lost my mind.

My client insisted on cramming their AI project into 2-week sprints.

Thats how we did app development, he said.

I had to explain: AI is not software development.

After three failed sprints and a frustrated dev team, we completely changed our approach.

The result: More progress in 6 weeks than in the previous 3 months.

Today, Ill show you why classic agile methods fail for AI projects – and which approaches truly work.

Why Scrum and Kanban Fail in AI Projects

Scrum was invented for software development.

AI obeys different laws.

Ignoring that is the most common mistake I see with my clients.

The Sprint Problem: AI Doesn’t Develop Linearly

A sprint lasts 2 weeks.

A machine learning model can take 3 weeks before producing any useful results.

What do you do in weeks 1 and 2? Present we’re still training as a sprint result?

I’ve watched it happen.

One dev team delivered practically nothing showable in 6 sprints—despite doing excellent work.

The problem: AI development works on different timelines.

Sometimes data prep takes 4 weeks, and then the first model instantly works perfectly.

Sometimes you train 50 different approaches before one works.

That doesnt fit into 2-week cycles.

Backlog Chaos: When Algorithms Change the Priorities

Product backlog works for features.

In AI, priorities shift due to new findings.

Example from my experience:

We wanted to build a sentiment analysis tool for customer feedback.

5 features were planned:

  1. Positive/Negative Classification
  2. Emotion Detection
  3. Topic Extraction
  4. Trend Analysis
  5. Dashboard

After the initial data exploration, we realized 60% of our data was unusable.

Suddenly, data cleaning was priority number one.

That was nowhere on the backlog.

In AI projects, it’s often the data that determines the next step—not the product owner.

Daily Standups Become Weekly Standups (and That’s Okay)

What did you do yesterday?

Optimized hyperparameters.

What are you doing today?

Optimizing hyperparameters.

Any blockers?

Training is still running for another 12 hours.

That’s what daily standups look like for AI teams.

Pointless.

AI development has longer feedback loops.

Training a model can take days.

Evaluating an A/B test needs statistical significance—that often means weeks.

Daily syncs don’t make sense when nothing substantive changes each day.

The Three Fundamental Differences between Software and AI Development

Software development: You know what you’re building.

AI development: You know what you’re trying out.

That’s the core difference.

Unpredictability Instead of Plannability

With software you write code, it works (or doesn’t).

With AI you train a model, it hits 73% accuracy—and you don’t know why.

Not because of bad code.

But due to unexpected data problems or model performance.

You can’t plan when a model will hit a target accuracy.

You can only experiment and iterate.

That makes classic project planning impossible.

Experimental vs. Linear Development Process

Software development follows a linear process:

Requirements → Design → Implementation → Testing → Deployment

AI development is a cycle of experimentation:

Hypothesis → Experiment → Evaluation → Learning → New Hypothesis

I spend 80% of my time in AI projects running experiments that don’t work.

That’s normal.

In fact, it’s good.

Every failed experiment gets you closer to the solution.

For software, 80% failed code would be unacceptable.

For AI, it’s the route to success.

Data Quality as a Critical Success Factor

Software works with synthetic test data.

AI lives and dies by real, high-quality data.

I’ve seen projects where the team wrote perfect code for 6 months.

The model was still useless.

Because the data was bad.

In AI projects, you spend 60–80% of your time on data work:

  • Data collection
  • Data cleaning
  • Data labeling
  • Data validation
  • Building data pipelines

You won’t see that in any software sprint plan.

But without it, AI doesn’t work.

AI-first Workflows: What Comes After Scrum

I’ve spent the last 3 years working with different approaches for AI teams.

Here’s what really works.

Hypothesis-driven Development Instead of User Stories

Forget user stories.

As a user I want… doesn’t work for AI.

Replace it with: hypothesis-driven development.

Every development phase starts with a measurable hypothesis:

If we add feature X, model accuracy will improve by at least 5%.

Or:

If we use algorithm Y, training time will be cut by 50%.

Each hypothesis has:

  • A measurable metric
  • A target value
  • A time estimate for the experiment
  • Criteria for success/failure

This turns we’re optimizing the model into a concrete experiment with a clear result.

Continuous Experimentation as Core Process

In software, you build features.

In AI, you conduct experiments.

The most important process isn’t sprint planning but experiment design.

Our standard experiment workflow:

  1. Hypothesis definition (1 day)
  2. Experiment setup (2–3 days)
  3. Execution (variable: 1 day to 2 weeks)
  4. Evaluation (1-2 days)
  5. Decision (Continue/Drop/Pivot)

Important: Every experiment is documented.

Even failed ones.

Especially the failed ones.

We keep an experiment log with:

  • Hypothesis
  • Approach
  • Result
  • Learnings
  • Next steps

This becomes the most valuable resource for the team.

Data-centric Workflows for AI Teams

Software teams organize around features.

AI teams must organize around data.

Our workflow isn’t managed via a Kanban board, but through a data pipeline:

Phase Responsible Output Quality Criterion
Data Collection Data Engineer Raw Dataset Completeness, Freshness
Data Cleaning Data Scientist Clean Dataset <5% Missing Values
Feature Engineering ML Engineer Feature Set Correlation with Target
Model Training Data Scientist Trained Model Target Accuracy met
Model Deployment ML Engineer Production Model Latency < 200ms

Each stage has clear handover criteria.

Nothing is done without measurable quality.

Practical Working Methods for AI-driven Teams

Enough theory.

Here are the workflows I use every day.

The 3-Phase Method for AI Projects

I split each AI project into 3 phases:

Phase 1: Discovery (25% of the time)

Goal: Understand what’s possible.

Activities:

  • Data exploration
  • Proof of concept
  • Feasibility assessment
  • Initial baseline models

Success metric: Is the problem solvable with AI?

Phase 2: Development (60% of the time)

Goal: Build the best possible model.

Activities:

  • Iterative model improvement
  • Feature engineering
  • Hyperparameter optimization
  • Cross-validation

Success metric: Target accuracy reached.

Phase 3: Deployment (15% of the time)

Goal: Bring the model into production.

Activities:

  • Model packaging
  • API development
  • Monitoring setup
  • A/B testing

Success metric: Model runs stably in production.

Important: The phases aren’t linear.

You’ll bounce between discovery and development.

That’s normal.

Agile Data Science: Rethinking Sprints

We still use sprints—but differently.

Our AI sprints last 3–4 weeks (not 2).

Each sprint has an experiment goal, not a feature goal.

Sprint planning works like this:

  1. Experiment Review: What did we learn last sprint?
  2. Hypothesis Prioritization: Which experiments are most promising?
  3. Resource Allocation: Who’s working on which experiment?
  4. Success Criteria: How will we measure success?

Sprint review presents insights, not features:

  • Which hypotheses were validated/refuted?
  • What new findings did we gain?
  • How did model performance evolve?
  • What are the next logical experiments?

How to Organize Cross-functional AI Teams

An AI team needs different roles compared to a software team.

Our standard setup for an AI project:

Role Main Responsibility Skills % of Time
Data Scientist Model Development ML, Statistics, Python 40%
Data Engineer Data Pipeline ETL, Databases, Cloud 30%
ML Engineer Model Deployment DevOps, APIs, Scalability 20%
Product Manager Business Alignment Domain Knowledge, Strategy 10%

Important: The Product Manager is NOT the Scrum Master.

They define business goals, not sprint goals.

Experiment prioritization is a team effort.

Tool Stack and Processes: What Really Works

The tooling for AI teams is different from traditional software teams.

Here’s our proven stack.

Project Management Tools for AI Teams

Jira is fine for software.

For AI, we use a combination:

Experiment Tracking: MLflow

  • All experiments are automatically logged
  • Parameters, metrics, artifacts in one view
  • Comparison of different model versions

Task Management: Notion

  • Hypothesis backlog
  • Experiment documentation
  • Team learnings
  • Data quality dashboards

Communication: Slack + Daily Data Reports

  • Automated reports on model performance
  • Alerts on data quality issues
  • Channel for every running experiment

The most important tool, however, is a shared experiment log.

We document EVERY experiment—whether successful or not.

Versioning Models and Data

You version code with Git.

But what about models and data?

Our approach:

Data Versioning: DVC (Data Version Control)

  • Every dataset gets a version number
  • Reproducible data pipelines
  • Automatic data lineage tracking

Model Versioning: MLflow Model Registry

  • Each model is automatically versioned
  • Staging/Production environments
  • Rollback option in case of performance drop

Code Versioning: Git + Pre-commit Hooks

  • Automatic code quality checks
  • Experiment metadata is auto-committed
  • Jupyter notebooks are cleaned before commit

Without versioning, AI development isn’t reproducible.

And if it’s not reproducible, it’s not debuggable.

Testing and Deployment in AI Environments

Unit testing for AI code is different from normal software.

You not only test functions, but also data quality and model performance.

Our testing framework:

Data Quality Tests

  • Schema validation (are all columns present?)
  • Data freshness (are the data up to date?)
  • Statistical tests (has the data distribution changed?)
  • Completeness checks (how many missing values?)

Model Performance Tests

  • Accuracy threshold tests
  • Latency tests
  • Memory usage tests
  • Bias detection tests

Integration Tests

  • End-to-end pipeline tests
  • API response time tests
  • Load tests

Deployment is done via blue-green deployment with automatic rollback.

If model performance drops by more than 5%, it automatically rolls back to the previous version.

Common Mistakes When Transitioning to AI-driven Workflows

I’ve seen the same mistakes with almost every client.

Here are the most common ones—and how you can avoid them.

The Scrum Master Becomes the AI Product Owner

Classic mistake:

The company wants to stay agile and turns the Scrum Master into the AI Product Owner.

The problem: A Scrum Master understands process, but not data science.

They can’t prioritize experiments because they can’t judge what’s realistic.

The solution: The AI Product Owner needs a technical background.

They must understand:

  • How machine learning works
  • What data quality means
  • How long model training takes
  • Which metrics matter

For us, the AI Product Owner is always a Data Scientist or ML Engineer with business understanding.

Never a pure project manager.

Why Classic Definition of Done Doesn’t Work

Software: Feature works as specified = Done.

AI: Model achieves 85% accuracy = Done.

But what if the model only achieves 84%?

Is it not done?

Classic Definition of Done leads to endless optimization cycles for AI.

Our approach: Probabilistic Definition of Done.

Instead of Model must achieve 85% accuracy we define:

Model achieves at least 80% accuracy and outperforms the current baseline approach.

Plus a time limit:

If after 4 weeks of optimization there’s no significant improvement, the current model is production-ready.

This prevents perfectionism and enables iterative improvement in production.

Change Management for Traditional Teams

The hardest part isn’t the tech.

It’s the change management.

Software developers are used to building deterministic systems.

AI is probabilistic.

It’s a mindset shift.

Here’s what I do on every transition:

1. Expectation Management

  • Communicate honestly: 80% of experiments fail
  • That’s normal and valuable
  • Success is measured differently

2. Pair Programming for AI

  • Experienced data scientists work alongside software engineers
  • Knowledge transfer through code reviews
  • Joint experiment planning

3. Continuous Learning

  • Weekly ML Learning Sessions
  • Case studies of successful experiments
  • Post-mortems for failed approaches

The transition takes 3–6 months.

Plan for that.

And celebrate small wins—even the failed experiments that yield valuable insights.

Frequently Asked Questions

How long does the transition from Scrum to AI-driven workflows take?

In my experience, the switchover takes 3–6 months. The team must learn new mindsets and establish new processes. The key is to approach the transition step by step rather than trying to change everything at once.

Can Scrum and AI development really not be combined?

You can—but you must adapt Scrum significantly. Longer sprints (3–4 weeks), experiment-based rather than feature-based goals, and more flexible timelines. Pure Scrum implementation doesn’t work in AI projects.

What roles are essential for an AI team?

At minimum: Data Scientist, Data Engineer, and ML Engineer. In smaller teams, one person may cover several roles, but all domains must be represented. A Product Manager with AI understanding is also important.

How do you measure the success of AI experiments?

With predefined, measurable metrics such as accuracy, precision, recall, or business KPIs. Important: Even failed experiments are successful if they provide learnings. We document all experiments systematically.

What are the biggest challenges in the transition?

Change management is tougher than the tech. Teams must learn to handle uncertainty and think probabilistically, not deterministically. They also need new tools and versioning strategies for data and models.

Which tools are essential for AI teams?

MLflow for experiment tracking, DVC for data versioning, a cloud provider for computing power, and a good documentation tool like Notion. Git alone isn’t enough—you need tools purpose-built for data science.

How do you handle the unpredictability of AI projects?

Through timeboxed experiments with clear go/no-go criteria. Set time limits for optimization loops and define minimum performance thresholds. Plan with buffers and communicate uncertainty transparently to stakeholders.

Do classic project management tools work for AI teams?

To some extent. Jira can be used for task management, but you need extra tools for experiment tracking and data lineage. We use a combination of specialized tools.

How should code reviews be organized for machine learning code?

ML code reviews differ from software code reviews. You review not only code quality, but also experiment design, data quality, and model validation. Pair programming between experienced data scientists and software developers boosts knowledge transfer.

What if an AI project fails completely?

That happens in 15–20% of cases and is normal. What’s key is to recognize quickly if an approach isn’t working and pivot fast. Document all learnings—they’re valuable for future projects. A failed project can prevent others from making the same mistakes.

Related articles