Table of Contents
- Why Scrum and Kanban Fail in AI Projects
- The Three Fundamental Differences Between Software and AI Development
- AI-first Ways of Working: What Comes After Scrum
- Concrete Ways of Working for AI-Driven Teams in Practice
- Tool Stack and Processes: What Really Works
- Common Mistakes When Transitioning to AI-Driven Workflows
- Frequently Asked Questions
Last week, I almost lost my mind.
My client insisted on squeezing their AI project into 2-week sprints.
That’s how we did it when developing our app, he said.
I had to explain: AI is not software development.
After three failed sprints and a frustrated development team, we changed course completely.
The result: In six weeks, we accomplished more than in the three preceding months.
Today, I’ll show you why classic agile methods fall short for AI projects—and which approaches actually work.
Why Scrum and Kanban Fail in AI Projects
Scrum was invented for software development.
AI follows different laws.
Ignoring that is the most common mistake I see with my clients.
The Sprint Problem: AI Doesn’t Progress Linearly
A sprint lasts two weeks.
A machine learning model might need three weeks before yielding any useful results.
What do you present in weeks one and two? We’re still training as the sprint outcome?
I’ve seen this happen.
A dev team went through six sprints and produced almost nothing to show for it—even though they did excellent work.
The issue: AI development operates on different time cycles.
Sometimes, data preparation takes four weeks, then the first model works right away.
Sometimes you train 50 different approaches before one finally works.
This doesnt fit into two-week rhythms.
Backlog Chaos: When Algorithms Reprioritize for You
Product backlogs work for features.
With AI, priorities shift based on what you discover.
Here’s a real-life example:
We wanted to build a sentiment analysis tool for customer feedback.
Five features were planned:
- Positive/Negative Classification
- Emotion Detection
- Topic Extraction
- Trend Analysis
- Dashboard
After the initial data exploration, we discovered 60% of our data was unusable.
Suddenly, data cleaning became priority number one.
That wasn’t on the backlog anywhere.
In AI projects, the data often dictates the next step—not the product owner.
Daily Standups Become Weekly Standups (and That’s Okay)
What did you do yesterday?
Tuned hyperparameters.
What will you do today?
Tune hyperparameters.
Are there any blockers?
Training is running for another 12 hours.
This is what daily standups look like in AI teams.
Pointless.
AI development has longer feedback loops.
Training a model can take days.
Evaluating an A/B test requires statistical significance—which can mean weeks.
Daily syncs make no sense when nothing substantial changes each day.
The Three Fundamental Differences Between Software and AI Development
Software development: You know what you’re building.
AI development: You know what you’re trying out.
That’s the core difference.
Unpredictability Instead of Planability
In software, you write code—it works (or it doesn’t).
In AI, you train a model, it performs at 73%—and you have no idea why.
Not because of bad code.
But due to unexpected data issues or model performance.
You can’t plan when a model will hit your desired accuracy.
You can only experiment and iterate.
That makes classic project planning impossible.
Experimental vs. Linear Development Process
Software development follows a linear process:
Requirements → Design → Implementation → Testing → Deployment
AI development is an experimentation loop:
Hypothesis → Experiment → Evaluation → Learning → new hypothesis
I spend 80% of my time on AI projects running experiments that fail.
This is normal.
This is even good.
Every failed experiment brings you closer to the solution.
For software, 80% failed code would be unacceptable.
For AI, it’s the way to success.
Data Quality as a Critical Success Factor
Software works fine with synthetic test data.
AI lives and dies by real, high-quality data.
I’ve seen projects where a team wrote perfect code for six months.
The model was useless anyway.
Because the data was poor.
In AI projects, you spend 60–80% of your time on data tasks:
- Data collection
- Data cleaning
- Data labeling
- Data validation
- Building data pipelines
You won’t find this in any software sprint plan.
But without it, AI doesn’t work.
AI-first Ways of Working: What Comes After Scrum
I’ve spent the past three years trying different approaches for AI teams.
Here’s what actually works.
Hypothesis-driven Development Instead of User Stories
Forget user stories.
As a user, I want to… doesn’t work for AI.
Instead: Hypothesis-driven development.
Each development phase starts with a measurable hypothesis:
If we add Feature X, model accuracy improves by at least 5%.
Or:
If we use Algorithm Y, training time is reduced by 50%.
Every hypothesis has:
- A measurable metric
- A target value
- A time estimate for the experiment
- Success/failure criteria
That turns We’ll optimize the model into a concrete experiment with a clear outcome.
Continuous Experimentation as a Core Process
In software, you develop features.
In AI, you run experiments.
The crucial process isn’t sprint planning—it’s experiment design.
Our standard experimental workflow:
- Hypothesis Definition (1 day)
- Experiment Setup (2–3 days)
- Execution (variable: 1 day to 2 weeks)
- Evaluation (1–2 days)
- Decision (continue/discard/pivot)
Important: Every experiment is documented.
Even the failures.
Especially the failures.
We keep an Experiment Log with:
- Hypothesis
- Approach
- Result
- Learnings
- Next steps
This becomes the team’s most valuable asset.
Data-centric Workflows for AI Teams
Software teams organize themselves around features.
AI teams must organize around data.
Our workflow doesn’t run through a Kanban board, but a data pipeline:
Phase | Responsible | Output | Quality Criteria |
---|---|---|---|
Data Collection | Data Engineer | Raw Dataset | Completeness, Recency |
Data Cleaning | Data Scientist | Clean Dataset | <5% Missing Values |
Feature Engineering | ML Engineer | Feature Set | Correlation with Target |
Model Training | Data Scientist | Trained Model | Target accuracy achieved |
Model Deployment | ML Engineer | Production Model | Latency < 200ms |
Each phase has clear handover criteria.
Nothing is done without measurable quality.
Concrete Ways of Working for AI-Driven Teams in Practice
Enough theory.
Here are the workflows I use every day.
The 3-Phase Method for AI Projects
I break every AI project into three phases:
Phase 1: Discovery (25% of the time)
Goal: Understand what’s possible.
Activities:
- Data exploration
- Proof of concept
- Feasibility assessment
- Initial baseline models
Success metric: Is the problem solvable with AI?
Phase 2: Development (60% of the time)
Goal: Build the best possible model.
Activities:
- Iteratively improving models
- Feature engineering
- Hyperparameter optimization
- Cross-validation
Success metric: Target accuracy achieved.
Phase 3: Deployment (15% of the time)
Goal: Get the model into production.
Activities:
- Model packaging
- API development
- Monitoring setup
- A/B testing
Success metric: Model runs stably in production.
Important: These phases are not linear.
You’ll switch between discovery and development repeatedly.
That’s normal.
Agile Data Science: Rethinking Sprints
We still use sprints—but differently.
Our AI sprints last three to four weeks (not two).
Each sprint has an experiment goal, not a feature goal.
Sprint planning works like this:
- Experiment Review: What did we learn from the last sprints?
- Hypothesis Prioritization: Which experiments are the most promising?
- Resource Allocation: Who works on which experiment?
- Success Criteria: How will we measure success?
Sprint reviews don’t showcase features, but insights:
- Which hypotheses were confirmed/refuted?
- What new findings did we gain?
- How did the model’s performance change?
- What are the next logical experiments?
Properly Organizing Cross-functional AI Teams
An AI team needs different roles than a software team.
Our typical setup for an AI project:
Role | Main Task | Skills | % of Time |
---|---|---|---|
Data Scientist | Model development | ML, Statistics, Python | 40% |
Data Engineer | Data pipeline | ETL, Databases, Cloud | 30% |
ML Engineer | Model deployment | DevOps, APIs, Scalability | 20% |
Product Manager | Business Alignment | Domain Knowledge, Strategy | 10% |
Important: The Product Manager is NOT the Scrum Master.
They define business goals, not sprint goals.
The experiment prioritization is a team effort.
Tool Stack and Processes: What Really Works
AI teams need different tools than software teams.
Here’s our proven stack.
Project Management Tools for AI Teams
Jira is fine for software.
For AI, we use a combination:
Experiment Tracking: MLflow
- All experiments are automatically logged
- Parameters, metrics, and artifacts in one place
- Compare multiple model versions
Task Management: Notion
- Hypothesis backlog
- Experiment documentation
- Team learnings
- Data quality dashboards
Communication: Slack + Daily Data Reports
- Automated reports on model performance
- Alerts for data quality issues
- Channel for every ongoing experiment
But the most important tool is a shared experiment log.
We document EVERY experiment—whether successful or not.
Versioning Models and Data
You version code with Git.
But what about models and data?
This is our approach:
Data Versioning: DVC (Data Version Control)
- Every dataset has a version number
- Reproducible data pipelines
- Automatic data lineage tracking
Model Versioning: MLflow Model Registry
- Every model is versioned automatically
- Staging/Production environments
- Rollback in case of performance drop
Code Versioning: Git + Pre-commit Hooks
- Automatic code quality checks
- Experiment metadata is committed automatically
- Jupyter notebooks are cleaned before commit
Without versioning, AI development isn’t reproducible.
And if it’s not reproducible, it’s not debuggable.
Testing and Deployment in AI Environments
Unit tests for AI code are different than for regular software.
You’re not just testing functions, but also data quality and model performance.
Our testing framework:
Data Quality Tests
- Schema validation (are all columns present?)
- Data freshness (is the data up-to-date?)
- Statistical tests (has the data distribution changed?)
- Completeness checks (how many missing values?)
Model Performance Tests
- Accuracy threshold tests
- Latency tests
- Memory usage tests
- Bias detection tests
Integration Tests
- End-to-end pipeline tests
- API response time tests
- Load tests
Deployment is done via blue-green deployment with automatic rollback.
If model performance drops by more than 5%, the previous version is automatically restored.
Common Mistakes When Transitioning to AI-Driven Workflows
I’ve seen the same mistakes at almost every client.
Here are the most common ones—and how to avoid them.
The Scrum Master Becomes the AI Product Owner
Classic mistake:
A company wants to stay agile and makes the Scrum Master the AI Product Owner.
The problem: A Scrum Master understands processes—but not data science.
They can’t prioritize experiments because they don’t know what’s realistic.
The fix: The AI Product Owner must have a technical background.
They need to understand:
- How machine learning works
- What data quality means
- How long model training takes
- Which metrics matter
For us, the AI Product Owner is always a data scientist or ML engineer with business acumen.
Never just a project manager.
Why Classic Definition of Done Doesn’t Work
Software: Feature works as specified = done.
AI: Model reaches 85% accuracy = done.
But what if the model only gets to 84%?
Is it not done, then?
Classic definitions of done lead to endless optimization loops in AI.
Our approach: Probabilistic definition of done.
Instead of Model must reach 85% accuracy we say:
Model achieves at least 80% accuracy and outperforms the current baseline.
Plus a time limit:
If there’s no significant improvement after four weeks of optimization, the current model is production-ready.
This prevents perfectionism and allows for iterative improvement in production.
Change Management for Traditional Teams
The hardest part isn’t the technology.
It’s the change management.
Software engineers are used to building deterministic systems.
AI is probabilistic.
That’s a mindset shift.
Here’s what I do with every transition:
1. Expectation Management
- Communicate honestly: 80% of experiments fail
- That’s normal and valuable
- Success is measured differently
2. Pair Programming for AI
- Experienced data scientists work side-by-side with software engineers
- Knowledge transfer via code reviews
- Joint planning of experiments
3. Continuous Learning
- Weekly ML Learning Sessions
- Case studies of successful experiments
- Post-mortems for failed approaches
The transition takes three to six months.
Plan accordingly.
And celebrate small successes—even the failed experiments that deliver valuable insights.
Frequently Asked Questions
How long does it take to transition from Scrum to AI-driven ways of working?
From my experience, it takes three to six months. The team has to learn new mindsets and establish new processes. The key is to manage the change gradually, not overhaul every process at once.
Can’t you combine Scrum and AI development at all?
Actually, yes—but you’ll need to seriously adapt Scrum. Longer sprints (three or four weeks), experiment-based goals instead of feature goals, and flexible timelines. Pure Scrum implementation doesn’t work for AI projects.
What roles does an AI team absolutely need?
At minimum: data scientist, data engineer, and ML engineer. In smaller teams, one person may fill multiple roles, but all bases need to be covered. A product manager with AI understanding is important, too.
How do you measure the success of AI experiments?
By using predefined, measurable metrics such as accuracy, precision, recall, or business KPIs. Importantly: failed experiments are also successes if they yield learnings. We document all experiments systematically.
What are the biggest challenges when transitioning?
Change management is harder than the tech. Teams need to learn to deal with uncertainty and think probabilistically instead of deterministically. You’ll also need new tools and data/model versioning strategies.
Which tools are essential for AI teams?
MLflow for experiment tracking, DVC for data versioning, a cloud provider for computing power, and a solid documentation tool like Notion. Git alone isn’t enough—you need tools purpose-built for data science.
How do you deal with the unpredictability of AI projects?
With timeboxed experiments and clear go/no-go criteria. Set time limits on optimization cycles and define minimum performance thresholds. Plan for buffers and communicate uncertainties transparently to stakeholders.
Do classic project management tools work for AI teams?
To a limited extent. Jira can be used for task management, but you also need extra tools for experiment tracking and data lineage. We use a combination of several specialized tools.
How do you organize code reviews for machine learning code?
ML code reviews are different from software code reviews. You check not just code quality, but also experiment design, data quality, and model validation. Pair programming between experienced data scientists and software engineers helps with knowledge transfer.
What if an AI project fails completely?
That happens in 15–20% of cases—and it’s normal. The key is to recognize quickly when something isn’t working and pivot fast. Document all learnings—they’re valuable for future projects. A failed project can help others avoid the same mistakes.