Article

“Cutting Water": 3 Ways to Define MVP in Data & Analytics Agile Projects

Written by Oleg Kazantsev

Agile delivery methods have, over the past 20 years, become firmly ingrained in the collective know-how of the technology world. It’s common knowledge that Agile allows small teams to be productive at scale, delivering value to a business when it’s most needed and absorbing risks and uncertainty like no other approach.

But proposing Agile methods to professionals in the Data & Analytics field is more likely to yield a resounding “Yeah…no.” Everyone agrees that agility is great, but many, many data people feel that Agile methodology just isn’t made for big data.

One of the big reasons for this methodology dysmorphia (as one might call it) is confusion around MVP. The Minimum Viable Product (MVP), central to the philosophy of Agile delivery, is a releasable and value-adding feature of a system that can be packaged and rolled out to the end-user at the completion of an iterative delivery cycle.

Agile originated in the world of application-centric startups, so an MVP is usually illustrated by Agile coaches as some widget, enhancement, or new capability. It’s an appetizer that a server brings to the restaurant table so the customers can bite into something while they wait for the main course. Cutting up a large project into mini finished ‘courses’ like this keeps everyone from getting impatient and hangry.

When a similar gastronomic metaphor was used in a recent Agile class for a Data & Analytics division of a major financial institution, everyone nodded. Then, one person raised a hand.

“But what we deal with here is DATA,” they said. “It’s more like juice. Or Wine. Or water. How do you cut water?”

Bullseye. That’s the MVP problem for data in a nutshell. It’s hard to define segments in data—there’s no clear development or test stage. It all overlaps. Contrasted with software features, data and information are much less compartmentalized into usable objects with a clear human-centric function.

Does creating and standardizing a new staging table on the backend constitute an MVP? Is a large data load an MVP? Or should a consulting team just divide a linear, waterfall-like project into multiple two-week sprints, skipping the concept of MVP altogether, and call it “as Agile as it gets?”

Let’s explore the fine art of water-cutting (that is, how to define MVP in Data & Analytics) using three real world examples.

A cross-section of blue moving water viewed from the side

1. MVP as data journey: Medallion architecture

Here’s the use case: A large financial institution is looking to give its multiple divisions equal access to the troves of accumulated data from in a central location—ideally, a cloud. How should they divide such a large cloud transformation into meaningful MVPs?

Answer: When the scale and volume of data transformation are the main challenge, consider dividing the data journey itself, using the concept of Medallion architecture.

Medallion, coined by DataBricks, divides the data journey into three stages:

Bronze: raw, unprocessed data delivered as-is from its source to an agreed-upon central location (a “lakehouse,” often a cloud)
‍Silver: Aggregated, standardized, structured data with eliminated anomalies (like null values)‍
Gold: Data finalized for the end users, with an understanding of their specific demands (like breaking it up by meaningful period, filtering by specific function, etc.)

Example of Medallion architecture by DataBricks

In standard Agile, a minimum viable product is something that can be rolled out to the end user (usually, the application consumer). So, is Gold the only possible MVP? Not necessarily. To divide the data journey into MVPs requires a strong understanding of the value of data organization-wide. Consider this:

• Large companies often act as raw data merchants for smaller, solution-centric startups. Raw but centralized data (such as event-streaming logs from Kafka or user interactions from social media sites) is a valuable resource for these transactions. This could constitute a definition of MVP for Bronze data.

Warning: If you plan to use Bronze data as MVP, prioritize involving your data governance and security experts early on. Raw data released from the company’s custody to a third party must always be validated for legal concerns.

• Aggregated, standardized data may be used by different divisions of the same company to enable their own software development lifecycle activities. For example, schematized (even if not business-ready) data could be used for machine learning of AI algorithms. It may also be consumed by other in-house applications and used as test data in lower environments. All of these let us define Silver data as MVP.

Warning: If the company agrees to use real Silver data for testing, prioritize obfuscating confidential and sensitive data elements.

• No such complexities exist for the Gold data as MVP. It’s business-ready and thus constitutes the default definition of MVP (and, in fact, that of the final deliverable).

2. MVP as expertise area: data swimlanes

Here’s the use case: A risk management unit produces and analyzes multiple reports from various upstream sources and rolls them out to regional and global directors via a business intelligence application. How can they start delivering value early?

Answer: When an existing business process undergoes digital transformation, automation, and enhancement, consider delivering value end-to-end, one expertise area at a time. In that way, each end-to-end (E2E) delivery of a unique report or widget forms an MVP.

Critical to this approach is a clear understanding of business processes. Once it’s known how the reports and widgets can be thematically grouped, they can be divided into data swimlanes. Sometimes, a complex report with elaborate filtering will form a swimlane of its own; sometimes, multiple minor widgets drawing data from the same upstream source could be united into a single swimlane.

Why the term “swimlane?” Because a competitive swimmer always finishes last if they jump lanes or failed to finish their distance, no matter how well they swam. People need to keep to their lane.

What to consider when you define MVP by data swimlanes?

Definition of Done (DoD): When replacing or enhancing an existing business process one swimlane at a time, consider DoD to be a full production rollout of that swimlane. When the client aims to build a brand-new reporting and analytics practice, DoD should be the pre-production rollout for each swimlane, with the final release in the project being the rollout of ALL swimlanes to production users. Achieving the Definition of Done qualifies an MVP for each swimlane in turn.
Sprint length: Data-related sprint length should ideally match the length of a reporting period within the organization. For example, if the end-users engage in reporting once a month, there’s no value in releasing one MVP every week. While fast iteration is highly prized in Agile, mismatched rollouts may actually cause the team to suffer from task-shifting, which subtracts value instead of adding it.
Sprint cadence. That said, consider performing production rollout outside of reporting spikes. Here’s a real-life scenario: upstream systems submit raw data in Week 1 of every month. The end users massage the data during Week 2 and submit reports in Week 3. This leaves Week 4 of every month as the ideal time for a planned release: it wouldn’t clash with any business process and will get the end-user attention it needs.

3. MVP as risk reduction: remediation phases

Here’s the use case: A stale data segment (meaning one that hasn’t been used or evaluated in a while) was identified and needs remediation. The data governance team hopes to audit the data with business owners and then roll it out to production in batches. How does the client ensure that all involved systems can consume the remediated data without errors? How does the client eliminate errors/issues in user-facing applications after the remediation?

Answer: When structural changes are not requested, eliminating or reducing risks in the data rollout adds value. Risk reduction builds confidence in company data and data-driven decisions, which is the basis of data governance.

Using risk reduction as an MVP first requires a clear map of business processes and data consumption within the organization. The team should also have a realistic understanding of the volume of impacted data compared to the systems’ ETL bandwidth.

(ETL stands for Extract, Transform, Load. It refers to the process of extracting data from various sources, transforming or reshaping it into a suitable format, and then loading it into a target database or data warehouse for analysis and reporting purposes in data analytics and data engineering.)

Based on this information, the team can build an Ishikawa diagram of risks and downstream effects the data change could cause. Also known the “fishbone diagram,” this drawing will define each MVP as a diagonal “fishbone” of a known possible point of failure. Thus, the sprints or phases of the data remediation project will focus on eliminating risks at each possible cause-and-effect juncture.

Fishbone Diagram example. An MVP would be eliminating or measurably reducing risk for all the components in a single data fishbone, e.g. “Manpower.”

In the real-life scenario, the data governance team aimed to remediate around 100,000 records.

‍MVP 1 of risk reduction aimed at the internal business audit of records.
‍MVP 2 brought the upstream systems’ records in line with the business recommendations for the full volume of remediated data.
‍MVP 3 aimed to ensure that the midstream systems’ data structure could handle the remediated data (e.g., can field XYZ handle special characters? Can it handle a value over 256 characters long?).
‍MVP 4 applied these risks to the user-facing downstream applications (e.g., can the UI dropdowns handle the new values?).
‍MVP 5 concentrated on delivery of a single batch of remediated records end to end.
‍MVP 6 put the system through a load test of performing a similarly sized batch every day and across multiple environments.

In any “MVP as risk reduction” project, the process continues on like this until the project reaches the fish’s head, which is the point when the whole data volume rolls out to production with full confidence.

Bringing it back to the table

Data science is an ever-evolving field, and the perceived fluidity of its main subject—information—shouldn’t fool people into thinking it can’t be practically divided into value-adding MVPs. When performing Agile transformation and delivery of Data & Analytics projects and programs, we must resist counterproductive urges:

Don’t lock yourself into an “all-or-nothing” mindset when it comes to data delivery
Don’t imitate Agile methodology without meaningfully transforming the product delivery into value-adding iterations
Don’t limit yourself to the MVP definitions described in this article. Medallion architecture, data swimlanes, and risk reduction are valid and valuable MVPs, but there may be others more suitable for you

Clients want to see value before a project goes to production; that’s an understandable desire and a key value of Agile methodology. Like software projects, data projects still have iterations. Like water, data projects have phases. It just takes a bit of adjustment and, yes, agility to define them. Maybe you can’t cut water—but you certainly can cut ice.

An AI-generate image of an iceberg, seen both above and below the waterline

Oleg Kazantsev is a Senior Management Consultant and Agile Coach for Launch's Digital Business Transformation Studio. With a background in Data & Analytics, he has a particular passion for and expertise in storytelling through data.

Written by Oleg Kazantsev

When a similar gastronomic metaphor was used in a recent Agile class for a Data & Analytics division of a major financial institution, everyone nodded. Then, one person raised a hand.

“But what we deal with here is DATA,” they said. “It’s more like juice. Or Wine. Or water. How do you cut water?”

Let’s explore the fine art of water-cutting (that is, how to define MVP in Data & Analytics) using three real world examples.

1. MVP as data journey: Medallion architecture

Answer: When the scale and volume of data transformation are the main challenge, consider dividing the data journey itself, using the concept of Medallion architecture.

Medallion, coined by DataBricks, divides the data journey into three stages:

Bronze: raw, unprocessed data delivered as-is from its source to an agreed-upon central location (a “lakehouse,” often a cloud)
‍Silver: Aggregated, standardized, structured data with eliminated anomalies (like null values)‍
Gold: Data finalized for the end users, with an understanding of their specific demands (like breaking it up by meaningful period, filtering by specific function, etc.)

Warning: If you plan to use Bronze data as MVP, prioritize involving your data governance and security experts early on. Raw data released from the company’s custody to a third party must always be validated for legal concerns.

Warning: If the company agrees to use real Silver data for testing, prioritize obfuscating confidential and sensitive data elements.

• No such complexities exist for the Gold data as MVP. It’s business-ready and thus constitutes the default definition of MVP (and, in fact, that of the final deliverable).

2. MVP as expertise area: data swimlanes

Why the term “swimlane?” Because a competitive swimmer always finishes last if they jump lanes or failed to finish their distance, no matter how well they swam. People need to keep to their lane.

What to consider when you define MVP by data swimlanes?

Definition of Done (DoD): When replacing or enhancing an existing business process one swimlane at a time, consider DoD to be a full production rollout of that swimlane. When the client aims to build a brand-new reporting and analytics practice, DoD should be the pre-production rollout for each swimlane, with the final release in the project being the rollout of ALL swimlanes to production users. Achieving the Definition of Done qualifies an MVP for each swimlane in turn.
Sprint length: Data-related sprint length should ideally match the length of a reporting period within the organization. For example, if the end-users engage in reporting once a month, there’s no value in releasing one MVP every week. While fast iteration is highly prized in Agile, mismatched rollouts may actually cause the team to suffer from task-shifting, which subtracts value instead of adding it.
Sprint cadence. That said, consider performing production rollout outside of reporting spikes. Here’s a real-life scenario: upstream systems submit raw data in Week 1 of every month. The end users massage the data during Week 2 and submit reports in Week 3. This leaves Week 4 of every month as the ideal time for a planned release: it wouldn’t clash with any business process and will get the end-user attention it needs.

3. MVP as risk reduction: remediation phases

In the real-life scenario, the data governance team aimed to remediate around 100,000 records.

‍MVP 1 of risk reduction aimed at the internal business audit of records.
‍MVP 2 brought the upstream systems’ records in line with the business recommendations for the full volume of remediated data.
‍MVP 3 aimed to ensure that the midstream systems’ data structure could handle the remediated data (e.g., can field XYZ handle special characters? Can it handle a value over 256 characters long?).
‍MVP 4 applied these risks to the user-facing downstream applications (e.g., can the UI dropdowns handle the new values?).
‍MVP 5 concentrated on delivery of a single batch of remediated records end to end.
‍MVP 6 put the system through a load test of performing a similarly sized batch every day and across multiple environments.

Bringing it back to the table

Don’t lock yourself into an “all-or-nothing” mindset when it comes to data delivery
Don’t imitate Agile methodology without meaningfully transforming the product delivery into value-adding iterations
Don’t limit yourself to the MVP definitions described in this article. Medallion architecture, data swimlanes, and risk reduction are valid and valuable MVPs, but there may be others more suitable for you

“Cutting Water": 3 Ways to Define MVP in Data & Analytics Agile Projects

1. MVP as data journey: Medallion architecture

2. MVP as expertise area: data swimlanes

3. MVP as risk reduction: remediation phases

Bringing it back to the table

Discover latest posts from the NSIDE team.

1. MVP as data journey: Medallion architecture

2. MVP as expertise area: data swimlanes

3. MVP as risk reduction: remediation phases

Bringing it back to the table

Discover latest posts from the NSIDE team.

What is Self-Supervised Learning with Kevin McCall

Revolutionizing AI: Snowflake Arctic and Microsoft Phi-3

Navigating Abroad Season 3, Episode 9: Daniela C. Merlano

Navigating Abroad Season 4, Episode 3: Jean-Paul Paoli

Navigating Abroad Season 4, Episode 2: Fred Werner

Navigating Abroad Season 3, Episode 1: Fred Werner

“Cutting Water": 3 Ways to Define MVP in Data & Analytics Agile Projects

1. MVP as data journey: Medallion architecture

2. MVP as expertise area: data swimlanes

3. MVP as risk reduction: remediation phases

Bringing it back to the table

Discover latest posts from the NSIDE team.

1. MVP as data journey: Medallion architecture

2. MVP as expertise area: data swimlanes

3. MVP as risk reduction: remediation phases

Bringing it back to the table

Discover latest posts from the NSIDE team.

Keep Learning

What is Self-Supervised Learning with Kevin McCall

Revolutionizing AI: Snowflake Arctic and Microsoft Phi-3

Navigating Abroad Season 3, Episode 9: Daniela C. Merlano

Navigating Abroad Season 4, Episode 3: Jean-Paul Paoli

Navigating Abroad Season 4, Episode 2: Fred Werner

Navigating Abroad Season 3, Episode 1: Fred Werner