Customer Lifetime Value Prediction Model Builder

Build a predictive customer lifetime value (CLV) model using historical transaction data to forecast future revenue, segment customers by value tiers, and inform acquisition spend decisions.

Prompt Template

You are a data scientist specializing in customer analytics and predictive modeling. Build a customer lifetime value (CLV) prediction model from historical data.

**Business type:** [e.g., subscription SaaS with monthly and annual plans]
**Data available:** [e.g., 3 years of transaction history, 15,000 customers, fields: customer_id, signup_date, plan_type, monthly_revenue, churn_date, support_tickets, feature_usage_score]
**Current state:** [e.g., we calculate CLV retrospectively but can't predict it for new customers]
**Tools:** [e.g., Python (pandas, scikit-learn, lifetimes library) or SQL + spreadsheet]
**Key business questions:**
  - What is a new customer worth in their first 12/24/36 months?
  - Which customer segments have highest/lowest CLV?
  - How much should we spend to acquire a customer in each segment?

Build:
1. **CLV Methodology Selection** — compare approaches with recommendation:
   - Historical (simple, backward-looking)
   - Probabilistic (BG/NBD + Gamma-Gamma)
   - ML-based (regression/survival analysis)
2. **Data Preparation Pipeline** — code for:
   - Feature engineering from raw transactions
   - Handling censored data (customers still active)
   - Train/test split strategy for temporal data
3. **Model Implementation** — full code for recommended approach:
   - Training pipeline
   - Prediction for individual customers
   - Confidence intervals on predictions
4. **Customer Segmentation** — value-based tiers:
   - Tier definitions and thresholds
   - Profile of each tier (demographics, behavior)
   - Migration patterns between tiers
5. **Business Application** — translating model to decisions:
   - Maximum CAC by segment
   - Retention investment prioritization
   - Revenue forecasting from current cohort
6. **Model Monitoring** — how to track prediction accuracy over time

Example Output

CLV Prediction Model: SaaS Subscription

Methodology: Survival Analysis + Revenue Regression

Why not simpler approaches?

| Method | Accuracy | Handles Censoring | New Customer Prediction |

|--------|----------|-------------------|------------------------|

| Historical average | Low | No | No |

| BG/NBD | Medium | Yes | Limited |

| **Survival + Regression** ✓ | **High** | **Yes** | **Yes, from Day 1** |

Implementation

from lifelines import CoxPHFitter, KaplanMeierFitter

import pandas as pd

# Feature engineering

df['tenure_months'] = (df['churn_date'].fillna(pd.Timestamp.now()) - df['signup_date']).dt.days / 30

df['is_churned'] = df['churn_date'].notna().astype(int)

df['avg_monthly_revenue'] = df.groupby('customer_id')['monthly_revenue'].transform('mean')

df['support_ticket_rate'] = df['support_tickets'] / df['tenure_months']

# Survival model for retention probability

cph = CoxPHFitter()

cph.fit(df[['tenure_months', 'is_churned', 'plan_type_encoded',

'feature_usage_score', 'support_ticket_rate']],

duration_col='tenure_months', event_col='is_churned')

# Predict 36-month survival curve per customer

surv_funcs = cph.predict_survival_function(df_new_customers)

# CLV = sum of (survival_probability_month_t × expected_revenue_month_t)

clv_36m = (surv_funcs * monthly_revenue_predictions).sum(axis=0)

Customer Value Segments

| Tier | CLV Range (36mo) | % Customers | Avg Revenue/Mo | Max CAC |

|------|-----------------|-------------|----------------|----------|

| Platinum | >$2,000 | 8% | $89 | $600 |

| Gold | $800-$2,000 | 22% | $49 | $250 |

| Silver | $300-$800 | 45% | $29 | $100 |

| Bronze | <$300 | 25% | $19 | $40 |

Key Insight

**Feature usage score is the #1 predictor of CLV** — customers who use 5+ features in their first 30 days have 3.2x higher CLV than those using ≤2 features. → Invest in onboarding, not just acquisition.

Tips for Best Results

  • 💡Always account for censored data (active customers who haven't churned yet) — ignoring them drastically underestimates CLV
  • 💡Use temporal train/test splits, not random — training on 2023 data and testing on 2024 mimics real prediction conditions
  • 💡CLV predictions are most valuable when tied to action — if you can't change your CAC or retention strategy based on the segments, the model is academic
  • 💡Recalculate CLV quarterly and track prediction drift — customer behavior changes and your model needs to keep up