MatchMetrics

The A/B tests behind every swipe, match, and paywall.

Each week we break down a real experiment from the dating apps you use. See both variants. Guess which one won. Find out what the data really says.

11,400+ product people read this 5 min per test Wednesdays at 9am ET
Every test we feature meets these standards
95%+Confidence level
80%+Statistical power
1,000+Participants per variant
14–28 daysRun window
Z-testDouble-sided
The voices we work to

Three lines that shaped how we run experiments.

We didn't invent this discipline. We're just translating the best of it for the dating-product world.

"

Our success at Amazon is a function of how many experiments we do per year, per month, per week, per day.

Jeff BezosFounder, Amazon
"

More experiments, more money. More success, more growth. It's a matter of speeding up. If you are slow, you get eaten.

Ton WesselingFounder, Online Dialogue · Organiser of The Conference
"

Within hours the new headline triggered the "too good to be true" alert. The change had increased revenue by 12% — over $100M a year. It was the best revenue idea in Bing's history, and almost no one believed in it before the test.

Ronny KohaviFormerly GM, Analysis & Experimentation, Microsoft Bing
$100M

The $100M headline that almost died in the backlog.

In 2012 a Microsoft engineer on the Bing ads team proposed moving a few words of ad text from the secondary line into the headline. The idea sat in the backlog for months. Nobody believed in it.

When a developer finally implemented it on a slow afternoon, revenue spiked so abnormally fast it tripped the "too good to be true" alarm — the team assumed it was a tracking bug. It wasn't.

The change lifted ad revenue by 12%. In US dollars: more than $100 million a year, in the United States alone. It became the single highest-ROI feature in Bing's history.

The lesson isn't about ad copy. It's that the team's best estimate of which ideas were worth shipping was off by orders of magnitude. Ship the test. Your intuition is not as good as you think it is.

Source: Ronny Kohavi, formerly GM, Analysis & Experimentation, Microsoft Bing

Don't leave money on the table.

We run the same playbook for dating, social discovery, and community apps — fractional design + experimentation leadership.

Start a project →
Test of the Week

Which Tinder Gold paywall made more people pay?

Tinder ran this 50/50 split on iOS in Q2 2025. Same offer, same price. One headline beat the other by a margin that surprised the growth team.

Paywall Tinder iOS Sample: 1.2M users · 21 days

Curiosity gap vs. value framing on the Gold upsell

Same screen, same price, same CTA color. Only the headline changed.

Hypothesis If we replace the value-stack headline with a curiosity-gap headline, among first-time paywall viewers on iOS, then we will see higher Gold subscription start rate, because unresolved social signals about oneself override rational feature evaluation at the moment of payment.
52% guessed correctly
A Variant A · Curiosity

Headline: "See who already likes you"

9:41•••
Tinder Gold
See who already likes you
Anna
Marija
Lea
Continue · $14.99/mo
+18.4%
Lift in subscription start rate
B Variant B · Value

Headline: "Get unlimited likes & 5 Super Likes a day"

9:41•••
Tinder Gold
Get unlimited likes &
5 Super Likes/day
✓ Unlimited likes
✓ 5 Super Likes daily
✓ 1 free Boost monthly
✓ Passport to anywhere
✓ Rewind your last swipe
Continue · $14.99/mo
Baseline
Control variant
Which paywall converted better?
Variant A won by +18.4%
+18.4%Subscription lift
52%Voters correct
99.9%Statistical confidence
~$11MProjected annual ARR delta

Why it won

The "See who likes you" headline activates an unresolved curiosity loop — there are people who already like you, and you can know who. The value-stack version is rational and complete, which actually closes the loop before payment.

This is the same psychology behind unread notification badges and Hinge's "Who you've liked" tab. People will pay to resolve unfinished business about themselves.

Takeaway → On paywalls for social products, sell the unresolved social signal, not the feature list. Feature stacks belong on the second screen, after the user has committed to "yes, tell me more".
How we pick what to feature

Five inputs decide whether a test makes the issue.

Adapted from Ton Wesseling's customer-behaviour-study framework. We don't run highlights from press releases — every test passes all five.

V1

View

Behaviour data

GA4 funnels, heat maps, scroll maps, session recordings. Where users actually drop, click, and bail.

V2

Voice

What users say

Support tickets, in-app surveys, feedback widgets, moderated interviews. The story behind the numbers.

V3

Validated

Past tests

The team's internal knowledge base of wins, losses, and inconclusive results. Don't re-run yesterday's experiment.

V4

Verified

Outside research

Peer-reviewed behavioural science, competitor monitoring (Visualping, the optimizer plugin), industry benchmarks.

V5

Value

Strategic fit

Mission, vision, KPIs. A test that wins on a metric the business doesn't care about is a test that loses.

Recent Tests

Four more experiments from the archive.

Each one is a real test pattern run by Hinge, Bumble, Tinder, or one of the smaller niche apps. Click to vote.

Onboarding Hinge Sample: 280K signups · 14 days

6 required prompts vs. 3 required prompts at signup

More content = better profiles. But does it cost completion?

38%guessed correctly
A Variant A · 6 prompts

Users must answer 6 prompts before they can swipe.

9:41•••
Answer 6 prompts
Step 4 of 8
My simple pleasures are…
A shower thought I recently had…
Don't hate me if I…
The way to win me over is…
Continue (2 more)
Baseline
Onboarding completion
B Variant B · 3 prompts

Users must answer 3 prompts; rest are optional later.

9:41•••
Pick 3 prompts
Step 4 of 5
My simple pleasures are…
A shower thought I recently had…
Don't hate me if I…
Almost done →
+27.1%
Onboarding completion
Which onboarding completed more?
Variant B won by +27.1%
+27.1%Completion rate
+9.4%D7 retention
~0%Change in match rate
38%Voters correct

Why it won

Every additional required field in onboarding is a tax on commitment. The match rate didn't drop, which means the marginal 3 prompts weren't carrying real signal — they were carrying friction.

The bigger surprise was D7 retention. More users got to first-swipe faster, which is the only state where the product actually has a chance to hook them.

Takeaway → "Better profiles" is a metric. "More completed profiles" is a different metric. They rarely agree. Optimize for the funnel step closest to your activation event, and make the rest optional with strong post-activation nudges.
Chat Bumble Sample: 410K matches · 30 days

Pre-filled icebreaker suggestion vs. blank composer

Helping users write first messages should boost reply rate. Right?

71%guessed correctly
A Variant A · Pre-fill

Composer pre-filled with an AI-suggested opener about the match's profile.

9:41•••
Anna
She mentioned loving hiking — break the ice with this opener:
"Hey Anna! Saw you love hiking — what's the best trail you've done recently? 🥾"
Send
Rewrite
Baseline
Reply rate
B Variant B · Blank

Blank composer with a soft placeholder. User writes their own first message.

9:41•••
Anna
You matched with Anna! Say hi 👋
Write a message…
Send
+12.3%
Reply rate
Which composer got more replies?
Variant B won by +12.3%
+12.3%Reply rate (first msg)
+4.1%Messages per match
−6.8%First-message send rate
71%Voters correct

Why it won

The pre-fill increased the volume of sent messages but tanked the reply rate. Recipients could pattern-match an AI-written opener within two lines — the formulaic "I saw you love X, what's the best Y" structure is now widely recognized as a bot tell.

Authenticity is the actual scarce good in dating product UX. Anything that makes one user's effort feel cheap reduces the other user's reason to respond.

Takeaway → Reducing sender friction is not the same as increasing match quality. In two-sided social products, optimize the recipient's reply incentive — not just the sender's send rate.
Notification Generic dating app Sample: 96K matches · 10 days

Full-screen "It's a match!" celebration vs. subtle toast

Modal interruption is bad UX. Except, sometimes, when it isn't.

61%guessed correctly
A Variant A · Full-screen

Match triggers a full-screen takeover with both faces and a "Send Message" CTA.

It's a Match!

You and Anna liked each other

Send a message
Keep swiping
+34.2%
24h message send rate
B Variant B · Toast

Subtle toast at the top of the swipe deck. User keeps swiping uninterrupted.

9:41•••
New match: Anna
Lea, 27
2 km away
Baseline
Control variant
Which match treatment drove more first messages?
Variant A won by +34.2%
+34.2%Msg sent in first 24h
+11.6%Conversation depth
−3.1%Swipes per session
61%Voters correct

Why it won

The fewer-swipes cost was real — the modal interrupts the swipe flow. But the message-send lift more than compensated. Emotional moments (a match) need a visual peak; without it, the match becomes just another notification in a stack of 40.

This is the same principle behind Duolingo's streak fireworks. Don't optimize away the celebration — it's the part of the loop people return for.

Takeaway → Friction in the right place is fuel for the next session. Cut friction from acquisition flows; add friction (or ceremony) to moments of payoff.
Premium feature Tinder Sample: 740K free users · 28 days

Super Like CTA: scarcity framing vs. benefit framing

Free users see one Super Like per day. What's the best way to label that button?

44%guessed correctly
A Variant A · Scarcity

Label: "Send Super Like (1 free today)"

9:41•••
Marija, 29
Architect · 3 km
⭐ Send Super Like (1 free today)
+41.7%
Super Like usage
B Variant B · Benefit

Label: "Stand out — try Super Like"

9:41•••
Marija, 29
Architect · 3 km
⭐ Stand out — try Super Like
Baseline
Control variant
Which CTA drove more Super Like usage?
Variant A won by +41.7%
+41.7%Super Like usage
+8.2%D2 retention (free)
+5.4%Super Like pack purchase
44%Voters correct

Why it won

The "1 free today" framing does three things at once: it removes the "is this going to cost me?" uncertainty, it triggers loss aversion (use it or lose it), and it educates the user that the resource is replenishable. The benefit framing did none of these and required a second cognitive step to evaluate.

Once users tried it once, packs became plausibly worth buying. The CTA was a wedge for the entire Super Like monetization line.

Takeaway → For freemium tap-once features, name the quantity, not the benefit. Users decide on resource consumption faster than they decide on outcomes.
From the archives Booking.com · Hostels Sitewide test · multi-week run

The word that quietly tanked an entire hostel category

Trust signals should lift conversion. So why did mentioning "safe" do the opposite?

Hypothesis If we remove explicit mentions of "safe" from hostel listings and replace them with implicit safety signals (cleanliness, 24/7 staff, neighbourhood quality), among users browsing hostel inventory, then we will see higher sitewide booking conversion, because explicitly naming safety activates the implicit unsafety frame in the reader's mind.
29%guessed correctly
A Variant A · Explicit

Listings include direct phrases like "Really safe hostel", "Safe neighbourhood".

9:41•••
Sunset Backpackers
Lisbon · ⭐ 8.6
Really safe hostel Safe neighbourhood 24/7 staff
"Stay in a safe hostel with security and a safe neighbourhood. Highly rated for safety…"
Book from €18
Baseline
Control variant
B Variant B · Implicit

Same hostels, but safety is implied via cleanliness, 24/7 staff, neighbourhood — never named.

9:41•••
Sunset Backpackers
Lisbon · ⭐ 8.6
Excellent cleanliness 24/7 staff Wonderful location
"Wonderful location in central Lisbon, 24/7 reception, excellent cleanliness rated by guests…"
Book from €18
+6.8%
Sitewide conversion lift
Which listing copy converted better?
Variant B won by +6.8% — sitewide
+6.8%Sitewide conversion
+11.2%Affected category
29%Voters correct
~Multi-€MAnnual impact

Why it won

Behavioural research established years earlier that safety is the most important attribute for hostel bookings — guests sleep in 8-to-16-bed dorms with strangers. So the team did the rational thing: surface safety. And conversions dropped.

The word "safe" is a stop word. Naming it explicitly activates the question it's meant to answer — is this place unsafe? Implicit signals (excellent cleanliness, 24/7 staff, wonderful location) carry the same trust content without lighting up the worry. People infer safety; they don't want to be reassured of it.

Takeaway → Don't name the concern you're trying to alleviate. Surface shoulder signals that imply it. The same logic applies to "no spam", "easy cancellation", and "no commitment" — every one of those is a stop word that can tank the funnel that mentions it.

Story originally documented by Ton Wesseling, Online Dialogue.

The golden detail Auth screen · niche dating app Sample: 184K new users · 18 days

"Register / Log In" vs. "Sign Up / Sign In" — does CTA pairing matter?

Two CTAs. Same intent. One pair is mismatched in verb family — the other reads as one system. Does that 1mm of polish move the needle?

Hypothesis If we replace mismatched auth CTAs ("Register" + "Log In") with a paired verb family ("Sign Up" + "Sign In"), among first-time visitors landing on the auth screen, then we will see higher successful sign-in completion rate, because terminologically consistent pairs reduce micro-cognitive friction at the exact moment the user is choosing an identity path.
34%guessed correctly
A Variant A · Mismatched pair

Primary CTA: "Register". Secondary: "Log In". Two verb families.

9:41•••
Welcome 👋
Find someone worth swiping for.
your@email.com
Password
Register
Already a member? Log In
Baseline
Successful sign-in rate
B Variant B · Paired family

Primary CTA: "Sign Up". Secondary: "Sign In". One verb family, mirrored weight.

9:41•••
Welcome 👋
Find someone worth swiping for.
your@email.com
Password
Sign Up
Already with us? Sign In
+7.2%
Successful sign-in rate
Which CTA pairing produced more completed sign-ins?
Variant B won by +7.2%
+7.2%Successful sign-in rate
−14.6%"Wrong button" taps
+3.1%D1 retention
34%Voters correct

Why it won — Jović's read

"Register" and "Log In" come from two different mental dictionaries — one is administrative ("register a vehicle"), the other is conversational. Pairing them forces the user to do a half-second translation: is "Log In" the partner of "Register", or did I miss the right button? That hesitation costs you returning users who tap the wrong CTA and bounce back to the home screen.

"Sign Up" and "Sign In" share a verb stem and a rhythm. The eye reads them as a matched pair before the brain finishes parsing. The lift isn't coming from new users converting better — it's coming from returning users no longer mis-tapping. The ~15% drop in wrong-button taps is the real story behind the +7.2%.

This is what we call the golden detail. It's not a redesign. It's two words. But on an auth screen — the highest-traffic screen in the entire product — every micro-friction compounds across millions of sessions. Terminological consistency isn't polish; it's a load-bearing piece of the funnel.

Takeaway → Audit every paired CTA in your product (auth, settings, billing, account). If the verbs come from different families, they're silently leaking conversions. The fix costs one PR and zero engineering risk — and it's almost always worth running as an A/B before rolling out, because the size of the lift tells you how much of your funnel was held together with sticky tape.
The Legend File · Issue №1

The $100M headline that almost died in the backlog.

In 2012 a Microsoft engineer on the Bing ads team proposed moving a few words of ad text from the secondary line into the headline. The idea sat in the backlog for months. Nobody believed in it.

When a developer finally implemented it on a slow afternoon, revenue spiked so abnormally fast it tripped the "too good to be true" alarm — the team assumed it was a tracking bug. It wasn't.

The change lifted ad revenue by 12%. In US dollars: more than $100 million a year, in the United States alone. It became the single highest-ROI feature in Bing's history.

The lesson isn't about ad copy. It's that the team's best estimate of which ideas were worth shipping was off by orders of magnitude. Ship the test. Your intuition is not as good as you think it is.

— Source: Ronny Kohavi, formerly GM, Analysis & Experimentation, Microsoft Bing.
$100M
in annual revenue from a JS change that nearly nobody believed in
Work with us

Want this kind of testing on your product?

We run the same playbook for dating, social discovery, and community apps — fractional design + experimentation leadership. Tell us what you're working on and we'll reply within 48 hours.

  • Audit + behavioural-data deep dive
  • Mobile-first sign-up & paywall redesign
  • Statistically rigorous A/B testing (95%+ confidence)