RESEARCH COUNCIL

STRESS-TEST YOUR THESIS BEFORE THE REVIEWERS DO

ADVERSARIALStress-Test Memo COUNCILDecision Memo

Your research question as the prompt. Your papers as context. Domain experts, methodologists, and a devil's advocate that find the holes before peer review does. Structured deliberation that turns intuition into defensible argument.

THE IMPACT

🔬

Peer review before peer review

Simulated reviewers with different expertise challenge your methodology, question your assumptions, and demand evidence. The weaknesses they find are the ones a journal reviewer would flag — but you find them first.

📚

Cross-disciplinary synthesis

Upload papers from different fields. Agents trained in statistics, domain theory, and epistemology cross-reference findings. They catch the paper in sociology that contradicts your physics assumption.

Methodology audit trail

Every methodological choice — sample size, statistical test, control variable — gets challenged and justified. The deliberation transcript is your methodology defense, written before the reviewer asks.

THE COUNCIL

METHOD-01T1

Research Methodologist

Evaluates study design, sampling strategy, statistical validity. Questions whether your method actually answers your research question. Thinks in p-values and effect sizes.

DOMAIN-01T1

Domain Expert

Deep knowledge of the specific field. Knows the canonical papers, the ongoing debates, the unspoken assumptions. Challenges whether your contribution is actually novel.

STATS-01T2

Statistical Analyst

Runs the numbers mentally. Checks for multiple comparisons, confounders, selection bias. Asks "did you correct for that?" before you publish something that doesn't replicate.

EPISTEM-01T2

Epistemologist

Questions the foundations. What counts as evidence? Is your theoretical framework internally consistent? Are you confirming what you already believe? The philosophical conscience of the council.

LIT-01T2

Literature Scout

Has read everything relevant. Cross-references your claims against the existing body of work. Finds the paper from 2018 that already showed what you think is novel.

IMPACT-01T3

Impact Assessor

Evaluates real-world significance. A statistically significant result isn't always a meaningful one. Asks "so what?" and demands practical implications.

DEVIL-01T3

Devil's Advocate

The Reviewer 2 you need. Attacks the emerging consensus, finds alternative explanations, demands robustness checks. If your argument survives DEVIL-01, it survives peer review.

THE CHARACTERS

Character-agents bring perspective that pure expertise misses. REVIEWER-01 is the hostile reviewer you need before submission — blunt, skeptical, looking for rejection reasons. ADVISOR-01 has decades of experience and knows which battles are worth fighting. STUDENT-01 tests whether your argument actually makes sense to someone outside the bubble.

REVIEWER-01T1

Simulated Reviewer

Plays the role of a skeptical journal reviewer. Reads your abstract, methods, and results with the specific goal of finding reasons to reject. Blunt, rigorous, occasionally unfair — like real reviewers.

◈ CHARACTER
ADVISOR-01T2

PhD Advisor

The experienced supervisor who has seen 100 theses. Knows which shortcuts work and which destroy credibility. Asks "have you considered..." with 30 years of context.

◈ CHARACTER
STUDENT-01T3

Fresh Eyes

A smart person from outside your field. If they can't follow your argument, your paper has a clarity problem. Asks the "obvious" questions that experts forget to ask.

◈ CHARACTER

EXAMPLE DELIBERATION

QUESTION

Is our finding that remote teams produce 23% more creative output than co-located teams robust enough to publish, or are there confounders we're missing?

CONTEXT

N=142 software teams across 3 companies. Creativity measured via patent applications and peer-rated novelty scores. Remote teams: 78 teams, co-located: 64 teams. Data collected 2023-2025. Controlled for team size, industry, and seniority. Pre-registered hypothesis.

ROUND 1

STATS-01 (Statistical Analyst)

"The 23% effect size is notable, but your groups are unbalanced (78 vs 64). Have you run sensitivity analyses? Also — patent applications have a 12-18 month lag. Your 2-year window may capture pre-remote patents attributed to remote teams. This is a serious threat to validity."

DOMAIN-01 (Domain Expert)

"The remote work creativity literature is contested. Brucks & Levav (2022, Nature) found remote pairs generate fewer ideas. Yang et al. (2022, Nature Human Behaviour) found weaker network ties. Your result contradicts both — which demands a strong explanation, not just a p-value."

METHOD-01 (Research Methodologist)

"Selection bias is the elephant. Companies that went remote may have already been more innovative (tech-forward, flexible culture). You controlled for industry and seniority but not for organizational culture or self-selection into remote work. Without randomization, causation is off the table."

REVIEWER-01 (Simulated Reviewer)

"I would desk-reject this in current form. "Creativity" operationalized as patent applications is domain-specific (software only) and lag-contaminated. Peer-rated novelty is subjective and unblinded. You need a cleaner operationalization before claiming a 23% effect across "teams.""

ROUND 2

EPISTEM-01 (Epistemologist)

"The deeper question: what do you mean by "creative output"? Patents measure formalized invention, not creativity. Peer ratings measure perceived novelty, which correlates with presentation skill. Neither captures the actual cognitive process. Your construct validity is weak."

LIT-01 (Literature Scout)

"Found 3 relevant papers you're not citing. Gibbs (2024) showed a similar effect but attributed it to schedule flexibility, not remote work per se. If Gibbs is right, your moderator is working hours, not location. Also: Bloom et al.'s original WFH study showed productivity gains, not creativity — don't conflate."

DEVIL-01 (Devil's Advocate)

"Alternative explanation: companies that allowed remote work 2023-2025 were overwhelmingly tech companies in growth phases. Growth-phase companies file more patents by default (more R&D budget, more headcount). Your "remote effect" might just be a "well-funded company" effect. Test this."

ADVISOR-01 (PhD Advisor)

"I've reviewed enough papers to know: a 23% effect in organizational behavior research is suspiciously large. Either you've found something real — in which case you need extraordinary evidence — or there's an uncontrolled variable inflating the effect. My advice: run a subsample analysis within a single company where both modes exist. If the effect holds intra-company, you have a story."

IMPACT-01 (Impact Assessor)

"Even if the effect is real, the practical implication — "go remote to be more creative" — is too simplistic for policy. You need to identify the mechanism. Is it autonomy? Reduced interruptions? Asynchronous deep work? Without the mechanism, your finding is a correlation that managers will misapply."

SYNTHESIS
ASSESSMENT

The council identifies 4 critical threats to the current manuscript: (1) patent lag contamination inflating the remote group, (2) organizational culture as unmeasured confounder, (3) weak construct validity for "creativity", and (4) contradiction with Brucks & Levav and Yang et al. without adequate theoretical reconciliation.

RECOMMENDATION

Do not submit in current form. Three actions: (a) run intra-company subsample analysis per ADVISOR-01, (b) add Gibbs 2024 and reframe around schedule flexibility as potential moderator, (c) replace patent count with a cleaner creativity measure or triangulate. If the effect survives these tests, the paper is strong. If not, you've saved yourself a rejection.

DISSENT

STATS-01 notes the pre-registration protects against p-hacking accusations but not confounders. DOMAIN-01 argues the contradiction with Nature papers requires a dedicated "reconciliation" section, not a footnote.

CONFIDENCE

0.85 — the methodological concerns are specific and addressable. The core research question is valuable.

YOUR WORKFLOW

01

UPLOAD YOUR PAPERS

Your manuscript draft, cited papers, and relevant datasets. RAG chunks them and makes them citable by agents during deliberation.

02

ASSEMBLE YOUR REVIEWERS

Methodologist, domain expert, statistician, literature scout. Add a simulated reviewer to pre-empt the real ones. Customize their expertise to your specific field.

03

DELIBERATE YOUR THESIS

Adversarial mode: every claim gets challenged. Standard mode: structured evaluation with convergence. Cross-ecosystem: pit your theoretical framework against a competing one.

04

ITERATE ON FEEDBACK

Start 1-on-1 discussions with specific agents. Ask STATS-01 about your power analysis. Ask REVIEWER-01 what would change their mind. Dig deeper than the synthesis.

05

EVOLVE YOUR COUNCIL

After each paper, agents that gave useful critique rise. Those that missed real issues fade. Your research council becomes calibrated to your field, your methods, your blind spots.