Polymarket Arbitrage Bible: The Real Edge is in the Math Infrastructure
Original Title: The Math Needed for Trading on Polymarket (Complete Roadmap)
Original Author: Roan, Crypto Analyst
Translation, Annotations: MrRyanChi, insiders.bot
During the establishment of @insidersdotbot, I had in-depth discussions with many high-frequency market making teams and arbitrage teams, among which, the greatest need was how to implement arbitrage strategies.
Our users, friends, and partners are all exploring the complex and multi-dimensional trading route of Polymarket arbitrage. If you are an active Twitter user, then I believe you have also come across tweets like "I made money from the prediction market through the XX arbitrage strategy."
However, most articles have overly simplified the underlying logic of arbitrage, turning arbitrage into a trading pattern of "I can do it, and you can too" and "using Clawdbot can solve it," without going into detail on how to systematically understand and develop your own arbitrage system.
If you want to understand how arbitrage tools on Polymarket make money, this article is the most comprehensive interpretation I have seen so far.
Since the original English text has many overly technical parts that require further research, I have helped everyone with restructuring and supplementation, making it easy for everyone to understand all the key content in just this one article without the need to stop and look up information.
Polymarket Arbitrage Is Not a Simple Math Problem
You see a market on Polymarket:
YES price $0.62, NO price $0.33.
You think: 0.62 + 0.33 = 0.95, less than 1 dollar, there is arbitrage opportunity! Buy YES and NO at the same time, spend $0.95, regardless of the outcome, you can get back $1.00, netting $0.05.
You are correct.
But the problem is - while you are still manually calculating this addition problem, the quant system is doing something completely different.
They are simultaneously scanning 17,218 conditions, spanning 2^63 possible result combinations, finding all pricing inconsistencies in milliseconds. By the time you place the two orders, the price difference has already disappeared. The system has long found the same loophole in dozens of related markets, calculated the optimal position size considering order book depth and fees, executed all trades in parallel, and then moved funds to the next opportunity. [1]
The Gap is Not Just About Speed. It's About Mathematical Infrastructure.
Chapter 1: Why "Addition" Is Not Enough — The Marginal Polytope Problem
The Single Market Fallacy
Let's start with a simple example.
Market A: "Will Trump Win the Election in Pennsylvania?"
YES price $0.48, NO price $0.52. Adding up to exactly $1.00.
Seems perfect, no arbitrage opportunity, right?
Wrong.
Adding one more market changes the game.
Now consider Market B: "Will the Republican Party Lead by Over 5 Points in Pennsylvania?"
YES price $0.32, NO price $0.68. Also adding up to $1.00.
Each market seems "normal" on its own. But there is a logical dependency here:
The U.S. presidential election is not a national popular vote but a state-by-state tally. Each state is an independent "battleground," where whoever gets more votes in that state takes all of the state's electoral votes (winner takes all). Trump is the Republican candidate. So, "Republicans winning in Pennsylvania" and "Trump winning in Pennsylvania" — are the same thing. If the Republicans win by over 5 points, it not only means Trump won Pennsylvania but won by a large margin.
In other words, the YES in Market B (Republican landslide) is a subset of the YES in Market A (Trump victory) — a landslide surely means victory, but victory does not necessarily mean a landslide.
And this logical dependency creates an arbitrage opportunity.
It's like betting on two things — "Will it rain tomorrow?" and "Will there be a thunderstorm tomorrow?"
If there is a thunderstorm, it will definitely rain (a thunderstorm is a subset of rain). So, the price of "Thunderstorm YES" cannot be higher than "Rain YES." If the market prices violate this logic, you can buy low and sell high at the same time, earning a "risk-free profit," and that's arbitrage.
Exponential Explosion: Why Brute Force Search Doesn't Work
For any market with n conditions, there are theoretically 2^n possible price combinations.
Sound manageable? Consider a real-life example.
2010 NCAA Tournament Market [2]: 63 games, each with two possible outcomes. The number of possible result combinations is 2^63 = 9,223,372,036,854,775,808—over 9 quintillion possibilities. There are over 5000 markets available.
How big is 2^63? If you checked 1 billion combinations per second, it would take around 292 years to check them all. That's why "brute force search" is completely impractical here.
Checking each combination one by one? Computationally infeasible.
Now consider the 2024 US election. A research team identified 1,576 pairs of potentially correlated markets. If each pair has 10 conditions, that's 2^20 = 1,048,576 combinations to check per pair. Multiply that by 1,576 pairs. Your laptop would still be computing while the election results are already out.
Integer Programming: Using Constraints Instead of Enumeration
The solution for quant systems is not to "enumerate faster," but to not enumerate at all.
They use Integer Programming to describe "which results are valid."
Consider a real example. Duke vs. Cornell game market: each team has 7 markets (0 to 6 wins), totaling 14 conditions, with 2^14 = 16,384 possible combinations.
But there's one constraint: they can't both win 5 or more games because they would meet in the semifinals (only one can advance).
How does Integer Programming handle this? Just three constraints:
· Constraint one: Out of Duke's 7 markets, exactly one is true (Duke can only have one final win count).
· Constraint Two: Among Cornell's 7 spreads, exactly one is true.
· Constraint Three: Duke Wins 5 Games + Duke Wins 6 Games + Cornell Wins 5 Games + Cornell Wins 6 Games ≤ 1 (They cannot all win that many).
Three linear constraints, replacing 16,384 brute-force checks.

Brute Force Search vs Integer Programming
In other words, brute force search is like reading through every word in the dictionary to find a word. Integer programming is like flipping directly to that alphabetical page. You don't need to check all possibilities, you just need to describe "what a valid answer looks like," and then let the algorithm find pricing that violates the rules.
Real Data: 41% of Market Exhibits Arbitrage [2]
The original text mentioned that the research team analyzed data from April 2024 to April 2025:
• Checked 17,218 conditions
• 7,051 conditions exhibited single-market arbitrage (41%)
• Median price deviation: $0.60 (should be $1.00)
• 13 confirmed cross-market exploitable arbitrages
A median deviation of $0.60 means that the market regularly deviates by 40%. This is not "close to efficient," this is "readily exploitable on a large scale."
Chapter 2: Bregman Projection - Calculating the Optimal Arbitrage Trade
Identifying arbitrage is one problem. Calculating the optimal arbitrage trade is another.
You can't simply "take an average" or "fine-tune the price." You need to project the current market state onto a no-arbitrage feasible space while preserving the informational structure of prices.
Why "Straight-Line Distance" Doesn't Work
The most intuitive idea is: find the "legal price" closest to the current price and then trade the difference.
In mathematical terms, this is to minimize the Euclidean distance: ||μ - θ||²
But there is a fatal flaw: it treats all price changes equally.
Going from $0.50 to $0.60, and going from $0.05 to $0.15, both entail a 10-cent increase. But their informational content is vastly different.
Why? Because prices represent implied probabilities. Going from 50% to 60% is a mild shift in viewpoint. Going from 5% to 15% is a massive belief reversal—an event that was nearly impossible suddenly becoming "somewhat likely."
Imagine you are weighing yourself. Going from 70 kilograms to 80 kilograms, you might say "I gained a bit of weight." But going from 30 kilograms to 40 kilograms (if you are an adult) would be "going from near death to severe malnutrition." Despite both being a 10-kilogram change, the meaning is entirely different. Prices operate in the same way—the closer a price change is to 0 or 1, the more informational content it holds.
Bregman Divergence: The Right "Distance"
Polymarket's market makers use LMSR (Logarithmic Market Scoring Rule)[4], where prices fundamentally represent probability distributions.
In this framework, the correct distance metric is not the Euclidean distance but the Bregman Divergence.[5]
For LMSR, the Bregman Divergence becomes the KL Divergence (Kullback-Leibler Divergence)[6]—a measure of the "informational distance" between two probability distributions.
You don't need to remember the formula. You just need to understand one thing:
The KL Divergence automatically assigns higher weight to "movements near extreme prices." A change from $0.05 to $0.15 is considered "farther" under KL Divergence than a change from $0.50 to $0.60. This aligns perfectly with our intuition—movements at extreme prices signify larger informational impacts.
A good example is the recent prediction market where Axiom made a comeback against Meteora at the last moment, also driven by extreme price movements.

Bregman Projection vs Euclidean Projection
Arbitrage Profit = Distance of Bregman Projection
This is one of the key conclusions the original author refers to throughout the paper:
Any maximum guaranteed profit that can be obtained from a trade is equal to the distance from the current market state to the arbitrage-free space's Bregman projection.
In other words, the further the market price is from the "valid space," the more money can be made. And the Bregman projection will tell you:
1. What to trade (the projection direction tells you the trading direction)
2. How much to trade (considering order book depth)
3. How much can be earned (the projection distance is the maximum profit)
The top-ranked arbitrageur made $2,009,631.76 in a year. [2] His strategy is to solve this optimization problem faster and more accurately than anyone else.

Marginal Polytope and Arbitrage
For example, imagine you are standing on top of a mountain, and at the foot of the mountain, there is a river (arbitrage-free space). Your current position (current market price) is at a distance from the river.
The Bregman projection helps you find the "shortest path from your position to the riverbank" — but not the straight-line distance, rather the shortest path considering the terrain (market structure). The length of this path is the maximum profit you can earn.
Chapter 3: Frank-Wolfe Algorithm — Turning Theory into Executable Code
So, now you know: to calculate the optimal arbitrage, you need to perform the Bregman projection.
But here's the problem — calculating the Bregman projection directly is infeasible.
Why? Because the arbitrage-free space (marginal polytope M) has an exponentially large number of vertices. Standard convex optimization methods require accessing the entire constraint set, which means enumerating every valid result. As we mentioned earlier, this is impossible in a large-scale scenario.
The Core Idea of Frank-Wolfe
The brilliance of the Frank-Wolfe algorithm [7] lies in: it doesn't try to solve the entire problem at once, but rather approaches the solution step by step.
Here's how it works:
Step 1: Start from a small known feasible set.
Step 2: Optimize on this small set to find the current best solution.
Step 3: Find a new feasible solution using integer programming and add it to the set.
Step 4: Check if it's close enough to the optimal solution. If not, go back to Step 2.
Each iteration, the set only grows by one vertex. Even after 100 iterations, you only need to track 100 vertices—not 2^63.

Frank-Wolfe Iteration Process
Imagine you're in a huge maze trying to find the exit.
The brute force method would be to explore every path. Frank-Wolfe's method is: take a random path, then at each junction, ask an "oracle" (an integer programming solver): "From here, which direction is most likely to lead to the exit?" and then take that step. You don't need to explore the entire maze, just make the right choices at key points.
Integer Programming Solver: The "Oracle" at Each Step
Each iteration of Frank-Wolfe requires solving an integer linear programming problem. This is theoretically NP-hard (meaning "no known fast general algorithm").
But modern solvers, like Gurobi[8], can efficiently solve well-structured problems.
The research team used Gurobi 5.5. Actual solving times:
• Early Iterations (few matches finished): Less than 1 second
• Mid-game (30-40 matches finished): 10-30 seconds
• Endgame (50+ matches have concluded): Less than 5 seconds
Why is the endgame faster? Because as the match results are determined, the feasible solution space shrinks. With fewer variables and tighter constraints, the solution is found more quickly.
Gradient Explosion Issue and Barrier Frank-Wolfe
The standard Frank-Wolfe faces a technical issue: when the price approaches 0, the gradient of LMSR tends to negative infinity. This causes the algorithm to be unstable.
The solution is Barrier Frank-Wolfe: instead of optimizing on the full polytope M, the optimization is done on a slightly "contracted" version of M. The contraction parameter ε adaptively decreases during iterations—starting further away from the boundary (stable) and gradually approaching the true boundary (accurate).
Research shows that in practice, 50 to 150 rounds of iterations are sufficient for convergence.
Real Performance
A key finding in the paper [2] was:
In the first 16 matches of the NCAA tournament, the Frank-Wolfe Market Maker (FWMM) and the Simple Linear Constraint Market Maker (LCMM) performed similarly—because the integer programming solver was still too slow.
However, after 45 matches, the first successful 30-minute projection was completed.
Since then, FWMM has outperformed LCMM in spread pricing by 38%.
The turning point is: when the outcome space shrinks to a point where integer programming can solve within the trading window.
FWMM is like a student who warms up in the first half of the exam, but once in the zone, starts dominating. LCMM is like the student who consistently performs but has a limited ceiling. The key difference is: FWMM has a stronger "weapon" (Bregman projection), it just needs time to "load" (wait for the solver to finish).
Chapter 4: Execution—Why Calculating Optimal Trades Can Still Lead to Losses
You detect an arbitrage opportunity. You calculate the optimal trade using Bregman projection.
Now you need to execute.
This is where most strategies fail.
Non-Atomic Execution Issue
Polymarket uses a CLOB (Central Limit Order Book) [9]. Unlike decentralized exchanges, trades on CLOB are executed sequentially—you cannot guarantee all orders will be filled simultaneously.
Your arbitrage plan:
Buy YES at $0.30. Buy NO at $0.30. Total cost $0.60. Regardless of outcome, cash out at $1.00. Profit $0.40.
Reality:
· Submit YES order → Fill at $0.30 ✓
· Your order shifts the market price.
· Submit NO order → Fill at $0.78 ✗
· Total cost: $1.08. Cash out: $1.00. Actual result: Loss of $0.08.
One leg filled, the other didn't. You've been exposed.
This is why the paper only accounts for opportunities with a profit margin greater than $0.05. Smaller spreads will be eaten up by execution risk.

Non-Atomic Execution Risk
VWAP: The Real Trading Price
Don't assume you'll get filled at the quoted price. Calculate the Volume Weighted Average Price (VWAP) [10].
The research team's approach is: for each block on the Polygon chain (about 2 seconds), calculate the VWAP for all YES trades and all NO trades within that block. If |VWAP_yes + VWAP_no - 1.0| > 0.02, it's recorded as an arbitrage opportunity [2].
VWAP is the "true average price you paid." If you want to buy 10,000 tokens, and there are 2,000 at $0.30, 3,000 at $0.32, and 5,000 at $0.35 on the order book—your VWAP would be (2000×0.30 + 3000×0.32 + 5000×0.35) / 10000 = $0.326. It's significantly higher than the "best" price you saw at $0.30.
Liquidity Constraint: How Much You Can Make Depends on Order Book Depth
Even if the price is indeed off, your profit potential is limited by available liquidity.
Real Example [2]:
Market shows Arbitrage: YES sum of prices = $0.85. Potential profit: $0.15 per dollar. But the order book depth at these prices is only $234. Maximum extractable profit: $234 × 0.15 = $35.10.
For cross-market arbitrage, you need to have liquidity in all positions simultaneously. The smallest one determines your ceiling.
This is also why, in existing quant platforms, demonstrating the impact of order price on execution price is crucial.

Chapter 5: Full System—What Was Actually Deployed
The theory is clean. The production environment is messy.
Here is what a truly runnable arbitrage system looks like [2].
Data Pipeline
Real-Time Data: WebSocket connection to the Polymarket API [9], receiving order book updates (price/quantity changes), trade pushes, market creation/settlement events.
Historical Data: Querying contract events via the Alchemy Polygon node API—OrderFilled (trade execution), PositionSplit (new token mint), PositionsMerge (token burn).
The research team analyzed 86 million trades [2]. At this scale, it requires infrastructure; it's not something a script can handle.
There are also plans to open-source a high-speed trading API. If you plan to use a similar trading model, hope to experience the API firsthand, or can DM me anytime.
Dependency Detection Layer
For 305 US election markets, there are 46,360 possible combinations to check. Manual analysis is not feasible.
The research team used a clever approach: employing the DeepSeek-R1-Distill-Qwen-32B large language model for preliminary screening.
Input: descriptions of the conditions for two markets. Output: JSON of valid result combinations.
This was followed by a three-tier validation: Does each market have exactly one true condition? Are the valid combination count fewer than n × m (dependency exists)? Do the dependency subsets meet the arbitrage condition?
Results: 40,057 independent pairs (no arbitrage) → 1,576 dependent pairs (potential arbitrage) → 374 meeting strict conditions → 13 deemed exploitable after manual verification [2].
The LLM accuracy on complex multi-condition markets is 81.45%. Sufficient for preliminary screening but requires manual validation before execution.
Three-Tier Optimization Engine
· Tier 1: Simple Linear Constraints (LCMM). Quickly checks basic rules – "sum of probabilities equals 1," "if A implies B, then P(A) cannot exceed P(B)." Completed in milliseconds, eliminating obvious pricing errors.
· Tier 2: Integer Program Projection (Frank-Wolfe + Gurobi). This is the core. Parameters: Alpha = 0.9 (extract at least 90% of available arbitrage), initial ε = 0.1 (10% shrink), convergence threshold = 1e-6, time limit = 30 minutes. Typical iterations: 50-150. Solve time per iteration: 1-30 seconds.
· Tier 3: Execution Validation. Before submitting orders, simulate trades on the current order book. Check: Is liquidity sufficient? What is the expected slippage? After deducting slippage, what is the guaranteed profit? Does the profit exceed the minimum threshold ($0.05)? Only proceed if all criteria are met.
Position Sizing: Enhanced Kelly Formula
The standard Kelly formula [11] tells you what proportion of funds to put into a trade. But in an arbitrage scenario, adjustment for execution risk is needed:
f = (b×p - q) / b × √p
Where b is the arbitrage profit percentage, p is the probability of full execution (estimated based on order book depth), q = 1 - p.
Upper limit: 50% of order book depth. Above this ratio, your order itself would significantly move the market.
Final Result
From April 2024 to April 2025, total extracted profits:
Single-Condition Arbitrage: Low Buy Both Sides $5,899,287 + High Sell Both Sides $4,682,075 = $10,581,362
Market Rebalance: Buy All YES Low $11,092,286 + Sell All YES High $612,189 + Buy All NO $17,307,114 = $29,011,589
Cross-Market Composite Arbitrage: $95,634
Total: $39,688,585
The top 10 arbitrageurs took home $8,127,849 (20.5% of the total). Top-ranked arbitrageur: $2,009,632, from 4,049 trades, averaging $496 per trade[2].
Not a lottery. Not luck. It is the systematic execution of mathematical precision.
The Ultimate Reality
While traders are still reading "10 Tips for Predicting Markets," what are quantitative systems doing?
They are detecting dependencies among 17,218 conditions using integer programming. They are calculating optimal arbitrage trades using Bregman projection. They are handling gradient explosions with the Frank-Wolfe algorithm. They are estimating slippage with VWAP and executing orders in parallel. They are systematically extracting $40 million in guaranteed profits.
The gap is not luck. It is a mathematical infrastructure.
The paper is public[1]. The algorithms are known. The profit is real.
The question is: Can you build before the next $40 million is extracted?
Concept Quick Lookup
• Marginal Polytope → The space of all "valid prices." Prices must be within this space to avoid arbitrage. Can be understood as the "valid region of prices."
• Integer Programming → Describes valid solutions with linear constraints to avoid brute force enumeration. Compresses 2^63 checks into a few constraints [3]
• Bregman Divergence / KL Divergence → Method to measure "distance" between two probability distributions, more suitable for pricing/probability scenarios than Euclidean distance. Weights changes near extreme prices more heavily [5][6]
• LMSR (Logarithmic Market Scoring Rule) → Pricing mechanism used by Polymarket market makers, where prices represent implied probabilities [4]
• Frank-Wolfe Algorithm → An iterative optimization algorithm that adds only one new vertex per round, avoiding exponentially many valid solutions [7]
• Gurobi → Leading industry integer programming solver, the "guide" for each Frank-Wolfe iteration [8]
• CLOB (Central Limit Order Book) → Polymarket's order matching mechanism, where orders are executed in sequence, atomicity not guaranteed [9]
• VWAP (Volume-Weighted Average Price) → The average price you actually pay, considering order book depth. More realistic than "best price" [10]
• Kelly Criterion → Tells you what proportion of your funds to put into a trade to balance risk and return [11]
• Non-Atomic Execution → The issue where multiple orders cannot guarantee simultaneous execution. One leg executes while the other does not = exposure risk
• DeepSeek → Large language model used for initial screening of market dependencies, accuracy 81.45%
References
[1] Original Post: https://x.com/RohOnChain/status/2017314080395296995
[2] Research Paper "Unravelling the Probabilistic Forest: Arbitrage in Prediction Markets": https://arxiv.org/abs/2508.03474
[3] Theoretical Paper "Arbitrage-Free Combinatorial Market Making via Integer Programming": https://arxiv.org/abs/1606.02825
[4] LMSR Logarithmic Market Scoring Rule Explanation: https://www.cultivatelabs.com/crowdsourced-forecasting-guide/how-does-logarithmic-market-scoring-rule-lmsr-work
[5] Introduction to Bregman Divergence: https://mark.reid.name/blog/meet-the-bregman-divergences.html
[6] KL Divergence - Wikipedia: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
[7] Frank-Wolfe Algorithm - Wikipedia: https://en.wikipedia.org/wiki/Frank%E2%80%93Wolfe_algorithm
[8] Gurobi Optimizer: https://www.gurobi.com/
[9] Polymarket CLOB API Documentation: https://docs.polymarket.com/
[10] VWAP Explanation - Investopedia: https://www.investopedia.com/terms/v/vwap.asp
[11] Kelly Criterion - Investopedia: https://www.investopedia.com/articles/trading/04/091504.asp
[12] Decrypt Article "The $40 Million Free Money Glitch": https://decrypt.co/339958/40-million-free-money-glitch-crypto-prediction-markets
You may also like

What are the common traits of people who founded a $5 Billion+ company before the age of 23?

Why Hasn't $160 Billion Stripe Gone Public?

All the AI News You Need to Know is Here, Lyrical Officially Launches AI News Feed

Bitwise: Why Bitcoin Is Destined to Impact a Million Dollars?

Amid Geopolitical Turmoil, Tokenized Gold Emerges Alongside Round-the-Clock On-Chain Markets

Who Longs War on Polymarket?

4 AI Trading Strategy Lessons from WEEX Hackathon Finalist
Finalist Bambi shares how AI tools helped turn real trading experience into an automated strategy, why survival-first risk control shaped the system’s design, and how the approach will evolve ahead of WEEX AI Trading Hackathon Season 2.

Hong Kong Crypto Ecosystem 2.0: Stablecoins, RWA, and the New Battleground for Financial Institutions

Polymarket Arbitrage Bible: The Real Gap is in the Mathematical Infrastructure

Crypto Barbarians Jupiter Series: Still Owes the Market an Answer

Bank Card Payment vs. Stablecoin Payment: Which is More Suitable for AI Agents?

Zuck is really out of touch! He actually acquired a dated Lobster-based social platform?

Key Market Information Discrepancy on March 11th - A Must-See! | Alpha Morning Report

How to Deal with Trump? Accept this "Art of the Deal Playbook"

AI Computing Power Arms Race Intensifies: This Startup Aims to Mine Bitcoin in Space

Claude Code launches the /btw feature, Musk X Money set to launch soon, what's the English community talking about today?

What Is OpenClaw? How The AI Agent Could Automate Crypto Trading Through APIs
OpenClaw is a rapidly growing AI agent on GitHub that can automate tasks and even execute crypto trades through exchange APIs. Learn how OpenClaw works, how it connects to exchanges, and the risks traders should understand before using AI trading agents.
