Orply.

Agents Often Claim Web Access After Being Blocked or Challenged

Rafael LeviAI EngineerWednesday, June 17, 20269 min read

Rafael Levi of Bright Data argues that many web-dependent agents fail not because they cannot produce answers, but because they report success after web access has broken. In a demo using Bright Data’s Web MCP, Levi shows the same agent failing against sites such as LinkedIn, Instagram, Amazon and TikTok without live access, then producing usable results when given infrastructure for search, scraping, JavaScript rendering and CAPTCHA handling. His broader case is that reliable agents need a real public-web access layer, not prompts that assume the model saw the page.

The failure mode is not that agents cannot reach the web. It is that they say they did.

Rafael Levi’s central claim is that agents are routinely giving users the impression of successful web access after the access has failed. The failure can be mundane — a blocked request, a CAPTCHA, an empty page, JavaScript that did not render — or more adversarial, such as a system feeding the agent fake content. In either case, Levi said, the damaging behavior is the same: the model reports back as if it got the data.

Bright Data, which Levi described as a web access platform for collecting public data at scale, frames this as a reliability problem for LLM applications rather than a cosmetic defect in browsing. Levi’s complaint is not merely that models hallucinate. It is that the agent’s interaction with the web creates a hidden gap between what the user thinks happened and what the agent actually saw.

I would rather LLM tell me, no, I can't. But it never does. It always tries to make things up.
Rafael Levi · Source

The pattern begins with the assumption that an agent can “search the web” or “go to a URL” in the way a human does. But the web has spent years hardening itself against robots and automation. CAPTCHAs are the obvious example. More recent anti-bot systems block AI crawlers by default, challenge suspicious traffic, or return content that looks meaningful but is not the page the user expected.

A Bright Data visual summarized the failure plainly: “AI agents get blocked, fed fake content, and hit CAPTCHAs. Then they report back as if nothing went wrong.”

Levi argued that this is especially dangerous when the model falls back to training data. If a request fails, the model may still have enough stale context to produce a plausible answer. In his example, a model trained on 2024 material answering in 2026 may present old information as if it were current. That creates a failure that is harder to detect than an explicit error, because the output may sound fluent, cite sources, and appear responsive.

Blocked pages, fake pages, and stale memory all collapse into the same user experience

The mechanism Levi called the “invisible failure loop” has four parts: the agent sends what looks like a normal HTTP request; an anti-bot system intercepts it; the agent receives an empty page, fake content, or challenge HTML; and the agent reports success anyway.

No errors. No warnings. Just wrong answers presented as truth.

The absence of a clear failure signal is what makes the pattern persistent. Bright Data described an HTTP 403 or CAPTCHA as something that may not behave like an exception for the agent: no errors thrown, no retries triggered, and the agent “genuinely believes it has the data.” Levi also said that when an agent receives an empty page, it often does not tell the user it got an empty page. It tries to answer anyway. The user sees a fluent response, not the block, challenge, or missing page behind it.

Levi connected this directly to hallucinated citations. He said he has seen models “literally make numbers up” and provide citations that lead to 404 pages. He also cited a Columbia / Tow Center for Digital Journalism test represented in Bright Data’s material, which said that nine major AI search engines had a citation failure rate above 60%, with more than 60% of citations referencing content the AI never successfully retrieved.

60%+
citation failure rate shown in Bright Data’s material for a Columbia / Tow Center test of nine major AI search engines

Levi’s consumer example was product search: asking an AI system to find an item online, receiving a product link, and discovering that the URL or product does not exist. The agent has satisfied the form of the request — a product name, a price, a link — while failing the substance of it.

The deeper problem is that the user cannot easily tell whether the model retrieved the cited page, saw a bot challenge, used stale memory, or invented a plausible URL. All of those paths can produce the same surface-level output: a confident answer.

The live comparison was designed to isolate web access, not prompt quality

Rafael Levi’s demo compared the same agent and the same prompts with and without Bright Data’s Web MCP. The task list covered five sites and use cases: Rightmove rentals in London, LinkedIn company data for Mistral AI, an Instagram profile, an Amazon product identified by ASIN, and TikTok content from Google’s account.

He said he chose targets he expected would fail without Bright Data’s infrastructure. In the version without MCP, the agent had no live web data access and no browsing tools. Levi described it as a default, out-of-the-box GPT-5 setup with “tools none available.” The result, he said, was zero successes across the five tasks.

With Bright Data’s MCP connected, the terminal showed a connection to the Bright Data MCP server and 69 available tools, including search_engine, scrape_as_markdown, search_engine_batch, discover, session_stats, web_data_amazon_product, and web_data_walmart_product. Levi described the MCP in his narration as having 66 tools.

Levi described the tools in practical terms. search_engine gives the model access to real Google, Bing, and DuckDuckGo searches rather than an opaque or assumed “web search.” scrape_as_markdown can request a URL and return markdown rather than raw HTML, reducing the token cost of parsing tags. search_engine_batch can send many keywords at once. The system also includes prebuilt APIs for specific sites and a remote browser infrastructure that the LLM can open and navigate.

The remote browser is central to Bright Data’s access story. Levi said it can solve CAPTCHAs automatically, uses unique fingerprints, and can open 100 browsers navigating the same website without getting blocked. In Bright Data’s framing, the agent is not just being handed another tool; it is being given an access layer intended to handle the anti-bot, CAPTCHA, and rendering problems that raw agent requests often encounter.

ConditionLevi’s descriptionObserved demo outcome described by Levi
Without MCPNo live web data access; no browsing tools; raw/default accessZero successes across five tasks
With Bright Data Web MCPSearch, scraping, browser infrastructure, CAPTCHA handling, JavaScript rendering, structured toolsSuccessful results were described for Rightmove, LinkedIn, Instagram, and Amazon; TikTok was part of the task list but is not clearly confirmed in the excerpt
The demo compared identical prompts with and without Bright Data’s web-access infrastructure.

The demo also asked the LLM to compare its own results from the no-MCP and MCP runs. Levi said the comparison listed the no-MCP attempts as failed due to no live web access and the MCP attempts as successful for the tested sites.

The proposed fix is infrastructure, not code

Rafael Levi’s proposed remedy is to make sure the agent can actually access the public web before trusting its answer. He characterized the failure as an infrastructure problem, not something developers can reliably code around by themselves.

Bright Data’s account of the fix is an access layer that handles IP rotation, CAPTCHA solving, and JavaScript rendering transparently. The premise is that a developer cannot assume a raw request will see the same page a human sees, or that the model will surface the difference when it does not.

Levi returned to Cloudflare’s AI Labyrinth as the clearest example of why the web access layer matters. He said Cloudflare blocks AI crawling for about 20% of the web by default and has released an AI Labyrinth system that detects bots, traps them, and feeds them fake data rather than simply blocking them. Bright Data described the fake content as “plausible looking disinformation.”

20%
share of the web Levi said Cloudflare blocks for AI crawlers by default

Asked how Bright Data detects whether an agent has encountered Cloudflare’s AI Labyrinth, Levi answered that Bright Data’s approach is not to identify the labyrinth after the fact. Instead, it tries to prevent the trigger by making the agent look like a human user. He described pre-recorded mouse movement and human-like typing behavior, so the protection system does not ask whether the visitor is a robot in the first place.

Levi did not claim to have precise statistics on AI Labyrinth behavior. He said the system had been released “like a month ago” and that he did not have detailed information on exactly how it works. What he did claim is that Bright Data had not seen a degradation in its results, despite collecting and caching very large volumes of data and comparing outputs.

Levi treated misleading data as one of the hardest problems in web access. He gave the example of hotel sites in Asia returning different prices depending on whether a user visits from a phone, a computer, or a proxy. In that case, the question “which price is correct?” may not have a simple answer. His practical answer was narrower: make the agent look as much like a human as possible and “hope for the best.”

For builders, the boundary is public data, narrow tools, and extraction that does not waste model context

Rafael Levi drew a repeated boundary around the kind of web access Bright Data is offering: public data. Asked whether Bright Data uses accounts it runs for social profiles, he said no. Bright Data, in his account, does not collect data behind login and treats logged-in collection as legally problematic because creating an account means accepting a site’s terms and conditions.

That distinction mattered during questions about LinkedIn and Instagram. When an audience member suggested that Instagram may not expose much public data without a logged-in session, Levi demonstrated a public LinkedIn company page for Mistral AI in an incognito browser. The page showed company information and a “Discover all 415 employees” prompt. His point was not that every social profile is fully accessible without authentication, but that some useful company and profile data is publicly visible and can be collected within Bright Data’s stated boundary.

For individual profiles, Levi said access depends heavily on traffic quality. A large event Wi-Fi network may be blocked because it resembles a data-center or low-quality IP. A home connection may allow a small number of profiles before requiring login. Bright Data’s position, as Levi stated it, is to avoid logged-in access and work only with data visible publicly.

Levi also described a second pattern for applications that do not need live retrieval. Bright Data has datasets that can be filtered directly. If a user wants, for example, LinkedIn people matching “AI engineers” in a particular area, Levi said an agent can query the dataset rather than live-scrape the site. The tradeoff is freshness: the user must be willing to accept data that may be a few months old.

The operational advice was not simply “give the agent every web tool.” The MCP server exposed dozens of tools in the demo, but Levi said developers should filter them. If the agent only needs markdown scraping and search, it should load only those two. Otherwise, he said, the developer is “flooding context with irrelevant data.”

That constraint also shaped his advice on extraction. While pointing to Bright Data’s GitHub resources and a “skills” page, Levi said the skills can teach an agent how to build a scraper or pipeline using Bright Data’s APIs. For larger jobs, he argued against having the LLM parse every page directly.

Don't parse with LLM! LLM builds the parser and then the script runs it.
Rafael Levi · Source

Levi claimed that approach can save about 99% of tokens compared with parsing large numbers of pages directly through the model. The model’s job, in that pattern, is to produce the parser or pipeline; the repetitive extraction work should be done by code.

A final clarification narrowed what the demo tested. Asked whether the initial experiment used OpenAI’s web search tool through the API, Levi said no. He told the model to go to specific URLs and see whether it could load them. Levi’s broader claim was about user expectations: people often expect “web search” to mean something like a Google search, while an LLM may not be doing that. He said Bright Data’s MCP can perform actual Google searches and positioned that as one of its major benefits.

The practical offer attached to the session was a free tier of 5,000 requests per month, which Levi described as enough for an MVP, prototype, or hackathon project, with pay-as-you-go available beyond that.

The frontier, in your inbox tomorrow at 08:00.

Sign up free. Pick the industry Briefs you want. Tomorrow morning, they land. No credit card.

Sign up free