The "page is published but never appears in Google" debugging game has two phases: can Googlebot crawl the page, and if so, will Google index it? Most teams treat these as the same problem and chase the wrong fix. The page that returns 200 OK but never appears in search isn't usually a "Google penalty." It's almost always one of half a dozen mechanical issues sitting in the gap between crawled and indexed.
This guide skips the per-page mechanics (the two tools below cover those) and focuses on the mental model: the two gates Google walks through, the order they run in, and the audit workflow for fixing pages stuck on the wrong side of either gate.
The two gates: crawlability vs indexability
The single mental-model fix that closes most "why isn't this page ranking?" tickets:
- Crawlability - Can Googlebot fetch this URL? Determined by
robots.txt, the HTTP status code, authentication walls, server availability, and discoverability via internal links or the sitemap. - Indexability - Will Google include this URL in the index? Determined by the
noindexdirective (meta tag orX-Robots-Tagheader), the canonical tag, content quality, near-duplicate detection, and Google's render-time evaluation.
Crawl runs first. A page can be crawled but not indexed (Google fetches it, then chooses not to include it - this is the "Crawled - currently not indexed" status in Search Console, and it's usually a quality signal). A page cannot be indexed without first being crawled by you - but if you block crawling, the URL can still appear in Google's index based on external links, with the gray "no information available" snippet.
That last point is the most expensive misunderstanding on this topic. robots.txt Disallow blocks the crawl, not the index. If you want a URL out of search results, the directive is noindex, served on a page Google is allowed to crawl. Counterintuitive, but it's the rule that fixes the bug.
Check crawlability for any URL
Paste a URL below to see whether Googlebot can crawl it: HTTP status, robots.txt rules that apply, and the meta-robots / X-Robots-Tag directives served by the page.
Check indexability for any URL
Now run the same URL through the indexability checker. The questions are different: is there a noindex tag? Does the canonical point at this URL or somewhere else? Does the HTTP status allow indexing?
The directive decision tree
Three directives are commonly used to control crawl and index behaviour, and they do different things. Pick the wrong one and the URL ends up indexed when you wanted it hidden, or hidden when you wanted it indexed.
"I want this URL out of search results"
Use <meta name="robots" content="noindex"> (or the X-Robots-Tag: noindex response header for non-HTML responses like PDFs). Leave robots.txt allowing the crawl - Googlebot has to be able to fetch the page to see the noindex tag.
For permanent removals (deleted pages, deactivated accounts), pair the noindex with a 410 Gone status code. The combination drops URLs from the index faster than noindex alone.
"I want to save crawl budget on infinite low-value URLs"
Use robots.txt Disallow. Faceted filter combinations, internal search results, and parameter-decorated URLs (millions of /products?color=red&size=L&page=4-style variants) are the legitimate use case. Googlebot doesn't waste cycles fetching them, so it has more budget for your real pages.
Caveat: Disallowed URLs can still appear in search results if they're externally linked. Combine with noindex only if you can ensure the URLs aren't linked from anywhere - otherwise the URL shows up with no description.
"I have multiple URLs serving the same content; pick one as the master"
Use <link rel="canonical" href="...">. Set every indexable page's canonical to itself (self-referencing). For variants - parameterised URLs, paginated archive pages, AMP versions - set the canonical to the master URL. Google consolidates ranking signals onto the canonical and treats the variants as duplicates.
Audit move: open Search Console URL Inspection on any URL whose ranking is unclear, and look at the "Canonical" section. If "Google-selected canonical" differs from "User-declared canonical," Google has overridden your hint - usually because stronger signals (more inbound links, better authority) point at the URL Google chose.
The biggest mistake: using robots.txt to deindex
This is the single most common "why is my page still in Google?" ticket on every SEO forum. The pattern: a developer adds a URL to robots.txt Disallow expecting it to disappear. A week later it's still in the index, often with the gray "no information is available for this page" snippet. The page is then sometimes also noindexed in panic, which makes things worse - because Disallow now prevents Googlebot from crawling the page to see the noindex.
The correct fix decision:
- Remove the
Disallowrule fromrobots.txt. Googlebot needs to fetch the page. - Add
<meta name="robots" content="noindex">to the page (orX-Robots-Tag: noindexfor non-HTML). - Wait for Googlebot to recrawl. Speed it up with a Search Console URL Inspection → "Request indexing" pass.
- Once the URL drops from the index, you can re-add the
Disallowrule if you also want to save crawl budget.
The order matters. If you Disallow before the noindex is seen, Google may keep the URL indexed for months because it can no longer fetch the deindex signal.
What a clean crawl/index audit looks like
Run this whenever pages aren't ranking despite "looking fine," after migrations, after CMS upgrades, and on a quarterly cadence regardless. Takes about 25 minutes for a small site.
- Open Search Console → Pages report. Group "Why pages aren't indexed" by reason. The top three categories - "Crawled - currently not indexed," "Discovered - currently not indexed," and "Submitted URL has 'noindex'" - each have a different fix. Don't lump them together.
- For "Crawled - currently not indexed": fix the page, not the directive. Google read the page and chose not to index it. That's a quality, near-duplicate, or low-authority signal. Check for thin content, near-duplicate of another page on your own site, or insufficient internal-link authority. Improve the page, then request re-indexing.
- For "Discovered - currently not indexed": improve internal linking. Google found the URL but hasn't crawled it yet. Almost always a sign of weak internal-link authority or insufficient crawl budget. Add prominent internal links from your highest-authority pages and the URL gets crawled within days.
- For "Submitted URL has 'noindex'": pick one signal. Either remove the URL from your sitemap (you've told Google not to index it, so why submit it?) or remove the noindex tag (you want it indexed after all).
- For "Blocked by robots.txt": double-check intent. Anything in this list is a deliberate block. Confirm you actually meant to block each URL. If a legitimate ranking page is here, it's the highest-priority fix.
- Spot-check 10 priority URLs via URL Inspection. Pick 10 high-traffic or high-conversion URLs. For each: confirm it's "URL is on Google," confirm "User-declared canonical" matches "Google-selected canonical," confirm the rendered HTML contains the content you expect (the rendered tab, not the source tab). This catches render-time issues that crawl-only checks miss.
- Verify the mobile render. Google indexes the mobile version. If your mobile template hides 60% of the body content behind a tab that loads on user click, Google may not index that content. Use URL Inspection's "Test live URL" with the smartphone Googlebot and inspect the rendered HTML.
Grab the one-page audit checklist
A printable version of the audit above, plus a one-page directive decision tree (the noindex vs Disallow vs canonical reference) you can pin near your monitor.
Quick quiz: are you ready to audit your own crawl/index state?
Five questions, takes two minutes. We'll show you the right answer and a one-line explanation after each one.
Crawlability & indexability - quick check
5 randomized questions drawn from a pool of 12. Different every time you take it. Takes about two minutes.
Next up in Technical SEO
You've covered status codes, robots.txt, sitemaps, and the crawl/index gates. The last piece of the Technical SEO pillar:
- Mixed content and HTTPS - the audit that takes 30 seconds and often reveals a year's worth of debt left over from your last platform migration.