Tools

Server Log Analyzer (Crawl Log)

Linkilo reads your server's access logs and shows you which pages Googlebot, Bingbot, GPTBot, ClaudeBot, and other crawlers actually visit. Real-time bot tracking, verified-vs-spoofed detection, and AI crawler insights.

What this is

Crawl log analysis is what big SEO teams use Screaming Frog Log File Analyser or external tools for. Linkilo brings that capability inside WordPress, focused on the data that matters: which posts crawlers care about, which they ignore, and which fake bots are pretending to be Googlebot.

If you've installed Linkilo's Server Log Analyzer add-on, this report appears in your admin menu.

Turn it on

  1. Go to Linkilo → Settings → Server Log Analyzer (under the Tools group).

Two cards.

Card 1: Crawler tracking

  • Record crawler visits — master toggle. When OFF, nothing is recorded and the Crawl Log report disappears from the menu.
  • Keep visit history for — 7 days / 30 days / 90 days / 200 days (recommended). The "Days since last crawl" column needs at least 200 days of history to be useful.

Card 2: Pick which bots to record

By default, everything is on. Use bulk buttons for quick selection:

  • Select all / Deselect all
  • Search engines only — Googlebot, Bingbot, Yandex, Baidu, etc.
  • AI crawlers only — GPTBot, ChatGPT-User, ClaudeBot, Perplexity, etc.

Below the buttons are individual checkboxes grouped into categories: Search engines, AI crawlers, Social previews, SEO tools, Other.

Click Save Settings.

View the crawl log

Go to Linkilo → Crawl Log in the WordPress sidebar (top-level menu).

You'll see panels for:

Most Crawled URLs

Your most-visited-by-bots pages, with a crawl-ratio chip showing if they're hot (≥200% of median), warm (50–200%), or cool (<50%).

Days Since Last Crawl

Pages bots haven't visited recently, with indexing-status badges:

  • robots.txt blocked
  • noindex set (Yoast / Rank Math / AIOSEO / SEOPress / Genesis)
  • Submitted but not indexed
  • Crawled, currently not indexed

Crawl Waste + Crawl Budget

Bots wasting time on URLs you don't care about — parameters, paginated archives, junk pages. Helps you spot pages that should be canonical-tagged or noindex'd.

Bot Activity

Breakdown of which bots are crawling and how often. Total visits per bot. Verified-vs-spoofed percentages. AI bot bucket separately from search engines.

Real vs. Spoofed bots

Some traffic claims to be Googlebot but isn't — scrapers, malicious crawlers, security probes. Linkilo verifies bot identity by reverse-DNS lookup AND user-agent matching. Spoofed bots show with a red ✗ badge; verified ones with a green ✓.

The verification cache lasts 30 days per IP, so verifications are cheap.

GSC Inspect

Any URL in the report has a GSC Inspect button. Click to send it through Google Search Console's URL Inspection API. You get back the current indexing state ("Submitted and indexed," "Crawled — currently not indexed," "Excluded by 'noindex'," "Blocked by robots.txt," etc.) without leaving the page.

Results cached in postmeta (_linkilo_gsc_inspection_data) for 7 days, so repeat clicks are instant.

Requires Google Search Console connected — see Google Search Console.

Common Questions

My host doesn't give me access to access logs

Some shared hosts hide access logs by default. Contact your host and ask if they can:

  • Give you log file access via FTP/SFTP
  • Set up log shipping to a folder you can read
  • Enable access logs on your account

Most managed WordPress hosts (Kinsta, WP Engine, BigScoots) can do this.

Why is GSC Inspect telling me I'm not connected?

You need to connect Google Search Console first — see Google Search Console.

"Crawl Ratio" tooltip explained

The crawl-ratio chip compares each URL's crawl count to the median of all crawls in the date window:

  • 🟠 Hot (≥200% of median) — Google REALLY cares about this URL
  • 🔵 Warm (50–200%) — getting normal attention
  • ⚪ Cool (<50%) — barely visited

It's a quick way to spot which pages Google values most vs. which it barely crawls.

Why are some Googlebot visits marked as spoofed?

That's normal — anyone can set their user-agent to "Googlebot". Linkilo verifies by doing a reverse-DNS lookup on the IP. Real Googlebot IPs reverse-resolve to .googlebot.com or .google.com. If the IP doesn't reverse-resolve correctly, the visit gets flagged as spoofed.

Spoofed traffic often comes from SEO scrapers (Ahrefs, SEMrush running uncredited bots), security scanners, or malicious actors testing for vulnerabilities. Useful to know.

What AI bots does Linkilo track?

Major ones:

  • GPTBot (OpenAI training)
  • ChatGPT-User (real-time ChatGPT browsing)
  • ClaudeBot (Anthropic)
  • Claude-User (real-time Claude browsing)
  • PerplexityBot
  • Google-Extended (Bard / Gemini training)
  • CCBot (Common Crawl, used by many AI services)

Plus others as they're identified. You can untick any you don't care about in Card 2.

What's the retention setting for?

How long Linkilo keeps each crawl event in the database. Old events are pruned automatically.

  • 7 days — minimal disk usage, only useful for "what crawled my site today"
  • 30 / 90 days — good for short-term analysis
  • 200 days (recommended) — needed for the "Days since last crawl" column to find pages that fell off Google's radar
  • No retention longer than 200 — would just take up disk space without proportionate insight

Was this article helpful?


© Copyright 2024, All Rights Reserved