Add Broken Links detection and SEO Analysis features

Database Schema:
- Added meta_description TEXT field to pages table
- Added index on status_code for faster broken link queries

Backend Changes:
- Crawler now extracts meta descriptions from pages
- New API endpoint: broken-links (finds 404s and server errors)
- New API endpoint: seo-analysis (analyzes titles and meta descriptions)

SEO Analysis Features:
- Title length validation (optimal: 30-60 chars)
- Meta description length validation (optimal: 70-160 chars)
- Detection of missing titles/descriptions
- Duplicate content detection (titles and meta descriptions)

Frontend Changes:
- Added "Broken Links" tab showing pages with errors
- Added "SEO Analysis" tab with:
  * Statistics overview
  * Pages with SEO issues
  * Duplicate content report

All quality checks pass:
- PHPStan Level 8: 0 errors
- PHPCS PSR-12: 0 warnings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-10-04 09:26:33 +02:00
parent 9e61572747
commit f7be09ec63
4 changed files with 220 additions and 4 deletions

View File

@@ -28,12 +28,14 @@ CREATE TABLE IF NOT EXISTS pages (
crawl_job_id INT NOT NULL,
url VARCHAR(2048) NOT NULL,
title VARCHAR(500),
meta_description TEXT,
status_code INT,
content_type VARCHAR(100),
crawled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (crawl_job_id) REFERENCES crawl_jobs(id) ON DELETE CASCADE,
INDEX idx_crawl_job (crawl_job_id),
INDEX idx_url (url(255)),
INDEX idx_status_code (status_code),
UNIQUE KEY unique_job_url (crawl_job_id, url(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;