Add pagination to all data tables using jQuery DataTables

Libraries Added: - jQuery 3.7.1 from CDN - DataTables 1.13.7 (CSS + JS) from CDN Custom Styling: - Integrated DataTables styling with existing design - Custom pagination button styles - Responsive search and filter inputs Paginated Tables: - jobsTable: Crawl jobs (25/page, sorted by ID desc) - pagesTable: Crawled pages (50/page) - linksTable: Found links (50/page) - brokenTable: Broken links (25/page) - redirectsTable: Redirects (25/page) - seoTable: SEO issues (25/page) Features: - Search functionality per table - Column sorting - Configurable entries per page - German localization - Automatic reinitialization on data reload - Navigation controls (First/Previous/Next/Last) - Entry count display All quality checks pass: - PHPStan Level 8: 0 errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add redirect tracking and analysis features
2025-10-04 09:49:39 +02:00 · 2025-10-04 09:40:26 +02:00 · 2025-10-04 09:29:05 +02:00 · 2025-10-04 09:26:33 +02:00 · 2025-10-04 09:07:50 +02:00 · 2025-10-04 08:58:28 +02:00
15 changed files with 741 additions and 573 deletions
--- a/README.md
+++ b/README.md
@@ -1,7 +1,17 @@
-# PHP Docker Anwendung
+# Web Crawler

 Eine PHP-Anwendung mit MariaDB, die in Docker läuft.

+## Copyright & Lizenz
+
+**Copyright © 2025 Martin Kiesewetter**
+
+- **Autor:** Martin Kiesewetter
+- **E-Mail:** mki@kies-media.de
+- **Website:** [https://kies-media.de](https://kies-media.de)
+
+---
+
 ## Anforderungen

 - Docker
@@ -43,14 +53,23 @@ docker-compose up -d --build
 ```
 .
 ├── docker-compose.yml      # Docker Compose Konfiguration
-├── Dockerfile             # PHP Container Image
-├── start.sh               # Container Start-Script
-├── init.sql               # Datenbank Initialisierung
-├── config/
+├── Dockerfile              # PHP Container Image
+├── config/                 # Konfigurationsdateien
+│   ├── docker/
+│   │   ├── init.sql        # Datenbank Initialisierung
+│   │   └── start.sh        # Container Start-Script (unused)
 │   └── nginx/
-│       └── default.conf   # Nginx Konfiguration
-└── src/
-    └── index.php          # Hauptanwendung
+│       └── default.conf    # Nginx Konfiguration
+├── src/                    # Anwendungscode
+│   ├── api.php
+│   ├── index.php
+│   ├── classes/
+│   └── crawler-worker.php
+├── tests/                  # Test Suite
+│   ├── Unit/
+│   └── Integration/
+├── phpstan.neon            # PHPStan Konfiguration
+└── phpcs.xml               # PHPCS Konfiguration
 ```

 ## Entwicklung
--- a/composer.json
+++ b/composer.json
@@ -1,4 +1,5 @@
 {
+    "_comment": "Web Crawler - Composer Configuration | Copyright (c) 2025 Martin Kiesewetter <mki@kies-media.de> | https://kies-media.de",
    "name": "web-crawler/app",
    "description": "Web Crawler Application with Parallel Processing",
    "type": "project",
--- a/config/docker/init.sql
+++ b/config/docker/init.sql
@@ -1,3 +1,11 @@
+/**
+ * Web Crawler - Database Schema
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
 -- Database initialization script for Web Crawler

 -- Crawl Jobs Table
@@ -20,12 +28,17 @@ CREATE TABLE IF NOT EXISTS pages (
    crawl_job_id INT NOT NULL,
    url VARCHAR(2048) NOT NULL,
    title VARCHAR(500),
+    meta_description TEXT,
    status_code INT,
    content_type VARCHAR(100),
+    redirect_url VARCHAR(2048),
+    redirect_count INT DEFAULT 0,
    crawled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (crawl_job_id) REFERENCES crawl_jobs(id) ON DELETE CASCADE,
    INDEX idx_crawl_job (crawl_job_id),
    INDEX idx_url (url(255)),
+    INDEX idx_status_code (status_code),
+    INDEX idx_redirect_count (redirect_count),
    UNIQUE KEY unique_job_url (crawl_job_id, url(255))
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

--- a/config/docker/start.sh
+++ b/config/docker/start.sh
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,3 +1,9 @@
+# Web Crawler - Docker Compose Configuration
+#
+# @copyright Copyright (c) 2025 Martin Kiesewetter
+# @author    Martin Kiesewetter <mki@kies-media.de>
+# @link      https://kies-media.de
+
 version: '3.8'

 services:
@@ -34,7 +40,7 @@ services:
      - "3307:3306"
    volumes:
      - mariadb_data:/var/lib/mysql
-      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
+      - ./config/docker/init.sql:/docker-entrypoint-initdb.d/init.sql
    networks:
      - app-network

--- a/index.php
+++ b/index.php
@@ -1,11 +0,0 @@
-<?php
-declare(strict_types=1);
-
-error_reporting(E_ALL);
-ini_set('display_errors', '1');
-
-require_once 'webanalyse.php';
-$wa = new WebAnalyse();
-$db = mysqli_connect('localhost', 'root', '', 'screaming_frog');
-
-$wa->doCrawl(1);
--- a/setnew.php
+++ b/setnew.php
@@ -1,11 +0,0 @@
-<?php
-$db = mysqli_connect("localhost", "root", "", "screaming_frog");
-
-$db->query("truncate table crawl");
-// $db->query("insert into crawl (start_url, user_id) values ('https://kies-media.de/', 1)");
-$db->query("insert into crawl (start_url, user_id) values ('https://kies-media.de/leistungen/externer-ausbilder-fuer-fachinformatiker/', 1)");
-
-$db->query("truncate table urls");
-$urls = $db->query("insert ignore into urls (id, url, crawl_id) select 1,start_url, id from crawl where id = 1"); #->fetch_all(MYSQLI_ASSOC)
-
-$db->query("truncate table links");
--- a/src/api.php
+++ b/src/api.php
@@ -1,5 +1,13 @@
 <?php

+/**
+ * Web Crawler - API Endpoint
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
 require_once __DIR__ . '/vendor/autoload.php';

 use App\Database;
@@ -108,6 +116,148 @@ try {
            ]);
            break;

+        case 'broken-links':
+            $jobId = $_GET['job_id'] ?? 0;
+            $stmt = $db->prepare(
+                "SELECT * FROM pages " .
+                "WHERE crawl_job_id = ? AND (status_code >= 400 OR status_code = 0) " .
+                "ORDER BY status_code DESC, url"
+            );
+            $stmt->execute([$jobId]);
+            $brokenLinks = $stmt->fetchAll();
+
+            echo json_encode([
+                'success' => true,
+                'broken_links' => $brokenLinks
+            ]);
+            break;
+
+        case 'seo-analysis':
+            $jobId = $_GET['job_id'] ?? 0;
+            $stmt = $db->prepare(
+                "SELECT id, url, title, meta_description, status_code FROM pages " .
+                "WHERE crawl_job_id = ? ORDER BY url"
+            );
+            $stmt->execute([$jobId]);
+            $pages = $stmt->fetchAll();
+
+            $issues = [];
+            foreach ($pages as $page) {
+                $pageIssues = [];
+                $titleLen = mb_strlen($page['title'] ?? '');
+                $descLen = mb_strlen($page['meta_description'] ?? '');
+
+                // Title issues (Google: 50-60 chars optimal)
+                if (empty($page['title'])) {
+                    $pageIssues[] = 'Title missing';
+                } elseif ($titleLen < 30) {
+                    $pageIssues[] = "Title too short ({$titleLen} chars)";
+                } elseif ($titleLen > 60) {
+                    $pageIssues[] = "Title too long ({$titleLen} chars)";
+                }
+
+                // Meta description issues (Google: 120-160 chars optimal)
+                if (empty($page['meta_description'])) {
+                    $pageIssues[] = 'Meta description missing';
+                } elseif ($descLen < 70) {
+                    $pageIssues[] = "Meta description too short ({$descLen} chars)";
+                } elseif ($descLen > 160) {
+                    $pageIssues[] = "Meta description too long ({$descLen} chars)";
+                }
+
+                if (!empty($pageIssues)) {
+                    $issues[] = [
+                        'url' => $page['url'],
+                        'title' => $page['title'],
+                        'title_length' => $titleLen,
+                        'meta_description' => $page['meta_description'],
+                        'meta_length' => $descLen,
+                        'issues' => $pageIssues
+                    ];
+                }
+            }
+
+            // Find duplicates
+            $titleCounts = [];
+            $descCounts = [];
+            foreach ($pages as $page) {
+                if (!empty($page['title'])) {
+                    $titleCounts[$page['title']][] = $page['url'];
+                }
+                if (!empty($page['meta_description'])) {
+                    $descCounts[$page['meta_description']][] = $page['url'];
+                }
+            }
+
+            $duplicates = [];
+            foreach ($titleCounts as $title => $urls) {
+                if (count($urls) > 1) {
+                    $duplicates[] = [
+                        'type' => 'title',
+                        'content' => $title,
+                        'urls' => $urls
+                    ];
+                }
+            }
+            foreach ($descCounts as $desc => $urls) {
+                if (count($urls) > 1) {
+                    $duplicates[] = [
+                        'type' => 'meta_description',
+                        'content' => $desc,
+                        'urls' => $urls
+                    ];
+                }
+            }
+
+            echo json_encode([
+                'success' => true,
+                'issues' => $issues,
+                'duplicates' => $duplicates,
+                'total_pages' => count($pages)
+            ]);
+            break;
+
+        case 'redirects':
+            $jobId = $_GET['job_id'] ?? 0;
+            $stmt = $db->prepare(
+                "SELECT url, title, status_code, redirect_url, redirect_count FROM pages " .
+                "WHERE crawl_job_id = ? AND redirect_count > 0 " .
+                "ORDER BY redirect_count DESC, url"
+            );
+            $stmt->execute([$jobId]);
+            $redirects = $stmt->fetchAll();
+
+            // Count redirect types
+            $permanent = 0;
+            $temporary = 0;
+            $excessive = 0;
+            $maxThreshold = 3; // From Config::MAX_REDIRECT_THRESHOLD
+
+            foreach ($redirects as $redirect) {
+                $code = $redirect['status_code'];
+                if ($code == 301 || $code == 308) {
+                    $permanent++;
+                } elseif ($code == 302 || $code == 303 || $code == 307) {
+                    $temporary++;
+                }
+                if ($redirect['redirect_count'] > $maxThreshold) {
+                    $excessive++;
+                }
+            }
+
+            echo json_encode([
+                'success' => true,
+                'redirects' => $redirects,
+                'stats' => [
+                    'total' => count($redirects),
+                    'permanent' => $permanent,
+                    'temporary' => $temporary,
+                    'excessive' => $excessive,
+                    'threshold' => $maxThreshold
+                ]
+            ]);
+            break;
+
        case 'delete':
            $jobId = $_POST['job_id'] ?? 0;
            $stmt = $db->prepare("DELETE FROM crawl_jobs WHERE id = ?");
@@ -119,6 +269,42 @@ try {
            ]);
            break;

+        case 'recrawl':
+            $jobId = $_POST['job_id'] ?? 0;
+            $domain = $_POST['domain'] ?? '';
+
+            if (empty($domain)) {
+                throw new Exception('Domain is required');
+            }
+
+            // Delete all related data for this job
+            $stmt = $db->prepare("DELETE FROM crawl_queue WHERE crawl_job_id = ?");
+            $stmt->execute([$jobId]);
+
+            $stmt = $db->prepare("DELETE FROM links WHERE crawl_job_id = ?");
+            $stmt->execute([$jobId]);
+
+            $stmt = $db->prepare("DELETE FROM pages WHERE crawl_job_id = ?");
+            $stmt->execute([$jobId]);
+
+            // Reset job status
+            $stmt = $db->prepare(
+                "UPDATE crawl_jobs SET status = 'pending', total_pages = 0, total_links = 0, " .
+                "started_at = NULL, completed_at = NULL WHERE id = ?"
+            );
+            $stmt->execute([$jobId]);
+
+            // Start crawling in background
+            $cmd = "php " . __DIR__ . "/crawler-worker.php $jobId > /dev/null 2>&1 &";
+            exec($cmd);
+
+            echo json_encode([
+                'success' => true,
+                'job_id' => $jobId,
+                'message' => 'Recrawl started'
+            ]);
+            break;
+
        default:
            throw new Exception('Invalid action');
    }
--- a/src/classes/Config.php
+++ b/src/classes/Config.php
@@ -0,0 +1,29 @@
+<?php
+
+/**
+ * Web Crawler - Configuration Class
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
+namespace App;
+
+class Config
+{
+    /**
+     * Maximum number of redirects before warning
+     */
+    public const int MAX_REDIRECT_THRESHOLD = 3;
+
+    /**
+     * Maximum crawl depth
+     */
+    public const int MAX_CRAWL_DEPTH = 50;
+
+    /**
+     * Number of parallel requests
+     */
+    public const int CONCURRENCY = 10;
+}
--- a/src/classes/Crawler.php
+++ b/src/classes/Crawler.php
@@ -1,5 +1,13 @@
 <?php

+/**
+ * Web Crawler - Crawler Class
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
 namespace App;

 use GuzzleHttp\Client;
@@ -25,6 +33,10 @@ class Crawler
        $this->client = new Client([
            'timeout' => 30,
            'verify' => false,
+            'allow_redirects' => [
+                'max' => 10,
+                'track_redirects' => true
+            ],
            'headers' => [
                'User-Agent' => 'WebCrawler/1.0'
            ]
@@ -136,30 +148,61 @@ class Crawler
        $contentType = $response->getHeaderLine('Content-Type');
        $body = $response->getBody()->getContents();

+        // Track redirects
+        $redirectUrl = null;
+        $redirectCount = 0;
+        if ($response->hasHeader('X-Guzzle-Redirect-History')) {
+            $redirectHistory = $response->getHeader('X-Guzzle-Redirect-History');
+            $redirectCount = count($redirectHistory);
+            if ($redirectCount > 0) {
+                $redirectUrl = end($redirectHistory);
+            }
+        }
+
        // Save page
        $domCrawler = new DomCrawler($body, $url);
        $title = $domCrawler->filter('title')->count() > 0
            ? $domCrawler->filter('title')->text()
            : '';

+        $metaDescription = $domCrawler->filter('meta[name="description"]')->count() > 0
+            ? $domCrawler->filter('meta[name="description"]')->attr('content')
+            : '';
+
        $stmt = $this->db->prepare(
-            "INSERT INTO pages (crawl_job_id, url, title, status_code, content_type)
-            VALUES (?, ?, ?, ?, ?)
-            ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id), status_code = VALUES(status_code)"
+            "INSERT INTO pages (crawl_job_id, url, title, meta_description, status_code, " .
+            "content_type, redirect_url, redirect_count) " .
+            "VALUES (?, ?, ?, ?, ?, ?, ?, ?) " .
+            "ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id), status_code = VALUES(status_code), " .
+            "meta_description = VALUES(meta_description), redirect_url = VALUES(redirect_url), " .
+            "redirect_count = VALUES(redirect_count)"
        );

-        $stmt->execute([$this->crawlJobId, $url, $title, $statusCode, $contentType]);
+        $stmt->execute([
+            $this->crawlJobId,
+            $url,
+            $title,
+            $metaDescription,
+            $statusCode,
+            $contentType,
+            $redirectUrl,
+            $redirectCount
+        ]);
        $pageId = $this->db->lastInsertId();

        // If pageId is 0, fetch it manually
-        if ($pageId == 0) {
+        if ($pageId == 0 || $pageId === '0') {
            $stmt = $this->db->prepare("SELECT id FROM pages WHERE crawl_job_id = ? AND url = ?");
            $stmt->execute([$this->crawlJobId, $url]);
-            $pageId = $stmt->fetchColumn();
+            $fetchedId = $stmt->fetchColumn();
+            $pageId = is_numeric($fetchedId) ? (int)$fetchedId : 0;
        }

+        // Ensure pageId is an integer
+        $pageId = is_numeric($pageId) ? (int)$pageId : 0;
+
        // Extract and save links
-        if (str_contains($contentType, 'text/html') && is_int($pageId)) {
+        if (str_contains($contentType, 'text/html') && $pageId > 0) {
            echo "Extracting links from: $url (pageId: $pageId)\n";
            $this->extractLinks($domCrawler, $url, $pageId, $depth);
        } else {
@@ -199,8 +242,8 @@ class Crawler

                // Save link
                $stmt = $this->db->prepare(
-                    "INSERT INTO links (page_id, crawl_job_id, source_url, target_url, link_text, is_nofollow, is_internal)
-                    VALUES (?, ?, ?, ?, ?, ?, ?)"
+                    "INSERT INTO links (page_id, crawl_job_id, source_url, target_url, " .
+                    "link_text, is_nofollow, is_internal) VALUES (?, ?, ?, ?, ?, ?, ?)"
                );
                $stmt->execute([
                    $pageId,
--- a/src/classes/Database.php
+++ b/src/classes/Database.php
@@ -1,5 +1,13 @@
 <?php

+/**
+ * Web Crawler - Database Class
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
 namespace App;

 use PDO;
--- a/src/composer.json
+++ b/src/composer.json
@@ -1,4 +1,5 @@
 {
+    "_comment": "Web Crawler - Composer Configuration | Copyright (c) 2025 Martin Kiesewetter <mki@kies-media.de> | https://kies-media.de",
    "name": "web-crawler/app",
    "description": "Web Crawler Application with Parallel Processing",
    "type": "project",
--- a/src/crawler-worker.php
+++ b/src/crawler-worker.php
@@ -1,6 +1,14 @@
 #!/usr/bin/env php
 <?php

+/**
+ * Web Crawler - Background Worker
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
 require_once __DIR__ . '/vendor/autoload.php';

 use App\Database;
--- a/src/index.php
+++ b/src/index.php
@@ -1,9 +1,28 @@
 <!DOCTYPE html>
+<!--
+/**
+ * Web Crawler - Main Interface
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+-->
 <html lang="de">
 <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Web Crawler</title>
+
+    <!-- jQuery -->
+    <script src="https://code.jquery.com/jquery-3.7.1.min.js"></script>
+
+    <!-- DataTables CSS -->
+    <link rel="stylesheet" href="https://cdn.datatables.net/1.13.7/css/jquery.dataTables.min.css">
+
+    <!-- DataTables JS -->
+    <script src="https://cdn.datatables.net/1.13.7/js/jquery.dataTables.min.js"></script>
+
    <style>
        * {
            margin: 0;
@@ -198,6 +217,58 @@
            text-overflow: ellipsis;
            white-space: nowrap;
        }
+
+        /* DataTables Styling */
+        .dataTables_wrapper {
+            padding: 20px 0;
+        }
+
+        .dataTables_filter input {
+            padding: 8px;
+            border: 2px solid #e0e0e0;
+            border-radius: 6px;
+            margin-left: 10px;
+        }
+
+        .dataTables_length select {
+            padding: 6px;
+            border: 2px solid #e0e0e0;
+            border-radius: 6px;
+            margin: 0 10px;
+        }
+
+        .dataTables_info {
+            padding-top: 10px;
+            color: #7f8c8d;
+        }
+
+        .dataTables_paginate {
+            padding-top: 10px;
+        }
+
+        .dataTables_paginate .paginate_button {
+            padding: 6px 12px;
+            margin: 0 2px;
+            border: 1px solid #e0e0e0;
+            border-radius: 4px;
+            background: white;
+            cursor: pointer;
+        }
+
+        .dataTables_paginate .paginate_button.current {
+            background: #3498db;
+            color: white !important;
+            border-color: #3498db;
+        }
+
+        .dataTables_paginate .paginate_button:hover {
+            background: #ecf0f1;
+        }
+
+        .dataTables_paginate .paginate_button.disabled {
+            cursor: not-allowed;
+            opacity: 0.5;
+        }
    </style>
 </head>
 <body>
@@ -214,7 +285,7 @@

        <div class="card">
            <h2>Crawl Jobs</h2>
-            <table id="jobsTable">
+            <table id="jobsTable" class="display">
                <thead>
                    <tr>
                        <th>ID</th>
@@ -241,10 +312,13 @@
                <div class="tabs">
                    <button class="tab active" onclick="switchTab('pages')">Seiten</button>
                    <button class="tab" onclick="switchTab('links')">Links</button>
+                    <button class="tab" onclick="switchTab('broken')">Broken Links</button>
+                    <button class="tab" onclick="switchTab('redirects')">Redirects</button>
+                    <button class="tab" onclick="switchTab('seo')">SEO Analysis</button>
                </div>

                <div class="tab-content active" id="pages-tab">
-                    <table>
+                    <table id="pagesTable" class="display">
                        <thead>
                            <tr>
                                <th>URL</th>
@@ -260,7 +334,7 @@
                </div>

                <div class="tab-content" id="links-tab">
-                    <table>
+                    <table id="linksTable" class="display">
                        <thead>
                            <tr>
                                <th>Von</th>
@@ -275,6 +349,62 @@
                        </tbody>
                    </table>
                </div>
+
+                <div class="tab-content" id="broken-tab">
+                    <table id="brokenTable" class="display">
+                        <thead>
+                            <tr>
+                                <th>URL</th>
+                                <th>Status Code</th>
+                                <th>Titel</th>
+                                <th>Gecrawlt</th>
+                            </tr>
+                        </thead>
+                        <tbody id="brokenBody">
+                            <tr><td colspan="4" class="loading">Keine defekten Links gefunden</td></tr>
+                        </tbody>
+                    </table>
+                </div>
+
+                <div class="tab-content" id="redirects-tab">
+                    <h3>Redirect Statistics</h3>
+                    <div id="redirectStats" class="stats" style="margin-bottom: 20px;"></div>
+                    <table id="redirectsTable" class="display">
+                        <thead>
+                            <tr>
+                                <th>URL</th>
+                                <th>Redirect To</th>
+                                <th>Status Code</th>
+                                <th>Redirect Count</th>
+                                <th>Type</th>
+                            </tr>
+                        </thead>
+                        <tbody id="redirectsBody">
+                            <tr><td colspan="5" class="loading">Keine Redirects gefunden</td></tr>
+                        </tbody>
+                    </table>
+                </div>
+
+                <div class="tab-content" id="seo-tab">
+                    <h3>SEO Issues</h3>
+                    <div id="seoStats" style="margin-bottom: 20px;"></div>
+                    <table id="seoTable" class="display">
+                        <thead>
+                            <tr>
+                                <th>URL</th>
+                                <th>Title (Länge)</th>
+                                <th>Meta Description (Länge)</th>
+                                <th>Issues</th>
+                            </tr>
+                        </thead>
+                        <tbody id="seoIssuesBody">
+                            <tr><td colspan="4" class="loading">Keine SEO-Probleme gefunden</td></tr>
+                        </tbody>
+                    </table>
+
+                    <h3 style="margin-top: 30px;">Duplicate Content</h3>
+                    <div id="seoDuplicatesBody"></div>
+                </div>
            </div>
        </div>
    </div>
@@ -312,12 +442,19 @@
            }
        }

+        let jobsDataTable = null;
+
        async function loadJobs() {
            try {
                const response = await fetch('/api.php?action=jobs');
                const data = await response.json();

                if (data.success) {
+                    // Destroy existing DataTable if it exists
+                    if (jobsDataTable) {
+                        jobsDataTable.destroy();
+                    }
+
                    const tbody = document.getElementById('jobsBody');
                    tbody.innerHTML = data.jobs.map(job => `
                        <tr>
@@ -329,10 +466,30 @@
                            <td>${job.started_at || '-'}</td>
                            <td>
                                <button class="action-btn" onclick="viewJob(${job.id})">Ansehen</button>
+                                <button class="action-btn" onclick="recrawlJob(${job.id}, '${job.domain}')">Recrawl</button>
                                <button class="action-btn" onclick="deleteJob(${job.id})">Löschen</button>
                            </td>
                        </tr>
                    `).join('');
+
+                    // Initialize DataTable
+                    jobsDataTable = $('#jobsTable').DataTable({
+                        pageLength: 25,
+                        order: [[0, 'desc']],
+                        language: {
+                            search: 'Suchen:',
+                            lengthMenu: 'Zeige _MENU_ Einträge',
+                            info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                            infoEmpty: 'Keine Einträge verfügbar',
+                            infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                            paginate: {
+                                first: 'Erste',
+                                last: 'Letzte',
+                                next: 'Nächste',
+                                previous: 'Vorherige'
+                            }
+                        }
+                    });
                }
            } catch (e) {
                console.error('Fehler beim Laden der Jobs:', e);
@@ -404,6 +561,10 @@
                const pagesResponse = await fetch(`/api.php?action=pages&job_id=${currentJobId}`);
                const pagesData = await pagesResponse.json();

+                if ($.fn.DataTable.isDataTable('#pagesTable')) {
+                    $('#pagesTable').DataTable().destroy();
+                }
+
                if (pagesData.success && pagesData.pages.length > 0) {
                    document.getElementById('pagesBody').innerHTML = pagesData.pages.map(page => `
                        <tr>
@@ -413,12 +574,33 @@
                            <td>${page.crawled_at}</td>
                        </tr>
                    `).join('');
+
+                    $('#pagesTable').DataTable({
+                        pageLength: 50,
+                        language: {
+                            search: 'Suchen:',
+                            lengthMenu: 'Zeige _MENU_ Einträge',
+                            info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                            infoEmpty: 'Keine Einträge verfügbar',
+                            infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                            paginate: {
+                                first: 'Erste',
+                                last: 'Letzte',
+                                next: 'Nächste',
+                                previous: 'Vorherige'
+                            }
+                        }
+                    });
                }

                // Load links
                const linksResponse = await fetch(`/api.php?action=links&job_id=${currentJobId}`);
                const linksData = await linksResponse.json();

+                if ($.fn.DataTable.isDataTable('#linksTable')) {
+                    $('#linksTable').DataTable().destroy();
+                }
+
                if (linksData.success && linksData.links.length > 0) {
                    document.getElementById('linksBody').innerHTML = linksData.links.map(link => `
                        <tr>
@@ -429,6 +611,205 @@
                            <td>${link.is_internal ? 'Intern' : '<span class="external">Extern</span>'}</td>
                        </tr>
                    `).join('');
+
+                    $('#linksTable').DataTable({
+                        pageLength: 50,
+                        language: {
+                            search: 'Suchen:',
+                            lengthMenu: 'Zeige _MENU_ Einträge',
+                            info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                            infoEmpty: 'Keine Einträge verfügbar',
+                            infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                            paginate: {
+                                first: 'Erste',
+                                last: 'Letzte',
+                                next: 'Nächste',
+                                previous: 'Vorherige'
+                            }
+                        }
+                    });
+                }
+
+                // Load broken links
+                const brokenResponse = await fetch(`/api.php?action=broken-links&job_id=${currentJobId}`);
+                const brokenData = await brokenResponse.json();
+
+                if ($.fn.DataTable.isDataTable('#brokenTable')) {
+                    $('#brokenTable').DataTable().destroy();
+                }
+
+                if (brokenData.success && brokenData.broken_links.length > 0) {
+                    document.getElementById('brokenBody').innerHTML = brokenData.broken_links.map(page => `
+                        <tr>
+                            <td class="url-cell" title="${page.url}">${page.url}</td>
+                            <td><span class="status failed">${page.status_code || 'Error'}</span></td>
+                            <td>${page.title || '-'}</td>
+                            <td>${page.crawled_at}</td>
+                        </tr>
+                    `).join('');
+
+                    $('#brokenTable').DataTable({
+                        pageLength: 25,
+                        language: {
+                            search: 'Suchen:',
+                            lengthMenu: 'Zeige _MENU_ Einträge',
+                            info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                            infoEmpty: 'Keine Einträge verfügbar',
+                            infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                            paginate: {
+                                first: 'Erste',
+                                last: 'Letzte',
+                                next: 'Nächste',
+                                previous: 'Vorherige'
+                            }
+                        }
+                    });
+                } else {
+                    document.getElementById('brokenBody').innerHTML = '<tr><td colspan="4" class="loading">Keine defekten Links gefunden</td></tr>';
+                }
+
+                // Load SEO analysis
+                const seoResponse = await fetch(`/api.php?action=seo-analysis&job_id=${currentJobId}`);
+                const seoData = await seoResponse.json();
+
+                if (seoData.success) {
+                    // SEO Stats
+                    document.getElementById('seoStats').innerHTML = `
+                        <div class="stat-box">
+                            <div class="stat-label">Total Pages</div>
+                            <div class="stat-value">${seoData.total_pages}</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Pages with Issues</div>
+                            <div class="stat-value">${seoData.issues.length}</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Duplicates Found</div>
+                            <div class="stat-value">${seoData.duplicates.length}</div>
+                        </div>
+                    `;
+
+                    // SEO Issues
+                    if ($.fn.DataTable.isDataTable('#seoTable')) {
+                        $('#seoTable').DataTable().destroy();
+                    }
+
+                    if (seoData.issues.length > 0) {
+                        document.getElementById('seoIssuesBody').innerHTML = seoData.issues.map(item => `
+                            <tr>
+                                <td class="url-cell" title="${item.url}">${item.url}</td>
+                                <td>${item.title || '-'} (${item.title_length})</td>
+                                <td>${item.meta_description ? item.meta_description.substring(0, 50) + '...' : '-'} (${item.meta_length})</td>
+                                <td><span class="nofollow">${item.issues.join(', ')}</span></td>
+                            </tr>
+                        `).join('');
+
+                        $('#seoTable').DataTable({
+                            pageLength: 25,
+                            language: {
+                                search: 'Suchen:',
+                                lengthMenu: 'Zeige _MENU_ Einträge',
+                                info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                                infoEmpty: 'Keine Einträge verfügbar',
+                                infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                                paginate: {
+                                    first: 'Erste',
+                                    last: 'Letzte',
+                                    next: 'Nächste',
+                                    previous: 'Vorherige'
+                                }
+                            }
+                        });
+                    } else {
+                        document.getElementById('seoIssuesBody').innerHTML = '<tr><td colspan="4" class="loading">Keine SEO-Probleme gefunden</td></tr>';
+                    }
+
+                    // Duplicates
+                    if (seoData.duplicates.length > 0) {
+                        document.getElementById('seoDuplicatesBody').innerHTML = seoData.duplicates.map(dup => `
+                            <div class="stat-box" style="margin-bottom: 15px;">
+                                <div class="stat-label">Duplicate ${dup.type}</div>
+                                <div style="font-size: 14px; margin: 10px 0;"><strong>${dup.content}</strong></div>
+                                <div style="font-size: 12px;">Found on ${dup.urls.length} pages:</div>
+                                <ul style="margin-top: 5px; font-size: 12px;">
+                                    ${dup.urls.map(url => `<li>${url}</li>`).join('')}
+                                </ul>
+                            </div>
+                        `).join('');
+                    } else {
+                        document.getElementById('seoDuplicatesBody').innerHTML = '<p>Keine doppelten Inhalte gefunden</p>';
+                    }
+                }
+
+                // Load redirects
+                const redirectsResponse = await fetch(`/api.php?action=redirects&job_id=${currentJobId}`);
+                const redirectsData = await redirectsResponse.json();
+
+                if (redirectsData.success) {
+                    const stats = redirectsData.stats;
+
+                    // Redirect Stats
+                    document.getElementById('redirectStats').innerHTML = `
+                        <div class="stat-box">
+                            <div class="stat-label">Total Redirects</div>
+                            <div class="stat-value">${stats.total}</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Permanent (301/308)</div>
+                            <div class="stat-value">${stats.permanent}</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Temporary (302/303/307)</div>
+                            <div class="stat-value">${stats.temporary}</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Excessive (>${stats.threshold})</div>
+                            <div class="stat-value" style="color: ${stats.excessive > 0 ? '#e74c3c' : '#27ae60'}">${stats.excessive}</div>
+                            <div class="stat-sublabel">threshold: ${stats.threshold}</div>
+                        </div>
+                    `;
+
+                    // Redirect Table
+                    if ($.fn.DataTable.isDataTable('#redirectsTable')) {
+                        $('#redirectsTable').DataTable().destroy();
+                    }
+
+                    if (redirectsData.redirects.length > 0) {
+                        document.getElementById('redirectsBody').innerHTML = redirectsData.redirects.map(redirect => {
+                            const isExcessive = redirect.redirect_count > stats.threshold;
+                            const isPermRedirect = redirect.status_code == 301 || redirect.status_code == 308;
+                            const redirectType = isPermRedirect ? 'Permanent' : 'Temporary';
+
+                            return `
+                                <tr style="${isExcessive ? 'background-color: #fff3cd;' : ''}">
+                                    <td class="url-cell" title="${redirect.url}">${redirect.url}</td>
+                                    <td class="url-cell" title="${redirect.redirect_url || '-'}">${redirect.redirect_url || '-'}</td>
+                                    <td><span class="status ${isPermRedirect ? 'completed' : 'running'}">${redirect.status_code}</span></td>
+                                    <td><strong ${isExcessive ? 'style="color: #e74c3c;"' : ''}>${redirect.redirect_count}</strong></td>
+                                    <td>${redirectType}</td>
+                                </tr>
+                            `;
+                        }).join('');
+
+                        $('#redirectsTable').DataTable({
+                            pageLength: 25,
+                            language: {
+                                search: 'Suchen:',
+                                lengthMenu: 'Zeige _MENU_ Einträge',
+                                info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                                infoEmpty: 'Keine Einträge verfügbar',
+                                infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                                paginate: {
+                                    first: 'Erste',
+                                    last: 'Letzte',
+                                    next: 'Nächste',
+                                    previous: 'Vorherige'
+                                }
+                            }
+                        });
+                    } else {
+                        document.getElementById('redirectsBody').innerHTML = '<tr><td colspan="5" class="loading">Keine Redirects gefunden</td></tr>';
+                    }
                }

                // Update jobs table
@@ -463,6 +844,31 @@
            }
        }

+        async function recrawlJob(jobId, domain) {
+            if (!confirm('Job-Ergebnisse löschen und neu crawlen?')) return;
+
+            const formData = new FormData();
+            formData.append('job_id', jobId);
+            formData.append('domain', domain);
+
+            try {
+                const response = await fetch('/api.php?action=recrawl', {
+                    method: 'POST',
+                    body: formData
+                });
+                const data = await response.json();
+
+                if (data.success) {
+                    loadJobs();
+                    alert('Recrawl gestartet! Job ID: ' + data.job_id);
+                } else {
+                    alert('Fehler: ' + data.error);
+                }
+            } catch (e) {
+                alert('Fehler beim Recrawl: ' + e.message);
+            }
+        }
+
        function switchTab(tab) {
            document.querySelectorAll('.tab').forEach(t => t.classList.remove('active'));
            document.querySelectorAll('.tab-content').forEach(c => c.classList.remove('active'));
--- a/webanalyse.php
+++ b/webanalyse.php
@@ -1,530 +0,0 @@
-<?php
-
-declare(strict_types=1);
-
-/**
- * Koordiniert Webseiten-Crawls und persistiert Antwortdaten in der Screaming Frog Datenbank.
- */
-class WebAnalyse
-{
-    private const USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36';
-    private const CURL_TIMEOUT = 30;
-
-    /**
-     * @var mysqli Verbindung zur Screaming Frog Datenbank.
-     */
-    private mysqli $db;
-
-    public function __construct(?mysqli $connection = null)
-    {
-        $connection ??= mysqli_connect('localhost', 'root', '', 'screaming_frog');
-
-        if (!$connection instanceof mysqli) {
-            throw new RuntimeException('Verbindung zur Datenbank konnte nicht hergestellt werden: ' . mysqli_connect_error());
-        }
-
-        $connection->set_charset('utf8mb4');
-        $this->db = $connection;
-    }
-
-    /**
-     * Holt eine einzelne URL und gibt Response-Metadaten zurueck.
-     *
-     * @param string $url Zieladresse fuer den Abruf.
-     * @return array<string,mixed> Antwortdaten oder ein "error"-Schluessel.
-     */
-    public function getWebsite(string $url): array
-    {
-        $handle = $this->createCurlHandle($url);
-        $response = curl_exec($handle);
-
-        if ($response === false) {
-            $error = curl_error($handle);
-            curl_close($handle);
-            return ['error' => $error];
-        }
-
-        $info = curl_getinfo($handle);
-        curl_close($handle);
-
-        return $this->buildResponsePayload($response, $info);
-    }
-
-    /**
-     * Ruft mehrere URLs parallel via curl_multi ab.
-     *
-     * @param array<int,string> $urls Liste von Ziel-URLs.
-     * @return array<string,array<string,mixed>> Antworten je URL.
-     */
-    public function getMultipleWebsites(array $urls): array
-    {
-        if ($urls === []) {
-            return [];
-        }
-
-        $results = [];
-        $multiHandle = curl_multi_init();
-        $handles = [];
-
-        foreach ($urls as $url) {
-            $handle = $this->createCurlHandle($url);
-            $handles[$url] = $handle;
-            curl_multi_add_handle($multiHandle, $handle);
-        }
-
-        $running = null;
-        do {
-            $status = curl_multi_exec($multiHandle, $running);
-        } while ($status === CURLM_CALL_MULTI_PERFORM);
-
-        while ($running && $status === CURLM_OK) {
-            if (curl_multi_select($multiHandle, 1.0) === -1) {
-                usleep(100000);
-            }
-
-            do {
-                $status = curl_multi_exec($multiHandle, $running);
-            } while ($status === CURLM_CALL_MULTI_PERFORM);
-        }
-
-        foreach ($handles as $url => $handle) {
-            $response = curl_multi_getcontent($handle);
-
-            if ($response === false) {
-                $results[$url] = ['error' => curl_error($handle)];
-            } else {
-                $results[$url] = $this->buildResponsePayload($response, curl_getinfo($handle));
-            }
-
-            curl_multi_remove_handle($multiHandle, $handle);
-            curl_close($handle);
-        }
-
-        curl_multi_close($multiHandle);
-
-        return $results;
-    }
-
-    /**
-     * Persistiert Response-Daten und stoesst die Analyse der gefundenen Links an.
-     *
-     * @param int $crawlID Identifier der Crawl-Session.
-     * @param string $url Ursprung-URL, deren Antwort verarbeitet wird.
-     * @param array<string,mixed> $data Ergebnis der HTTP-Abfrage.
-     */
-    public function processResults(int $crawlID, string $url, array $data): void
-    {
-        if (isset($data['error'])) {
-            error_log(sprintf('Fehler bei der Analyse von %s: %s', $url, $data['error']));
-            return;
-        }
-
-        $body = (string)($data['body'] ?? '');
-
-        $update = $this->db->prepare(
-            'UPDATE urls
-             SET status_code = ?, response_time = ?, body_size = ?, date = NOW(), body = ?
-             WHERE url = ? AND crawl_id = ?
-             LIMIT 1'
-        );
-
-        if ($update === false) {
-            throw new RuntimeException('Update-Statement konnte nicht vorbereitet werden: ' . $this->db->error);
-        }
-
-        $statusCode = (int)($data['status_code'] ?? 0);
-        $responseTimeMs = (int)round(((float)($data['response_time'] ?? 0)) * 1000);
-        $bodySize = (int)($data['body_size'] ?? strlen($body));
-
-        $update->bind_param('iiissi', $statusCode, $responseTimeMs, $bodySize, $body, $url, $crawlID);
-        $update->execute();
-        $update->close();
-
-        $this->findNewUrls($crawlID, $body, $url);
-    }
-
-    /**
-     * Extrahiert Links aus einer Antwort und legt neue URL-Datensaetze an.
-     *
-     * @param int $crawlID Identifier der Crawl-Session.
-     * @param string $body HTML-Koerper der Antwort.
-     * @param string $url Bearbeitete URL, dient als Kontext fuer relative Links.
-     */
-    public function findNewUrls(int $crawlID, string $body, string $url): void
-    {
-        if ($body === '') {
-            return;
-        }
-
-        $links = $this->extractLinks($body, $url);
-        if ($links === []) {
-            return;
-        }
-
-        $originId = $this->resolveUrlId($crawlID, $url);
-        if ($originId === null) {
-            return;
-        }
-
-        $deleteLinksStmt = $this->db->prepare('DELETE FROM links WHERE von = ?');
-        if ($deleteLinksStmt !== false) {
-            $deleteLinksStmt->bind_param('i', $originId);
-            $deleteLinksStmt->execute();
-            $deleteLinksStmt->close();
-        }
-
-        $insertUrlStmt = $this->db->prepare('INSERT IGNORE INTO urls (url, crawl_id) VALUES (?, ?)');
-        $selectUrlStmt = $this->db->prepare('SELECT id FROM urls WHERE url = ? AND crawl_id = ? LIMIT 1');
-        $insertLinkStmt = $this->db->prepare('INSERT IGNORE INTO links (von, nach, linktext, dofollow) VALUES (?, ?, ?, ?)');
-
-        if (!$insertUrlStmt || !$selectUrlStmt || !$insertLinkStmt) {
-            throw new RuntimeException('Vorbereitete Statements konnten nicht erstellt werden: ' . $this->db->error);
-        }
-
-        foreach ($links as $link) {
-            $absoluteUrl = (string)$link['absolute_url'];
-
-            $insertUrlStmt->bind_param('si', $absoluteUrl, $crawlID);
-            $insertUrlStmt->execute();
-
-            $targetId = $this->db->insert_id;
-            if ($targetId === 0) {
-                $selectUrlStmt->bind_param('si', $absoluteUrl, $crawlID);
-                $selectUrlStmt->execute();
-                $result = $selectUrlStmt->get_result();
-                $targetId = $result ? (int)($result->fetch_assoc()['id'] ?? 0) : 0;
-            }
-
-            if ($targetId === 0) {
-                continue;
-            }
-
-            $linkText = $this->normaliseText((string)($link['text'] ?? ''));
-            $isFollow = (int)(strpos((string)($link['rel'] ?? ''), 'nofollow') !== false ? 0 : 1);
-
-            $insertLinkStmt->bind_param('iisi', $originId, $targetId, $linkText, $isFollow);
-            $insertLinkStmt->execute();
-        }
-
-        $insertUrlStmt->close();
-        $selectUrlStmt->close();
-        $insertLinkStmt->close();
-    }
-
-    /**
-     * Startet einen Crawl-Durchlauf fuer unbehandelte URLs.
-     *
-     * @param int $crawlID Identifier der Crawl-Session.
-     */
-    public function doCrawl(int $crawlID): void
-    {
-        $statement = $this->db->prepare(
-            'SELECT url FROM urls WHERE crawl_id = ? AND date IS NULL LIMIT 50'
-        );
-
-        if ($statement === false) {
-            return;
-        }
-
-        $statement->bind_param('i', $crawlID);
-        $statement->execute();
-        $result = $statement->get_result();
-
-        if (!$result instanceof mysqli_result) {
-            $statement->close();
-            return;
-        }
-
-        $urls = [];
-        while ($row = $result->fetch_assoc()) {
-            $urls[] = $row['url'];
-        }
-
-        $result->free();
-        $statement->close();
-
-        if ($urls === []) {
-            return;
-        }
-
-        foreach ($this->getMultipleWebsites($urls) as $url => $data) {
-            $this->processResults($crawlID, $url, $data);
-        }
-    }
-
-    /**
-     * Parst HTML-Inhalt und liefert eine strukturierte Liste gefundener Links.
-     *
-     * @param string $html Rohes HTML-Dokument.
-     * @param string $baseUrl Basis-URL fuer die Aufloesung relativer Pfade.
-     * @return array<int,array<string,mixed>> Gesammelte Linkdaten.
-     */
-    public function extractLinks(string $html, string $baseUrl = ''): array
-    {
-        $links = [];
-
-        $dom = new DOMDocument();
-        $previous = libxml_use_internal_errors(true);
-        $dom->loadHTML('<?xml encoding="UTF-8">' . $html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
-        libxml_clear_errors();
-        libxml_use_internal_errors($previous);
-
-        foreach ($dom->getElementsByTagName('a') as $index => $aTag) {
-            $href = trim($aTag->getAttribute('href'));
-            if ($href === '') {
-                continue;
-            }
-
-            $absoluteUrl = $this->resolveUrl($href, $baseUrl);
-            $text = $this->normaliseText(trim($aTag->textContent));
-            $rel = $aTag->getAttribute('rel');
-            $title = $aTag->getAttribute('title');
-            $target = $aTag->getAttribute('target');
-
-            $links[] = [
-                'index' => $index + 1,
-                'href' => $href,
-                'absolute_url' => $absoluteUrl,
-                'text' => $text,
-                'rel' => $rel !== '' ? $rel : null,
-                'title' => $title !== '' ? $title : null,
-                'target' => $target !== '' ? $target : null,
-                'is_external' => $this->isExternalLink($absoluteUrl, $baseUrl),
-                'link_type' => $this->getLinkType($href),
-                'is_internal' => $this->isInternalLink($absoluteUrl, $baseUrl) ? 1 : 0,
-            ];
-        }
-
-        return $links;
-    }
-
-    /**
-     * Prueft, ob ein Link aus Sicht der Basis-URL extern ist.
-     *
-     * @param string $href Ziel des Links.
-     * @param string $baseUrl Ausgangsadresse zur Domainabgleichung.
-     * @return bool|null True fuer extern, false fuer intern, null falls undefiniert.
-     */
-    private function isExternalLink(string $href, string $baseUrl): ?bool
-    {
-        if ($baseUrl === '') {
-            return null;
-        }
-
-        $baseDomain = parse_url($baseUrl, PHP_URL_HOST);
-        $linkDomain = parse_url($href, PHP_URL_HOST);
-
-        if ($baseDomain === null || $linkDomain === null) {
-            return null;
-        }
-
-        return !hash_equals($baseDomain, $linkDomain);
-    }
-
-    /**
-     * Prueft, ob ein Link derselben Domain wie die Basis-URL entspricht.
-     *
-     * @param string $href Ziel des Links.
-     * @param string $baseUrl Ausgangsadresse zur Domainabgleichung.
-     * @return bool|null True fuer intern, false fuer extern, null falls undefiniert.
-     */
-    private function isInternalLink(string $href, string $baseUrl): ?bool
-    {
-        if ($baseUrl === '') {
-            return null;
-        }
-
-        $baseDomain = parse_url($baseUrl, PHP_URL_HOST);
-        $linkDomain = parse_url($href, PHP_URL_HOST);
-
-        if ($baseDomain === null || $linkDomain === null) {
-            return null;
-        }
-
-        return hash_equals($baseDomain, $linkDomain);
-    }
-
-    /**
-     * Leitet den Link-Typ anhand gaengiger Protokolle und Muster ab.
-     *
-     * @param string $href Ziel des Links.
-     * @return string Beschreibender Typ wie "absolute" oder "email".
-     */
-    private function getLinkType(string $href): string
-    {
-        if ($href === '') {
-            return 'empty';
-        }
-
-        $lower = strtolower($href);
-        if (strpos($lower, 'mailto:') === 0) {
-            return 'email';
-        }
-        if (strpos($lower, 'tel:') === 0) {
-            return 'phone';
-        }
-        if (strpos($lower, '#') === 0) {
-            return 'anchor';
-        }
-        if (strpos($lower, 'javascript:') === 0) {
-            return 'javascript';
-        }
-        if (filter_var($href, FILTER_VALIDATE_URL)) {
-            return 'absolute';
-        }
-
-        return 'relative';
-    }
-
-    /**
-     * Gruppiert Links anhand ihres vorab bestimmten Typs.
-     *
-     * @param array<int,array<string,mixed>> $links Liste der extrahierten Links.
-     * @return array<string,array<int,array<string,mixed>>> Links nach Typ gruppiert.
-     */
-    public function groupLinksByType(array $links): array
-    {
-        $grouped = [];
-
-        foreach ($links as $link) {
-            $type = (string)($link['link_type'] ?? 'unknown');
-            $grouped[$type][] = $link;
-        }
-
-        return $grouped;
-    }
-
-    /**
-     * Erstellt ein konfiguriertes Curl-Handle fuer einen Request.
-     *
-     * @return CurlHandle
-     */
-    private function createCurlHandle(string $url)
-    {
-        $handle = curl_init($url);
-        if ($handle === false) {
-            throw new RuntimeException('Konnte Curl-Handle nicht initialisieren: ' . $url);
-        }
-
-        curl_setopt_array($handle, [
-            CURLOPT_URL => $url,
-            CURLOPT_RETURNTRANSFER => true,
-            CURLOPT_HEADER => true,
-            CURLOPT_FOLLOWLOCATION => true,
-            CURLOPT_TIMEOUT => self::CURL_TIMEOUT,
-            CURLOPT_USERAGENT => self::USER_AGENT,
-            CURLOPT_SSL_VERIFYPEER => false,
-        ]);
-
-        return $handle;
-    }
-
-    /**
-     * Splittet Header und Body und bereitet das Antwort-Array auf.
-     *
-     * @param string $response Vollstaendige Response inkl. Header.
-     * @param array<string,mixed> $info curl_getinfo Ergebnis.
-     * @return array<string,mixed>
-     */
-    private function buildResponsePayload(string $response, array $info): array
-    {
-        $headerSize = (int)($info['header_size'] ?? 0);
-        $headers = substr($response, 0, $headerSize);
-        $body = substr($response, $headerSize);
-
-        return [
-            'url' => $info['url'] ?? ($info['redirect_url'] ?? ''),
-            'status_code' => (int)($info['http_code'] ?? 0),
-            'headers_parsed' => $this->parseHeaders($headers),
-            'body' => $body,
-            'response_time' => (float)($info['total_time'] ?? 0.0),
-            'body_size' => strlen($body),
-        ];
-    }
-
-    /**
-     * Wandelt Header-String in ein assoziatives Array um.
-     *
-     * @param string $headers Roh-Header.
-     * @return array<string,string>
-     */
-    private function parseHeaders(string $headers): array
-    {
-        $parsed = [];
-        foreach (preg_split('/\r?\n/', trim($headers)) as $line) {
-            if ($line === '' || strpos($line, ':') === false) {
-                continue;
-            }
-
-            [$key, $value] = explode(':', $line, 2);
-            $parsed[trim($key)] = trim($value);
-        }
-
-        return $parsed;
-    }
-
-    /**
-     * Normalisiert relativen Pfad gegenueber einer Basis-URL zu einer absoluten Adresse.
-     */
-    private function resolveUrl(string $href, string $baseUrl): string
-    {
-        if ($href === '' || filter_var($href, FILTER_VALIDATE_URL)) {
-            return $href;
-        }
-
-        if ($baseUrl === '') {
-            return $href;
-        }
-
-        $baseParts = parse_url($baseUrl);
-        if ($baseParts === false || !isset($baseParts['scheme'], $baseParts['host'])) {
-            return $href;
-        }
-
-        $scheme = $baseParts['scheme'];
-        $host = $baseParts['host'];
-        $port = isset($baseParts['port']) ? ':' . $baseParts['port'] : '';
-        $basePath = $baseParts['path'] ?? '/';
-
-        if (strpos($href, '/') === 0) {
-            $path = $href;
-        } else {
-            if (substr($basePath, -1) !== '/') {
-                $basePath = preg_replace('#/[^/]*$#', '/', $basePath) ?: '/';
-            }
-            $path = $basePath . $href;
-        }
-
-        return sprintf('%s://%s%s%s', $scheme, $host, $port, '/' . ltrim($path, '/'));
-    }
-
-    /**
-     * Sorgt fuer sauberen UTF-8 Text ohne Steuerzeichen.
-     */
-    private function normaliseText(string $text): string
-    {
-        $normalized = preg_replace('/\s+/u', ' ', $text) ?? '';
-        $encoding = mb_detect_encoding($normalized, ['UTF-8', 'ISO-8859-1', 'Windows-1252'], true) ?: 'UTF-8';
-
-        return trim(mb_convert_encoding($normalized, 'UTF-8', $encoding));
-    }
-
-    /**
-     * Ermittelt die ID einer URL innerhalb eines Crawl-Durchlaufs.
-     */
-    private function resolveUrlId(int $crawlID, string $url): ?int
-    {
-        $statement = $this->db->prepare('SELECT id FROM urls WHERE url = ? AND crawl_id = ? LIMIT 1');
-        if ($statement === false) {
-            return null;
-        }
-
-        $statement->bind_param('si', $url, $crawlID);
-        $statement->execute();
-        $result = $statement->get_result();
-        $id = $result ? $result->fetch_assoc()['id'] ?? null : null;
-        $statement->close();
-
-        return $id !== null ? (int)$id : null;
-    }
-}
Author	SHA1	Message	Date
Martin	1588f83624	Add pagination to all data tables using jQuery DataTables Libraries Added: - jQuery 3.7.1 from CDN - DataTables 1.13.7 (CSS + JS) from CDN Custom Styling: - Integrated DataTables styling with existing design - Custom pagination button styles - Responsive search and filter inputs Paginated Tables: - jobsTable: Crawl jobs (25/page, sorted by ID desc) - pagesTable: Crawled pages (50/page) - linksTable: Found links (50/page) - brokenTable: Broken links (25/page) - redirectsTable: Redirects (25/page) - seoTable: SEO issues (25/page) Features: - Search functionality per table - Column sorting - Configurable entries per page - German localization - Automatic reinitialization on data reload - Navigation controls (First/Previous/Next/Last) - Entry count display All quality checks pass: - PHPStan Level 8: 0 errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 09:49:39 +02:00
Martin	c40d44e4c9	Add redirect tracking and analysis features Database Schema: - Added redirect_url VARCHAR(2048) to pages table - Added redirect_count INT DEFAULT 0 to pages table - Added index on redirect_count for faster queries Configuration: - Created Config class with typed constants (PHP 8.3+) - MAX_REDIRECT_THRESHOLD = 3 (configurable warning threshold) - MAX_CRAWL_DEPTH = 50 - CONCURRENCY = 10 Backend Changes: - Crawler now tracks redirects using Guzzle's redirect tracking - Extracts redirect history from response headers - Records redirect count and final destination URL - Guzzle configured with max 10 redirects and tracking enabled API Endpoint: - New endpoint: /api.php?action=redirects - Analyzes redirect types (permanent 301/308 vs temporary 302/303/307) - Identifies excessive redirects (> threshold) - Returns statistics and detailed redirect information Frontend Changes: - Added "Redirects" tab with: * Statistics overview (Total, Permanent, Temporary, Excessive) * Detailed table showing all redirects * Visual warnings for excessive redirects (yellow background) * Color-coded redirect counts (red when > threshold) * Status code badges (green for permanent, blue for temporary) All quality checks pass: - PHPStan Level 8: 0 errors - PHPCS PSR-12: 0 errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 09:40:26 +02:00
Martin	e6b75410ed	Add copyright information to README Added visible copyright section with author information: - Martin Kiesewetter - mki@kies-media.de - https://kies-media.de Also updated project title from "PHP Docker Anwendung" to "Web Crawler" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 09:29:05 +02:00
Martin	f7be09ec63	Add Broken Links detection and SEO Analysis features Database Schema: - Added meta_description TEXT field to pages table - Added index on status_code for faster broken link queries Backend Changes: - Crawler now extracts meta descriptions from pages - New API endpoint: broken-links (finds 404s and server errors) - New API endpoint: seo-analysis (analyzes titles and meta descriptions) SEO Analysis Features: - Title length validation (optimal: 30-60 chars) - Meta description length validation (optimal: 70-160 chars) - Detection of missing titles/descriptions - Duplicate content detection (titles and meta descriptions) Frontend Changes: - Added "Broken Links" tab showing pages with errors - Added "SEO Analysis" tab with: * Statistics overview * Pages with SEO issues * Duplicate content report All quality checks pass: - PHPStan Level 8: 0 errors - PHPCS PSR-12: 0 warnings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 09:26:33 +02:00
Martin	9e61572747	Add recrawl functionality and fix PHPCS warnings - Added "Recrawl" button in jobs table UI - Implemented recrawl API endpoint that deletes all job data and restarts crawl - Fixed PHPCS line length warnings in api.php and Crawler.php All quality checks pass: - PHPStan Level 8: 0 errors - PHPCS PSR-12: 0 warnings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 09:07:50 +02:00
Martin	11fd8fa673	Add copyright headers to configuration files Extended copyright headers to SQL, YAML, and JSON configuration files: - config/docker/init.sql (SQL comment block) - docker-compose.yml (YAML comment) - composer.json and src/composer.json (JSON _comment field) All files validated and tested successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:58:28 +02:00
Martin	cbf099701b	Add copyright headers to all application files Added copyright headers to all PHP files in the application with proper author information (Martin Kiesewetter) and contact details. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:47:44 +02:00
Martin	ad274c0738	Update paths for config/ directory structure Adjusted all references to match new config/ structure: - docker/config/nginx/default.conf → config/nginx/default.conf - docker/init.sql → config/docker/init.sql - docker/start.sh → config/docker/start.sh Updated files: - docker-compose.yml: Updated volume mount paths - README.md: Updated project structure documentation New structure consolidates all configuration files under config/ for better organization and clarity. Tested and verified all services running correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:36:58 +02:00
Martin	de4d2e53d9	Reorganize Docker-related files into docker/ directory Moved Docker infrastructure files to dedicated docker/ folder: - config/nginx/default.conf → docker/config/nginx/default.conf - init.sql → docker/init.sql - start.sh → docker/start.sh (currently unused) Updated: - docker-compose.yml: Adjusted volume paths - README.md: Updated project structure documentation Benefits: - Clear separation between infrastructure (docker/) and application (src/) - Better project organization - Easier to understand for new developers Docker Compose and Dockerfile remain in root for convenience. All services tested and working correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:31:47 +02:00
Martin	daa76b2141	Remove legacy PHP files from root directory Removed unused legacy files: - index.php (old crawler entry point) - webanalyse.php (old crawler implementation) - setnew.php (database reset script) These files are no longer used. The current application uses: - src/index.php (web interface) - src/api.php (API endpoints) - src/classes/Crawler.php (crawler implementation) - src/crawler-worker.php (background worker) The legacy code remains in git history if needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:24:43 +02:00
Martin	09d5b61779	Fix link extraction bug caused by type checking The PHPStan fix inadvertently broke link extraction by using is_int() on $pageId, which failed when lastInsertId() or fetchColumn() returned a string instead of an int. Changes: - Convert $pageId to int explicitly after fetching - Use $pageId > 0 instead of is_int($pageId) for validation - Handle both 0 and '0' cases when fetching manually This ensures link extraction works again while maintaining type safety. Tests pass, PHPStan clean. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:18:52 +02:00