Add pagination to all data tables using jQuery DataTables

Libraries Added: - jQuery 3.7.1 from CDN - DataTables 1.13.7 (CSS + JS) from CDN Custom Styling: - Integrated DataTables styling with existing design - Custom pagination button styles - Responsive search and filter inputs Paginated Tables: - jobsTable: Crawl jobs (25/page, sorted by ID desc) - pagesTable: Crawled pages (50/page) - linksTable: Found links (50/page) - brokenTable: Broken links (25/page) - redirectsTable: Redirects (25/page) - seoTable: SEO issues (25/page) Features: - Search functionality per table - Column sorting - Configurable entries per page - German localization - Automatic reinitialization on data reload - Navigation controls (First/Previous/Next/Last) - Entry count display All quality checks pass: - PHPStan Level 8: 0 errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add redirect tracking and analysis features
2025-10-04 09:49:39 +02:00 · 2025-10-04 09:40:26 +02:00 · 2025-10-04 09:29:05 +02:00 · 2025-10-04 09:26:33 +02:00 · 2025-10-04 09:07:50 +02:00 · 2025-10-04 08:58:28 +02:00
27 changed files with 5337 additions and 484 deletions
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,6 @@
+# Database Configuration
+DB_HOST=mariadb
+DB_NAME=app_database
+DB_USER=app_user
+DB_PASSWORD=app_password
+DB_ROOT_PASSWORD=root_password
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1,26 @@
-/.idea/
+# IDE
+.idea/
+.vscode/
+
+# Dependencies
+vendor/
+node_modules/
+
+# Environment
+.env
+.env.local
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Logs
+*.log
+
+# Temporary files
+*.tmp
+*.cache
+
+# Docker
+docker-compose.override.yml
+/.claude/settings.local.json
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -4,13 +4,109 @@
 The codebase is intentionally lean. `index.php` bootstraps the crawl by instantiating `webanalyse` and handing off the crawl identifier. Core crawling logic lives in `webanalyse.php`, which houses HTTP fetching, link extraction, and database persistence. Use `setnew.php` to reset seed data inside the `screaming_frog` schema before a rerun. Keep new helpers in their own PHP files under this root so the autoload includes stay predictable; group SQL migrations or fixtures under a `database/` folder if you add them. IDE settings reside in `.idea/`.

 ## Build, Test, and Development Commands
-Run the project through Apache in XAMPP or start the PHP built-in server with `php -S localhost:8080 index.php` from this directory. Validate syntax quickly via `php -l webanalyse.php` (repeat for any new file). When iterating on crawl logic, truncate runtime tables with `php setnew.php` to restore the baseline dataset.
+
+### Docker Development
+The project runs in Docker containers. Use these commands:
+
+```bash
+# Start containers
+docker-compose up -d
+
+# Stop containers
+docker-compose down
+
+# Rebuild containers
+docker-compose up -d --build
+
+# View logs
+docker-compose logs -f php
+```
+
+### Running Tests
+The project uses PHPUnit for automated testing:
+
+```bash
+# Run all tests (Unit + Integration)
+docker-compose exec php sh -c "php /var/www/html/vendor/bin/phpunit /var/www/tests/"
+
+# Or use the composer shortcut
+docker-compose exec php composer test
+```
+
+**Test Structure:**
+- `tests/Unit/` - Unit tests for individual components
+- `tests/Integration/` - Integration tests for full crawl workflows
+- All tests run in isolated database transactions
+
+### Static Code Analysis
+PHPStan is configured at Level 8 (strictest) to ensure type safety:
+
+```bash
+# Run PHPStan analysis
+docker-compose exec php sh -c "php -d memory_limit=512M /var/www/html/vendor/bin/phpstan analyse -c /var/www/phpstan.neon"
+
+# Or use the composer shortcut
+docker-compose exec php composer phpstan
+```
+
+**PHPStan Configuration:**
+- Level: 8 (maximum strictness)
+- Analyzes: `src/` and `tests/`
+- Excludes: `vendor/`
+- Config file: `phpstan.neon`
+
+All code must pass PHPStan Level 8 with zero errors before merging.
+
+### Code Style Checking
+PHP_CodeSniffer enforces PSR-12 coding standards:
+
+```bash
+# Check code style
+docker-compose exec php composer phpcs
+
+# Automatically fix code style issues
+docker-compose exec php composer phpcbf
+```
+
+**PHPCS Configuration:**
+- Standard: PSR-12
+- Analyzes: `src/` and `tests/`
+- Excludes: `vendor/`
+- Auto-fix available via `phpcbf`
+
+Run `phpcbf` before committing to automatically fix most style violations.

 ## Coding Style & Naming Conventions
 Follow PSR-12 style cues already in use: 4-space indentation, brace-on-new-line for functions, and `declare(strict_types=1);` at the top of entry scripts. Favour descriptive camelCase for methods (`getMultipleWebsites`) and snake_case only for direct SQL field names. Maintain `mysqli` usage for consistency, and gate new configuration through constants or clearly named environment variables.

 ## Testing Guidelines
-There is no automated suite yet; treat each crawl as an integration test. After code changes, run `php setnew.php` followed by a crawl and confirm that `crawl`, `urls`, and `links` tables reflect the expected row counts. Log anomalies with `error_log()` while developing, and remove or downgrade to structured responses before merging.
+
+### Automated Testing
+The project has a comprehensive test suite using PHPUnit:
+
+- **Write tests first**: Follow TDD principles when adding new features
+- **Unit tests** (`tests/Unit/`): Test individual classes and methods in isolation
+- **Integration tests** (`tests/Integration/`): Test full crawl workflows with real HTTP requests
+- **Database isolation**: Tests use transactions that roll back automatically
+- **Coverage**: Aim for high test coverage on critical crawl logic
+
+### Quality Gates
+Before committing code, ensure:
+1. All tests pass: `docker-compose exec php composer test`
+2. PHPStan analysis passes: `docker-compose exec php composer phpstan`
+3. Code style is correct: `docker-compose exec php composer phpcs`
+4. Auto-fix style issues: `docker-compose exec php composer phpcbf`
+
+**Pre-commit Checklist:**
+- ✅ Tests pass
+- ✅ PHPStan Level 8 with 0 errors
+- ✅ PHPCS PSR-12 compliance (warnings acceptable)
+
+### Manual Testing
+For UI changes, manually test the crawler interface at http://localhost:8080. Verify:
+- Job creation and status updates
+- Page and link extraction accuracy
+- Error handling for invalid URLs or network issues

 ## Commit & Pull Request Guidelines
 Author commit messages in the present tense with a concise summary (`Add link grouping for external URLs`). Group related SQL adjustments with their PHP changes in the same commit. For pull requests, include: a short context paragraph, reproduction steps, screenshots of key output tables when behaviour changes, and any follow-up tasks. Link tracking tickets or issues so downstream agents can trace decisions.
--- a/42
+++ b/42
@@ -0,0 +1,42 @@
+FROM php:8.3-fpm
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    nginx \
+    libpng-dev \
+    libjpeg-dev \
+    libfreetype6-dev \
+    libzip-dev \
+    zip \
+    unzip \
+    git \
+    curl \
+    && docker-php-ext-configure gd --with-freetype --with-jpeg \
+    && docker-php-ext-install -j$(nproc) gd pdo pdo_mysql mysqli zip \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Composer
+COPY --from=composer:latest /usr/bin/composer /usr/bin/composer
+
+# Configure nginx
+RUN rm -rf /etc/nginx/sites-enabled/default
+
+# Configure PHP-FPM
+RUN sed -i 's/listen = 127.0.0.1:9000/listen = 9000/g' /usr/local/etc/php-fpm.d/www.conf
+
+# Set working directory
+WORKDIR /var/www/html
+
+# Copy application files
+COPY ./src /var/www/html
+
+# Set permissions
+RUN chown -R www-data:www-data /var/www/html \
+    && chmod -R 755 /var/www/html
+
+# Expose port 80
+EXPOSE 80
+
+# Start PHP-FPM and Nginx
+CMD php-fpm -D && nginx -g 'daemon off;'
--- a/README.md
+++ b/README.md
@@ -0,0 +1,131 @@
+# Web Crawler
+
+Eine PHP-Anwendung mit MariaDB, die in Docker läuft.
+
+## Copyright & Lizenz
+
+**Copyright © 2025 Martin Kiesewetter**
+
+- **Autor:** Martin Kiesewetter
+- **E-Mail:** mki@kies-media.de
+- **Website:** [https://kies-media.de](https://kies-media.de)
+
+---
+
+## Anforderungen
+
+- Docker
+- Docker Compose
+
+## Installation & Start
+
+1. Container starten:
+```bash
+docker-compose up -d
+```
+
+2. Container stoppen:
+```bash
+docker-compose down
+```
+
+3. Container neu bauen:
+```bash
+docker-compose up -d --build
+```
+
+## Services
+
+- **PHP Anwendung**: http://localhost:8080
+- **phpMyAdmin**: http://localhost:8081
+- **MariaDB**: Port 3306
+
+## Datenbank Zugangsdaten
+
+- **Host**: mariadb
+- **Datenbank**: app_database
+- **Benutzer**: app_user
+- **Passwort**: app_password
+- **Root Passwort**: root_password
+
+## Struktur
+
+```
+.
+├── docker-compose.yml      # Docker Compose Konfiguration
+├── Dockerfile              # PHP Container Image
+├── config/                 # Konfigurationsdateien
+│   ├── docker/
+│   │   ├── init.sql        # Datenbank Initialisierung
+│   │   └── start.sh        # Container Start-Script (unused)
+│   └── nginx/
+│       └── default.conf    # Nginx Konfiguration
+├── src/                    # Anwendungscode
+│   ├── api.php
+│   ├── index.php
+│   ├── classes/
+│   └── crawler-worker.php
+├── tests/                  # Test Suite
+│   ├── Unit/
+│   └── Integration/
+├── phpstan.neon            # PHPStan Konfiguration
+└── phpcs.xml               # PHPCS Konfiguration
+```
+
+## Entwicklung
+
+Die Anwendungsdateien befinden sich im `src/` Verzeichnis und werden als Volume in den Container gemountet, sodass Änderungen sofort sichtbar sind.
+
+## Tests & Code-Qualität
+
+### Unit Tests ausführen
+
+Die Anwendung verwendet PHPUnit für Unit- und Integrationstests:
+
+```bash
+# Alle Tests ausführen
+docker-compose exec php sh -c "php /var/www/html/vendor/bin/phpunit /var/www/tests/"
+
+# Alternative mit Composer-Script
+docker-compose exec php composer test
+```
+
+Die Tests befinden sich in:
+- `tests/Unit/` - Unit Tests
+- `tests/Integration/` - Integration Tests
+
+### Statische Code-Analyse mit PHPStan
+
+PHPStan ist auf Level 8 (höchstes Level) konfiguriert und analysiert den gesamten Code:
+
+```bash
+# PHPStan ausführen
+docker-compose exec php sh -c "php -d memory_limit=512M /var/www/html/vendor/bin/phpstan analyse -c /var/www/phpstan.neon"
+
+# Alternative mit Composer-Script
+docker-compose exec php composer phpstan
+```
+
+**PHPStan Konfiguration:**
+- Level: 8 (strictest)
+- Analysierte Pfade: `src/` und `tests/`
+- Ausgeschlossen: `vendor/` Ordner
+- Konfigurationsdatei: `phpstan.neon`
+
+### Code Style Prüfung mit PHP_CodeSniffer
+
+PHP_CodeSniffer (PHPCS) prüft den Code gegen PSR-12 Standards:
+
+```bash
+# Code Style prüfen
+docker-compose exec php composer phpcs
+
+# Code Style automatisch korrigieren
+docker-compose exec php composer phpcbf
+```
+
+**PHPCS Konfiguration:**
+- Standard: PSR-12
+- Analysierte Pfade: `src/` und `tests/`
+- Ausgeschlossen: `vendor/` Ordner
+- Auto-Fix verfügbar mit `phpcbf`
--- a/composer.json
+++ b/composer.json
@@ -0,0 +1,28 @@
+{
+    "_comment": "Web Crawler - Composer Configuration | Copyright (c) 2025 Martin Kiesewetter <mki@kies-media.de> | https://kies-media.de",
+    "name": "web-crawler/app",
+    "description": "Web Crawler Application with Parallel Processing",
+    "type": "project",
+    "require": {
+        "php": "^8.3",
+        "guzzlehttp/guzzle": "^7.8",
+        "symfony/dom-crawler": "^7.0",
+        "symfony/css-selector": "^7.0"
+    },
+    "require-dev": {
+        "phpunit/phpunit": "^11.0"
+    },
+    "autoload": {
+        "psr-4": {
+            "App\\": "classes/"
+        }
+    },
+    "autoload-dev": {
+        "psr-4": {
+            "Tests\\": "tests/"
+        }
+    },
+    "scripts": {
+        "test": "phpunit"
+    }
+}
--- a/config/docker/init.sql
+++ b/config/docker/init.sql
@@ -0,0 +1,79 @@
+/**
+ * Web Crawler - Database Schema
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
+-- Database initialization script for Web Crawler
+
+-- Crawl Jobs Table
+CREATE TABLE IF NOT EXISTS crawl_jobs (
+    id INT AUTO_INCREMENT PRIMARY KEY,
+    domain VARCHAR(255) NOT NULL,
+    status ENUM('pending', 'running', 'completed', 'failed') DEFAULT 'pending',
+    total_pages INT DEFAULT 0,
+    total_links INT DEFAULT 0,
+    started_at TIMESTAMP NULL,
+    completed_at TIMESTAMP NULL,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    INDEX idx_domain (domain),
+    INDEX idx_status (status)
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
+
+-- Pages Table
+CREATE TABLE IF NOT EXISTS pages (
+    id INT AUTO_INCREMENT PRIMARY KEY,
+    crawl_job_id INT NOT NULL,
+    url VARCHAR(2048) NOT NULL,
+    title VARCHAR(500),
+    meta_description TEXT,
+    status_code INT,
+    content_type VARCHAR(100),
+    redirect_url VARCHAR(2048),
+    redirect_count INT DEFAULT 0,
+    crawled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (crawl_job_id) REFERENCES crawl_jobs(id) ON DELETE CASCADE,
+    INDEX idx_crawl_job (crawl_job_id),
+    INDEX idx_url (url(255)),
+    INDEX idx_status_code (status_code),
+    INDEX idx_redirect_count (redirect_count),
+    UNIQUE KEY unique_job_url (crawl_job_id, url(255))
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
+
+-- Links Table
+CREATE TABLE IF NOT EXISTS links (
+    id INT AUTO_INCREMENT PRIMARY KEY,
+    page_id INT NOT NULL,
+    crawl_job_id INT NOT NULL,
+    source_url VARCHAR(2048) NOT NULL,
+    target_url VARCHAR(2048) NOT NULL,
+    link_text VARCHAR(1000),
+    is_nofollow BOOLEAN DEFAULT FALSE,
+    is_internal BOOLEAN DEFAULT TRUE,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    FOREIGN KEY (page_id) REFERENCES pages(id) ON DELETE CASCADE,
+    FOREIGN KEY (crawl_job_id) REFERENCES crawl_jobs(id) ON DELETE CASCADE,
+    INDEX idx_page (page_id),
+    INDEX idx_crawl_job (crawl_job_id),
+    INDEX idx_source_url (source_url(255)),
+    INDEX idx_target_url (target_url(255)),
+    INDEX idx_nofollow (is_nofollow)
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
+
+-- Queue Table for parallel processing
+CREATE TABLE IF NOT EXISTS crawl_queue (
+    id INT AUTO_INCREMENT PRIMARY KEY,
+    crawl_job_id INT NOT NULL,
+    url VARCHAR(2048) NOT NULL,
+    depth INT DEFAULT 0,
+    status ENUM('pending', 'processing', 'completed', 'failed') DEFAULT 'pending',
+    retry_count INT DEFAULT 0,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    processed_at TIMESTAMP NULL,
+    FOREIGN KEY (crawl_job_id) REFERENCES crawl_jobs(id) ON DELETE CASCADE,
+    INDEX idx_status (status),
+    INDEX idx_crawl_job (crawl_job_id),
+    UNIQUE KEY unique_job_url (crawl_job_id, url(255))
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
--- a/config/docker/start.sh
+++ b/config/docker/start.sh
@@ -0,0 +1,7 @@
+#!/bin/bash
+
+# Start PHP-FPM
+php-fpm -D
+
+# Start Nginx in foreground
+nginx -g 'daemon off;'
--- a/config/nginx/default.conf
+++ b/config/nginx/default.conf
@@ -0,0 +1,31 @@
+server {
+    listen 80;
+    server_name localhost;
+    root /var/www/html;
+    index index.php index.html;
+
+    error_log /var/log/nginx/error.log;
+    access_log /var/log/nginx/access.log;
+
+    location / {
+        try_files $uri $uri/ /index.php?$query_string;
+    }
+
+    location ~ \.php$ {
+        try_files $uri =404;
+        fastcgi_split_path_info ^(.+\.php)(/.+)$;
+        fastcgi_pass localhost:9000;
+        fastcgi_index index.php;
+        include fastcgi_params;
+        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
+        fastcgi_param PATH_INFO $fastcgi_path_info;
+    }
+
+    location ~ /\.ht {
+        deny all;
+    }
+
+    location ~ /\.git {
+        deny all;
+    }
+}
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -0,0 +1,68 @@
+# Web Crawler - Docker Compose Configuration
+#
+# @copyright Copyright (c) 2025 Martin Kiesewetter
+# @author    Martin Kiesewetter <mki@kies-media.de>
+# @link      https://kies-media.de
+
+version: '3.8'
+
+services:
+  php:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: php_app
+    ports:
+      - "8080:80"
+    volumes:
+      - ./src:/var/www/html
+      - ./tests:/var/www/tests
+      - ./composer.json:/var/www/composer.json
+      - ./composer.lock:/var/www/composer.lock
+      - ./phpstan.neon:/var/www/phpstan.neon
+      - ./phpcs.xml:/var/www/phpcs.xml
+      - ./config/nginx/default.conf:/etc/nginx/conf.d/default.conf
+    depends_on:
+      - mariadb
+    networks:
+      - app-network
+
+  mariadb:
+    image: mariadb:11.5
+    container_name: mariadb_db
+    restart: unless-stopped
+    environment:
+      MYSQL_ROOT_PASSWORD: root_password
+      MYSQL_DATABASE: app_database
+      MYSQL_USER: app_user
+      MYSQL_PASSWORD: app_password
+    ports:
+      - "3307:3306"
+    volumes:
+      - mariadb_data:/var/lib/mysql
+      - ./config/docker/init.sql:/docker-entrypoint-initdb.d/init.sql
+    networks:
+      - app-network
+
+  phpmyadmin:
+    image: phpmyadmin:latest
+    container_name: phpmyadmin
+    restart: unless-stopped
+    environment:
+      PMA_HOST: mariadb
+      PMA_PORT: 3306
+      MYSQL_ROOT_PASSWORD: root_password
+    ports:
+      - "8081:80"
+    depends_on:
+      - mariadb
+    networks:
+      - app-network
+
+networks:
+  app-network:
+    driver: bridge
+
+volumes:
+  mariadb_data:
+    driver: local
--- a/index.php
+++ b/index.php
@@ -1,13 +0,0 @@
-<?php
-declare(strict_types=1);
-
-Error_reporting(E_ALL);
-ini_set('display_errors', 1);
-
-require_once 'webanalyse.php';
-$wa = new webanalyse();
-$db = mysqli_connect("localhost", "root", "", "screaming_frog");
-
-
-$wa-> doCrawl(1);
-
--- a/phpcs.xml
+++ b/phpcs.xml
@@ -0,0 +1,19 @@
+<?xml version="1.0"?>
+<ruleset name="ScreamingFrog">
+    <description>PHP_CodeSniffer configuration</description>
+
+    <!-- Use PSR-12 coding standard -->
+    <rule ref="PSR12"/>
+
+    <!-- Paths to check -->
+    <file>/var/www/html</file>
+    <file>/var/www/tests</file>
+
+    <!-- Exclude vendor directory -->
+    <exclude-pattern>/var/www/html/vendor/*</exclude-pattern>
+    <exclude-pattern>*/vendor/*</exclude-pattern>
+
+    <!-- Show progress and colors -->
+    <arg name="colors"/>
+    <arg value="sp"/>
+</ruleset>
--- a/phpstan.neon
+++ b/phpstan.neon
@@ -0,0 +1,7 @@
+parameters:
+    level: 8
+    paths:
+        - /var/www/html
+        - /var/www/tests
+    excludePaths:
+        - /var/www/html/vendor
--- a/phpunit.xml
+++ b/phpunit.xml
@@ -0,0 +1,21 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:noNamespaceSchemaLocation="https://schema.phpunit.de/11.0/phpunit.xsd"
+         bootstrap="vendor/autoload.php"
+         colors="true"
+         cacheDirectory=".phpunit.cache"
+         testdox="true">
+    <testsuites>
+        <testsuite name="Unit Tests">
+            <directory>tests/Unit</directory>
+        </testsuite>
+        <testsuite name="Integration Tests">
+            <directory>tests/Integration</directory>
+        </testsuite>
+    </testsuites>
+    <source>
+        <include>
+            <directory>src/classes</directory>
+        </include>
+    </source>
+</phpunit>
--- a/setnew.php
+++ b/setnew.php
@@ -1,11 +0,0 @@
-<?php
-$db = mysqli_connect("localhost", "root", "", "screaming_frog");
-
-$db->query("truncate table crawl");
-// $db->query("insert into crawl (start_url, user_id) values ('https://kies-media.de/', 1)");
-$db->query("insert into crawl (start_url, user_id) values ('https://kies-media.de/leistungen/externer-ausbilder-fuer-fachinformatiker/', 1)");
-
-$db->query("truncate table urls");
-$urls = $db->query("insert ignore into urls (id, url, crawl_id) select 1,start_url, id from crawl where id = 1"); #->fetch_all(MYSQLI_ASSOC)
-
-$db->query("truncate table links");
--- a/src/api.php
+++ b/src/api.php
@@ -0,0 +1,317 @@
+<?php
+
+/**
+ * Web Crawler - API Endpoint
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
+require_once __DIR__ . '/vendor/autoload.php';
+
+use App\Database;
+use App\Crawler;
+
+header('Content-Type: application/json');
+
+$db = Database::getInstance();
+
+$action = $_GET['action'] ?? '';
+
+try {
+    switch ($action) {
+        case 'start':
+            $domain = $_POST['domain'] ?? '';
+            if (empty($domain)) {
+                throw new Exception('Domain is required');
+            }
+
+            // Validate and format URL
+            if (!preg_match('/^https?:\/\//', $domain)) {
+                $domain = 'https://' . $domain;
+            }
+
+            // Create crawl job
+            $stmt = $db->prepare("INSERT INTO crawl_jobs (domain, status) VALUES (?, 'pending')");
+            $stmt->execute([$domain]);
+            $jobId = $db->lastInsertId();
+
+            // Start crawling in background (using exec for async)
+            $cmd = "php " . __DIR__ . "/crawler-worker.php $jobId > /dev/null 2>&1 &";
+            exec($cmd);
+
+            echo json_encode([
+                'success' => true,
+                'job_id' => $jobId,
+                'message' => 'Crawl job started'
+            ]);
+            break;
+
+        case 'status':
+            $jobId = $_GET['job_id'] ?? 0;
+            $stmt = $db->prepare("SELECT * FROM crawl_jobs WHERE id = ?");
+            $stmt->execute([$jobId]);
+            $job = $stmt->fetch();
+
+            if (!$job) {
+                throw new Exception('Job not found');
+            }
+
+            // Get queue statistics
+            $stmt = $db->prepare("
+                SELECT
+                    COUNT(*) as total,
+                    SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) as pending,
+                    SUM(CASE WHEN status = 'processing' THEN 1 ELSE 0 END) as processing,
+                    SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as completed,
+                    SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failed
+                FROM crawl_queue
+                WHERE crawl_job_id = ?
+            ");
+            $stmt->execute([$jobId]);
+            $queueStats = $stmt->fetch();
+
+            echo json_encode([
+                'success' => true,
+                'job' => $job,
+                'queue' => $queueStats
+            ]);
+            break;
+
+        case 'jobs':
+            $stmt = $db->query("SELECT * FROM crawl_jobs ORDER BY created_at DESC LIMIT 50");
+            if ($stmt === false) {
+                throw new Exception('Failed to query jobs');
+            }
+            $jobs = $stmt->fetchAll();
+
+            echo json_encode([
+                'success' => true,
+                'jobs' => $jobs
+            ]);
+            break;
+
+        case 'pages':
+            $jobId = $_GET['job_id'] ?? 0;
+            $stmt = $db->prepare("SELECT * FROM pages WHERE crawl_job_id = ? ORDER BY id DESC LIMIT 1000");
+            $stmt->execute([$jobId]);
+            $pages = $stmt->fetchAll();
+
+            echo json_encode([
+                'success' => true,
+                'pages' => $pages
+            ]);
+            break;
+
+        case 'links':
+            $jobId = $_GET['job_id'] ?? 0;
+            $stmt = $db->prepare("SELECT * FROM links WHERE crawl_job_id = ? ORDER BY id DESC LIMIT 1000");
+            $stmt->execute([$jobId]);
+            $links = $stmt->fetchAll();
+
+            echo json_encode([
+                'success' => true,
+                'links' => $links
+            ]);
+            break;
+
+        case 'broken-links':
+            $jobId = $_GET['job_id'] ?? 0;
+            $stmt = $db->prepare(
+                "SELECT * FROM pages " .
+                "WHERE crawl_job_id = ? AND (status_code >= 400 OR status_code = 0) " .
+                "ORDER BY status_code DESC, url"
+            );
+            $stmt->execute([$jobId]);
+            $brokenLinks = $stmt->fetchAll();
+
+            echo json_encode([
+                'success' => true,
+                'broken_links' => $brokenLinks
+            ]);
+            break;
+
+        case 'seo-analysis':
+            $jobId = $_GET['job_id'] ?? 0;
+            $stmt = $db->prepare(
+                "SELECT id, url, title, meta_description, status_code FROM pages " .
+                "WHERE crawl_job_id = ? ORDER BY url"
+            );
+            $stmt->execute([$jobId]);
+            $pages = $stmt->fetchAll();
+
+            $issues = [];
+            foreach ($pages as $page) {
+                $pageIssues = [];
+                $titleLen = mb_strlen($page['title'] ?? '');
+                $descLen = mb_strlen($page['meta_description'] ?? '');
+
+                // Title issues (Google: 50-60 chars optimal)
+                if (empty($page['title'])) {
+                    $pageIssues[] = 'Title missing';
+                } elseif ($titleLen < 30) {
+                    $pageIssues[] = "Title too short ({$titleLen} chars)";
+                } elseif ($titleLen > 60) {
+                    $pageIssues[] = "Title too long ({$titleLen} chars)";
+                }
+
+                // Meta description issues (Google: 120-160 chars optimal)
+                if (empty($page['meta_description'])) {
+                    $pageIssues[] = 'Meta description missing';
+                } elseif ($descLen < 70) {
+                    $pageIssues[] = "Meta description too short ({$descLen} chars)";
+                } elseif ($descLen > 160) {
+                    $pageIssues[] = "Meta description too long ({$descLen} chars)";
+                }
+
+                if (!empty($pageIssues)) {
+                    $issues[] = [
+                        'url' => $page['url'],
+                        'title' => $page['title'],
+                        'title_length' => $titleLen,
+                        'meta_description' => $page['meta_description'],
+                        'meta_length' => $descLen,
+                        'issues' => $pageIssues
+                    ];
+                }
+            }
+
+            // Find duplicates
+            $titleCounts = [];
+            $descCounts = [];
+            foreach ($pages as $page) {
+                if (!empty($page['title'])) {
+                    $titleCounts[$page['title']][] = $page['url'];
+                }
+                if (!empty($page['meta_description'])) {
+                    $descCounts[$page['meta_description']][] = $page['url'];
+                }
+            }
+
+            $duplicates = [];
+            foreach ($titleCounts as $title => $urls) {
+                if (count($urls) > 1) {
+                    $duplicates[] = [
+                        'type' => 'title',
+                        'content' => $title,
+                        'urls' => $urls
+                    ];
+                }
+            }
+            foreach ($descCounts as $desc => $urls) {
+                if (count($urls) > 1) {
+                    $duplicates[] = [
+                        'type' => 'meta_description',
+                        'content' => $desc,
+                        'urls' => $urls
+                    ];
+                }
+            }
+
+            echo json_encode([
+                'success' => true,
+                'issues' => $issues,
+                'duplicates' => $duplicates,
+                'total_pages' => count($pages)
+            ]);
+            break;
+
+        case 'redirects':
+            $jobId = $_GET['job_id'] ?? 0;
+            $stmt = $db->prepare(
+                "SELECT url, title, status_code, redirect_url, redirect_count FROM pages " .
+                "WHERE crawl_job_id = ? AND redirect_count > 0 " .
+                "ORDER BY redirect_count DESC, url"
+            );
+            $stmt->execute([$jobId]);
+            $redirects = $stmt->fetchAll();
+
+            // Count redirect types
+            $permanent = 0;
+            $temporary = 0;
+            $excessive = 0;
+            $maxThreshold = 3; // From Config::MAX_REDIRECT_THRESHOLD
+
+            foreach ($redirects as $redirect) {
+                $code = $redirect['status_code'];
+                if ($code == 301 || $code == 308) {
+                    $permanent++;
+                } elseif ($code == 302 || $code == 303 || $code == 307) {
+                    $temporary++;
+                }
+                if ($redirect['redirect_count'] > $maxThreshold) {
+                    $excessive++;
+                }
+            }
+
+            echo json_encode([
+                'success' => true,
+                'redirects' => $redirects,
+                'stats' => [
+                    'total' => count($redirects),
+                    'permanent' => $permanent,
+                    'temporary' => $temporary,
+                    'excessive' => $excessive,
+                    'threshold' => $maxThreshold
+                ]
+            ]);
+            break;
+
+        case 'delete':
+            $jobId = $_POST['job_id'] ?? 0;
+            $stmt = $db->prepare("DELETE FROM crawl_jobs WHERE id = ?");
+            $stmt->execute([$jobId]);
+
+            echo json_encode([
+                'success' => true,
+                'message' => 'Job deleted'
+            ]);
+            break;
+
+        case 'recrawl':
+            $jobId = $_POST['job_id'] ?? 0;
+            $domain = $_POST['domain'] ?? '';
+
+            if (empty($domain)) {
+                throw new Exception('Domain is required');
+            }
+
+            // Delete all related data for this job
+            $stmt = $db->prepare("DELETE FROM crawl_queue WHERE crawl_job_id = ?");
+            $stmt->execute([$jobId]);
+
+            $stmt = $db->prepare("DELETE FROM links WHERE crawl_job_id = ?");
+            $stmt->execute([$jobId]);
+
+            $stmt = $db->prepare("DELETE FROM pages WHERE crawl_job_id = ?");
+            $stmt->execute([$jobId]);
+
+            // Reset job status
+            $stmt = $db->prepare(
+                "UPDATE crawl_jobs SET status = 'pending', total_pages = 0, total_links = 0, " .
+                "started_at = NULL, completed_at = NULL WHERE id = ?"
+            );
+            $stmt->execute([$jobId]);
+
+            // Start crawling in background
+            $cmd = "php " . __DIR__ . "/crawler-worker.php $jobId > /dev/null 2>&1 &";
+            exec($cmd);
+
+            echo json_encode([
+                'success' => true,
+                'job_id' => $jobId,
+                'message' => 'Recrawl started'
+            ]);
+            break;
+
+        default:
+            throw new Exception('Invalid action');
+    }
+} catch (Exception $e) {
+    http_response_code(400);
+    echo json_encode([
+        'success' => false,
+        'error' => $e->getMessage()
+    ]);
+}
--- a/src/classes/Config.php
+++ b/src/classes/Config.php
@@ -0,0 +1,29 @@
+<?php
+
+/**
+ * Web Crawler - Configuration Class
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
+namespace App;
+
+class Config
+{
+    /**
+     * Maximum number of redirects before warning
+     */
+    public const int MAX_REDIRECT_THRESHOLD = 3;
+
+    /**
+     * Maximum crawl depth
+     */
+    public const int MAX_CRAWL_DEPTH = 50;
+
+    /**
+     * Number of parallel requests
+     */
+    public const int CONCURRENCY = 10;
+}
--- a/src/classes/Crawler.php
+++ b/src/classes/Crawler.php
@@ -0,0 +1,355 @@
+<?php
+
+/**
+ * Web Crawler - Crawler Class
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
+namespace App;
+
+use GuzzleHttp\Client;
+use GuzzleHttp\Pool;
+use GuzzleHttp\Psr7\Request;
+use GuzzleHttp\Exception\RequestException;
+use Symfony\Component\DomCrawler\Crawler as DomCrawler;
+
+class Crawler
+{
+    private \PDO $db;
+    private Client $client;
+    private int $concurrency = 10; // Parallel requests
+    /** @var array<string, bool> */
+    private array $visited = [];
+    private int $crawlJobId;
+    private string $baseDomain;
+
+    public function __construct(int $crawlJobId)
+    {
+        $this->db = Database::getInstance();
+        $this->crawlJobId = $crawlJobId;
+        $this->client = new Client([
+            'timeout' => 30,
+            'verify' => false,
+            'allow_redirects' => [
+                'max' => 10,
+                'track_redirects' => true
+            ],
+            'headers' => [
+                'User-Agent' => 'WebCrawler/1.0'
+            ]
+        ]);
+    }
+
+    public function start(string $startUrl): void
+    {
+        $host = parse_url($startUrl, PHP_URL_HOST);
+        $this->baseDomain = strtolower($host ?: '');
+
+        // Update job status
+        $stmt = $this->db->prepare("UPDATE crawl_jobs SET status = 'running', started_at = NOW() WHERE id = ?");
+        $stmt->execute([$this->crawlJobId]);
+
+        // Normalize and add start URL to queue
+        $normalizedStartUrl = $this->normalizeUrl($startUrl);
+        $this->addToQueue($normalizedStartUrl, 0);
+
+        // Process queue
+        $this->processQueue();
+
+        // Update job status
+        $this->updateJobStats();
+        $stmt = $this->db->prepare("UPDATE crawl_jobs SET status = 'completed', completed_at = NOW() WHERE id = ?");
+        $stmt->execute([$this->crawlJobId]);
+    }
+
+    private function addToQueue(string $url, int $depth): void
+    {
+        if (isset($this->visited[$url])) {
+            return;
+        }
+
+        try {
+            $stmt = $this->db->prepare(
+                "INSERT IGNORE INTO crawl_queue (crawl_job_id, url, depth) VALUES (?, ?, ?)"
+            );
+            $stmt->execute([$this->crawlJobId, $url, $depth]);
+        } catch (\Exception $e) {
+            // URL already in queue
+        }
+    }
+
+    private function processQueue(): void
+    {
+        while (true) {
+            // Get pending URLs
+            $stmt = $this->db->prepare(
+                "SELECT id, url, depth FROM crawl_queue
+                WHERE crawl_job_id = ? AND status = 'pending'
+                LIMIT ?"
+            );
+            $stmt->execute([$this->crawlJobId, $this->concurrency]);
+            $urls = $stmt->fetchAll();
+
+            if (empty($urls)) {
+                break;
+            }
+
+            $this->crawlBatch($urls);
+        }
+    }
+
+    /**
+     * @param array<int, array{id: int, url: string, depth: int}> $urls
+     */
+    private function crawlBatch(array $urls): void
+    {
+        $requests = function () use ($urls) {
+            foreach ($urls as $item) {
+                // Mark as processing
+                $stmt = $this->db->prepare("UPDATE crawl_queue SET status = 'processing' WHERE id = ?");
+                $stmt->execute([$item['id']]);
+
+                yield function () use ($item) {
+                    return $this->client->getAsync($item['url']);
+                };
+            }
+        };
+
+        $pool = new Pool($this->client, $requests(), [
+            'concurrency' => $this->concurrency,
+            'fulfilled' => function ($response, $index) use ($urls) {
+                $item = $urls[$index];
+                $this->handleResponse($item, $response);
+            },
+            'rejected' => function ($reason, $index) use ($urls) {
+                $item = $urls[$index];
+                $this->handleError($item, $reason);
+            },
+        ]);
+
+        $pool->promise()->wait();
+    }
+
+    /**
+     * @param array{id: int, url: string, depth: int} $queueItem
+     * @param \Psr\Http\Message\ResponseInterface $response
+     */
+    private function handleResponse(array $queueItem, $response): void
+    {
+        $url = $queueItem['url'];
+        $depth = $queueItem['depth'];
+
+        $this->visited[$url] = true;
+
+        $statusCode = $response->getStatusCode();
+        $contentType = $response->getHeaderLine('Content-Type');
+        $body = $response->getBody()->getContents();
+
+        // Track redirects
+        $redirectUrl = null;
+        $redirectCount = 0;
+        if ($response->hasHeader('X-Guzzle-Redirect-History')) {
+            $redirectHistory = $response->getHeader('X-Guzzle-Redirect-History');
+            $redirectCount = count($redirectHistory);
+            if ($redirectCount > 0) {
+                $redirectUrl = end($redirectHistory);
+            }
+        }
+
+        // Save page
+        $domCrawler = new DomCrawler($body, $url);
+        $title = $domCrawler->filter('title')->count() > 0
+            ? $domCrawler->filter('title')->text()
+            : '';
+
+        $metaDescription = $domCrawler->filter('meta[name="description"]')->count() > 0
+            ? $domCrawler->filter('meta[name="description"]')->attr('content')
+            : '';
+
+        $stmt = $this->db->prepare(
+            "INSERT INTO pages (crawl_job_id, url, title, meta_description, status_code, " .
+            "content_type, redirect_url, redirect_count) " .
+            "VALUES (?, ?, ?, ?, ?, ?, ?, ?) " .
+            "ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id), status_code = VALUES(status_code), " .
+            "meta_description = VALUES(meta_description), redirect_url = VALUES(redirect_url), " .
+            "redirect_count = VALUES(redirect_count)"
+        );
+
+        $stmt->execute([
+            $this->crawlJobId,
+            $url,
+            $title,
+            $metaDescription,
+            $statusCode,
+            $contentType,
+            $redirectUrl,
+            $redirectCount
+        ]);
+        $pageId = $this->db->lastInsertId();
+
+        // If pageId is 0, fetch it manually
+        if ($pageId == 0 || $pageId === '0') {
+            $stmt = $this->db->prepare("SELECT id FROM pages WHERE crawl_job_id = ? AND url = ?");
+            $stmt->execute([$this->crawlJobId, $url]);
+            $fetchedId = $stmt->fetchColumn();
+            $pageId = is_numeric($fetchedId) ? (int)$fetchedId : 0;
+        }
+
+        // Ensure pageId is an integer
+        $pageId = is_numeric($pageId) ? (int)$pageId : 0;
+
+        // Extract and save links
+        if (str_contains($contentType, 'text/html') && $pageId > 0) {
+            echo "Extracting links from: $url (pageId: $pageId)\n";
+            $this->extractLinks($domCrawler, $url, $pageId, $depth);
+        } else {
+            echo "Skipping link extraction - content type: $contentType\n";
+        }
+
+        // Mark as completed
+        $stmt = $this->db->prepare("UPDATE crawl_queue SET status = 'completed', processed_at = NOW() WHERE id = ?");
+        $stmt->execute([$queueItem['id']]);
+    }
+
+    private function extractLinks(DomCrawler $crawler, string $sourceUrl, int $pageId, int $depth): void
+    {
+        $linkCount = 0;
+        $crawler->filter('a')->each(function (DomCrawler $node) use ($sourceUrl, $pageId, $depth, &$linkCount) {
+            try {
+                $linkCount++;
+                $href = $node->attr('href');
+                if (!$href || $href === '#') {
+                    return;
+                }
+
+                // Convert relative URLs to absolute
+                $targetUrl = $this->makeAbsoluteUrl($href, $sourceUrl);
+
+                // Get link text
+                $linkText = trim($node->text());
+
+                // Check nofollow
+                $rel = $node->attr('rel') ?? '';
+                $isNofollow = str_contains($rel, 'nofollow');
+
+                // Check if internal (same domain, no subdomains)
+                $targetHost = parse_url($targetUrl, PHP_URL_HOST);
+                $targetDomain = strtolower($targetHost ?: '');
+                $isInternal = ($targetDomain === $this->baseDomain);
+
+                // Save link
+                $stmt = $this->db->prepare(
+                    "INSERT INTO links (page_id, crawl_job_id, source_url, target_url, " .
+                    "link_text, is_nofollow, is_internal) VALUES (?, ?, ?, ?, ?, ?, ?)"
+                );
+                $stmt->execute([
+                    $pageId,
+                    $this->crawlJobId,
+                    $sourceUrl,
+                    $targetUrl,
+                    $linkText,
+                    $isNofollow ? 1 : 0,
+                    $isInternal ? 1 : 0
+                ]);
+
+                // Add to queue if internal and not nofollow
+                if ($isInternal && !$isNofollow && $depth < 50) {
+                    // Normalize URL (remove fragment, trailing slash)
+                    $normalizedUrl = $this->normalizeUrl($targetUrl);
+                    $this->addToQueue($normalizedUrl, $depth + 1);
+                }
+            } catch (\Exception $e) {
+                echo "Error processing link: " . $e->getMessage() . "\n";
+            }
+        });
+        echo "Processed $linkCount links from $sourceUrl\n";
+    }
+
+    private function makeAbsoluteUrl(string $url, string $base): string
+    {
+        if (filter_var($url, FILTER_VALIDATE_URL)) {
+            return $url;
+        }
+
+        $parts = parse_url($base);
+        $scheme = $parts['scheme'] ?? 'http';
+        $host = $parts['host'] ?? '';
+        $path = $parts['path'] ?? '/';
+
+        if ($url[0] === '/') {
+            return "$scheme://$host$url";
+        }
+
+        $basePath = substr($path, 0, strrpos($path, '/') + 1);
+        return "$scheme://$host$basePath$url";
+    }
+
+    /**
+     * @param array{id: int, url: string, depth: int} $queueItem
+     * @param \GuzzleHttp\Exception\RequestException $reason
+     */
+    private function handleError(array $queueItem, $reason): void
+    {
+        $stmt = $this->db->prepare(
+            "UPDATE crawl_queue SET status = 'failed', processed_at = NOW(), retry_count = retry_count + 1 WHERE id = ?"
+        );
+        $stmt->execute([$queueItem['id']]);
+    }
+
+    private function updateJobStats(): void
+    {
+        $stmt = $this->db->prepare(
+            "UPDATE crawl_jobs SET
+            total_pages = (SELECT COUNT(*) FROM pages WHERE crawl_job_id = ?),
+            total_links = (SELECT COUNT(*) FROM links WHERE crawl_job_id = ?)
+            WHERE id = ?"
+        );
+        $stmt->execute([$this->crawlJobId, $this->crawlJobId, $this->crawlJobId]);
+    }
+
+    private function normalizeUrl(string $url): string
+    {
+        // Parse URL
+        $parts = parse_url($url);
+
+        if (!$parts) {
+            return $url;
+        }
+
+        // Remove fragment
+        unset($parts['fragment']);
+
+        // Normalize domain (add www if base domain has it, or remove if base doesn't)
+        if (isset($parts['host'])) {
+            // Always convert to lowercase
+            $parts['host'] = strtolower($parts['host']);
+
+            // Match www pattern with base domain
+            $baseHasWww = str_starts_with($this->baseDomain, 'www.');
+            $urlHasWww = str_starts_with($parts['host'], 'www.');
+
+            if ($baseHasWww && !$urlHasWww) {
+                $parts['host'] = 'www.' . $parts['host'];
+            } elseif (!$baseHasWww && $urlHasWww) {
+                $parts['host'] = substr($parts['host'], 4);
+            }
+        }
+
+        // Normalize path - remove trailing slash except for root
+        if (isset($parts['path']) && $parts['path'] !== '/') {
+            $parts['path'] = rtrim($parts['path'], '/');
+        }
+
+        // Rebuild URL
+        $scheme = isset($parts['scheme']) ? $parts['scheme'] . '://' : '';
+        $host = $parts['host'] ?? '';
+        $port = isset($parts['port']) ? ':' . $parts['port'] : '';
+        $path = $parts['path'] ?? '/';
+        $query = isset($parts['query']) ? '?' . $parts['query'] : '';
+
+        return $scheme . $host . $port . $path . $query;
+    }
+}
--- a/src/classes/Database.php
+++ b/src/classes/Database.php
@@ -0,0 +1,44 @@
+<?php
+
+/**
+ * Web Crawler - Database Class
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
+namespace App;
+
+use PDO;
+use PDOException;
+
+class Database
+{
+    private static ?PDO $instance = null;
+
+    private function __construct()
+    {
+    }
+
+    public static function getInstance(): PDO
+    {
+        if (self::$instance === null) {
+            try {
+                self::$instance = new PDO(
+                    "mysql:host=mariadb;dbname=app_database;charset=utf8mb4",
+                    "app_user",
+                    "app_password",
+                    [
+                        PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
+                        PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
+                        PDO::ATTR_EMULATE_PREPARES => false,
+                    ]
+                );
+            } catch (PDOException $e) {
+                throw new \Exception("Database connection failed: " . $e->getMessage());
+            }
+        }
+        return self::$instance;
+    }
+}
--- a/src/composer.json
+++ b/src/composer.json
@@ -0,0 +1,33 @@
+{
+    "_comment": "Web Crawler - Composer Configuration | Copyright (c) 2025 Martin Kiesewetter <mki@kies-media.de> | https://kies-media.de",
+    "name": "web-crawler/app",
+    "description": "Web Crawler Application with Parallel Processing",
+    "type": "project",
+    "require": {
+        "php": "^8.3",
+        "guzzlehttp/guzzle": "^7.8",
+        "symfony/dom-crawler": "^7.0",
+        "symfony/css-selector": "^7.0"
+    },
+    "require-dev": {
+        "phpunit/phpunit": "^11.0",
+        "phpstan/phpstan": "^2.1",
+        "squizlabs/php_codesniffer": "^4.0"
+    },
+    "autoload": {
+        "psr-4": {
+            "App\\": "classes/"
+        }
+    },
+    "autoload-dev": {
+        "psr-4": {
+            "Tests\\": "tests/"
+        }
+    },
+    "scripts": {
+        "test": "phpunit",
+        "phpstan": "phpstan analyse -c ../phpstan.neon --memory-limit=512M",
+        "phpcs": "phpcs --standard=PSR12 --ignore=/var/www/html/vendor /var/www/html /var/www/tests",
+        "phpcbf": "phpcbf --standard=PSR12 --ignore=/var/www/html/vendor /var/www/html /var/www/tests"
+    }
+}
--- a/src/composer.lock
+++ b/src/composer.lock
--- a/src/crawler-worker.php
+++ b/src/crawler-worker.php
@@ -0,0 +1,48 @@
+#!/usr/bin/env php
+<?php
+
+/**
+ * Web Crawler - Background Worker
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+
+require_once __DIR__ . '/vendor/autoload.php';
+
+use App\Database;
+use App\Crawler;
+
+if ($argc < 2) {
+    die("Usage: php crawler-worker.php <job_id>\n");
+}
+
+$jobId = (int)$argv[1];
+
+try {
+    $db = Database::getInstance();
+
+    // Get job details
+    $stmt = $db->prepare("SELECT domain FROM crawl_jobs WHERE id = ?");
+    $stmt->execute([$jobId]);
+    $job = $stmt->fetch();
+
+    if (!$job) {
+        die("Job not found\n");
+    }
+
+    echo "Starting crawl for: {$job['domain']}\n";
+
+    $crawler = new Crawler($jobId);
+    $crawler->start($job['domain']);
+
+    echo "Crawl completed\n";
+} catch (Exception $e) {
+    echo "Error: " . $e->getMessage() . "\n";
+
+    // Mark job as failed
+    $db = Database::getInstance();
+    $stmt = $db->prepare("UPDATE crawl_jobs SET status = 'failed' WHERE id = ?");
+    $stmt->execute([$jobId]);
+}
--- a/src/index.php
+++ b/src/index.php
@@ -0,0 +1,885 @@
+<!DOCTYPE html>
+<!--
+/**
+ * Web Crawler - Main Interface
+ *
+ * @copyright Copyright (c) 2025 Martin Kiesewetter
+ * @author    Martin Kiesewetter <mki@kies-media.de>
+ * @link      https://kies-media.de
+ */
+-->
+<html lang="de">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Web Crawler</title>
+
+    <!-- jQuery -->
+    <script src="https://code.jquery.com/jquery-3.7.1.min.js"></script>
+
+    <!-- DataTables CSS -->
+    <link rel="stylesheet" href="https://cdn.datatables.net/1.13.7/css/jquery.dataTables.min.css">
+
+    <!-- DataTables JS -->
+    <script src="https://cdn.datatables.net/1.13.7/js/jquery.dataTables.min.js"></script>
+
+    <style>
+        * {
+            margin: 0;
+            padding: 0;
+            box-sizing: border-box;
+        }
+
+        body {
+            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
+            background: #ffe4e9;
+            padding: 20px;
+        }
+
+        .container {
+            max-width: 1400px;
+            margin: 0 auto;
+        }
+
+        h1 {
+            color: #2c3e50;
+            margin-bottom: 30px;
+        }
+
+        .card {
+            background: white;
+            border-radius: 8px;
+            padding: 25px;
+            margin-bottom: 20px;
+            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+        }
+
+        .input-group {
+            display: flex;
+            gap: 10px;
+            margin-bottom: 20px;
+        }
+
+        input[type="text"] {
+            flex: 1;
+            padding: 12px 16px;
+            border: 2px solid #e0e0e0;
+            border-radius: 6px;
+            font-size: 16px;
+        }
+
+        input[type="text"]:focus {
+            outline: none;
+            border-color: #3498db;
+        }
+
+        button {
+            padding: 12px 24px;
+            background: #3498db;
+            color: white;
+            border: none;
+            border-radius: 6px;
+            font-size: 16px;
+            cursor: pointer;
+            transition: background 0.3s;
+        }
+
+        button:hover {
+            background: #2980b9;
+        }
+
+        button:disabled {
+            background: #bdc3c7;
+            cursor: not-allowed;
+        }
+
+        .status {
+            display: inline-block;
+            padding: 4px 12px;
+            border-radius: 4px;
+            font-size: 12px;
+            font-weight: 600;
+            text-transform: uppercase;
+        }
+
+        .status.pending { background: #f39c12; color: white; }
+        .status.running { background: #3498db; color: white; }
+        .status.completed { background: #27ae60; color: white; }
+        .status.failed { background: #e74c3c; color: white; }
+
+        table {
+            width: 100%;
+            border-collapse: collapse;
+        }
+
+        th, td {
+            padding: 12px;
+            text-align: left;
+            border-bottom: 1px solid #ecf0f1;
+        }
+
+        th {
+            background: #f8f9fa;
+            font-weight: 600;
+            color: #2c3e50;
+        }
+
+        tr:hover {
+            background: #f8f9fa;
+        }
+
+        .tabs {
+            display: flex;
+            gap: 10px;
+            margin-bottom: 20px;
+            border-bottom: 2px solid #ecf0f1;
+        }
+
+        .tab {
+            padding: 12px 20px;
+            background: none;
+            border: none;
+            border-bottom: 3px solid transparent;
+            cursor: pointer;
+            color: #7f8c8d;
+            font-weight: 500;
+        }
+
+        .tab.active {
+            color: #3498db;
+            border-bottom-color: #3498db;
+        }
+
+        .tab-content {
+            display: none;
+        }
+
+        .tab-content.active {
+            display: block;
+        }
+
+        .stats {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
+            gap: 15px;
+            margin-top: 15px;
+        }
+
+        .stat-box {
+            background: #ecf0f1;
+            padding: 15px;
+            border-radius: 6px;
+        }
+
+        .stat-label {
+            font-size: 12px;
+            color: #7f8c8d;
+            text-transform: uppercase;
+            margin-bottom: 5px;
+        }
+
+        .stat-value {
+            font-size: 24px;
+            font-weight: 700;
+            color: #2c3e50;
+        }
+
+        .stat-sublabel {
+            font-size: 11px;
+            color: #95a5a6;
+            margin-top: 3px;
+        }
+
+        .nofollow {
+            color: #e74c3c;
+            font-weight: 600;
+        }
+
+        .external {
+            color: #3498db;
+        }
+
+        .loading {
+            text-align: center;
+            padding: 40px;
+            color: #7f8c8d;
+        }
+
+        .action-btn {
+            padding: 6px 12px;
+            font-size: 14px;
+            margin-right: 5px;
+        }
+
+        .url-cell {
+            max-width: 400px;
+            overflow: hidden;
+            text-overflow: ellipsis;
+            white-space: nowrap;
+        }
+
+        /* DataTables Styling */
+        .dataTables_wrapper {
+            padding: 20px 0;
+        }
+
+        .dataTables_filter input {
+            padding: 8px;
+            border: 2px solid #e0e0e0;
+            border-radius: 6px;
+            margin-left: 10px;
+        }
+
+        .dataTables_length select {
+            padding: 6px;
+            border: 2px solid #e0e0e0;
+            border-radius: 6px;
+            margin: 0 10px;
+        }
+
+        .dataTables_info {
+            padding-top: 10px;
+            color: #7f8c8d;
+        }
+
+        .dataTables_paginate {
+            padding-top: 10px;
+        }
+
+        .dataTables_paginate .paginate_button {
+            padding: 6px 12px;
+            margin: 0 2px;
+            border: 1px solid #e0e0e0;
+            border-radius: 4px;
+            background: white;
+            cursor: pointer;
+        }
+
+        .dataTables_paginate .paginate_button.current {
+            background: #3498db;
+            color: white !important;
+            border-color: #3498db;
+        }
+
+        .dataTables_paginate .paginate_button:hover {
+            background: #ecf0f1;
+        }
+
+        .dataTables_paginate .paginate_button.disabled {
+            cursor: not-allowed;
+            opacity: 0.5;
+        }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <h1>🕷️ Web Crawler</h1>
+
+        <div class="card">
+            <h2>Neue Domain crawlen</h2>
+            <div class="input-group">
+                <input type="text" id="domainInput" placeholder="example.com oder https://example.com" />
+                <button onclick="startCrawl()">Crawl starten</button>
+            </div>
+        </div>
+
+        <div class="card">
+            <h2>Crawl Jobs</h2>
+            <table id="jobsTable" class="display">
+                <thead>
+                    <tr>
+                        <th>ID</th>
+                        <th>Domain</th>
+                        <th>Status</th>
+                        <th>Seiten</th>
+                        <th>Links</th>
+                        <th>Gestartet</th>
+                        <th>Aktionen</th>
+                    </tr>
+                </thead>
+                <tbody id="jobsBody">
+                    <tr><td colspan="7" class="loading">Lade...</td></tr>
+                </tbody>
+            </table>
+        </div>
+
+        <div id="jobDetails" style="display: none;">
+            <div class="card">
+                <h2>Job Details: <span id="jobDomain"></span></h2>
+
+                <div class="stats" id="jobStats"></div>
+
+                <div class="tabs">
+                    <button class="tab active" onclick="switchTab('pages')">Seiten</button>
+                    <button class="tab" onclick="switchTab('links')">Links</button>
+                    <button class="tab" onclick="switchTab('broken')">Broken Links</button>
+                    <button class="tab" onclick="switchTab('redirects')">Redirects</button>
+                    <button class="tab" onclick="switchTab('seo')">SEO Analysis</button>
+                </div>
+
+                <div class="tab-content active" id="pages-tab">
+                    <table id="pagesTable" class="display">
+                        <thead>
+                            <tr>
+                                <th>URL</th>
+                                <th>Titel</th>
+                                <th>Status</th>
+                                <th>Gecrawlt</th>
+                            </tr>
+                        </thead>
+                        <tbody id="pagesBody">
+                            <tr><td colspan="4" class="loading">Keine Seiten gefunden</td></tr>
+                        </tbody>
+                    </table>
+                </div>
+
+                <div class="tab-content" id="links-tab">
+                    <table id="linksTable" class="display">
+                        <thead>
+                            <tr>
+                                <th>Von</th>
+                                <th>Nach</th>
+                                <th>Link-Text</th>
+                                <th>Nofollow</th>
+                                <th>Typ</th>
+                            </tr>
+                        </thead>
+                        <tbody id="linksBody">
+                            <tr><td colspan="5" class="loading">Keine Links gefunden</td></tr>
+                        </tbody>
+                    </table>
+                </div>
+
+                <div class="tab-content" id="broken-tab">
+                    <table id="brokenTable" class="display">
+                        <thead>
+                            <tr>
+                                <th>URL</th>
+                                <th>Status Code</th>
+                                <th>Titel</th>
+                                <th>Gecrawlt</th>
+                            </tr>
+                        </thead>
+                        <tbody id="brokenBody">
+                            <tr><td colspan="4" class="loading">Keine defekten Links gefunden</td></tr>
+                        </tbody>
+                    </table>
+                </div>
+
+                <div class="tab-content" id="redirects-tab">
+                    <h3>Redirect Statistics</h3>
+                    <div id="redirectStats" class="stats" style="margin-bottom: 20px;"></div>
+                    <table id="redirectsTable" class="display">
+                        <thead>
+                            <tr>
+                                <th>URL</th>
+                                <th>Redirect To</th>
+                                <th>Status Code</th>
+                                <th>Redirect Count</th>
+                                <th>Type</th>
+                            </tr>
+                        </thead>
+                        <tbody id="redirectsBody">
+                            <tr><td colspan="5" class="loading">Keine Redirects gefunden</td></tr>
+                        </tbody>
+                    </table>
+                </div>
+
+                <div class="tab-content" id="seo-tab">
+                    <h3>SEO Issues</h3>
+                    <div id="seoStats" style="margin-bottom: 20px;"></div>
+                    <table id="seoTable" class="display">
+                        <thead>
+                            <tr>
+                                <th>URL</th>
+                                <th>Title (Länge)</th>
+                                <th>Meta Description (Länge)</th>
+                                <th>Issues</th>
+                            </tr>
+                        </thead>
+                        <tbody id="seoIssuesBody">
+                            <tr><td colspan="4" class="loading">Keine SEO-Probleme gefunden</td></tr>
+                        </tbody>
+                    </table>
+
+                    <h3 style="margin-top: 30px;">Duplicate Content</h3>
+                    <div id="seoDuplicatesBody"></div>
+                </div>
+            </div>
+        </div>
+    </div>
+
+    <script>
+        let currentJobId = null;
+        let refreshInterval = null;
+
+        async function startCrawl() {
+            const domain = document.getElementById('domainInput').value.trim();
+            if (!domain) {
+                alert('Bitte Domain eingeben');
+                return;
+            }
+
+            const formData = new FormData();
+            formData.append('domain', domain);
+
+            try {
+                const response = await fetch('/api.php?action=start', {
+                    method: 'POST',
+                    body: formData
+                });
+                const data = await response.json();
+
+                if (data.success) {
+                    document.getElementById('domainInput').value = '';
+                    loadJobs();
+                    alert('Crawl gestartet! Job ID: ' + data.job_id);
+                } else {
+                    alert('Fehler: ' + data.error);
+                }
+            } catch (e) {
+                alert('Fehler beim Starten: ' + e.message);
+            }
+        }
+
+        let jobsDataTable = null;
+
+        async function loadJobs() {
+            try {
+                const response = await fetch('/api.php?action=jobs');
+                const data = await response.json();
+
+                if (data.success) {
+                    // Destroy existing DataTable if it exists
+                    if (jobsDataTable) {
+                        jobsDataTable.destroy();
+                    }
+
+                    const tbody = document.getElementById('jobsBody');
+                    tbody.innerHTML = data.jobs.map(job => `
+                        <tr>
+                            <td>${job.id}</td>
+                            <td>${job.domain}</td>
+                            <td><span class="status ${job.status}">${job.status}</span></td>
+                            <td>${job.total_pages}</td>
+                            <td>${job.total_links}</td>
+                            <td>${job.started_at || '-'}</td>
+                            <td>
+                                <button class="action-btn" onclick="viewJob(${job.id})">Ansehen</button>
+                                <button class="action-btn" onclick="recrawlJob(${job.id}, '${job.domain}')">Recrawl</button>
+                                <button class="action-btn" onclick="deleteJob(${job.id})">Löschen</button>
+                            </td>
+                        </tr>
+                    `).join('');
+
+                    // Initialize DataTable
+                    jobsDataTable = $('#jobsTable').DataTable({
+                        pageLength: 25,
+                        order: [[0, 'desc']],
+                        language: {
+                            search: 'Suchen:',
+                            lengthMenu: 'Zeige _MENU_ Einträge',
+                            info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                            infoEmpty: 'Keine Einträge verfügbar',
+                            infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                            paginate: {
+                                first: 'Erste',
+                                last: 'Letzte',
+                                next: 'Nächste',
+                                previous: 'Vorherige'
+                            }
+                        }
+                    });
+                }
+            } catch (e) {
+                console.error('Fehler beim Laden der Jobs:', e);
+            }
+        }
+
+        async function viewJob(jobId) {
+            currentJobId = jobId;
+            document.getElementById('jobDetails').style.display = 'block';
+
+            // Start auto-refresh every 1 second
+            if (refreshInterval) clearInterval(refreshInterval);
+            loadJobDetails();
+            refreshInterval = setInterval(loadJobDetails, 1000);
+        }
+
+        async function loadJobDetails() {
+            if (!currentJobId) return;
+
+            try {
+                // Load job status
+                const statusResponse = await fetch(`/api.php?action=status&job_id=${currentJobId}`);
+                const statusData = await statusResponse.json();
+
+                if (statusData.success) {
+                    const job = statusData.job;
+                    const queue = statusData.queue;
+                    document.getElementById('jobDomain').textContent = job.domain;
+
+                    const queueInfo = queue ? `
+                        <div class="stat-box">
+                            <div class="stat-label">Warteschlange</div>
+                            <div class="stat-value">${queue.pending || 0}</div>
+                            <div class="stat-sublabel">noch zu crawlen</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Verarbeitet</div>
+                            <div class="stat-value">${queue.completed || 0}</div>
+                            <div class="stat-sublabel">abgeschlossen</div>
+                        </div>
+                    ` : '';
+
+                    document.getElementById('jobStats').innerHTML = `
+                        <div class="stat-box">
+                            <div class="stat-label">Status</div>
+                            <div class="stat-value"><span class="status ${job.status}">${job.status}</span></div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Seiten</div>
+                            <div class="stat-value">${job.total_pages}</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Links</div>
+                            <div class="stat-value">${job.total_links}</div>
+                        </div>
+                        ${queueInfo}
+                    `;
+
+                    // Stop refresh if completed or failed
+                    if (job.status === 'completed' || job.status === 'failed') {
+                        if (refreshInterval) {
+                            clearInterval(refreshInterval);
+                            refreshInterval = null;
+                        }
+                    }
+                }
+
+                // Load pages
+                const pagesResponse = await fetch(`/api.php?action=pages&job_id=${currentJobId}`);
+                const pagesData = await pagesResponse.json();
+
+                if ($.fn.DataTable.isDataTable('#pagesTable')) {
+                    $('#pagesTable').DataTable().destroy();
+                }
+
+                if (pagesData.success && pagesData.pages.length > 0) {
+                    document.getElementById('pagesBody').innerHTML = pagesData.pages.map(page => `
+                        <tr>
+                            <td class="url-cell" title="${page.url}">${page.url}</td>
+                            <td>${page.title || '-'}</td>
+                            <td>${page.status_code}</td>
+                            <td>${page.crawled_at}</td>
+                        </tr>
+                    `).join('');
+
+                    $('#pagesTable').DataTable({
+                        pageLength: 50,
+                        language: {
+                            search: 'Suchen:',
+                            lengthMenu: 'Zeige _MENU_ Einträge',
+                            info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                            infoEmpty: 'Keine Einträge verfügbar',
+                            infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                            paginate: {
+                                first: 'Erste',
+                                last: 'Letzte',
+                                next: 'Nächste',
+                                previous: 'Vorherige'
+                            }
+                        }
+                    });
+                }
+
+                // Load links
+                const linksResponse = await fetch(`/api.php?action=links&job_id=${currentJobId}`);
+                const linksData = await linksResponse.json();
+
+                if ($.fn.DataTable.isDataTable('#linksTable')) {
+                    $('#linksTable').DataTable().destroy();
+                }
+
+                if (linksData.success && linksData.links.length > 0) {
+                    document.getElementById('linksBody').innerHTML = linksData.links.map(link => `
+                        <tr>
+                            <td class="url-cell" title="${link.source_url}">${link.source_url}</td>
+                            <td class="url-cell" title="${link.target_url}">${link.target_url}</td>
+                            <td>${link.link_text || '-'}</td>
+                            <td>${link.is_nofollow ? '<span class="nofollow">Ja</span>' : 'Nein'}</td>
+                            <td>${link.is_internal ? 'Intern' : '<span class="external">Extern</span>'}</td>
+                        </tr>
+                    `).join('');
+
+                    $('#linksTable').DataTable({
+                        pageLength: 50,
+                        language: {
+                            search: 'Suchen:',
+                            lengthMenu: 'Zeige _MENU_ Einträge',
+                            info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                            infoEmpty: 'Keine Einträge verfügbar',
+                            infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                            paginate: {
+                                first: 'Erste',
+                                last: 'Letzte',
+                                next: 'Nächste',
+                                previous: 'Vorherige'
+                            }
+                        }
+                    });
+                }
+
+                // Load broken links
+                const brokenResponse = await fetch(`/api.php?action=broken-links&job_id=${currentJobId}`);
+                const brokenData = await brokenResponse.json();
+
+                if ($.fn.DataTable.isDataTable('#brokenTable')) {
+                    $('#brokenTable').DataTable().destroy();
+                }
+
+                if (brokenData.success && brokenData.broken_links.length > 0) {
+                    document.getElementById('brokenBody').innerHTML = brokenData.broken_links.map(page => `
+                        <tr>
+                            <td class="url-cell" title="${page.url}">${page.url}</td>
+                            <td><span class="status failed">${page.status_code || 'Error'}</span></td>
+                            <td>${page.title || '-'}</td>
+                            <td>${page.crawled_at}</td>
+                        </tr>
+                    `).join('');
+
+                    $('#brokenTable').DataTable({
+                        pageLength: 25,
+                        language: {
+                            search: 'Suchen:',
+                            lengthMenu: 'Zeige _MENU_ Einträge',
+                            info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                            infoEmpty: 'Keine Einträge verfügbar',
+                            infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                            paginate: {
+                                first: 'Erste',
+                                last: 'Letzte',
+                                next: 'Nächste',
+                                previous: 'Vorherige'
+                            }
+                        }
+                    });
+                } else {
+                    document.getElementById('brokenBody').innerHTML = '<tr><td colspan="4" class="loading">Keine defekten Links gefunden</td></tr>';
+                }
+
+                // Load SEO analysis
+                const seoResponse = await fetch(`/api.php?action=seo-analysis&job_id=${currentJobId}`);
+                const seoData = await seoResponse.json();
+
+                if (seoData.success) {
+                    // SEO Stats
+                    document.getElementById('seoStats').innerHTML = `
+                        <div class="stat-box">
+                            <div class="stat-label">Total Pages</div>
+                            <div class="stat-value">${seoData.total_pages}</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Pages with Issues</div>
+                            <div class="stat-value">${seoData.issues.length}</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Duplicates Found</div>
+                            <div class="stat-value">${seoData.duplicates.length}</div>
+                        </div>
+                    `;
+
+                    // SEO Issues
+                    if ($.fn.DataTable.isDataTable('#seoTable')) {
+                        $('#seoTable').DataTable().destroy();
+                    }
+
+                    if (seoData.issues.length > 0) {
+                        document.getElementById('seoIssuesBody').innerHTML = seoData.issues.map(item => `
+                            <tr>
+                                <td class="url-cell" title="${item.url}">${item.url}</td>
+                                <td>${item.title || '-'} (${item.title_length})</td>
+                                <td>${item.meta_description ? item.meta_description.substring(0, 50) + '...' : '-'} (${item.meta_length})</td>
+                                <td><span class="nofollow">${item.issues.join(', ')}</span></td>
+                            </tr>
+                        `).join('');
+
+                        $('#seoTable').DataTable({
+                            pageLength: 25,
+                            language: {
+                                search: 'Suchen:',
+                                lengthMenu: 'Zeige _MENU_ Einträge',
+                                info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                                infoEmpty: 'Keine Einträge verfügbar',
+                                infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                                paginate: {
+                                    first: 'Erste',
+                                    last: 'Letzte',
+                                    next: 'Nächste',
+                                    previous: 'Vorherige'
+                                }
+                            }
+                        });
+                    } else {
+                        document.getElementById('seoIssuesBody').innerHTML = '<tr><td colspan="4" class="loading">Keine SEO-Probleme gefunden</td></tr>';
+                    }
+
+                    // Duplicates
+                    if (seoData.duplicates.length > 0) {
+                        document.getElementById('seoDuplicatesBody').innerHTML = seoData.duplicates.map(dup => `
+                            <div class="stat-box" style="margin-bottom: 15px;">
+                                <div class="stat-label">Duplicate ${dup.type}</div>
+                                <div style="font-size: 14px; margin: 10px 0;"><strong>${dup.content}</strong></div>
+                                <div style="font-size: 12px;">Found on ${dup.urls.length} pages:</div>
+                                <ul style="margin-top: 5px; font-size: 12px;">
+                                    ${dup.urls.map(url => `<li>${url}</li>`).join('')}
+                                </ul>
+                            </div>
+                        `).join('');
+                    } else {
+                        document.getElementById('seoDuplicatesBody').innerHTML = '<p>Keine doppelten Inhalte gefunden</p>';
+                    }
+                }
+
+                // Load redirects
+                const redirectsResponse = await fetch(`/api.php?action=redirects&job_id=${currentJobId}`);
+                const redirectsData = await redirectsResponse.json();
+
+                if (redirectsData.success) {
+                    const stats = redirectsData.stats;
+
+                    // Redirect Stats
+                    document.getElementById('redirectStats').innerHTML = `
+                        <div class="stat-box">
+                            <div class="stat-label">Total Redirects</div>
+                            <div class="stat-value">${stats.total}</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Permanent (301/308)</div>
+                            <div class="stat-value">${stats.permanent}</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Temporary (302/303/307)</div>
+                            <div class="stat-value">${stats.temporary}</div>
+                        </div>
+                        <div class="stat-box">
+                            <div class="stat-label">Excessive (>${stats.threshold})</div>
+                            <div class="stat-value" style="color: ${stats.excessive > 0 ? '#e74c3c' : '#27ae60'}">${stats.excessive}</div>
+                            <div class="stat-sublabel">threshold: ${stats.threshold}</div>
+                        </div>
+                    `;
+
+                    // Redirect Table
+                    if ($.fn.DataTable.isDataTable('#redirectsTable')) {
+                        $('#redirectsTable').DataTable().destroy();
+                    }
+
+                    if (redirectsData.redirects.length > 0) {
+                        document.getElementById('redirectsBody').innerHTML = redirectsData.redirects.map(redirect => {
+                            const isExcessive = redirect.redirect_count > stats.threshold;
+                            const isPermRedirect = redirect.status_code == 301 || redirect.status_code == 308;
+                            const redirectType = isPermRedirect ? 'Permanent' : 'Temporary';
+
+                            return `
+                                <tr style="${isExcessive ? 'background-color: #fff3cd;' : ''}">
+                                    <td class="url-cell" title="${redirect.url}">${redirect.url}</td>
+                                    <td class="url-cell" title="${redirect.redirect_url || '-'}">${redirect.redirect_url || '-'}</td>
+                                    <td><span class="status ${isPermRedirect ? 'completed' : 'running'}">${redirect.status_code}</span></td>
+                                    <td><strong ${isExcessive ? 'style="color: #e74c3c;"' : ''}>${redirect.redirect_count}</strong></td>
+                                    <td>${redirectType}</td>
+                                </tr>
+                            `;
+                        }).join('');
+
+                        $('#redirectsTable').DataTable({
+                            pageLength: 25,
+                            language: {
+                                search: 'Suchen:',
+                                lengthMenu: 'Zeige _MENU_ Einträge',
+                                info: 'Zeige _START_ bis _END_ von _TOTAL_ Einträgen',
+                                infoEmpty: 'Keine Einträge verfügbar',
+                                infoFiltered: '(gefiltert von _MAX_ Einträgen)',
+                                paginate: {
+                                    first: 'Erste',
+                                    last: 'Letzte',
+                                    next: 'Nächste',
+                                    previous: 'Vorherige'
+                                }
+                            }
+                        });
+                    } else {
+                        document.getElementById('redirectsBody').innerHTML = '<tr><td colspan="5" class="loading">Keine Redirects gefunden</td></tr>';
+                    }
+                }
+
+                // Update jobs table
+                loadJobs();
+            } catch (e) {
+                console.error('Fehler beim Laden der Details:', e);
+            }
+        }
+
+        async function deleteJob(jobId) {
+            if (!confirm('Job wirklich löschen?')) return;
+
+            const formData = new FormData();
+            formData.append('job_id', jobId);
+
+            try {
+                const response = await fetch('/api.php?action=delete', {
+                    method: 'POST',
+                    body: formData
+                });
+                const data = await response.json();
+
+                if (data.success) {
+                    loadJobs();
+                    if (currentJobId === jobId) {
+                        document.getElementById('jobDetails').style.display = 'none';
+                        currentJobId = null;
+                    }
+                }
+            } catch (e) {
+                alert('Fehler beim Löschen: ' + e.message);
+            }
+        }
+
+        async function recrawlJob(jobId, domain) {
+            if (!confirm('Job-Ergebnisse löschen und neu crawlen?')) return;
+
+            const formData = new FormData();
+            formData.append('job_id', jobId);
+            formData.append('domain', domain);
+
+            try {
+                const response = await fetch('/api.php?action=recrawl', {
+                    method: 'POST',
+                    body: formData
+                });
+                const data = await response.json();
+
+                if (data.success) {
+                    loadJobs();
+                    alert('Recrawl gestartet! Job ID: ' + data.job_id);
+                } else {
+                    alert('Fehler: ' + data.error);
+                }
+            } catch (e) {
+                alert('Fehler beim Recrawl: ' + e.message);
+            }
+        }
+
+        function switchTab(tab) {
+            document.querySelectorAll('.tab').forEach(t => t.classList.remove('active'));
+            document.querySelectorAll('.tab-content').forEach(c => c.classList.remove('active'));
+
+            event.target.classList.add('active');
+            document.getElementById(tab + '-tab').classList.add('active');
+        }
+
+        // Initial load
+        loadJobs();
+        setInterval(loadJobs, 5000);
+    </script>
+</body>
+</html>
--- a/tests/Integration/CrawlerIntegrationTest.php
+++ b/tests/Integration/CrawlerIntegrationTest.php
@@ -0,0 +1,67 @@
+<?php
+
+namespace Tests\Integration;
+
+use PHPUnit\Framework\TestCase;
+use App\Crawler;
+use App\Database;
+
+class CrawlerIntegrationTest extends TestCase
+{
+    private int $testJobId;
+    private \PDO $db;
+
+    protected function setUp(): void
+    {
+        $this->db = Database::getInstance();
+
+        // Create a test job
+        $stmt = $this->db->prepare("INSERT INTO crawl_jobs (domain, status) VALUES (?, 'pending')");
+        $stmt->execute(['https://httpbin.org']);
+        $lastId = $this->db->lastInsertId();
+        $this->testJobId = is_numeric($lastId) ? (int)$lastId : 0;
+    }
+
+    protected function tearDown(): void
+    {
+        // Clean up test data
+        $stmt = $this->db->prepare("DELETE FROM crawl_jobs WHERE id = ?");
+        $stmt->execute([$this->testJobId]);
+    }
+
+    public function testCrawlerUpdatesJobStatusToRunning(): void
+    {
+        $crawler = new Crawler($this->testJobId);
+
+        // Start crawl (will fail but should update status)
+        try {
+            $crawler->start('https://httpbin.org/html');
+        } catch (\Exception $e) {
+            // Expected to fail in test environment
+        }
+
+        $stmt = $this->db->prepare("SELECT status FROM crawl_jobs WHERE id = ?");
+        $stmt->execute([$this->testJobId]);
+        $job = $stmt->fetch();
+
+        // Status should be either 'running' or 'completed'
+        $this->assertContains($job['status'], ['running', 'completed', 'failed']);
+    }
+
+    public function testCrawlerCreatesQueueEntries(): void
+    {
+        $crawler = new Crawler($this->testJobId);
+
+        try {
+            $crawler->start('https://httpbin.org/html');
+        } catch (\Exception $e) {
+            // Expected to fail in test environment
+        }
+
+        $stmt = $this->db->prepare("SELECT COUNT(*) as count FROM crawl_queue WHERE crawl_job_id = ?");
+        $stmt->execute([$this->testJobId]);
+        $result = $stmt->fetch();
+
+        $this->assertGreaterThan(0, $result['count']);
+    }
+}
--- a/tests/Unit/CrawlerTest.php
+++ b/tests/Unit/CrawlerTest.php
@@ -0,0 +1,49 @@
+<?php
+
+namespace Tests\Unit;
+
+use PHPUnit\Framework\TestCase;
+use App\Crawler;
+use App\Database;
+
+class CrawlerTest extends TestCase
+{
+    private int $testJobId;
+
+    protected function setUp(): void
+    {
+        $db = Database::getInstance();
+
+        // Create a test job
+        $stmt = $db->prepare("INSERT INTO crawl_jobs (domain, status) VALUES (?, 'pending')");
+        $stmt->execute(['https://example.com']);
+        $lastId = $db->lastInsertId();
+        $this->testJobId = is_numeric($lastId) ? (int)$lastId : 0;
+    }
+
+    protected function tearDown(): void
+    {
+        $db = Database::getInstance();
+
+        // Clean up test data
+        $stmt = $db->prepare("DELETE FROM crawl_jobs WHERE id = ?");
+        $stmt->execute([$this->testJobId]);
+    }
+
+    public function testCrawlerCanBeInstantiated(): void
+    {
+        $crawler = new Crawler($this->testJobId);
+        $this->assertInstanceOf(Crawler::class, $crawler);
+    }
+
+    public function testCrawlerCreatesJobWithCorrectStatus(): void
+    {
+        $db = Database::getInstance();
+
+        $stmt = $db->prepare("SELECT status FROM crawl_jobs WHERE id = ?");
+        $stmt->execute([$this->testJobId]);
+        $job = $stmt->fetch();
+
+        $this->assertEquals('pending', $job['status']);
+    }
+}
--- a/tests/Unit/DatabaseTest.php
+++ b/tests/Unit/DatabaseTest.php
@@ -0,0 +1,58 @@
+<?php
+
+namespace Tests\Unit;
+
+use PHPUnit\Framework\TestCase;
+use App\Database;
+use PDO;
+
+class DatabaseTest extends TestCase
+{
+    public function testGetInstanceReturnsPDO(): void
+    {
+        $db = Database::getInstance();
+        $this->assertInstanceOf(PDO::class, $db);
+    }
+
+    public function testGetInstanceReturnsSameInstance(): void
+    {
+        $db1 = Database::getInstance();
+        $db2 = Database::getInstance();
+        $this->assertSame($db1, $db2);
+    }
+
+    public function testDatabaseConnectionHasCorrectAttributes(): void
+    {
+        $db = Database::getInstance();
+
+        // Test error mode
+        $this->assertEquals(
+            PDO::ERRMODE_EXCEPTION,
+            $db->getAttribute(PDO::ATTR_ERRMODE)
+        );
+
+        // Test fetch mode
+        $this->assertEquals(
+            PDO::FETCH_ASSOC,
+            $db->getAttribute(PDO::ATTR_DEFAULT_FETCH_MODE)
+        );
+    }
+
+    public function testCanExecuteQuery(): void
+    {
+        $db = Database::getInstance();
+        $stmt = $db->query('SELECT 1 as test');
+        $this->assertNotFalse($stmt, 'Query failed');
+        $result = $stmt->fetch();
+
+        $this->assertEquals(['test' => 1], $result);
+    }
+
+    public function testCanPrepareStatement(): void
+    {
+        $db = Database::getInstance();
+        $stmt = $db->prepare('SELECT ? as test');
+
+        $this->assertInstanceOf(\PDOStatement::class, $stmt);
+    }
+}
--- a/webanalyse.php
+++ b/webanalyse.php
@@ -1,457 +0,0 @@
-<?php
-
-
-/**
- * Klasse uebernimmt das Crawlen von Websites und persistiert Metadaten in MySQL.
- */
-class webanalyse
-{
-    /**
-     * @var mysqli|null Verbindung zur Screaming Frog Datenbank.
-     */
-    var $db;
-
-    /**
-     * Initialisiert die Datenbankverbindung fuer die Crawl-Session.
-     */
-    function __construct()
-    {
-        $this->db = mysqli_connect("localhost", "root", "", "screaming_frog");
-    }
-
-
-    /**
-     * Holt eine einzelne URL via cURL und liefert Response-Metadaten.
-     *
-     * @param string $url Zieladresse fuer den Abruf.
-     * @return array<string,mixed> Antwortdaten oder ein "error"-Schluessel.
-     */
-    function getWebsite($url)
-    {
-        // cURL-Session initialisieren
-        $ch = curl_init();
-
-        // cURL-Optionen setzen
-        curl_setopt($ch, CURLOPT_URL, $url);
-        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);  // Antwort als String zurückgeben
-        curl_setopt($ch, CURLOPT_HEADER, true);          // Header in der Antwort einschließen
-        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);  // Weiterleitungen folgen
-        curl_setopt($ch, CURLOPT_TIMEOUT, 30);           // Timeout nach 30 Sekunden
-        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'); // User Agent setzen
-        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // SSL-Zertifikat nicht prüfen (nur für Tests)
-
-        // Anfrage ausführen
-        $response = curl_exec($ch);
-
-        // Fehler überprüfen
-        if (curl_errno($ch)) {
-            $error = curl_error($ch);
-            curl_close($ch);
-            return ['error' => $error];
-        }
-
-        // Informationen abrufen
-        $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
-        $headerSize = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
-        $totalTime = curl_getinfo($ch, CURLINFO_TOTAL_TIME);
-        $effectiveUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
-
-        // cURL-Session schließen
-        curl_close($ch);
-
-        // Header und Body trennen
-        $headers = substr($response, 0, $headerSize);
-        $body = substr($response, $headerSize);
-
-        // Header in Array umwandeln
-        $headerLines = explode("\r\n", trim($headers));
-        $parsedHeaders = [];
-
-        foreach ($headerLines as $line) {
-            if (strpos($line, ':') !== false) {
-                list($key, $value) = explode(':', $line, 2);
-                $parsedHeaders[trim($key)] = trim($value);
-            }
-        }
-
-        return [
-            'url' => $effectiveUrl,
-            'status_code' => $httpCode,
-            // 'headers_raw' => $headers,
-            'headers_parsed' => $parsedHeaders,
-            'body' => $body,
-            'response_time' => $totalTime,
-            'body_size' => strlen($body)
-        ];
-    }
-
-    /**
-     * Ruft mehrere URLs parallel via curl_multi ab.
-     *
-     * @param array<int,string> $urls Liste von Ziel-URLs.
-     * @return array<string,array<string,mixed>> Antworten je URL.
-     */
-    function getMultipleWebsites($urls)
-    {
-
-        $results = [];
-        $curlHandles = [];
-        $multiHandle = curl_multi_init();
-
-        // Einzelne cURL-Handles für jede URL erstellen
-        foreach ($urls as $url) {
-            $ch = curl_init();
-
-            // cURL-Optionen setzen (gleich wie bei getWebsite)
-            curl_setopt($ch, CURLOPT_URL, $url);
-            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
-            curl_setopt($ch, CURLOPT_HEADER, true);
-            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
-            curl_setopt($ch, CURLOPT_TIMEOUT, 30);
-            curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');
-            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
-
-            // Handle zum Multi-Handle hinzufügen
-            curl_multi_add_handle($multiHandle, $ch);
-            $curlHandles[$url] = $ch;
-        }
-
-        // Alle Anfragen parallel ausführen
-        $running = null;
-        do {
-            curl_multi_exec($multiHandle, $running);
-            curl_multi_select($multiHandle);
-        } while ($running > 0);
-
-
-        // Ergebnisse verarbeiten
-        foreach ($urls as $url) {
-            $ch = $curlHandles[$url];
-            $response = curl_multi_getcontent($ch);
-
-            // Fehler überprüfen
-            if (curl_errno($ch)) {
-                $error = curl_error($ch);
-                $results[$url] = ['error' => $error];
-            } else {
-                // Informationen abrufen
-                $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
-                $headerSize = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
-                $totalTime = curl_getinfo($ch, CURLINFO_TOTAL_TIME);
-                $effectiveUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
-
-                // Header und Body trennen
-                $headers = substr($response, 0, $headerSize);
-                $body = substr($response, $headerSize);
-
-                // Header in Array umwandeln
-                $headerLines = explode("\r\n", trim($headers));
-                $parsedHeaders = [];
-
-                foreach ($headerLines as $line) {
-                    if (strpos($line, ':') !== false) {
-                        list($key, $value) = explode(':', $line, 2);
-                        $parsedHeaders[trim($key)] = trim($value);
-                    }
-                }
-
-                $results[$url] = [
-                    'url' => $effectiveUrl,
-                    'status_code' => $httpCode,
-                    'headers_parsed' => $parsedHeaders,
-                    'body' => $body,
-                    'response_time' => $totalTime,
-                    'body_size' => strlen($body)
-                ];
-            }
-
-            // Handle aus Multi-Handle entfernen und schließen
-            curl_multi_remove_handle($multiHandle, $ch);
-            curl_close($ch);
-        }
-
-        // Multi-Handle schließen
-        curl_multi_close($multiHandle);
-
-        return $results;
-    }
-
-
-
-
-    /**
-     * Persistiert Response-Daten und stoesst die Analyse der gefundenen Links an.
-     *
-     * @param int $crawlID Identifier der Crawl-Session.
-     * @param string $url Ursprung-URL, deren Antwort verarbeitet wird.
-     * @param array<string,mixed> $data Ergebnis der HTTP-Abfrage.
-     * @return void
-     */
-    function processResults(int $crawlID, string $url, array $data)
-    {
-        if (!isset($data['error'])) {
-            $status_code = $data['status_code'];
-            $response_time = $data['response_time'];
-            $body_size = $data['body_size'];
-            $date = date('Y-m-d H:i:s');
-            $body = $data['body'];
-
-            $sql = "UPDATE urls SET 
-            status_code = " . $status_code . ", 
-            response_time = " . ($response_time * 1000) . ", 
-            body_size = " . $body_size . ", 
-            date = now(),
-            body = '" . $this->db->real_escape_string($body) . "'
-
-            WHERE url = '" . $this->db->real_escape_string($url) . "' AND crawl_id = " . $crawlID . " LIMIT 1";
-            // echo $sql;
-
-            $this->db->query($sql);
-        } else {
-            // Handle error case if needed
-            echo "Fehler bei der Analyse von $url: " . $data['error'] . "\n";
-        }
-
-        $this->findNewUrls($crawlID, $body, $url);
-    }
-
-
-    /**
-     * Extrahiert Links aus einer Antwort und legt neue URL-Datensaetze an.
-     *
-     * @param int $crawlID Identifier der Crawl-Session.
-     * @param string $body HTML-Koerper der Antwort.
-     * @param string $url Bearbeitete URL, dient als Kontext fuer relative Links.
-     * @return void
-     */
-    function findNewUrls(int $crawlID, string $body, string $url) {
-
-
-
-
-        $links = $this->extractLinks($body, $url);
-
-        $temp = $this->db->query("select id from urls where url = '".$this->db->real_escape_string($url)."' and crawl_id = ".$crawlID." LIMIT 1")->fetch_all(MYSQLI_ASSOC);
-        $vonUrlId = $temp[0]['id'];
-
-
-        $this->db->query("delete from links where von = ".$vonUrlId);
-
-        foreach($links as $l) {
-
-            $u = $this->db->query("insert ignore into urls (url, crawl_id) values ('".$this->db->real_escape_string($l['absolute_url'])."',".$crawlID.")");
-            $id = $this->db->insert_id;
-            if ($id === 0) {
-                $qwer = $this->db->query("select id from urls where url = '".$this->db->real_escape_string($l['absolute_url'])."' and crawl_id = ".$crawlID." LIMIT 1")->fetch_all(MYSQLI_ASSOC);
-                $id = $qwer[0]['id'];
-            }
-
-
-
-
-
-            $sql_links = "insert ignore into links (von, nach, linktext, dofollow) values (
-            ".$vonUrlId.",
-            ".$id.",
-
-
-            '".$this->db->real_escape_string(mb_convert_encoding($l['text'],"UTF-8"))."',
-            ".(strstr($l['rel']??"", 'nofollow') === false ? 1 : 0)."
-
-
-            )";
-
-            echo $sql_links;
-
-            $u = $this->db->query($sql_links);
-        
-            
-
-        }
-
-
-
-        print_r($links);
-
-
-    }
-
-
-    /**
-     * Startet einen Crawl-Durchlauf fuer unbehandelte URLs.
-     *
-     * @param int $crawlID Identifier der Crawl-Session.
-     * @return void
-     */
-    function doCrawl(int $crawlID)
-    {
-
-        $urls2toCrawl = $this->db->query("select * from urls where crawl_id = " . $crawlID . " and date is null LIMIT 2")->fetch_all(MYSQLI_ASSOC); // and date is not null
-
-
-        $urls = [];
-        foreach ($urls2toCrawl as $u) {
-            $urls[] = $u['url'];
-        }
-
-        $multipleResults = $this->getMultipleWebsites($urls);
-
-        // print_r($multipleResults);
-        foreach ($multipleResults as $url => $data) {
-
-            $this->processResults($crawlID, $url, $data);
-        }
-    }
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-    /**
-     * Parst HTML-Inhalt und liefert eine strukturierte Liste gefundener Links.
-     *
-     * @param string $html Rohes HTML-Dokument.
-     * @param string $baseUrl Basis-URL fuer die Aufloesung relativer Pfade.
-     * @return array<int,array<string,mixed>> Gesammelte Linkdaten.
-     */
-    function extractLinks($html, $baseUrl = '')
-    {
-        $links = [];
-
-        // DOMDocument erstellen und HTML laden
-        $dom = new DOMDocument();
-
-        // Fehlerbehandlung für ungültiges HTML
-        libxml_use_internal_errors(true);
-        $dom->loadHTML('<?xml encoding="UTF-8">' . $html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
-        libxml_clear_errors();
-
-        // Alle <a> Tags finden
-        $aTags = $dom->getElementsByTagName('a');
-
-        foreach ($aTags as $index => $aTag) {
-            $href = $aTag->getAttribute('href');
-            $text = trim($aTag->textContent);
-            $rel = $aTag->getAttribute('rel');
-            $title = $aTag->getAttribute('title');
-            $target = $aTag->getAttribute('target');
-
-            // Nur Links mit href-Attribut
-            if (!empty($href)) {
-                // Relative URLs zu absoluten URLs konvertieren
-                $absoluteUrl = $href;
-                if (!empty($baseUrl) && !preg_match('/^https?:\/\//', $href)) {
-                    $absoluteUrl = rtrim($baseUrl, '/') . '/' . ltrim($href, '/');
-                }
-
-                $links[] = [
-                    'index' => $index + 1,
-                    'href' => $href,
-                    'absolute_url' => $absoluteUrl,
-                    'text' => $text,
-                    'rel' => $rel ?: null,
-                    'title' => $title ?: null,
-                    'target' => $target ?: null,
-                    'is_external' => $this->isExternalLink($href, $baseUrl),
-                    'link_type' => $this->getLinkType($href),
-                    'is_internal' => $this->isInternalLink($href, $baseUrl)?1:0
-                ];
-            }
-        }
-
-        return $links;
-    }
-
-    /**
-     * Prueft, ob ein Link aus Sicht der Basis-URL extern ist.
-     *
-     * @param string $href Ziel des Links.
-     * @param string $baseUrl Ausgangsadresse zur Domainabgleichung.
-     * @return bool|null True fuer extern, false fuer intern, null falls undefiniert.
-     */
-    private function isExternalLink($href, $baseUrl)
-    {
-        if (empty($baseUrl)) return null;
-
-        // Relative Links sind intern
-        if (!preg_match('/^https?:\/\//', $href)) {
-            return false;
-        }
-
-        $baseDomain = parse_url($baseUrl, PHP_URL_HOST);
-        $linkDomain = parse_url($href, PHP_URL_HOST);
-
-        return $baseDomain !== $linkDomain;
-    }
-
-    /**
-     * Prueft, ob ein Link derselben Domain wie die Basis-URL entspricht.
-     *
-     * @param string $href Ziel des Links.
-     * @param string $baseUrl Ausgangsadresse zur Domainabgleichung.
-     * @return bool|null True fuer intern, false fuer extern, null falls undefiniert.
-     */
-    private function isInternalLink($href, $baseUrl)
-    {
-        if (empty($baseUrl)) return null;
-
-        // Relative Links sind intern
-        if (!preg_match('/^https?:\/\//', $href)) {
-            return true;
-        }
-
-        $baseDomain = parse_url($baseUrl, PHP_URL_HOST);
-        $linkDomain = parse_url($href, PHP_URL_HOST);
-
-        return $baseDomain === $linkDomain;
-    }
-
-    /**
-     * Leitet den Link-Typ anhand gaengiger Protokolle und Muster ab.
-     *
-     * @param string $href Ziel des Links.
-     * @return string Beschreibender Typ wie "absolute" oder "email".
-     */
-    private function getLinkType($href)
-    {
-        if (empty($href)) return 'empty';
-        if (strpos($href, 'mailto:') === 0) return 'email';
-        if (strpos($href, 'tel:') === 0) return 'phone';
-        if (strpos($href, '#') === 0) return 'anchor';
-        if (strpos($href, 'javascript:') === 0) return 'javascript';
-        if (preg_match('/^https?:\/\//', $href)) return 'absolute';
-        return 'relative';
-    }
-
-
-    /**
-     * Gruppiert Links anhand ihres vorab bestimmten Typs.
-     *
-     * @param array<int,array<string,mixed>> $links Liste der extrahierten Links.
-     * @return array<string,array<int,array<string,mixed>>> Links nach Typ gruppiert.
-     */
-    function groupLinksByType($links)
-    {
-        $grouped = [];
-
-        foreach ($links as $link) {
-            $type = $link['link_type'];
-            if (!isset($grouped[$type])) {
-                $grouped[$type] = [];
-            }
-            $grouped[$type][] = $link;
-        }
-
-        return $grouped;
-    }
-}
Author	SHA1	Message	Date
Martin	1588f83624	Add pagination to all data tables using jQuery DataTables Libraries Added: - jQuery 3.7.1 from CDN - DataTables 1.13.7 (CSS + JS) from CDN Custom Styling: - Integrated DataTables styling with existing design - Custom pagination button styles - Responsive search and filter inputs Paginated Tables: - jobsTable: Crawl jobs (25/page, sorted by ID desc) - pagesTable: Crawled pages (50/page) - linksTable: Found links (50/page) - brokenTable: Broken links (25/page) - redirectsTable: Redirects (25/page) - seoTable: SEO issues (25/page) Features: - Search functionality per table - Column sorting - Configurable entries per page - German localization - Automatic reinitialization on data reload - Navigation controls (First/Previous/Next/Last) - Entry count display All quality checks pass: - PHPStan Level 8: 0 errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 09:49:39 +02:00
Martin	c40d44e4c9	Add redirect tracking and analysis features Database Schema: - Added redirect_url VARCHAR(2048) to pages table - Added redirect_count INT DEFAULT 0 to pages table - Added index on redirect_count for faster queries Configuration: - Created Config class with typed constants (PHP 8.3+) - MAX_REDIRECT_THRESHOLD = 3 (configurable warning threshold) - MAX_CRAWL_DEPTH = 50 - CONCURRENCY = 10 Backend Changes: - Crawler now tracks redirects using Guzzle's redirect tracking - Extracts redirect history from response headers - Records redirect count and final destination URL - Guzzle configured with max 10 redirects and tracking enabled API Endpoint: - New endpoint: /api.php?action=redirects - Analyzes redirect types (permanent 301/308 vs temporary 302/303/307) - Identifies excessive redirects (> threshold) - Returns statistics and detailed redirect information Frontend Changes: - Added "Redirects" tab with: * Statistics overview (Total, Permanent, Temporary, Excessive) * Detailed table showing all redirects * Visual warnings for excessive redirects (yellow background) * Color-coded redirect counts (red when > threshold) * Status code badges (green for permanent, blue for temporary) All quality checks pass: - PHPStan Level 8: 0 errors - PHPCS PSR-12: 0 errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 09:40:26 +02:00
Martin	e6b75410ed	Add copyright information to README Added visible copyright section with author information: - Martin Kiesewetter - mki@kies-media.de - https://kies-media.de Also updated project title from "PHP Docker Anwendung" to "Web Crawler" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 09:29:05 +02:00
Martin	f7be09ec63	Add Broken Links detection and SEO Analysis features Database Schema: - Added meta_description TEXT field to pages table - Added index on status_code for faster broken link queries Backend Changes: - Crawler now extracts meta descriptions from pages - New API endpoint: broken-links (finds 404s and server errors) - New API endpoint: seo-analysis (analyzes titles and meta descriptions) SEO Analysis Features: - Title length validation (optimal: 30-60 chars) - Meta description length validation (optimal: 70-160 chars) - Detection of missing titles/descriptions - Duplicate content detection (titles and meta descriptions) Frontend Changes: - Added "Broken Links" tab showing pages with errors - Added "SEO Analysis" tab with: * Statistics overview * Pages with SEO issues * Duplicate content report All quality checks pass: - PHPStan Level 8: 0 errors - PHPCS PSR-12: 0 warnings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 09:26:33 +02:00
Martin	9e61572747	Add recrawl functionality and fix PHPCS warnings - Added "Recrawl" button in jobs table UI - Implemented recrawl API endpoint that deletes all job data and restarts crawl - Fixed PHPCS line length warnings in api.php and Crawler.php All quality checks pass: - PHPStan Level 8: 0 errors - PHPCS PSR-12: 0 warnings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 09:07:50 +02:00
Martin	11fd8fa673	Add copyright headers to configuration files Extended copyright headers to SQL, YAML, and JSON configuration files: - config/docker/init.sql (SQL comment block) - docker-compose.yml (YAML comment) - composer.json and src/composer.json (JSON _comment field) All files validated and tested successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:58:28 +02:00
Martin	cbf099701b	Add copyright headers to all application files Added copyright headers to all PHP files in the application with proper author information (Martin Kiesewetter) and contact details. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:47:44 +02:00
Martin	ad274c0738	Update paths for config/ directory structure Adjusted all references to match new config/ structure: - docker/config/nginx/default.conf → config/nginx/default.conf - docker/init.sql → config/docker/init.sql - docker/start.sh → config/docker/start.sh Updated files: - docker-compose.yml: Updated volume mount paths - README.md: Updated project structure documentation New structure consolidates all configuration files under config/ for better organization and clarity. Tested and verified all services running correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:36:58 +02:00
Martin	de4d2e53d9	Reorganize Docker-related files into docker/ directory Moved Docker infrastructure files to dedicated docker/ folder: - config/nginx/default.conf → docker/config/nginx/default.conf - init.sql → docker/init.sql - start.sh → docker/start.sh (currently unused) Updated: - docker-compose.yml: Adjusted volume paths - README.md: Updated project structure documentation Benefits: - Clear separation between infrastructure (docker/) and application (src/) - Better project organization - Easier to understand for new developers Docker Compose and Dockerfile remain in root for convenience. All services tested and working correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:31:47 +02:00
Martin	daa76b2141	Remove legacy PHP files from root directory Removed unused legacy files: - index.php (old crawler entry point) - webanalyse.php (old crawler implementation) - setnew.php (database reset script) These files are no longer used. The current application uses: - src/index.php (web interface) - src/api.php (API endpoints) - src/classes/Crawler.php (crawler implementation) - src/crawler-worker.php (background worker) The legacy code remains in git history if needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:24:43 +02:00
Martin	09d5b61779	Fix link extraction bug caused by type checking The PHPStan fix inadvertently broke link extraction by using is_int() on $pageId, which failed when lastInsertId() or fetchColumn() returned a string instead of an int. Changes: - Convert $pageId to int explicitly after fetching - Use $pageId > 0 instead of is_int($pageId) for validation - Handle both 0 and '0' cases when fetching manually This ensures link extraction works again while maintaining type safety. Tests pass, PHPStan clean. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-04 08:18:52 +02:00
Martin	e569d189d5	Add comprehensive quality tooling and fix code style issues Quality Tools Added: - PHPStan (Level 8) for static analysis - PHP_CodeSniffer (PSR-12) for code style - Updated PHPUnit test suite with type safety Code Improvements: - Fixed all PHPStan Level 8 errors (13 issues) - Auto-fixed 25 PSR-12 code style violations - Added proper type hints for arrays and method parameters - Fixed PDOStatement\|false handling in api.php and tests - Improved null-safety for parse_url() calls Configuration: - phpstan.neon: Level 8, analyzes src/ and tests/ - phpcs.xml: PSR-12 standard, excludes vendor/ - docker-compose.yml: Mount config files for tooling - composer.json: Add phpstan, phpcs, phpcbf scripts Documentation: - Updated README.md with testing and quality sections - Updated AGENTS.md with quality gates and workflows - Added pre-commit checklist for developers All tests pass (9/9), PHPStan clean (0 errors), PHPCS compliant (1 warning) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-03 23:58:21 +02:00
Martin	b5640ad131	docker-compose	2025-10-03 23:26:28 +02:00
Martin	5b5a627662	gitignore	2025-10-03 23:19:49 +02:00
Martin	4e868ca8e9	Sonstiges	2025-10-03 20:22:17 +02:00
Martin	a6e2a7733e	Fix Docker container startup and API endpoint configuration - Update Dockerfile to use inline CMD instead of external start.sh script to resolve execution issues with CRLF line endings - Fix nginx fastcgi_pass configuration to use localhost:9000 for PHP-FPM communication - Correct API endpoint paths in frontend from /src/api.php to /api.php to match nginx document root configuration - Ensure Composer dependencies are properly installed with PHP 8.3 compatibility 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-03 20:20:20 +02:00
Martin	67390a76f3	Merge conflict resolved in .gitignore	2025-10-03 19:57:58 +02:00
Martin	f568875b2c	Add PHPUnit tests and update UI - Add PHPUnit 11.0 testing framework - Create unit tests for Database and Crawler classes - Create integration tests for Crawler - Add phpunit.xml configuration - Change UI background color to rose - All 9 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-03 14:13:04 +02:00
Martin	2f301cec42	Initialer Push von Martin 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-03 14:02:44 +02:00