Show HN:自行託管 Reddit – 23.8 億則貼文,離線可用,永久屬於您
這篇「Show HN」介紹了 Redd-Archiver,一個基於 PostgreSQL 的歸檔生成器,可從 Reddit、Voat 和 Ruqqus 等連結聚合平台建立可瀏覽的 HTML 歸檔,讓使用者能夠自行託管並離線保存內容。
Navigation Menu
Search code, repositories, users, issues, pull requests...
Provide feedback
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly
To see all available qualifiers, see our documentation.
A PostgreSQL-backed archive generator that creates browsable HTML archives from link aggregator platforms including Reddit, Voat, and Ruqqus.
License
Uh oh!
There was an error while loading. Please reload this page.
19-84/redd-archiver
Folders and files
Latest commit
History
Repository files navigation
Redd-Archiver
Transform compressed data dumps into browsable HTML archives with flexible deployment options. Redd-Archiver supports offline browsing via sorted index pages OR full-text search with Docker deployment. Features mobile-first design, multi-platform support, and enterprise-grade performance with PostgreSQL full-text indexing.
Supported Platforms:
Tracked content: 2.384 billion posts across 68,883 communities (Reddit full Pushshift dataset through Dec 31 2024, Voat/Ruqqus complete archives)
Version 1.0 features multi-platform archiving, REST API with 30+ endpoints, MCP server for AI integration, and PostgreSQL-backed architecture for large-scale processing.
🚀 Quick Start
Try the live demo: Browse Example Archive →
New to Redd-Archiver? Start here: QUICKSTART.md
Get running in 2-15 minutes with our step-by-step guide covering:
🎯 Key Features
🌐 Multi-Platform Support
Archive content from multiple link aggregator platforms in a single unified archive:
🤖 MCP Server (AI Integration)
29 MCP tools auto-generated from OpenAPI for AI assistants:
See MCP Server Documentation for complete setup guide.
Core Functionality
Technical Excellence
Deployment Options
📦 Deployment Options
Redd-Archiver generates static HTML files that can be browsed offline OR deployed with full-text search:
Offline Browsing Features:
With Search Server:
🚨 Get Involved: Help Preserve Internet History
Internet content disappears every day. Communities get banned, platforms shut down, and valuable discussions vanish. You can help prevent this.
📥 Download & Mirror Data Now
Don't wait for content to disappear. Download these datasets today:
† Voat Performance Tip: Use pre-split files for 1000x faster imports (2-5 min vs 30+ min per subverse)
‡ Ruqqus: Docker image includes p7zip for automatic .7z decompression
Every mirror matters. Store locally, seed torrents, share with researchers. Be part of the preservation network.
🌐 Join the Registry: Deploy Your Instance
Already running an archive? Register it on our public leaderboard:
Benefits:
👉 Register Your Instance Now →
🆕 Submit New Data Sources
Found a new platform dataset? Help expand the archive network:
👉 Submit Data Source →
Why submit?
📸 Screenshots
Dashboard

Main landing page showing archive overview with statistics for 9,592 posts across Reddit, Voat, and Ruqqus. Features customizable branding (site name, project URL), responsive cards, activity metrics, and content statistics. (Works offline)
Subreddit Index

Post listing with sorting options (score, comments, date), pagination, and badge coloring. Includes navigation and theme toggle. (Works offline - sorted by score/comments/date)
Post Page with Comments

Individual post displaying nested comment threads with collapsible UI, user flair, and timestamps. Comments include anchor links for direct navigation from user pages. (Works offline)
Mobile Responsive Design

Fully optimized for mobile devices with touch-friendly navigation and responsive layout.
Search Interface

PostgreSQL full-text search with Google-style operators. Supports filtering by subreddit, author, date range, and score. (Requires Docker deployment)

Search results with highlighted excerpts using PostgreSQL ts_headline(). Sub-second response times with GIN indexing. (Server-based, Tor-compatible)
Sample Archive: Multi-platform archive featuring programming and technology communities from Reddit, Voat, and Ruqqus · See all screenshots →
🛠️ Installation
Prerequisites
Python Dependencies
Redd-Archiver uses modern, performance-focused dependencies:
Core:
HTML Generation:
Performance:
Quick Start
Review the CHANGELOG.md for version updates and changes.
📊 Usage
1. Prepare Your Data
Redd-Archiver processes data dumps from multiple platforms:
2. Identify High-Priority Communities (Optional)
Scanner Tools help you identify which communities to archive first based on priority scores:
What the scanners do:
Example output:
Use cases:
Output files (included in tools/ directory):
View the complete data catalog to browse all communities and their priority scores.
3. Configure PostgreSQL
Ensure DATABASE_URL is set (see Installation above):
4. Generate Your Archive
Reddit Archives (.zst files):
Voat Archives (SQL dumps):
Ruqqus Archives (.7z files):
Multi-Platform Mixed Archive:
With filtering and SEO:
Import/Export workflow (for large datasets):
4. Deploy Your Archive
Multiple deployment options available:
Local/Development (HTTP):
Production HTTPS (Let's Encrypt):
Homelab/Tor (.onion hidden service):
Dual-Mode (HTTPS + Tor):
Static Hosting (GitHub/Codeberg Pages):
See deployment guides:
5. Advanced CLI Options
Processing Control:
Logging:
Performance Tuning:
Environment Variables:
🏗️ Architecture
Redd-Archiver features a clean modular architecture with specialized components:
Project Structure
HTML Modules (18 specialized modules)
Jinja2 Templates (15 templates)
Database Schema
🔍 PostgreSQL Full-Text Search
Lightning-Fast Database Search
Redd-Archiver v1.0 uses PostgreSQL full-text search with GIN indexing for blazing-fast search capabilities:
Key Features:
Search API
PostgreSQL search is exposed via postgres_search.py (CLI) and search_server.py (Web API):
Command-Line Interface:
Web API (✅ Implemented):
Features:
🌐 REST API & Registry
REST API v1
Full-featured API with 30+ endpoints for programmatic access and MCP/AI integration:
MCP/AI-Optimized Features:
Rate limited to 100 requests/minute. See API Documentation for complete reference.
Instance Registry & Leaderboard
Redd-Archiver supports a distributed registry system for tracking archive instances:
See Registry Setup Guide for configuration.
📈 Performance & Optimization
PostgreSQL Backend Performance (v1.0+)
Constant Memory Usage:
Database Storage:
Processing Speed:
Search Performance
Performance varies based on dataset size, query complexity, and hardware:
Architecture Benefits
PostgreSQL v1.0 Features:
🔀 Scaling for Very Large Archives
Single Instance Limits
Redd-Archiver has been tested with archives up to hundreds of gigabytes. For optimal performance:
Horizontal Scaling Strategy
For very large archive collections (multiple terabytes), deploy multiple instances divided by topic:
Architecture:
Benefits:
Deployment Options:
Example Multi-Instance Setup:
When to Use:
🎯 Use Cases
Research & Academia
Community Archiving
Investigation & Analysis
📚 Documentation
Deployment Guides
API & Integration
Project Documentation
🤝 Contributing
We welcome contributions! Please see CONTRIBUTING.md for development guidelines, code structure, and testing procedures.
Key areas for contribution:
See our modular architecture (18 specialized modules) for easy entry points to contribute.
📝 License
This is free and unencumbered software released into the public domain. See the LICENSE file (Unlicense) for details.
Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software for any purpose, commercial or non-commercial, and by any means.
📦 Data Sources
This project leverages public datasets from the following sources:
🙏 Acknowledgments
This project builds upon the work of several excellent archival projects:
📧 Contact
💰 Support the Project
Redd-Archiver was built by one person over 6 months as a labor of love to preserve internet history before it disappears forever.
This isn't backed by a company or institution—just an individual committed to keeping valuable discussions accessible. Your support helps:
Every donation, no matter the size, helps keep this preservation effort alive.
Bitcoin (BTC)
Scan to donate Bitcoin

Monero (XMR)
Scan to donate Monero

Thank you for supporting internet archival efforts! Every contribution helps maintain and improve this project.
This software is provided "as is" under the Unlicense. See LICENSE for details. Users are responsible for compliance with applicable laws and terms of service when processing data.
About
A PostgreSQL-backed archive generator that creates browsable HTML archives from link aggregator platforms including Reddit, Voat, and Ruqqus.
Topics
Resources
License
Code of conduct
Contributing
Security policy
Uh oh!
There was an error while loading. Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading. Please reload this page.
Contributors
2
Languages
Footer
Footer navigation
相關文章