Ubuntu Mirrors Scraper
A Python program designed to automatically scrape and organize a comprehensive list of Ubuntu archive and CD (ISO) mirrors directly from the official Launchpad website. This tool helps users and systems identify and utilize up-to-date mirror servers for faster and more reliable access to Ubuntu packages and ISO images.
Features
- Automated Scraping: Automatically fetches mirror data from Launchpad.
- Dual Mirror Support: Scrapes both archive (package) and CD (ISO) mirrors.
- Multiple Output Formats: Generates mirror lists in CSV, JSON, and plain text formats for easy consumption by various applications and scripts.
- Organized Output: Stores mirror data in a structured
data/mirrors
directory, separated by mirror type.
Usage
To run this scraper and generate the mirror lists, follow these steps:
- Install pipenv: If you don’t have
pipenv
installed, you can install it using pip:pip install --upgrade pipenv
- Install Dependencies: Navigate to the project root directory and install the required Python packages using pipenv:
pipenv install
- Run the Scraper: Execute the
main.py
script using pipenv. This will initiate the scraping process and generate the mirror files in thedata/mirrors
directory.pipenv run python main.py
Output Files
The scraper generates the following files, categorized by mirror type:
Archive Mirrors
These files contain a list of mirrors for Ubuntu package archives.
- CSV File: Comma-separated values, suitable for spreadsheet applications or data analysis.
- JSON File: JavaScript Object Notation, ideal for programmatic access and web applications.
- Txt File: Plain text file, with one mirror URL per line.
CD Mirrors
These files contain a list of mirrors for Ubuntu CD/DVD (ISO) images.
Technologies Used
- BeautifulSoup4: For parsing HTML and XML documents.
- Requests: For making HTTP requests to fetch web pages.