| single | # url-normalize
[tests]
[Coveralls]
[PyPI]
A Python library for standardizing and normalizing URLs with support for
internationalized domain names (IDN).
## Table of Contents
- [Introduction]
- [Features]
- [Installation]
- [Usage]
  - [Python API]
  - [Command Line]
- [Documentation]
- [Contributing]
- [License]
## Introduction
url-normalize provides a robust URI normalization function that:
- Takes care of IDN domains.
- Always provides the URI scheme in lowercase characters.
- Always provides the host, if any, in lowercase characters.
- Only performs percent-encoding where it is essential.
- Always uses uppercase A-through-F characters when percent-encoding.
- Prevents dot-segments appearing in non-relative URI paths.
- For schemes that define a default authority, uses an empty authority if
the
  default is desired.
- For schemes that define an empty path to be equivalent to a path of "/",
  uses "/".
- For schemes that define a port, uses an empty port if the default is
desired
- Ensures all portions of the URI are utf-8 encoded NFC from Unicode
strings
Inspired by Sam Ruby's [urlnorm.py]
## Features
- **IDN Support**: Full internationalized domain name handling
- **Configurable Defaults**:
  - Customizable default scheme (https by default)
  - Configurable default domain for absolute paths
- **Query Parameter Control**:
  - Parameter filtering with allowlists
  - Support for domain-specific parameter rules
- **Versatile URL Handling**:
  - Empty string URLs
  - Double slash URLs (//domain.tld)
  - Shebang (#!) URLs
- **Developer Friendly**:
  - Cross-version Python compatibility (3.8+)
  - 100% test coverage
  - Modern type hints and string handling
## Installation
```sh
pip install url-normalize
```
## Usage
### Python API
```python
from url_normalize import url_normalize
# Basic normalization (uses https by default)
print(url_normalize("www.foo.com:80/foo"))
# Output: https://www.foo.com/foo
# With custom default scheme
print(url_normalize("www.foo.com/foo", default_scheme="http"))
# Output: http://www.foo.com/foo
# With query parameter filtering enabled
print(url_normalize("www.google.com/search?q=test&utm_source=test",
filter_params=True))
# Output: https://www.google.com/search?q=test
# With custom parameter allowlist as a dict
print(url_normalize(
    "example.com?page=1&id=123&ref=test",
    filter_params=True,
    param_allowlist={"example.com": ["page", "id"]}
))
# Output: https://example.com?page=1&id=123
# With custom parameter allowlist as a list
print(url_normalize(
    "example.com?page=1&id=123&ref=test",
    filter_params=True,
    param_allowlist=["page", "id"] |