python-charset-normalizer
Port variant py38
Summary Charset Detection, for Everyone (PY38)
Package version 2.0.7
Homepage https://github.com/ousret/charset_normalizer
Keywords python
Maintainer Python Automaton
License Not yet specified
Other variants py39
Ravenports Buildsheet | History
Ravensource Port Directory | History
Last modified 11 OCT 2021, 22:56:49 UTC
Port created 15 JUL 2021, 22:32:01 UTC
Subpackage Descriptions
single

Charset Detection, for Everyone 👋 [image]

The Real First Universal Charset Detector
[image] [image] [image]

> A library that helps you read text from an unknown charset encoding.
Motivated by `chardet`, > I'm trying to resolve the issue by taking a new approach. > All IANA character set names for which the Python core library provides codecs are supported.

>>>>> 👉 Try Me Online Now, Then Adopt Me 👈 <<<<<

This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**. | Feature | [Chardet] | Charset Normalizer | [cChardet] | | ------------- | :-------------: | :------------------: | :------------------: | | `Fast` | ❌
| ✅
| ✅
| | `Universal**` | ❌ | ✅ | ❌ | | `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ | | `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ | | `Free & Open` | ✅ | ✅ | ✅ | | `License` | LGPL-2.1 | MIT | MPL-1.1 | `Native Python` | ✅ | ✅ | ❌ | | `Detect spoken language` | ❌ | ✅ | N/A | | `Supported Encoding` | 30 | :tada: [93] | 40

[image][image] *\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*
## ⭐ Your support *Fork, test-it, star-it, submit your ideas! We do listen.* ## ⚡ Performance This package offer better performance than its counterpart Chardet. Here are some numbers. | Package | Accuracy | Mean per file (ms) | File per sec (est) | | ------------- | :-------------: | :------------------: | :------------------: | | [chardet] | 92 % | 220 ms | 5 file/sec | | charset-normalizer | **98 %** | **40 ms** | 25 file/sec | | Package | 99th percentile | 95th percentile | 50th percentile | | ------------- | :-------------: | :------------------: | :------------------: | | [chardet] | 1115 ms | 300 ms | 27 ms | | charset-normalizer | 460 ms | 240 ms | 18 ms | Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload. > Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows. > And yes, these results might change at any time. The dataset can be updated to include more files. > The actual delays heavily depends on your CPU capabilities. The factors should remain the same. [cchardet] is a non-native (cpp binding) faster alternative. If speed is the most important factor, you should try it. ## ✨ Installation Using PyPi for latest stable ```sh pip install charset-normalizer -U ``` If you want a more up-to-date `unicodedata` than the one available in your Python setup. ```sh pip install charset-normalizer[unicode_backport] -U ```

Configuration Switches (platform-specific settings discarded)
PY38 ON Build using Python 3.8 PY39 OFF Build using Python 3.9
Package Dependencies by Type
Build (only) python-pip:single:py38
autoselect-python:single:standard
Build and Runtime python38:single:standard
Download groups
main mirror://PYPIWHL/de/c8/820b1546c68efcbbe3c1b10dd925fbd84a0dda7438bc18db0ef1fa567733
Distribution File Information
f7af805c321bfa1ce6714c51f254e0d5bb5e5834039bc17db7ebe3a4cec9492b 38247 charset_normalizer-2.0.7-py3-none-any.whl
Ports that require python-charset-normalizer:py38
python-requests:py38 Python HTTP for Humans (PY38)