Charset Detection, for Everyone 👋 [image]

^{The Real First Universal Charset Detector}
[image] [image] [image]

> A library that helps you read text from an unknown charset encoding.
Motivated by `chardet`, > I'm trying to resolve the issue by taking a new approach. > All IANA character set names for which the Python core library provides codecs are supported.

>>>>> 👉 Try Me Online Now, Then Adopt Me 👈 <<<<<

This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**. | Feature | [Chardet] | Charset Normalizer | [cChardet] | | ------------- | :-------------: | :------------------: | :------------------: | | `Fast` | ❌
| ✅
| ✅
| | `Universal**` | ❌ | ✅ | ❌ | | `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ | | `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ | | `License` | LGPL-2.1
_restrictive_ | MIT | MPL-1.1
_restrictive_ | | `Native Python` | ✅ | ✅ | ❌ | | `Detect spoken language` | ❌ | ✅ | N/A | | `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ | | `Whl Size` | 193.6 kB | 39.5 kB | ~200 kB | | `Supported Encoding` | 33 | :tada: [90] | 40

[image][image] *\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*
Did you got there because of the logs? See [https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html] ## ⭐ Your support *Fork, test-it, star-it, submit your ideas! We do listen.* ## ⚡ Performance This package offer better performance than its counterpart Chardet. Here are some numbers. | Package | Accuracy | Mean per file (ms) | File per sec (est) | | ------------- | :-------------: | :------------------: | :------------------: | | [chardet] | 86 % | 200 ms | 5 file/sec | | charset-normalizer | **98 %** | **10 ms** | 100 file/sec | | Package | 99th percentile | 95th percentile | 50th percentile | | ------------- | :-------------: | :------------------: | :------------------: | | [chardet] | 1200 ms | 287 ms | 23 ms | | charset-normalizer | 100 ms | 50 ms | 5 ms | Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload. > Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows. > And yes, these results might change at any time. The dataset can be updated to include more files. > The actual delays heavily depends on your CPU capabilities. The factors should remain the same. > Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability > (eg. Supported Encoding) Challenge-them if you want. ## ✨ Installation Using PyPi for latest stable ```sh pip install charset-normalizer -U ``` ## 🚀 Basic Usage ### CLI This package comes with a CLI.

Port variant	py39
Summary	Charset Detection, for Everyone (3.9)
Package version	3.0.1
Homepage	https://github.com/Ousret/charset_normalizer
Keywords	python
Maintainer	Python Automaton
License	Not yet specified
Other variants	py310
Ravenports	Buildsheet \| History
Ravensource	Port Directory \| History
Last modified	19 NOV 2022, 23:30:14 UTC
Port created	15 JUL 2021, 22:32:01 UTC

Build (only)	python-pip:single:py39 autoselect-python:single:standard
Build and Runtime	python39:single:standard

python-aiohttp:py39	Async http client/server framework (3.9)
python-requests:py39	Python HTTP for Humans (3.9)