Charset Detection, for Everyone 👋 [image]

^{The Real First Universal Charset Detector}
[image] [image] [image]

> A library that helps you read text from an unknown charset encoding.
Motivated by `chardet`, > I'm trying to resolve the issue by taking a new approach. > All IANA character set names for which the Python core library provides codecs are supported.

>>>>> 👉 Try Me Online Now, Then Adopt Me 👈 <<<<<

This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**. | Feature | [Chardet] | Charset Normalizer | [cChardet] | | ------------- | :-------------: | :------------------: | :------------------: | | `Fast` | ❌
| ✅
| ✅
| | `Universal**` | ❌ | ✅ | ❌ | | `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ | | `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ | | `Free & Open` | ✅ | ✅ | ✅ | | `License` | LGPL-2.1 | MIT | MPL-1.1 | `Native Python` | ✅ | ✅ | ❌ | | `Detect spoken language` | ❌ | ✅ | N/A | | `Supported Encoding` | 30 | :tada: [93] | 40

[image][image] *\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*
## ⭐ Your support *Fork, test-it, star-it, submit your ideas! We do listen.* ## ⚡ Performance This package offer better performance than its counterpart Chardet. Here are some numbers. | Package | Accuracy | Mean per file (ms) | File per sec (est) | | ------------- | :-------------: | :------------------: | :------------------: | | [chardet] | 92 % | 220 ms | 5 file/sec | | charset-normalizer | **98 %** | **40 ms** | 25 file/sec | | Package | 99th percentile | 95th percentile | 50th percentile | | ------------- | :-------------: | :------------------: | :------------------: | | [chardet] | 1115 ms | 300 ms | 27 ms | | charset-normalizer | 460 ms | 240 ms | 18 ms | Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload. > Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows. > And yes, these results might change at any time. The dataset can be updated to include more files. > The actual delays heavily depends on your CPU capabilities. The factors should remain the same. [cchardet] is a non-native (cpp binding) faster alternative. If speed is the most important factor, you should try it. ## ✨ Installation Using PyPi for latest stable ```sh pip install charset-normalizer -U ``` If you want a more up-to-date `unicodedata` than the one available in your Python setup. ```sh pip install charset-normalizer[unicode_backport] -U ```

Port variant	py38
Summary	Charset Detection, for Everyone (PY38)
Package version	2.0.7
Homepage	https://github.com/ousret/charset_normalizer
Keywords	python
Maintainer	Python Automaton
License	Not yet specified
Other variants	py39
Ravenports	Buildsheet \| History
Ravensource	Port Directory \| History
Last modified	11 OCT 2021, 22:56:49 UTC
Port created	15 JUL 2021, 22:32:01 UTC

Build (only)	python-pip:single:py38 autoselect-python:single:standard
Build and Runtime	python38:single:standard