python-charset-normalizer
Port variant py39
Summary Charset Detection, for Everyone (3.9)
Package version 2.0.12
Homepage https://github.com/ousret/charset_normalizer
Keywords python
Maintainer Python Automaton
License Not yet specified
Other variants py310
Ravenports Buildsheet | History
Ravensource Port Directory | History
Last modified 22 FEB 2022, 02:10:24 UTC
Port created 15 JUL 2021, 22:32:01 UTC
Subpackage Descriptions
single

Charset Detection, for Everyone 👋 [image]

The Real First Universal Charset Detector
[image] [image] [image]

> A library that helps you read text from an unknown charset encoding.
Motivated by `chardet`, > I'm trying to resolve the issue by taking a new approach. > All IANA character set names for which the Python core library provides codecs are supported.

>>>>> 👉 Try Me Online Now, Then Adopt Me 👈 <<<<<

This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**. | Feature | [Chardet] | Charset Normalizer | [cChardet] | | ------------- | :-------------: | :------------------: | :------------------: | | `Fast` | ❌
| ✅
| ✅
| | `Universal**` | ❌ | ✅ | ❌ | | `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ | | `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ | | `Free & Open` | ✅ | ✅ | ✅ | | `License` | LGPL-2.1 | MIT | MPL-1.1 | `Native Python` | ✅ | ✅ | ❌ | | `Detect spoken language` | ❌ | ✅ | N/A | | `Supported Encoding` | 30 | :tada: [93] | 40

[image][image] *\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*
Did you got there because of the logs? See [https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html] ## ⭐ Your support *Fork, test-it, star-it, submit your ideas! We do listen.* ## ⚡ Performance This package offer better performance than its counterpart Chardet. Here are some numbers. | Package | Accuracy | Mean per file (ms) | File per sec (est) | | ------------- | :-------------: | :------------------: | :------------------: | | [chardet] | 92 % | 220 ms | 5 file/sec | | charset-normalizer | **98 %** | **40 ms** | 25 file/sec | | Package | 99th percentile | 95th percentile | 50th percentile | | ------------- | :-------------: | :------------------: | :------------------: | | [chardet] | 1115 ms | 300 ms | 27 ms | | charset-normalizer | 460 ms | 240 ms | 18 ms | Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload. > Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows. > And yes, these results might change at any time. The dataset can be updated to include more files. > The actual delays heavily depends on your CPU capabilities. The factors should remain the same. [cchardet] is a non-native (cpp binding) and unmaintained faster alternative with a better accuracy than chardet but lower than this package. If speed is the most important factor, you should try it. ## ✨ Installation Using PyPi for latest stable ```sh pip install charset-normalizer -U ``` If you want a more up-to-date `unicodedata` than the one available in your Python setup.

Configuration Switches (platform-specific settings discarded)
PY310 OFF Build using Python 3.10 PY39 ON Build using Python 3.9
Package Dependencies by Type
Build (only) python-pip:single:py39
autoselect-python:single:standard
Build and Runtime python39:single:standard
Download groups
main mirror://PYPIWHL/06/b3/24afc8868eba069a7f03650ac750a778862dc34941a4bebeb58706715726
Distribution File Information
6881edbebdb17b39b4eaaa821b438bf6eddffb4468cf344f09f89def34a8b1df 39623 charset_normalizer-2.0.12-py3-none-any.whl
Ports that require python-charset-normalizer:py39
python-requests:py39 Python HTTP for Humans (3.9)