single |
This module provides regular expressions according to `RFC 3986 "Uniform
Resource Identifier (URI): Generic Syntax"
`_ and `RFC 3987 "Internationalized
Resource Identifiers (IRIs)" `_, and
utilities for composition and relative resolution of references.
API
---
**match** (string, rule='IRI_reference')
Convenience function for checking if `string` matches a specific rule.
Returns a match object or None::
>>> assert match('%C7X', 'pct_encoded') is None
>>> assert match('%C7', 'pct_encoded')
>>> assert match('%c7', 'pct_encoded')
**parse** (string, rule='IRI_reference')
Parses `string` according to `rule` into a dict of subcomponents.
If `rule` is None, parse an IRI_reference [without validation
].
If regex_ is available, any rule is supported; with re_, `rule` must be
'IRI_reference' or some special case thereof ('IRI', 'absolute_IRI',
'irelative_ref', 'irelative_part', 'URI_reference', 'URI',
'absolute_URI',
'relative_ref', 'relative_part'). ::
>>> d = parse('http://tools.ietf.org/html/rfc3986#appendix-A',
... rule='URI')
>>> assert all([ d['scheme'] == 'http',
... d['authority'] == 'tools.ietf.org',
... d['path'] == '/html/rfc3986',
... d['query'] == None,
... d['fragment'] == 'appendix-A'])
**compose** (\*\*parts)
Returns an URI composed_ from named parts.
.. _composed: http://tools.ietf.org/html/rfc3986#section-5.3
**resolve** (base, uriref, strict=True, return_parts=False)
Resolves_ an `URI reference` relative to a `base` URI.
[Test cases]::
>>> base = resolve.test_cases_base
>>> for relative, resolved in resolve.test_cases.items():
... assert resolve(base, relative) == resolved
If `return_parts` is True, returns a dict of named parts instead of
a string.
Examples::
>>> assert resolve('urn:rootless', '../../name') == 'urn:name'
>>> assert resolve('urn:root/less', '../../name') == 'urn:/name'
>>> assert resolve('http://a/b', 'http:g') == 'http:g'
>>> assert resolve('http://a/b', 'http:g', strict=False) ==
'http://a/g'
.. _Resolves: http://tools.ietf.org/html/rfc3986#section-5.2
**patterns**
A dict of regular expressions with useful group names.
Compilable (with regex_ only) without need for any particular
compilation
flag.
**[bmp_][u]patterns[_no_names]**
Alternative versions of `patterns`.
[u]nicode strings without group names for the re_ module.
BMP only for narrow builds.
**get_compiled_pattern** (rule, flags=0)
Returns a compiled pattern object for a rule name or template string.
Usage for validation::
>>> uri = get_compiled_pattern('^%(URI)s$')
>>> assert
uri.match('http://tools.ietf.org/html/rfc3986#appendix-A')
>>> assert not
get_compiled_pattern('^%(relative_ref)s$').match('#f#g')
>>> from unicodedata import lookup
>>> smp = 'urn:' + lookup('OLD ITALIC LETTER A') # U+00010300
>>> assert not uri.match(smp)
>>> m = get_compiled_pattern('^%(IRI)s$').match(smp)
On narrow builds, non-BMP characters are (incorrectly) excluded::
>>> assert NARROW_BUILD == (not m)
For parsing, some subcomponents are captured in named groups (*only if*
regex_ is available, otherwise see `parse`)::
>>> match =
uri.match('http://tools.ietf.org/html/rfc3986#appendix-A')
|