single |
mwparserfromhell
================
**mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
that provides an easy-to-use and outrageously powerful parser for
MediaWiki_
wikicode. It supports Python 3.8+.
Developed by Earwig_ with contributions from `Σ`_, Legoktm_, and others.
Full documentation is available on ReadTheDocs_. Development occurs on
GitHub_.
Installation
------------
The easiest way to install the parser is through the `Python Package
Index`_;
you can install the latest release with pip install mwparserfromhell
(`get pip`_). Make sure your pip is up-to-date first, especially on
Windows.
Alternatively, get the latest development version::
git clone https://github.com/earwig/mwparserfromhell.git
cd mwparserfromhell
python setup.py install
The comprehensive unit testing suite requires `pytest`_ (pip install
pytest)
and can be run with ``python -m pytest``.
Usage
-----
Normal usage is rather straightforward (where text is page text):
>>> import mwparserfromhell
>>> wikicode = mwparserfromhell.parse(text)
wikicode is a ``mwparserfromhell.Wikicode`` object, which acts like an
ordinary str object with some extra methods. For example:
>>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
>>> wikicode = mwparserfromhell.parse(text)
>>> print(wikicode)
I has a template! {{foo|bar|baz|eggs=spam}} See it?
>>> templates = wikicode.filter_templates()
>>> print(templates)
['{{foo|bar|baz|eggs=spam}}']
>>> template = templates[0]
>>> print(template.name)
foo
>>> print(template.params)
['bar', 'baz', 'eggs=spam']
>>> print(template.get(1).value)
bar
>>> print(template.get("eggs").value)
spam
Since nodes can contain other nodes, getting nested templates is trivial:
>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates()
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}',
'{{spam}}']
You can also pass ``recursive=False to filter_templates()`` and explore
templates manually. This is possible because nodes can contain additional
Wikicode objects:
>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
>>> print(code.filter_templates(recursive=False))
['{{foo|this {{includes a|template}}}}']
>>> foo = code.filter_templates(recursive=False)[0]
>>> print(foo.get(1).value)
this {{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0])
{{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0].get(1).value)
template
Templates can be easily modified to add, remove, or alter params. Wikicode
objects can be treated like lists, with ``append()``, ``insert()``,
``remove()``, ``replace()``, and more. They also have a ``matches()``
method
for comparing page or template names, which takes care of capitalization
and
whitespace:
>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
>>> code = mwparserfromhell.parse(text)
>>> for template in code.filter_templates():
... if template.name.matches("Cleanup") and not template.has("date"):
... template.add("date", "July 2012")
...
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
>>> code.replace("{{uncategorized}}", "{{bar-stub}}")
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
|