Containers and iteration¶
import pandoc
from pandoc.types import *
Container¶
All concrete1 Pandoc element (of type Pandoc
, Para
, Str
, etc.) are list-like ;
their items are the arguments passed to their constructor.
We present here several familiar methods to access this content.
We illustrate this interface with the "Hello world!"
document:
meta = Meta({})
blocks = [Para([Str('Hello'), Space(), Str('world!')])]
doc = Pandoc(meta, blocks)
Random access¶
Indexing and slicing for this element work pretty much as in lists:
>>> doc[0]
Meta({})
>>> doc[1]
[Para([Str('Hello'), Space(), Str('world!')])]
>>> meta, blocks = doc[:]
>>> meta
Meta({})
>>> blocks
[Para([Str('Hello'), Space(), Str('world!')])]
The same patterns apply to change the element contents:
>>> maths = [Para([Math(InlineMath(), 'a=1')])]
>>> doc[1] = maths
>>> doc
Pandoc(Meta({}), [Para([Math(InlineMath(), 'a=1')])])
>>> meta = Meta({'title': MetaInlines([Str('Maths')])})
>>> doc[:] = meta, maths
>>> doc
Pandoc(Meta({'title': MetaInlines([Str('Maths')])}), [Para([Math(InlineMath(), 'a=1')])])
Length¶
The length of element is the number of items it contains.
Here for doc
, the meta
and blocks
arguments of its constructor:
>>> len(doc)
2
>>> len(doc) == len(doc[:])
True
Equality¶
Pandoc elements can be compared. The equality test checks for equality of type, then (recusively if needed) for equality of contents:
>>> para = doc[1][0]
>>> para == Para([Math(InlineMath(), 'a=1')])
True
>>> para == Para([Math(DisplayMath(), 'a=1')])
False
>>> para == Para([Math(InlineMath(), 'a=2')])
False
Membership¶
A membership test โ that leverages the equality test โ is also available:
>>> Meta({}) in doc
False
>>> Meta({'title': MetaInlines([Str('Maths')])}) in doc
True
Iteration¶
All pandoc item can be iterated. Consider
doc = pandoc.read("Hello world!")
We have:
>>> for elt in doc:
... print(elt)
Meta({})
[Para([Str('Hello'), Space(), Str('world!')])]
>>> meta, blocks = doc[:]
>>> for elt in meta:
... print(elt)
{}
>>> para = blocks[0]
>>> for elt in para:
... print(elt)
[Str('Hello'), Space(), Str('world!')]
>>> world = para[0][2]
>>> for elt in world:
... print(elt)
world!
Pattern matching¶
With Python 3.10 (or newer), pattern matching can be used for every Pandoc element:
>>> doc = pandoc.read("Hello world!")
>>> match doc:
... case Pandoc(Meta(meta), [Para(inlines)]):
... assert meta == {}
... print(inlines)
[Str('Hello'), Space(), Str('world!')]
Tree Iteration¶
Depth-first traversal¶
Python's built-in iter
โ which is used implicitly in the for loops โ
yields the children of the pandoc element, that is the arguments
that were given to its constructor ;
it is non-recursive: the contents of these children are not explored.
On the contrary, pandoc.iter
iterates a pandoc item recursively,
in document order. It performs a (preoder) depth-first traversal:
the iteration first yields the element given as argument to pandoc.iter
(the root), then its first child (if any), then the first child of this child
(if any), etc. recursively, before it yields the second child of the root (if
any), then the first child of this child, etc.
For example, with the following document
>>> doc = pandoc.read("""
... # Title
... Content
... """)
>>> doc
Pandoc(Meta({}), [Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])])
we have on one hand
>>> for elt in iter(doc):
... print(elt)
Meta({})
[Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])]
and on the other hand
>>> for elt in pandoc.iter(doc):
... print(elt)
Pandoc(Meta({}), [Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])])
Meta({})
{}
[Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])]
Header(1, ('title', [], []), [Str('Title')])
1
('title', [], [])
title
[]
[]
[Str('Title')]
Str('Title')
Title
Para([Str('Content')])
[Str('Content')]
Str('Content')
Content
Python built-in types¶
Numbers¶
Tree iteration can be applied to Python builts-in types, including those which
are not usually considered containers and thus are not iterable with the
built-in iter
function. The Bool
, Int
and Double
primitive types
(that is bool
, int
and float
) fall in this case:
>>> assert isinstance(True, Bool)
>>> iter(True)
Traceback (most recent call last):
...
TypeError: 'bool' object is not iterable
>>> assert isinstance(1, Int)
>>> iter(1)
Traceback (most recent call last):
...
TypeError: 'int' object is not iterable
>>> assert isinstance(3.14, Double)
>>> iter(3.14)
Traceback (most recent call last):
...
TypeError: 'float' object is not iterable
Since these elements have no child, tree iteration will only yield the elements themselves:
>>> for elt in pandoc.iter(True):
... print(elt)
True
>>> for elt in pandoc.iter(1):
... print(elt)
1
>>> for elt in pandoc.iter(3.14):
... print(elt)
3.14
Strings¶
Python strings are iterable, but in the context of tree iteration, we consider
them as atomic objects like booleans, integers and doubles. Thus pandoc.iter
will not iterate on characters like the built-in iter
function:
>>> isinstance("Hello!", Text)
True
>>> for elt in "Hello!":
... print(elt)
H
e
l
l
o
!
>>> for elt in pandoc.iter("Hello!"):
... print(elt)
Hello!
Tuples, lists, dicts¶
Tree iteration for tuples holds no surprise:
>>> elts = (1, (2, 3))
>>> for elt in elts:
... print(elt)
1
(2, 3)
>>> for elt in pandoc.iter(elts):
... print(elt)
(1, (2, 3))
1
(2, 3)
2
3
List iteration is very similar:
>>> elts = [1, [2, 3]]
>>> for elt in elts:
... print(elt)
1
[2, 3]
>>> for elt in pandoc.iter(elts):
... print(elt)
[1, [2, 3]]
1
[2, 3]
2
3
For maps/dicts, tree iteration combine recursivity and iteration on key-value
pairs, while standard iteration is flat and iterates on keys only. In other
words, tree iteration adds recursivity to the dict items
iterator:
>>> elts = {"a": True, "b": [1, 2]}
>>> for elt in elts:
... print(elt)
a
b
>>> for elt in elts.items():
... print(elt)
('a', True)
('b', [1, 2])
>>> for elt in pandoc.iter(elts):
... print(elt)
{'a': True, 'b': [1, 2]}
('a', True)
a
True
('b', [1, 2])
b
[1, 2]
1
2
Path¶
Principles¶
The function pandoc.iter
accepts an optional boolean argument path
.
When it is set to True
, the iteration returns elt, path
pairs.
In this pair, elt
is equal to what the iteration with path
set to
False
would have yielded and path
contains additional
information about the location of elt
in the iteration root.
Path is a list of (holder, i)
pairs which is not empty unless elt
is root
and such that:
-
the first holder in the path is the root of the iteration,
-
the i-th item in holder is the next holder in the path ...
-
or
elt
if we are at the end of the path.
Here i-th item in holder should be understood as holder[i]
unless holder
is a dict. In this special case, it would be its i-th key-value pair:
def getitem(elt, i):
if isinstance(elt, dict):
elt = elt.items()
return elt[i]
In any case, the following assertion is always valid:
def check(root, elt, path):
if path == []:
assert elt is root
else:
assert path[0][0] is root
for i, (holder, index) in enumerate(path):
next_elt = getitem(holder, index)
if i < len(path) - 1:
assert next_elt is path[i+1][0]
else:
assert next_elt is elt
And indeed, if we consider the following document:
doc = pandoc.read("""
# Title
Content
""")
the check works at any level:
>>> for elt, path in pandoc.iter(doc, path=True):
... check(doc, elt, path)
Use cases¶
The length of path
provides the depth of elt
with respect to the root:
>>> for elt, path in pandoc.iter(doc, path=True):
... print(f"{len(path)} - {elt!r}")
0 - Pandoc(Meta({}), [Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])])
1 - Meta({})
2 - {}
1 - [Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])]
2 - Header(1, ('title', [], []), [Str('Title')])
3 - 1
3 - ('title', [], [])
4 - 'title'
4 - []
4 - []
3 - [Str('Title')]
4 - Str('Title')
5 - 'Title'
2 - Para([Str('Content')])
3 - [Str('Content')]
4 - Str('Content')
5 - 'Content'
The latest item of path
provides the parent of the current element
and its index in this parent:
>>> for elt, path in pandoc.iter(doc, path=True):
... try:
... holder, index = path[-1]
... print(f"{elt!r} == {holder!r}[{index}]")
... except IndexError:
... assert elt is doc
Meta({}) == Pandoc(Meta({}), [Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])])[0]
{} == Meta({})[0]
[Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])] == Pandoc(Meta({}), [Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])])[1]
Header(1, ('title', [], []), [Str('Title')]) == [Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])][0]
1 == Header(1, ('title', [], []), [Str('Title')])[0]
('title', [], []) == Header(1, ('title', [], []), [Str('Title')])[1]
'title' == ('title', [], [])[0]
[] == ('title', [], [])[1]
[] == ('title', [], [])[2]
[Str('Title')] == Header(1, ('title', [], []), [Str('Title')])[2]
Str('Title') == [Str('Title')][0]
'Title' == Str('Title')[0]
Para([Str('Content')]) == [Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])][1]
[Str('Content')] == Para([Str('Content')])[0]
Str('Content') == [Str('Content')][0]
'Content' == Str('Content')[0]
Grand-parents are available in the previous path items, all the way up to the root, allowing us to locate the current element with respect to the root if needed:
>>> for elt, path in pandoc.iter(doc, path=True):
... indices = [i for holder, i in path]
... z = "".join(f"[{i}]" for i in indices)
... print(f"doc{z} == {elt!r}")
doc == Pandoc(Meta({}), [Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])])
doc[0] == Meta({})
doc[0][0] == {}
doc[1] == [Header(1, ('title', [], []), [Str('Title')]), Para([Str('Content')])]
doc[1][0] == Header(1, ('title', [], []), [Str('Title')])
doc[1][0][0] == 1
doc[1][0][1] == ('title', [], [])
doc[1][0][1][0] == 'title'
doc[1][0][1][1] == []
doc[1][0][1][2] == []
doc[1][0][2] == [Str('Title')]
doc[1][0][2][0] == Str('Title')
doc[1][0][2][0][0] == 'Title'
doc[1][1] == Para([Str('Content')])
doc[1][1][0] == [Str('Content')]
doc[1][1][0][0] == Str('Content')
doc[1][1][0][0][0] == 'Content'
-
any custom pandoc type that can be instantiated. If needed, refer to the kind of types section of the documentation for additional explanations. ↩