Labs 🧪¶
Warning
The pandoc.labs
module is an experiment ; its interface is highly
unstable. Don't build anything serious on top of it!
import pandoc
from pandoc.types import *
from pandoc.labs import *
HELLOWORLD_DOC = pandoc.read("Hello world!")
from urllib.request import urlopen
PATH = "raw.githubusercontent.com/commonmark/commonmark-spec"
HASH = "499ebbad90163881f51498c4c620652d0c66fb2e" # pinned version
URL = f"https://{PATH}/{HASH}/spec.txt"
COMMONMARK_SPEC = urlopen(URL).read().decode("utf-8")
COMMONMARK_DOC = pandoc.read(COMMONMARK_SPEC)
>>> HELLOWORLD_DOC
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> query(HELLOWORLD_DOC)
- Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
TODO. Explain what query does: a collection which stores single or multiple document elements on which parallel operations can be applied and that "automagically" know their location within the root document. Also "no-failure" flavor (operations don't fail, they return the empty collection)
>>> q = query(HELLOWORLD_DOC)
>>> isinstance(q, Query)
True
TODO: consider a change of name for Query
: Results
, Match
, Collection
,
etc?
At this stage, the query only contains the document itself.
>>> q
- Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
Search by type¶
The find
method allows to select items within the initial collection.
To begin with, we can search items by type:
>>> q.find(Meta)
- Meta({})
>>> q.find(Para)
- Para([Str('Hello'), Space(), Str('world!')])
Abstract types can also be used:
>>> q.find(Block)
- Para([Str('Hello'), Space(), Str('world!')])
>>> q.find(Inline)
- Str('Hello')
- Space()
- Str('world!')
To find all pandoc elements:
>>> q.find(Type)
- Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
- Meta({})
- Para([Str('Hello'), Space(), Str('world!')])
- Str('Hello')
- Space()
- Str('world!')
Finding python builtin types works too:
>>> q.find(dict)
- {}
>>> q.find(list)
- [Para([Str('Hello'), Space(), Str('world!')])]
- [Str('Hello'), Space(), Str('world!')]
>>> q.find(str)
- 'Hello'
- 'world!'
To get every possible item, in document order, we can search for Python objects:
>>> q.find(object)
- Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
- Meta({})
- {}
- [Para([Str('Hello'), Space(), Str('world!')])]
- Para([Str('Hello'), Space(), Str('world!')])
- [Str('Hello'), Space(), Str('world!')]
- Str('Hello')
- 'Hello'
- Space()
- Str('world!')
- 'world!'
Selectors¶
Types are not the only possible selectors. Predicates -- functions that take a pandoc element and return a boolean value -- can be used too:
>>> def startswith_H(elt):
... return isinstance(elt, Str) and elt[0].startswith("H")
...
>>> q.find(startswith_H)
- Str('Hello')
You can use predicate to define and select "virtual types" in a document. For example,
def AttrHolder(elt):
return isinstance(elt, (Code, Link, Image, Span, Div, CodeBlock, Header, Table))
TODO: match by attributes (id, classes, key-values); use keyword arguments in find with "or" semantics for lists; allow for predicates. For key values match, match for key existence, key-value pair, predicate as a whole or just for value.
Combine requirements¶
We can search for items that match one of several conditions:
>>> q.find(Str, Space)
- Str('Hello')
- Space()
- Str('world!')
If the list of arguments is empty, everything is a match:
>>> q.find()
- Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
- Meta({})
- {}
- [Para([Str('Hello'), Space(), Str('world!')])]
- Para([Str('Hello'), Space(), Str('world!')])
- [Str('Hello'), Space(), Str('world!')]
- Str('Hello')
- 'Hello'
- Space()
- Str('world!')
- 'world!'
False
>>> bool(q.find(float))
False
>>> if not q.find(float):
... print("no result")
no result
To add match several conditions at once, the filter
method can be used:
>>> q.find(Inline).filter(Str)
- Str('Hello')
- Str('world!')
filter
method can be used implicitly: a query is callable
>>> q.find(Inline)(Str)
- Str('Hello')
- Str('world!')
We can also match the negation of a condition
>>> q.find(Inline)(not_(Space))
- Str('Hello')
- Str('world!')
Navigation¶
TODO. Parent, children, next, previous, next_sibling, previous_sibling.
>>> q
- Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> q.next
- Meta({})
>>> q.next.next
- {}
>>> q.next.next.next
- [Para([Str('Hello'), Space(), Str('world!')])]
>>> q.next.next.next.next
- Para([Str('Hello'), Space(), Str('world!')])
>>> q.next.next.next.next.next
- [Str('Hello'), Space(), Str('world!')]
>>> q.next.next.next.next.next.next
- Str('Hello')
>>> q.next.next.next.next.next.next.next
- 'Hello'
>>> q.next.next.next.next.next.next.next.next
- Space()
>>> q.next.next.next.next.next.next.next.next.next
- Str('world!')
>>> q.next.next.next.next.next.next.next.next.next.next
- 'world!'
>>> q.next.next.next.next.next.next.next.next.next.next.next
<BLANKLINE>
>>> q.find(str)
- 'Hello'
- 'world!'
>>> q.find(str)[1]
- 'world!'
>>> w = q.find(str)[1]
>>> w.previous
- Str('world!')
>>> w.previous.previous
- Space()
>>> w.previous.previous.previous
- 'Hello'
>>> w.previous.previous.previous.previous
- Str('Hello')
>>> w.previous.previous.previous.previous.previous
- [Str('Hello'), Space(), Str('world!')]
>>> w.previous.previous.previous.previous.previous.previous
- Para([Str('Hello'), Space(), Str('world!')])
>>> w.previous.previous.previous.previous.previous.previous.previous
- [Para([Str('Hello'), Space(), Str('world!')])]
>>> w.previous.previous.previous.previous.previous.previous.previous.previous
- {}
>>> w.previous.previous.previous.previous.previous.previous.previous.previous.previous
- Meta({})
>>> w.previous.previous.previous.previous.previous.previous.previous.previous.previous.previous
- Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> w.previous.previous.previous.previous.previous.previous.previous.previous.previous.previous.previous
<BLANKLINE>
Nota: finding lists of inlines is difficult; finding non-empty lists of inlines is easy, but empty lists is harder, we need to use some knowledge of the type hierarchy.