API Reference¶

import pandoc
from pandoc.types import *

`pandoc`¶

read(source=None, file=None, format=None, options=None)

Read a source document.

The source document must be specified by either source or file. Implicitly, the document format is inferred from the filename extension when possible¹, otherwise the markdown format is assumed by default; the input format can also be specified explicitly. Extra options can be passed to the pandoc command-line tool.

Arguments

source: the document content, as a string or as utf-8 encoded bytes.
file: the document, provided as a file or filename.
format: the document format (such as "markdown", "odt", "docx", "html", etc.)

Refer to Pandoc's README for the list of supported input formats.
options: additional pandoc options (a list of strings).

Refer to Pandoc's user guide for a complete list of options.

Returns

doc: the document, as a Pandoc object.

Usage

Read documents from strings:

>>> markdown = "Hello world!"
>>> pandoc.read(markdown)
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> html = "<p>Hello world!</p>"
>>> pandoc.read(html, format="html")
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])

Read documents from files:

>>> filename = "doc.html"
>>> with open(filename, "w", encoding="utf-8") as file:
...     _ = file.write(html)
>>> pandoc.read(file=filename) # html format inferred from filename
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> file = open(filename, encoding="utf-8")
>>> pandoc.read(file=file, format="html") # but here it must be explicit
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])

Use extra pandoc options:

>>> pandoc.read(markdown, options=["-M", "id=hello"]) # add metadata
Pandoc(Meta({'id': MetaString('hello')}), [Para([Str('Hello'), Space(), Str('world!')])])

write(doc, file=None, format=None, options=None)

Write a pandoc document (or document fragment) to a file and return its contents.

Inline document fragments are automatically wrapped into a Plain blocks; block document fragments are automatically wrapped into a Pandoc element with no metadata.

Implicitly, the document format is inferred from the filename extension when possible¹, otherwise the markdown format is assumed by default; the output format can also be specified explicitly. Extra options can be passed to the pandoc command-line tool.

Arguments

doc: a Pandoc object or a document fragment (Inline, [Inline], MetaInlines, Block, [Block] or MetaBlocks).
file: a file, filename or None.
format: the document format (such as "markdown", "odt", "docx", "html", etc.)

Refer to Pandoc's README for the list of supported output formats.
options: additional pandoc options (a list of strings).

Refer to Pandoc's user guide for a complete list of options.

Returns

output: the output document, as a string or as a byte sequence.

Bytes are only used for binary output formats (doc, ppt, etc.).

Usage

Write documents to markdown strings:

>>> doc = pandoc.read("Hello world!")
>>> doc
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> print(pandoc.write(doc))  # doctest: +NORMALIZE_WHITESPACE
Hello world!

Write document fragments to markdown strings:

>>> md = lambda elt: print(pandoc.write(elt))
>>> md(Str("Hello!")) # doctest: +NORMALIZE_WHITESPACE
Hello!
>>> md([Str('Hello'), Space(), Str('world!')]) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md(Para([Str('Hello'), Space(), Str('world!')])) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md([ # doctest: +NORMALIZE_WHITESPACE
...     Para([Str('Hello'), Space(), Str('world!')]),
...     Para([Str('Hello'), Space(), Str('world!')])
... ])
Hello world!
<BLANKLINE>
Hello world!
>>> md(MetaInlines([Str('Hello'), Space(), Str('world!')])) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md(MetaBlocks([ # doctest: +NORMALIZE_WHITESPACE
...     Para([Str('Hello'), Space(), Str('world!')]),
...     Para([Str('Hello'), Space(), Str('world!')])
... ]))
Hello world!
<BLANKLINE>
Hello world!

Use alternate (text or binary) output formats:

>>> output = pandoc.write(doc, format="html") # html output
>>> type(output)
<class 'str'>
>>> print(output)
<p>Hello world!</p>
<BLANKLINE>
>>> output = pandoc.write(doc, format="odt")
>>> type(output)
<class 'bytes'>
>>> output # doctest: +ELLIPSIS
b'PK...'

Write documents to files:

>>> _ = pandoc.write(doc, file="doc.md")
>>> open("doc.md", encoding="utf-8").read()
'Hello world!\n'
>>> _ = pandoc.write(doc, file="doc.html")
>>> open("doc.html", encoding="utf-8").read()
'<p>Hello world!</p>\n'
>>> _ = pandoc.write(doc, file="doc.pdf")
>>> open("doc.pdf", "rb").read() # doctest: +ELLIPSIS
b'%PDF...'

Use extra pandoc options:

>>> output = pandoc.write(
...     doc, 
...     format="html", 
...     options=["--standalone", "-V", "lang=en"]
... )
>>> print(output) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
...
<body>
<p>Hello world!</p>
</body>
</html>

iter(elt, path=False)

Iterate on document elements in document order.

Arguments

elt: a pandoc item (or more generally any Python object),
path: a boolean; defaults to False.

Returns

iterator: a depth-first tree iterator.
elt_path (when path==True): a list of (elt, index) pairs.

Usage

This iterator may be used as a general-purpose tree iterator

>>> tree = [1, [2, [3]]]
>>> for elt in pandoc.iter(tree):
...     print(elt)
[1, [2, [3]]]
1
[2, [3]]
2
[3]
3

Non-iterable objects yield themselves:

>>> root = 1
>>> for elt in pandoc.iter(root):
...     print(elt)
1

But it is really meant to be used with pandoc objects:

>>> doc = Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> for elt in pandoc.iter(doc):
...     print(elt)
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
Meta({})
{}
[Para([Str('Hello'), Space(), Str('world!')])]
Para([Str('Hello'), Space(), Str('world!')])
[Str('Hello'), Space(), Str('world!')]
Str('Hello')
Hello
Space()
Str('world!')
world!

Two gotchas: characters in strings are not iterated (strings are considered "atomic")

>>> root = "Hello world!"
>>> for elt in pandoc.iter(root):
...     print(elt)
Hello world!

and dicts yield their key-value pairs (and not only their keys):

>>> root = {"a": 1, "b": 2}
>>> for elt in pandoc.iter(root):
...      print(elt)
{'a': 1, 'b': 2}
('a', 1)
a
1
('b', 2)
b
2

Use path=True when you need to locate the element in the document. You can get the element parent and index within this parent as path[-1], the grand-parent and the index of the parent within the grand-parent as path[-2], etc. up to the document root.

>>> doc = Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> world = Str("world!")
>>> for elt, path in pandoc.iter(doc, path=True): # find the path to Str("world!")
...     if elt == world:
...         break
>>> for elt, index in path:
...     print(f"At index {index} in {elt}:")
... else:
...     print(world)
At index 1 in Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])]):
At index 0 in [Para([Str('Hello'), Space(), Str('world!')])]:
At index 0 in Para([Str('Hello'), Space(), Str('world!')]):
At index 2 in [Str('Hello'), Space(), Str('world!')]:
Str('world!')

Arguments

auto: a boolean; defaults to False; set to True to infer the configuration from the pandoc in your path.
path: the path to the pandoc executable, such as "/usr/bin/pandoc".
version: the pandoc command-line tool version, such as "2.14.2".
pandoc_types_version: the pandoc-types version, such as "1.22.1".
read: a boolean; defaults to False. Return the configuration dictionary.
reset: a boolean; defaults to False. Delete the current configuration.

Returns

configuration (if read==True): the configuration dictionary, with entries "auto", "path", "version" and "pandoc_types_version".

Usage

The configuration step is triggered when you import pandoc.types or call pandoc.read or pandoc.write and will automatically infer the configuration from the pandoc executable found in the path (or fails).

>>> config = pandoc.configure(read=True)
>>> config # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'auto': True, 
 'path': ..., 
 'version': '3.2.1', 
 'pandoc_types_version': '1.23.1'}

To avoid this, call pandoc.configure(...) yourself beforehand. Alternatively, select manually your pandoc executable afterwards:

>>> pandoc.configure(reset=True)
>>> pandoc.configure(read=True) is None
True
>>> config["auto"] = False
>>> pandoc.configure(**config)
>>> pandoc.configure(read=True) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'auto': False, 
 'path': ..., 
 'version': '3.2.1', 
 'pandoc_types_version': '1.23.1'}

`pandoc.types`¶

AlignCenter

Concrete data type

Signature

AlignCenter()

AlignDefault

Concrete data type

Signature

AlignDefault()

AlignLeft

Concrete data type

Signature

AlignLeft()

AlignRight

Concrete data type

Signature

AlignRight()

Alignment

Abstract data type

Signature

Alignment = AlignLeft()
          | AlignRight()
          | AlignCenter()
          | AlignDefault()

Signature

Attr = (Text, [Text], [(Text, Text)])

Signature

AuthorInText()

Block

Abstract data type

Signature

Block = Plain([Inline])
      | Para([Inline])
      | LineBlock([[Inline]])
      | CodeBlock(Attr, Text)
      | RawBlock(Format, Text)
      | BlockQuote([Block])
      | OrderedList(ListAttributes, [[Block]])
      | BulletList([[Block]])
      | DefinitionList([([Inline], [[Block]])])
      | Header(Int, Attr, [Inline])
      | HorizontalRule()
      | Table(Attr, Caption, [ColSpec], TableHead, [TableBody], TableFoot)
      | Figure(Attr, Caption, [Block])
      | Div(Attr, [Block])

Signature

BlockQuote([Block])

Signature

bool

BulletList

Concrete data type

Signature

BulletList([[Block]])

Signature

Caption(ShortCaption or None, [Block])

Signature

Cell(Attr, Alignment, RowSpan, ColSpan, [Block])

Signature

Citation(Text, [Inline], [Inline], CitationMode, Int, Int)

Signature

CitationMode = AuthorInText()
             | SuppressAuthor()
             | NormalCitation()

Signature

Cite([Citation], [Inline])

Signature

Code(Attr, Text)

Signature

CodeBlock(Attr, Text)

Signature

ColSpan(Int)

Signature

ColSpec = (Alignment, ColWidth)

Signature

ColWidth = ColWidth_(Double)
         | ColWidthDefault()

Signature

ColWidthDefault()

ColWidth_

Concrete data type

Signature

ColWidth_(Double)

Signature

Decimal()

DefaultDelim

Concrete data type

Signature

DefaultDelim()

DefaultStyle

Concrete data type

Signature

DefaultStyle()

DefinitionList

Concrete data type

Signature

DefinitionList([([Inline], [[Block]])])

Signature

DisplayMath()

Div

Concrete data type

Signature

Div(Attr, [Block])

Signature

float

DoubleQuote

Concrete data type

Signature

DoubleQuote()

Emph

Concrete data type

Signature

Emph([Inline])

Signature

Example()

Figure

Concrete data type

Signature

Figure(Attr, Caption, [Block])

Signature

Format(Text)

Signature

Header(Int, Attr, [Inline])

Signature

HorizontalRule()

Image

Concrete data type

Signature

Image(Attr, [Inline], Target)

Signature

Inline = Str(Text)
       | Emph([Inline])
       | Underline([Inline])
       | Strong([Inline])
       | Strikeout([Inline])
       | Superscript([Inline])
       | Subscript([Inline])
       | SmallCaps([Inline])
       | Quoted(QuoteType, [Inline])
       | Cite([Citation], [Inline])
       | Code(Attr, Text)
       | Space()
       | SoftBreak()
       | LineBreak()
       | Math(MathType, Text)
       | RawInline(Format, Text)
       | Link(Attr, [Inline], Target)
       | Image(Attr, [Inline], Target)
       | Note([Block])
       | Span(Attr, [Inline])

Signature

InlineMath()

Int

Primitive type

Signature

int

LineBlock

Concrete data type

Signature

LineBlock([[Inline]])

Signature

LineBreak()

Link

Concrete data type

Signature

Link(Attr, [Inline], Target)

Signature

ListAttributes = (Int, ListNumberStyle, ListNumberDelim)

Signature

ListNumberDelim = DefaultDelim()
                | Period()
                | OneParen()
                | TwoParens()

Signature

ListNumberStyle = DefaultStyle()
                | Example()
                | Decimal()
                | LowerRoman()
                | UpperRoman()
                | LowerAlpha()
                | UpperAlpha()

Signature

LowerAlpha()

LowerRoman

Concrete data type

Signature

LowerRoman()

Math

Concrete data type

Signature

Math(MathType, Text)

Signature

MathType = DisplayMath()
         | InlineMath()

Signature

Meta({Text: MetaValue})

Signature

MetaBlocks([Block])

Signature

MetaBool(Bool)

Signature

MetaInlines([Inline])

Signature

MetaList([MetaValue])

Signature

MetaMap({Text: MetaValue})

Signature

MetaString(Text)

Signature

MetaValue = MetaMap({Text: MetaValue})
          | MetaList([MetaValue])
          | MetaBool(Bool)
          | MetaString(Text)
          | MetaInlines([Inline])
          | MetaBlocks([Block])

Signature

NormalCitation()

Note

Concrete data type

Signature

Note([Block])

Signature

OneParen()

OrderedList

Concrete data type

Signature

OrderedList(ListAttributes, [[Block]])

Signature

Pandoc(Meta, [Block])

Signature

Para([Inline])

Signature

Period()

Plain

Concrete data type

Signature

Plain([Inline])

Signature

QuoteType = SingleQuote()
          | DoubleQuote()

Signature

Quoted(QuoteType, [Inline])

Signature

RawBlock(Format, Text)

Signature

RawInline(Format, Text)

Signature

Row(Attr, [Cell])

Signature

RowHeadColumns(Int)

Signature

RowSpan(Int)

Signature

ShortCaption = [Inline]

Signature

SingleQuote()

SmallCaps

Concrete data type

Signature

SmallCaps([Inline])

Signature

SoftBreak()

Space

Concrete data type

Signature

Space()

Span

Concrete data type

Signature

Span(Attr, [Inline])

Signature

Str(Text)

Signature

Strikeout([Inline])

Signature

str

Strong

Concrete data type

Signature

Strong([Inline])

Signature

Subscript([Inline])

Signature

Superscript([Inline])

Signature

SuppressAuthor()

Table

Concrete data type

Signature

Table(Attr, Caption, [ColSpec], TableHead, [TableBody], TableFoot)

Signature

TableBody(Attr, RowHeadColumns, [Row], [Row])

Signature

TableFoot(Attr, [Row])

Signature

TableHead(Attr, [Row])

Signature

Target = (Text, Text)

Signature

str

TwoParens

Concrete data type

Signature

TwoParens()

Underline

Concrete data type

Signature

Underline([Inline])

Signature

UpperAlpha()

UpperRoman

Concrete data type

Signature

UpperRoman()

refer to Pandoc's heuristics for the gory details of this inference. ↩↩

API Reference¶

pandoc¶

Arguments

Returns

Usage

Arguments

Returns

Usage

Arguments

Returns

Usage

See also

Arguments

Returns

Usage

See also

pandoc.types¶

Signature

Signature

Signature

Signature

Signature

See also

Signature

See also

Signature

Signature

See also

Signature

See also

Signature

Signature

See also

Signature

See also

Signature

See also

Signature

See also

Signature

See also

Signature

See also

Signature

See also

Signature

See also

Signature

See also

Signature

See also

Signature

See also

Signature

Signature

See also

Signature

Signature

Signature

Signature

See also

Signature

Signature

See also

Signature

Signature

Signature

See also

Signature

Signature

See also

Signature

See also

Signature

See also

Signature

Signature

See also

Signature

See also

`pandoc`¶

`pandoc.types`¶