Skip to content

API Reference

import pandoc
from pandoc.types import *

pandoc

read(source=None, file=None, format=None, options=None)

Read a source document.

The source document must be specified by either source or file. Implicitly, the document format is inferred from the filename extension when possible1, otherwise the markdown format is assumed by default; the input format can also be specified explicitly. Extra options can be passed to the pandoc command-line tool.

Arguments

  • source: the document content, as a string or as utf-8 encoded bytes.

  • file: the document, provided as a file or filename.

  • format: the document format (such as "markdown", "odt", "docx", "html", etc.)

    Refer to Pandoc's README for the list of supported input formats.

  • options: additional pandoc options (a list of strings).

    Refer to Pandoc's user guide for a complete list of options.

Returns

  • doc: the document, as a Pandoc object.

Usage

Read documents from strings:

>>> markdown = "Hello world!"
>>> pandoc.read(markdown)
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> html = "<p>Hello world!</p>"
>>> pandoc.read(html, format="html")
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])

Read documents from files:

>>> filename = "doc.html"
>>> with open(filename, "w", encoding="utf-8") as file:
...     _ = file.write(html)
>>> pandoc.read(file=filename) # html format inferred from filename
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> file = open(filename, encoding="utf-8")
>>> pandoc.read(file=file, format="html") # but here it must be explicit
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])

Use extra pandoc options:

>>> pandoc.read(markdown, options=["-M", "id=hello"]) # add metadata
Pandoc(Meta({'id': MetaString('hello')}), [Para([Str('Hello'), Space(), Str('world!')])])
write(doc, file=None, format=None, options=None)

Write a pandoc document (or document fragment) to a file and return its contents.

Inline document fragments are automatically wrapped into a Plain blocks; block document fragments are automatically wrapped into a Pandoc element with no metadata.

Implicitly, the document format is inferred from the filename extension when possible1, otherwise the markdown format is assumed by default; the output format can also be specified explicitly. Extra options can be passed to the pandoc command-line tool.

Arguments

  • doc: a Pandoc object or a document fragment (Inline, [Inline], MetaInlines, Block, [Block] or MetaBlocks).

  • file: a file, filename or None.

  • format: the document format (such as "markdown", "odt", "docx", "html", etc.)

    Refer to Pandoc's README for the list of supported output formats.

  • options: additional pandoc options (a list of strings).

    Refer to Pandoc's user guide for a complete list of options.

Returns

  • output: the output document, as a string or as a byte sequence.

    Bytes are only used for binary output formats (doc, ppt, etc.).

Usage

Write documents to markdown strings:

>>> doc = pandoc.read("Hello world!")
>>> doc
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> print(pandoc.write(doc))  # doctest: +NORMALIZE_WHITESPACE
Hello world!

Write document fragments to markdown strings:

>>> md = lambda elt: print(pandoc.write(elt))
>>> md(Str("Hello!")) # doctest: +NORMALIZE_WHITESPACE
Hello!
>>> md([Str('Hello'), Space(), Str('world!')]) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md(Para([Str('Hello'), Space(), Str('world!')])) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md([ # doctest: +NORMALIZE_WHITESPACE
...     Para([Str('Hello'), Space(), Str('world!')]),
...     Para([Str('Hello'), Space(), Str('world!')])
... ])
Hello world!
<BLANKLINE>
Hello world!
>>> md(MetaInlines([Str('Hello'), Space(), Str('world!')])) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md(MetaBlocks([ # doctest: +NORMALIZE_WHITESPACE
...     Para([Str('Hello'), Space(), Str('world!')]),
...     Para([Str('Hello'), Space(), Str('world!')])
... ]))
Hello world!
<BLANKLINE>
Hello world!

Use alternate (text or binary) output formats:

>>> output = pandoc.write(doc, format="html") # html output
>>> type(output)
<class 'str'>
>>> print(output)
<p>Hello world!</p>
<BLANKLINE>
>>> output = pandoc.write(doc, format="odt")
>>> type(output)
<class 'bytes'>
>>> output # doctest: +ELLIPSIS
b'PK...'

Write documents to files:

>>> _ = pandoc.write(doc, file="doc.md")
>>> open("doc.md", encoding="utf-8").read()
'Hello world!\n'
>>> _ = pandoc.write(doc, file="doc.html")
>>> open("doc.html", encoding="utf-8").read()
'<p>Hello world!</p>\n'
>>> _ = pandoc.write(doc, file="doc.pdf")
>>> open("doc.pdf", "rb").read() # doctest: +ELLIPSIS
b'%PDF...'

Use extra pandoc options:

>>> output = pandoc.write(
...     doc, 
...     format="html", 
...     options=["--standalone", "-V", "lang=en"]
... )
>>> print(output) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
...
<body>
<p>Hello world!</p>
</body>
</html>
iter(elt, path=False)

Iterate on document elements in document order.

Arguments

  • elt: a pandoc item (or more generally any Python object),

  • path: a boolean; defaults to False.

Returns

  • iterator: a depth-first tree iterator.

  • elt_path (when path==True): a list of (elt, index) pairs.

Usage

This iterator may be used as a general-purpose tree iterator

>>> tree = [1, [2, [3]]]
>>> for elt in pandoc.iter(tree):
...     print(elt)
[1, [2, [3]]]
1
[2, [3]]
2
[3]
3

Non-iterable objects yield themselves:

>>> root = 1
>>> for elt in pandoc.iter(root):
...     print(elt)
1

But it is really meant to be used with pandoc objects:

>>> doc = Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> for elt in pandoc.iter(doc):
...     print(elt)
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
Meta({})
{}
[Para([Str('Hello'), Space(), Str('world!')])]
Para([Str('Hello'), Space(), Str('world!')])
[Str('Hello'), Space(), Str('world!')]
Str('Hello')
Hello
Space()
Str('world!')
world!

Two gotchas: characters in strings are not iterated (strings are considered "atomic")

>>> root = "Hello world!"
>>> for elt in pandoc.iter(root):
...     print(elt)
Hello world!

and dicts yield their key-value pairs (and not only their keys):

>>> root = {"a": 1, "b": 2}
>>> for elt in pandoc.iter(root):
...      print(elt)
{'a': 1, 'b': 2}
('a', 1)
a
1
('b', 2)
b
2

Use path=True when you need to locate the element in the document. You can get the element parent and index within this parent as path[-1], the grand-parent and the index of the parent within the grand-parent as path[-2], etc. up to the document root.

>>> doc = Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> world = Str("world!")
>>> for elt, path in pandoc.iter(doc, path=True): # find the path to Str("world!")
...     if elt == world:
...         break
>>> for elt, index in path:
...     print(f"At index {index} in {elt}:")
... else:
...     print(world)
At index 1 in Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])]):
At index 0 in [Para([Str('Hello'), Space(), Str('world!')])]:
At index 0 in Para([Str('Hello'), Space(), Str('world!')]):
At index 2 in [Str('Hello'), Space(), Str('world!')]:
Str('world!')

See also

Refer to the Tree iteration section.

configure(auto=False, path=None, version=None, pandoc_types_version=None, read=False, reset=False)

Arguments

  • auto: a boolean; defaults to False; set to True to infer the configuration from the pandoc in your path.

  • path: the path to the pandoc executable, such as "/usr/bin/pandoc".

  • version: the pandoc command-line tool version, such as "2.14.2".

  • pandoc_types_version: the pandoc-types version, such as "1.22.1".

  • read: a boolean; defaults to False. Return the configuration dictionary.

  • reset: a boolean; defaults to False. Delete the current configuration.

Returns

  • configuration (if read==True): the configuration dictionary, with entries "auto", "path", "version" and "pandoc_types_version".

Usage

The configuration step is triggered when you import pandoc.types or call pandoc.read or pandoc.write and will automatically infer the configuration from the pandoc executable found in the path (or fails).

>>> config = pandoc.configure(read=True)
>>> config # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'auto': True, 
 'path': ..., 
 'version': '3.1.1', 
 'pandoc_types_version': '1.23'}
To avoid this, call pandoc.configure(...) yourself beforehand. Alternatively, select manually your pandoc executable afterwards:

>>> pandoc.configure(reset=True)
>>> pandoc.configure(read=True) is None
True
>>> config["auto"] = False
>>> pandoc.configure(**config)
>>> pandoc.configure(read=True) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'auto': False, 
 'path': ..., 
 'version': '3.1.1', 
 'pandoc_types_version': '1.23'}    

See also

Refer to the Configuration section.

pandoc.types

AlignCenter

Concrete data type

Signature

AlignCenter()
AlignDefault

Concrete data type

Signature

AlignDefault()
AlignLeft

Concrete data type

Signature

AlignLeft()
AlignRight

Concrete data type

Signature

AlignRight()
Alignment

Abstract data type

Signature

Alignment = AlignLeft()
          | AlignRight()
          | AlignCenter()
          | AlignDefault()

See also

AlignCenter, AlignDefault, AlignLeft, AlignRight.

Attr

Typedef

Signature

Attr = (Text, [Text], [(Text, Text)])

See also

Text.

AuthorInText

Concrete data type

Signature

AuthorInText()
Block

Abstract data type

Signature

Block = Plain([Inline])
      | Para([Inline])
      | LineBlock([[Inline]])
      | CodeBlock(Attr, Text)
      | RawBlock(Format, Text)
      | BlockQuote([Block])
      | OrderedList(ListAttributes, [[Block]])
      | BulletList([[Block]])
      | DefinitionList([([Inline], [[Block]])])
      | Header(Int, Attr, [Inline])
      | HorizontalRule()
      | Table(Attr, Caption, [ColSpec], TableHead, [TableBody], TableFoot)
      | Figure(Attr, Caption, [Block])
      | Div(Attr, [Block])

See also

Attr, BlockQuote, BulletList, Caption, CodeBlock, ColSpec, DefinitionList, Div, Figure, Format, Header, HorizontalRule, Inline, Int, LineBlock, ListAttributes, OrderedList, Para, Plain, RawBlock, Table, TableBody, TableFoot, TableHead, Text.

BlockQuote

Concrete data type

Signature

BlockQuote([Block])

See also

Block.

Bool

Primitive type

Signature

bool
BulletList

Concrete data type

Signature

BulletList([[Block]])

See also

Block.

Caption

Concrete data type

Signature

Caption(ShortCaption or None, [Block])

See also

Block, ShortCaption.

Cell

Concrete data type

Signature

Cell(Attr, Alignment, RowSpan, ColSpan, [Block])

See also

Alignment, Attr, Block, ColSpan, RowSpan.

Citation

Concrete data type

Signature

Citation(Text, [Inline], [Inline], CitationMode, Int, Int)

See also

CitationMode, Inline, Int, Text.

CitationMode

Abstract data type

Signature

CitationMode = AuthorInText()
             | SuppressAuthor()
             | NormalCitation()

See also

AuthorInText, NormalCitation, SuppressAuthor.

Cite

Concrete data type

Signature

Cite([Citation], [Inline])

See also

Citation, Inline.

Code

Concrete data type

Signature

Code(Attr, Text)

See also

Attr, Text.

CodeBlock

Concrete data type

Signature

CodeBlock(Attr, Text)

See also

Attr, Text.

ColSpan

Concrete data type

Signature

ColSpan(Int)

See also

Int.

ColSpec

Typedef

Signature

ColSpec = (Alignment, ColWidth)

See also

Alignment, ColWidth.

ColWidth

Abstract data type

Signature

ColWidth = ColWidth_(Double)
         | ColWidthDefault()

See also

ColWidthDefault, ColWidth_, Double.

ColWidthDefault

Concrete data type

Signature

ColWidthDefault()
ColWidth_

Concrete data type

Signature

ColWidth_(Double)

See also

Double.

Decimal

Concrete data type

Signature

Decimal()
DefaultDelim

Concrete data type

Signature

DefaultDelim()
DefaultStyle

Concrete data type

Signature

DefaultStyle()
DefinitionList

Concrete data type

Signature

DefinitionList([([Inline], [[Block]])])

See also

Block, Inline.

DisplayMath

Concrete data type

Signature

DisplayMath()
Div

Concrete data type

Signature

Div(Attr, [Block])

See also

Attr, Block.

Double

Primitive type

Signature

float
DoubleQuote

Concrete data type

Signature

DoubleQuote()
Emph

Concrete data type

Signature

Emph([Inline])

See also

Inline.

Example

Concrete data type

Signature

Example()
Figure

Concrete data type

Signature

Figure(Attr, Caption, [Block])

See also

Attr, Block, Caption.

Format

Concrete data type

Signature

Format(Text)

See also

Text.

Header

Concrete data type

Signature

Header(Int, Attr, [Inline])

See also

Attr, Inline, Int.

HorizontalRule

Concrete data type

Signature

HorizontalRule()
Image

Concrete data type

Signature

Image(Attr, [Inline], Target)

See also

Attr, Inline, Target.

Inline

Abstract data type

Signature

Inline = Str(Text)
       | Emph([Inline])
       | Underline([Inline])
       | Strong([Inline])
       | Strikeout([Inline])
       | Superscript([Inline])
       | Subscript([Inline])
       | SmallCaps([Inline])
       | Quoted(QuoteType, [Inline])
       | Cite([Citation], [Inline])
       | Code(Attr, Text)
       | Space()
       | SoftBreak()
       | LineBreak()
       | Math(MathType, Text)
       | RawInline(Format, Text)
       | Link(Attr, [Inline], Target)
       | Image(Attr, [Inline], Target)
       | Note([Block])
       | Span(Attr, [Inline])

See also

Attr, Block, Citation, Cite, Code, Emph, Format, Image, LineBreak, Link, Math, MathType, Note, QuoteType, Quoted, RawInline, SmallCaps, SoftBreak, Space, Span, Str, Strikeout, Strong, Subscript, Superscript, Target, Text, Underline.

InlineMath

Concrete data type

Signature

InlineMath()
Int

Primitive type

Signature

int
LineBlock

Concrete data type

Signature

LineBlock([[Inline]])

See also

Inline.

LineBreak

Concrete data type

Signature

LineBreak()
Link

Concrete data type

Signature

Link(Attr, [Inline], Target)

See also

Attr, Inline, Target.

ListAttributes

Typedef

Signature

ListAttributes = (Int, ListNumberStyle, ListNumberDelim)

See also

Int, ListNumberDelim, ListNumberStyle.

ListNumberDelim

Abstract data type

Signature

ListNumberDelim = DefaultDelim()
                | Period()
                | OneParen()
                | TwoParens()

See also

DefaultDelim, OneParen, Period, TwoParens.

ListNumberStyle

Abstract data type

Signature

ListNumberStyle = DefaultStyle()
                | Example()
                | Decimal()
                | LowerRoman()
                | UpperRoman()
                | LowerAlpha()
                | UpperAlpha()

See also

Decimal, DefaultStyle, Example, LowerAlpha, LowerRoman, UpperAlpha, UpperRoman.

LowerAlpha

Concrete data type

Signature

LowerAlpha()
LowerRoman

Concrete data type

Signature

LowerRoman()
Math

Concrete data type

Signature

Math(MathType, Text)

See also

MathType, Text.

MathType

Abstract data type

Signature

MathType = DisplayMath()
         | InlineMath()

See also

DisplayMath, InlineMath.

Meta

Concrete data type

Signature

Meta({Text: MetaValue})

See also

MetaValue, Text.

MetaBlocks

Concrete data type

Signature

MetaBlocks([Block])

See also

Block.

MetaBool

Concrete data type

Signature

MetaBool(Bool)

See also

Bool.

MetaInlines

Concrete data type

Signature

MetaInlines([Inline])

See also

Inline.

MetaList

Concrete data type

Signature

MetaList([MetaValue])

See also

MetaValue.

MetaMap

Concrete data type

Signature

MetaMap({Text: MetaValue})

See also

MetaValue, Text.

MetaString

Concrete data type

Signature

MetaString(Text)

See also

Text.

MetaValue

Abstract data type

Signature

MetaValue = MetaMap({Text: MetaValue})
          | MetaList([MetaValue])
          | MetaBool(Bool)
          | MetaString(Text)
          | MetaInlines([Inline])
          | MetaBlocks([Block])

See also

Block, Bool, Inline, MetaBlocks, MetaBool, MetaInlines, MetaList, MetaMap, MetaString, Text.

NormalCitation

Concrete data type

Signature

NormalCitation()
Note

Concrete data type

Signature

Note([Block])

See also

Block.

OneParen

Concrete data type

Signature

OneParen()
OrderedList

Concrete data type

Signature

OrderedList(ListAttributes, [[Block]])

See also

Block, ListAttributes.

Pandoc

Concrete data type

Signature

Pandoc(Meta, [Block])

See also

Block, Meta.

Para

Concrete data type

Signature

Para([Inline])

See also

Inline.

Period

Concrete data type

Signature

Period()
Plain

Concrete data type

Signature

Plain([Inline])

See also

Inline.

QuoteType

Abstract data type

Signature

QuoteType = SingleQuote()
          | DoubleQuote()

See also

DoubleQuote, SingleQuote.

Quoted

Concrete data type

Signature

Quoted(QuoteType, [Inline])

See also

Inline, QuoteType.

RawBlock

Concrete data type

Signature

RawBlock(Format, Text)

See also

Format, Text.

RawInline

Concrete data type

Signature

RawInline(Format, Text)

See also

Format, Text.

Row

Concrete data type

Signature

Row(Attr, [Cell])

See also

Attr, Cell.

RowHeadColumns

Concrete data type

Signature

RowHeadColumns(Int)

See also

Int.

RowSpan

Concrete data type

Signature

RowSpan(Int)

See also

Int.

ShortCaption

Typedef

Signature

ShortCaption = [Inline]

See also

Inline.

SingleQuote

Concrete data type

Signature

SingleQuote()
SmallCaps

Concrete data type

Signature

SmallCaps([Inline])

See also

Inline.

SoftBreak

Concrete data type

Signature

SoftBreak()
Space

Concrete data type

Signature

Space()
Span

Concrete data type

Signature

Span(Attr, [Inline])

See also

Attr, Inline.

Str

Concrete data type

Signature

Str(Text)

See also

Text.

Strikeout

Concrete data type

Signature

Strikeout([Inline])

See also

Inline.

String

Primitive type

Signature

str
Strong

Concrete data type

Signature

Strong([Inline])

See also

Inline.

Subscript

Concrete data type

Signature

Subscript([Inline])

See also

Inline.

Superscript

Concrete data type

Signature

Superscript([Inline])

See also

Inline.

SuppressAuthor

Concrete data type

Signature

SuppressAuthor()
Table

Concrete data type

Signature

Table(Attr, Caption, [ColSpec], TableHead, [TableBody], TableFoot)

See also

Attr, Caption, ColSpec, TableBody, TableFoot, TableHead.

TableBody

Concrete data type

Signature

TableBody(Attr, RowHeadColumns, [Row], [Row])

See also

Attr, Row, RowHeadColumns.

TableFoot

Concrete data type

Signature

TableFoot(Attr, [Row])

See also

Attr, Row.

TableHead

Concrete data type

Signature

TableHead(Attr, [Row])

See also

Attr, Row.

Target

Typedef

Signature

Target = (Text, Text)

See also

Text.

Text

Primitive type

Signature

str
TwoParens

Concrete data type

Signature

TwoParens()
Underline

Concrete data type

Signature

Underline([Inline])

See also

Inline.

UpperAlpha

Concrete data type

Signature

UpperAlpha()
UpperRoman

Concrete data type

Signature

UpperRoman()

  1. refer to Pandoc's heuristics for the gory details of this inference.