Skip to content

API Reference

import pandoc
from pandoc.types import *

pandoc

read(source=None, file=None, format=None, options=None)

Read a source document.

The source document must be specified by either source or file. Implicitly, the document format is inferred from the filename extension when possible1, otherwise the markdown format is assumed by default; the input format can also be specified explicitly. Extra options can be passed to the pandoc command-line tool.

Arguments

  • source: the document content, as a string or as utf-8 encoded bytes.

  • file: the document, provided as a file or filename.

  • format: the document format (such as "markdown", "odt", "docx", "html", etc.)

    Refer to Pandoc's README for the list of supported input formats.

  • options: additional pandoc options (a list of strings).

    Refer to Pandoc's user guide for a complete list of options.

Returns

  • doc: the document, as a Pandoc object.

Usage

Read documents from strings:

>>> markdown = "Hello world!"
>>> pandoc.read(markdown)
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> html = "<p>Hello world!</p>"
>>> pandoc.read(html, format="html")
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])

Read documents from files:

>>> filename = "doc.html"
>>> with open(filename, "w", encoding="utf-8") as file:
...     _ = file.write(html)
>>> pandoc.read(file=filename) # html format inferred from filename
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> file = open(filename, encoding="utf-8")
>>> pandoc.read(file=file, format="html") # but here it must be explicit
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])

Use extra pandoc options:

>>> pandoc.read(markdown, options=["-M", "id=hello"]) # add metadata
Pandoc(Meta({'id': MetaString('hello')}), [Para([Str('Hello'), Space(), Str('world!')])])
write(doc, file=None, format=None, options=None)

Write a pandoc document (or document fragment) to a file and return its contents.

Inline document fragments are automatically wrapped into a Plain blocks; block document fragments are automatically wrapped into a Pandoc element with no metadata.

Implicitly, the document format is inferred from the filename extension when possible1, otherwise the markdown format is assumed by default; the output format can also be specified explicitly. Extra options can be passed to the pandoc command-line tool.

Arguments

  • doc: a Pandoc object or a document fragment (Inline, [Inline], MetaInlines, Block, [Block] or MetaBlocks).

  • file: a file, filename or None.

  • format: the document format (such as "markdown", "odt", "docx", "html", etc.)

    Refer to Pandoc's README for the list of supported output formats.

  • options: additional pandoc options (a list of strings).

    Refer to Pandoc's user guide for a complete list of options.

Returns

  • output: the output document, as a string or as a byte sequence.

    Bytes are only used for binary output formats (doc, ppt, etc.).

Usage

Write documents to markdown strings:

>>> doc = pandoc.read("Hello world!")
>>> doc
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> print(pandoc.write(doc))  # doctest: +NORMALIZE_WHITESPACE
Hello world!

Write document fragments to markdown strings:

>>> md = lambda elt: print(pandoc.write(elt))
>>> md(Str("Hello!")) # doctest: +NORMALIZE_WHITESPACE
Hello!
>>> md([Str('Hello'), Space(), Str('world!')]) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md(Para([Str('Hello'), Space(), Str('world!')])) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md([ # doctest: +NORMALIZE_WHITESPACE
...     Para([Str('Hello'), Space(), Str('world!')]),
...     Para([Str('Hello'), Space(), Str('world!')])
... ])
Hello world!
<BLANKLINE>
Hello world!
>>> md(MetaInlines([Str('Hello'), Space(), Str('world!')])) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md(MetaBlocks([ # doctest: +NORMALIZE_WHITESPACE
...     Para([Str('Hello'), Space(), Str('world!')]),
...     Para([Str('Hello'), Space(), Str('world!')])
... ]))
Hello world!
<BLANKLINE>
Hello world!

Use alternate (text or binary) output formats:

>>> output = pandoc.write(doc, format="html") # html output
>>> type(output)
<class 'str'>
>>> print(output)
<p>Hello world!</p>
<BLANKLINE>
>>> output = pandoc.write(doc, format="odt")
>>> type(output)
<class 'bytes'>
>>> output # doctest: +ELLIPSIS
b'PK...'

Write documents to files:

>>> _ = pandoc.write(doc, file="doc.md")
>>> open("doc.md", encoding="utf-8").read()
'Hello world!\n'
>>> _ = pandoc.write(doc, file="doc.html")
>>> open("doc.html", encoding="utf-8").read()
'<p>Hello world!</p>\n'
>>> _ = pandoc.write(doc, file="doc.pdf")
>>> open("doc.pdf", "rb").read() # doctest: +ELLIPSIS
b'%PDF...'

Use extra pandoc options:

>>> output = pandoc.write(
...     doc, 
...     format="html", 
...     options=["--standalone", "-V", "lang=en"]
... )
>>> print(output) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
...
<body>
<p>Hello world!</p>
</body>
</html>
iter(elt, path=False)

Iterate on document elements in document order.

Arguments

  • elt: a pandoc item (or more generally any Python object),

  • path: a boolean; defaults to False.

Returns

  • iterator: a depth-first tree iterator.

  • elt_path (when path==True): a list of (elt, index) pairs.

Usage

This iterator may be used as a general-purpose tree iterator

>>> tree = [1, [2, [3]]]
>>> for elt in pandoc.iter(tree):
...     print(elt)
[1, [2, [3]]]
1
[2, [3]]
2
[3]
3

Non-iterable objects yield themselves:

>>> root = 1
>>> for elt in pandoc.iter(root):
...     print(elt)
1

But it is really meant to be used with pandoc objects:

>>> doc = Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> for elt in pandoc.iter(doc):
...     print(elt)
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
Meta({})
{}
[Para([Str('Hello'), Space(), Str('world!')])]
Para([Str('Hello'), Space(), Str('world!')])
[Str('Hello'), Space(), Str('world!')]
Str('Hello')
Hello
Space()
Str('world!')
world!

Two gotchas: characters in strings are not iterated (strings are considered "atomic")

>>> root = "Hello world!"
>>> for elt in pandoc.iter(root):
...     print(elt)
Hello world!

and dicts yield their key-value pairs (and not only their keys):

>>> root = {"a": 1, "b": 2}
>>> for elt in pandoc.iter(root):
...      print(elt)
{'a': 1, 'b': 2}
('a', 1)
a
1
('b', 2)
b
2

Use path=True when you need to locate the element in the document. You can get the element parent and index within this parent as path[-1], the grand-parent and the index of the parent within the grand-parent as path[-2], etc. up to the document root.

>>> doc = Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> world = Str("world!")
>>> for elt, path in pandoc.iter(doc, path=True): # find the path to Str("world!")
...     if elt == world:
...         break
>>> for elt, index in path:
...     print(f"At index {index} in {elt}:")
... else:
...     print(world)
At index 1 in Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])]):
At index 0 in [Para([Str('Hello'), Space(), Str('world!')])]:
At index 0 in Para([Str('Hello'), Space(), Str('world!')]):
At index 2 in [Str('Hello'), Space(), Str('world!')]:
Str('world!')

See also

Refer to the Tree iteration section.

configure(auto=False, path=None, version=None, pandoc_types_version=None, read=False, reset=False)

Arguments

  • auto: a boolean; defaults to False; set to True to infer the configuration from the pandoc in your path.

  • path: the path to the pandoc executable, such as "/usr/bin/pandoc".

  • version: the pandoc command-line tool version, such as "2.14.2".

  • pandoc_types_version: the pandoc-types version, such as "1.22.1".

  • read: a boolean; defaults to False. Return the configuration dictionary.

  • reset: a boolean; defaults to False. Delete the current configuration.

Returns

  • configuration (if read==True): the configuration dictionary, with entries "auto", "path", "version" and "pandoc_types_version".

Usage

The configuration step is triggered when you import pandoc.types or call pandoc.read or pandoc.write and will automatically infer the configuration from the pandoc executable found in the path (or fails).

>>> config = pandoc.configure(read=True)
>>> config # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'auto': True, 
 'path': ..., 
 'version': '2.18', 
 'pandoc_types_version': '1.22.2'}
To avoid this, call pandoc.configure(...) yourself beforehand. Alternatively, select manually your pandoc executable afterwards:

>>> pandoc.configure(reset=True)
>>> pandoc.configure(read=True) is None
True
>>> config["auto"] = False
>>> pandoc.configure(**config)
>>> pandoc.configure(read=True) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'auto': False, 
 'path': ..., 
 'version': '2.18', 
 'pandoc_types_version': '1.22.2'}    

See also

Refer to the Configuration section.

pandoc.types

AlignCenter

Concrete data type

Signature

<class 'pandoc.types.AlignCenter'>
AlignDefault

Concrete data type

Signature

<class 'pandoc.types.AlignDefault'>
AlignLeft

Concrete data type

Signature

<class 'pandoc.types.AlignLeft'>
AlignRight

Concrete data type

Signature

<class 'pandoc.types.AlignRight'>
Alignment

Abstract data type

Signature

<class 'pandoc.types.Alignment'>

See also

AlignCenter, AlignDefault, AlignLeft, AlignRight.

Attr

Typedef

Signature

<class 'pandoc.types.Attr'>

See also

String.

AuthorInText

Concrete data type

Signature

<class 'pandoc.types.AuthorInText'>
Block

Abstract data type

Signature

<class 'pandoc.types.Block'>

See also

Alignment, Attr, BlockQuote, BulletList, CodeBlock, DefinitionList, Div, Double, Format, Header, HorizontalRule, Inline, Int, LineBlock, ListAttributes, Null, OrderedList, Para, Plain, RawBlock, String, Table, TableCell.

BlockQuote

Concrete data type

Signature

<class 'pandoc.types.BlockQuote'>

See also

Block.

Bool

Primitive type

Signature

bool
BulletList

Concrete data type

Signature

<class 'pandoc.types.BulletList'>

See also

Block.

Citation

Concrete data type

Signature

<class 'pandoc.types.Citation'>

See also

CitationMode, Inline, Int, String.

CitationMode

Abstract data type

Signature

<class 'pandoc.types.CitationMode'>

See also

AuthorInText, NormalCitation, SuppressAuthor.

Cite

Concrete data type

Signature

<class 'pandoc.types.Cite'>

See also

Citation, Inline.

Code

Concrete data type

Signature

<class 'pandoc.types.Code'>

See also

Attr, String.

CodeBlock

Concrete data type

Signature

<class 'pandoc.types.CodeBlock'>

See also

Attr, String.

Decimal

Concrete data type

Signature

<class 'pandoc.types.Decimal'>
DefaultDelim

Concrete data type

Signature

<class 'pandoc.types.DefaultDelim'>
DefaultStyle

Concrete data type

Signature

<class 'pandoc.types.DefaultStyle'>
DefinitionList

Concrete data type

Signature

<class 'pandoc.types.DefinitionList'>

See also

Block, Inline.

DisplayMath

Concrete data type

Signature

<class 'pandoc.types.DisplayMath'>
Div

Concrete data type

Signature

<class 'pandoc.types.Div'>

See also

Attr, Block.

Double

Primitive type

Signature

float
DoubleQuote

Concrete data type

Signature

<class 'pandoc.types.DoubleQuote'>
Emph

Concrete data type

Signature

<class 'pandoc.types.Emph'>

See also

Inline.

Example

Concrete data type

Signature

<class 'pandoc.types.Example'>
Format

Concrete data type

Signature

<class 'pandoc.types.Format'>

See also

String.

Header

Concrete data type

Signature

<class 'pandoc.types.Header'>

See also

Attr, Inline, Int.

HorizontalRule

Concrete data type

Signature

<class 'pandoc.types.HorizontalRule'>
Image

Concrete data type

Signature

<class 'pandoc.types.Image'>

See also

Attr, Inline, Target.

Inline

Abstract data type

Signature

<class 'pandoc.types.Inline'>

See also

Attr, Block, Citation, Cite, Code, Emph, Format, Image, LineBreak, Link, Math, MathType, Note, QuoteType, Quoted, RawInline, SmallCaps, SoftBreak, Space, Span, Str, Strikeout, String, Strong, Subscript, Superscript, Target.

InlineMath

Concrete data type

Signature

<class 'pandoc.types.InlineMath'>
Int

Primitive type

Signature

int
LineBlock

Concrete data type

Signature

<class 'pandoc.types.LineBlock'>

See also

Inline.

LineBreak

Concrete data type

Signature

<class 'pandoc.types.LineBreak'>
Link

Concrete data type

Signature

<class 'pandoc.types.Link'>

See also

Attr, Inline, Target.

ListAttributes

Typedef

Signature

<class 'pandoc.types.ListAttributes'>

See also

Int, ListNumberDelim, ListNumberStyle.

ListNumberDelim

Abstract data type

Signature

<class 'pandoc.types.ListNumberDelim'>

See also

DefaultDelim, OneParen, Period, TwoParens.

ListNumberStyle

Abstract data type

Signature

<class 'pandoc.types.ListNumberStyle'>

See also

Decimal, DefaultStyle, Example, LowerAlpha, LowerRoman, UpperAlpha, UpperRoman.

LowerAlpha

Concrete data type

Signature

<class 'pandoc.types.LowerAlpha'>
LowerRoman

Concrete data type

Signature

<class 'pandoc.types.LowerRoman'>
Math

Concrete data type

Signature

<class 'pandoc.types.Math'>

See also

MathType, String.

MathType

Abstract data type

Signature

<class 'pandoc.types.MathType'>

See also

DisplayMath, InlineMath.

Meta

Concrete data type

Signature

<class 'pandoc.types.Meta'>

See also

MetaValue, String.

MetaBlocks

Concrete data type

Signature

<class 'pandoc.types.MetaBlocks'>

See also

Block.

MetaBool

Concrete data type

Signature

<class 'pandoc.types.MetaBool'>

See also

Bool.

MetaInlines

Concrete data type

Signature

<class 'pandoc.types.MetaInlines'>

See also

Inline.

MetaList

Concrete data type

Signature

<class 'pandoc.types.MetaList'>

See also

MetaValue.

MetaMap

Concrete data type

Signature

<class 'pandoc.types.MetaMap'>

See also

MetaValue, String.

MetaString

Concrete data type

Signature

<class 'pandoc.types.MetaString'>

See also

String.

MetaValue

Abstract data type

Signature

<class 'pandoc.types.MetaValue'>

See also

Block, Bool, Inline, MetaBlocks, MetaBool, MetaInlines, MetaList, MetaMap, MetaString, String.

NormalCitation

Concrete data type

Signature

<class 'pandoc.types.NormalCitation'>
Note

Concrete data type

Signature

<class 'pandoc.types.Note'>

See also

Block.

Null

Concrete data type

Signature

<class 'pandoc.types.Null'>
OneParen

Concrete data type

Signature

<class 'pandoc.types.OneParen'>
OrderedList

Concrete data type

Signature

<class 'pandoc.types.OrderedList'>

See also

Block, ListAttributes.

Pandoc

Concrete data type

Signature

<class 'pandoc.types.Pandoc'>

See also

Block, Meta.

Para

Concrete data type

Signature

<class 'pandoc.types.Para'>

See also

Inline.

Period

Concrete data type

Signature

<class 'pandoc.types.Period'>
Plain

Concrete data type

Signature

<class 'pandoc.types.Plain'>

See also

Inline.

QuoteType

Abstract data type

Signature

<class 'pandoc.types.QuoteType'>

See also

DoubleQuote, SingleQuote.

Quoted

Concrete data type

Signature

<class 'pandoc.types.Quoted'>

See also

Inline, QuoteType.

RawBlock

Concrete data type

Signature

<class 'pandoc.types.RawBlock'>

See also

Format, String.

RawInline

Concrete data type

Signature

<class 'pandoc.types.RawInline'>

See also

Format, String.

SingleQuote

Concrete data type

Signature

<class 'pandoc.types.SingleQuote'>
SmallCaps

Concrete data type

Signature

<class 'pandoc.types.SmallCaps'>

See also

Inline.

SoftBreak

Concrete data type

Signature

<class 'pandoc.types.SoftBreak'>
Space

Concrete data type

Signature

<class 'pandoc.types.Space'>
Span

Concrete data type

Signature

<class 'pandoc.types.Span'>

See also

Attr, Inline.

Str

Concrete data type

Signature

<class 'pandoc.types.Str'>

See also

String.

Strikeout

Concrete data type

Signature

<class 'pandoc.types.Strikeout'>

See also

Inline.

String

Primitive type

Signature

str
Strong

Concrete data type

Signature

<class 'pandoc.types.Strong'>

See also

Inline.

Subscript

Concrete data type

Signature

<class 'pandoc.types.Subscript'>

See also

Inline.

Superscript

Concrete data type

Signature

<class 'pandoc.types.Superscript'>

See also

Inline.

SuppressAuthor

Concrete data type

Signature

<class 'pandoc.types.SuppressAuthor'>
Table

Concrete data type

Signature

<class 'pandoc.types.Table'>

See also

Alignment, Double, Inline, TableCell.

TableCell

Typedef

Signature

<class 'pandoc.types.TableCell'>

See also

Block.

Target

Typedef

Signature

<class 'pandoc.types.Target'>

See also

String.

TwoParens

Concrete data type

Signature

<class 'pandoc.types.TwoParens'>
UpperAlpha

Concrete data type

Signature

<class 'pandoc.types.UpperAlpha'>
UpperRoman

Concrete data type

Signature

<class 'pandoc.types.UpperRoman'>

  1. refer to Pandoc's heuristics for the gory details of this inference.