API Reference¶
import pandoc
from pandoc.types import *
pandoc
¶
read(source=None, file=None, format=None, options=None)
Read a source document.
The source document must be specified by either source
or file
.
Implicitly, the document format is inferred from the filename extension
when possible1, otherwise the markdown format is assumed
by default; the input format can also be specified explicitly.
Extra options can be passed to the pandoc command-line tool.
Arguments
-
source
: the document content, as a string or as utf-8 encoded bytes. -
file
: the document, provided as a file or filename. -
format
: the document format (such as"markdown"
,"odt"
,"docx"
,"html"
, etc.)Refer to Pandoc's README for the list of supported input formats.
-
options
: additional pandoc options (a list of strings).Refer to Pandoc's user guide for a complete list of options.
Returns
doc
: the document, as aPandoc
object.
Usage
Read documents from strings:
>>> markdown = "Hello world!"
>>> pandoc.read(markdown)
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> html = "<p>Hello world!</p>"
>>> pandoc.read(html, format="html")
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
Read documents from files:
>>> filename = "doc.html"
>>> with open(filename, "w", encoding="utf-8") as file:
... _ = file.write(html)
>>> pandoc.read(file=filename) # html format inferred from filename
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> file = open(filename, encoding="utf-8")
>>> pandoc.read(file=file, format="html") # but here it must be explicit
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
Use extra pandoc options:
>>> pandoc.read(markdown, options=["-M", "id=hello"]) # add metadata
Pandoc(Meta({'id': MetaString('hello')}), [Para([Str('Hello'), Space(), Str('world!')])])
write(doc, file=None, format=None, options=None)
Write a pandoc document (or document fragment) to a file and return its contents.
Inline document fragments are automatically wrapped into a Plain
blocks; block document fragments are automatically wrapped into
a Pandoc
element with no metadata.
Implicitly, the document format is inferred from the filename extension when possible1, otherwise the markdown format is assumed by default; the output format can also be specified explicitly. Extra options can be passed to the pandoc command-line tool.
Arguments
-
doc
: aPandoc
object or a document fragment (Inline
,[Inline]
,MetaInlines
,Block
,[Block]
orMetaBlocks
). -
file
: a file, filename orNone
. -
format
: the document format (such as"markdown"
,"odt"
,"docx"
,"html"
, etc.)Refer to Pandoc's README for the list of supported output formats.
-
options
: additional pandoc options (a list of strings).Refer to Pandoc's user guide for a complete list of options.
Returns
-
output
: the output document, as a string or as a byte sequence.Bytes are only used for binary output formats (doc, ppt, etc.).
Usage
Write documents to markdown strings:
>>> doc = pandoc.read("Hello world!")
>>> doc
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> print(pandoc.write(doc)) # doctest: +NORMALIZE_WHITESPACE
Hello world!
Write document fragments to markdown strings:
>>> md = lambda elt: print(pandoc.write(elt))
>>> md(Str("Hello!")) # doctest: +NORMALIZE_WHITESPACE
Hello!
>>> md([Str('Hello'), Space(), Str('world!')]) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md(Para([Str('Hello'), Space(), Str('world!')])) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md([ # doctest: +NORMALIZE_WHITESPACE
... Para([Str('Hello'), Space(), Str('world!')]),
... Para([Str('Hello'), Space(), Str('world!')])
... ])
Hello world!
<BLANKLINE>
Hello world!
>>> md(MetaInlines([Str('Hello'), Space(), Str('world!')])) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md(MetaBlocks([ # doctest: +NORMALIZE_WHITESPACE
... Para([Str('Hello'), Space(), Str('world!')]),
... Para([Str('Hello'), Space(), Str('world!')])
... ]))
Hello world!
<BLANKLINE>
Hello world!
Use alternate (text or binary) output formats:
>>> output = pandoc.write(doc, format="html") # html output
>>> type(output)
<class 'str'>
>>> print(output)
<p>Hello world!</p>
<BLANKLINE>
>>> output = pandoc.write(doc, format="odt")
>>> type(output)
<class 'bytes'>
>>> output # doctest: +ELLIPSIS
b'PK...'
Write documents to files:
>>> _ = pandoc.write(doc, file="doc.md")
>>> open("doc.md", encoding="utf-8").read()
'Hello world!\n'
>>> _ = pandoc.write(doc, file="doc.html")
>>> open("doc.html", encoding="utf-8").read()
'<p>Hello world!</p>\n'
>>> _ = pandoc.write(doc, file="doc.pdf")
>>> open("doc.pdf", "rb").read() # doctest: +ELLIPSIS
b'%PDF...'
Use extra pandoc options:
>>> output = pandoc.write(
... doc,
... format="html",
... options=["--standalone", "-V", "lang=en"]
... )
>>> print(output) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
...
<body>
<p>Hello world!</p>
</body>
</html>
iter(elt, path=False)
Iterate on document elements in document order.
Arguments
-
elt
: a pandoc item (or more generally any Python object), -
path
: a boolean; defaults toFalse
.
Returns
-
iterator
: a depth-first tree iterator. -
elt_path
(whenpath==True
): a list of(elt, index)
pairs.
Usage
This iterator may be used as a general-purpose tree iterator
>>> tree = [1, [2, [3]]]
>>> for elt in pandoc.iter(tree):
... print(elt)
[1, [2, [3]]]
1
[2, [3]]
2
[3]
3
Non-iterable objects yield themselves:
>>> root = 1
>>> for elt in pandoc.iter(root):
... print(elt)
1
But it is really meant to be used with pandoc objects:
>>> doc = Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> for elt in pandoc.iter(doc):
... print(elt)
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
Meta({})
{}
[Para([Str('Hello'), Space(), Str('world!')])]
Para([Str('Hello'), Space(), Str('world!')])
[Str('Hello'), Space(), Str('world!')]
Str('Hello')
Hello
Space()
Str('world!')
world!
Two gotchas: characters in strings are not iterated (strings are considered "atomic")
>>> root = "Hello world!"
>>> for elt in pandoc.iter(root):
... print(elt)
Hello world!
and dicts yield their key-value pairs (and not only their keys):
>>> root = {"a": 1, "b": 2}
>>> for elt in pandoc.iter(root):
... print(elt)
{'a': 1, 'b': 2}
('a', 1)
a
1
('b', 2)
b
2
Use path=True
when you need to locate the element in the document.
You can get the element parent and index within this parent as path[-1]
,
the grand-parent and the index of the parent within the grand-parent
as path[-2]
, etc. up to the document root.
>>> doc = Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> world = Str("world!")
>>> for elt, path in pandoc.iter(doc, path=True): # find the path to Str("world!")
... if elt == world:
... break
>>> for elt, index in path:
... print(f"At index {index} in {elt}:")
... else:
... print(world)
At index 1 in Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])]):
At index 0 in [Para([Str('Hello'), Space(), Str('world!')])]:
At index 0 in Para([Str('Hello'), Space(), Str('world!')]):
At index 2 in [Str('Hello'), Space(), Str('world!')]:
Str('world!')
See also
Refer to the Tree iteration section.
configure(auto=False, path=None, version=None, pandoc_types_version=None, read=False, reset=False)
Arguments
-
auto
: a boolean; defaults toFalse
; set toTrue
to infer the configuration from thepandoc
in your path. -
path
: the path to the pandoc executable, such as"/usr/bin/pandoc"
. -
version
: thepandoc
command-line tool version, such as"2.14.2"
. -
pandoc_types_version
: thepandoc-types
version, such as"1.22.1"
. -
read
: a boolean; defaults toFalse
. Return the configuration dictionary. -
reset
: a boolean; defaults toFalse
. Delete the current configuration.
Returns
configuration
(ifread==True
): the configuration dictionary, with entries"auto"
,"path"
,"version"
and "pandoc_types_version
".
Usage
The configuration step is triggered when you import pandoc.types
or
call pandoc.read
or pandoc.write
and will automatically infer the
configuration from the pandoc
executable found in the path (or fails).
>>> config = pandoc.configure(read=True)
>>> config # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'auto': True,
'path': ...,
'version': '3.2.1',
'pandoc_types_version': '1.23.1'}
pandoc.configure(...)
yourself beforehand.
Alternatively, select manually your pandoc executable afterwards:
>>> pandoc.configure(reset=True)
>>> pandoc.configure(read=True) is None
True
>>> config["auto"] = False
>>> pandoc.configure(**config)
>>> pandoc.configure(read=True) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'auto': False,
'path': ...,
'version': '3.2.1',
'pandoc_types_version': '1.23.1'}
See also
Refer to the Configuration section.
pandoc.types
¶
AlignCenter
Concrete data type
Signature
AlignCenter()
AlignDefault
Concrete data type
Signature
AlignDefault()
AlignLeft
Concrete data type
Signature
AlignLeft()
AlignRight
Concrete data type
Signature
AlignRight()
Alignment
Abstract data type
Signature
Alignment = AlignLeft()
| AlignRight()
| AlignCenter()
| AlignDefault()
See also
AuthorInText
Concrete data type
Signature
AuthorInText()
Block
Abstract data type
Signature
Block = Plain([Inline])
| Para([Inline])
| LineBlock([[Inline]])
| CodeBlock(Attr, Text)
| RawBlock(Format, Text)
| BlockQuote([Block])
| OrderedList(ListAttributes, [[Block]])
| BulletList([[Block]])
| DefinitionList([([Inline], [[Block]])])
| Header(Int, Attr, [Inline])
| HorizontalRule()
| Table(Attr, Caption, [ColSpec], TableHead, [TableBody], TableFoot)
| Figure(Attr, Caption, [Block])
| Div(Attr, [Block])
See also
Attr
, BlockQuote
, BulletList
, Caption
, CodeBlock
, ColSpec
, DefinitionList
, Div
, Figure
, Format
, Header
, HorizontalRule
, Inline
, Int
, LineBlock
, ListAttributes
, OrderedList
, Para
, Plain
, RawBlock
, Table
, TableBody
, TableFoot
, TableHead
, Text
.
Bool
Primitive type
Signature
bool
Caption
Concrete data type
Signature
Caption(ShortCaption or None, [Block])
See also
Cell
Concrete data type
Signature
Cell(Attr, Alignment, RowSpan, ColSpan, [Block])
See also
Citation
Concrete data type
Signature
Citation(Text, [Inline], [Inline], CitationMode, Int, Int)
See also
CitationMode
, Inline
, Int
, Text
.
CitationMode
Abstract data type
Signature
CitationMode = AuthorInText()
| SuppressAuthor()
| NormalCitation()
See also
ColWidth
Abstract data type
Signature
ColWidth = ColWidth_(Double)
| ColWidthDefault()
See also
ColWidthDefault
Concrete data type
Signature
ColWidthDefault()
Decimal
Concrete data type
Signature
Decimal()
DefaultDelim
Concrete data type
Signature
DefaultDelim()
DefaultStyle
Concrete data type
Signature
DefaultStyle()
DefinitionList
Concrete data type
Signature
DefinitionList([([Inline], [[Block]])])
See also
DisplayMath
Concrete data type
Signature
DisplayMath()
Double
Primitive type
Signature
float
DoubleQuote
Concrete data type
Signature
DoubleQuote()
Example
Concrete data type
Signature
Example()
HorizontalRule
Concrete data type
Signature
HorizontalRule()
Inline
Abstract data type
Signature
Inline = Str(Text)
| Emph([Inline])
| Underline([Inline])
| Strong([Inline])
| Strikeout([Inline])
| Superscript([Inline])
| Subscript([Inline])
| SmallCaps([Inline])
| Quoted(QuoteType, [Inline])
| Cite([Citation], [Inline])
| Code(Attr, Text)
| Space()
| SoftBreak()
| LineBreak()
| Math(MathType, Text)
| RawInline(Format, Text)
| Link(Attr, [Inline], Target)
| Image(Attr, [Inline], Target)
| Note([Block])
| Span(Attr, [Inline])
See also
Attr
, Block
, Citation
, Cite
, Code
, Emph
, Format
, Image
, LineBreak
, Link
, Math
, MathType
, Note
, QuoteType
, Quoted
, RawInline
, SmallCaps
, SoftBreak
, Space
, Span
, Str
, Strikeout
, Strong
, Subscript
, Superscript
, Target
, Text
, Underline
.
InlineMath
Concrete data type
Signature
InlineMath()
Int
Primitive type
Signature
int
LineBreak
Concrete data type
Signature
LineBreak()
ListAttributes
Typedef
Signature
ListAttributes = (Int, ListNumberStyle, ListNumberDelim)
See also
ListNumberDelim
Abstract data type
Signature
ListNumberDelim = DefaultDelim()
| Period()
| OneParen()
| TwoParens()
See also
ListNumberStyle
Abstract data type
Signature
ListNumberStyle = DefaultStyle()
| Example()
| Decimal()
| LowerRoman()
| UpperRoman()
| LowerAlpha()
| UpperAlpha()
See also
Decimal
, DefaultStyle
, Example
, LowerAlpha
, LowerRoman
, UpperAlpha
, UpperRoman
.
LowerAlpha
Concrete data type
Signature
LowerAlpha()
LowerRoman
Concrete data type
Signature
LowerRoman()
MathType
Abstract data type
Signature
MathType = DisplayMath()
| InlineMath()
See also
MetaValue
Abstract data type
Signature
MetaValue = MetaMap({Text: MetaValue})
| MetaList([MetaValue])
| MetaBool(Bool)
| MetaString(Text)
| MetaInlines([Inline])
| MetaBlocks([Block])
See also
Block
, Bool
, Inline
, MetaBlocks
, MetaBool
, MetaInlines
, MetaList
, MetaMap
, MetaString
, Text
.
NormalCitation
Concrete data type
Signature
NormalCitation()
OneParen
Concrete data type
Signature
OneParen()
OrderedList
Concrete data type
Signature
OrderedList(ListAttributes, [[Block]])
See also
Period
Concrete data type
Signature
Period()
QuoteType
Abstract data type
Signature
QuoteType = SingleQuote()
| DoubleQuote()
See also
SingleQuote
Concrete data type
Signature
SingleQuote()
SoftBreak
Concrete data type
Signature
SoftBreak()
Space
Concrete data type
Signature
Space()
String
Primitive type
Signature
str
SuppressAuthor
Concrete data type
Signature
SuppressAuthor()
Table
Concrete data type
Signature
Table(Attr, Caption, [ColSpec], TableHead, [TableBody], TableFoot)
See also
TableBody
Concrete data type
Signature
TableBody(Attr, RowHeadColumns, [Row], [Row])
See also
Text
Primitive type
Signature
str
TwoParens
Concrete data type
Signature
TwoParens()
UpperAlpha
Concrete data type
Signature
UpperAlpha()
UpperRoman
Concrete data type
Signature
UpperRoman()
-
refer to Pandoc's heuristics for the gory details of this inference. ↩↩