API Reference¶
import pandoc
from pandoc.types import *
pandoc
¶
read(source=None, file=None, format=None, options=None)
Read a source document.
The source document must be specified by either source
or file
.
Implicitly, the document format is inferred from the filename extension
when possible1, otherwise the markdown format is assumed
by default; the input format can also be specified explicitly.
Extra options can be passed to the pandoc command-line tool.
Arguments
-
source
: the document content, as a string or as utf-8 encoded bytes. -
file
: the document, provided as a file or filename. -
format
: the document format (such as"markdown"
,"odt"
,"docx"
,"html"
, etc.)Refer to Pandoc's README for the list of supported input formats.
-
options
: additional pandoc options (a list of strings).Refer to Pandoc's user guide for a complete list of options.
Returns
doc
: the document, as aPandoc
object.
Usage
Read documents from strings:
>>> markdown = "Hello world!"
>>> pandoc.read(markdown)
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> html = "<p>Hello world!</p>"
>>> pandoc.read(html, format="html")
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
Read documents from files:
>>> filename = "doc.html"
>>> with open(filename, "w", encoding="utf-8") as file:
... _ = file.write(html)
>>> pandoc.read(file=filename) # html format inferred from filename
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> file = open(filename, encoding="utf-8")
>>> pandoc.read(file=file, format="html") # but here it must be explicit
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
Use extra pandoc options:
>>> pandoc.read(markdown, options=["-M", "id=hello"]) # add metadata
Pandoc(Meta({'id': MetaString('hello')}), [Para([Str('Hello'), Space(), Str('world!')])])
write(doc, file=None, format=None, options=None)
Write a pandoc document (or document fragment) to a file and return its contents.
Inline document fragments are automatically wrapped into a Plain
blocks; block document fragments are automatically wrapped into
a Pandoc
element with no metadata.
Implicitly, the document format is inferred from the filename extension when possible1, otherwise the markdown format is assumed by default; the output format can also be specified explicitly. Extra options can be passed to the pandoc command-line tool.
Arguments
-
doc
: aPandoc
object or a document fragment (Inline
,[Inline]
,MetaInlines
,Block
,[Block]
orMetaBlocks
). -
file
: a file, filename orNone
. -
format
: the document format (such as"markdown"
,"odt"
,"docx"
,"html"
, etc.)Refer to Pandoc's README for the list of supported output formats.
-
options
: additional pandoc options (a list of strings).Refer to Pandoc's user guide for a complete list of options.
Returns
-
output
: the output document, as a string or as a byte sequence.Bytes are only used for binary output formats (doc, ppt, etc.).
Usage
Write documents to markdown strings:
>>> doc = pandoc.read("Hello world!")
>>> doc
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> print(pandoc.write(doc)) # doctest: +NORMALIZE_WHITESPACE
Hello world!
Write document fragments to markdown strings:
>>> md = lambda elt: print(pandoc.write(elt))
>>> md(Str("Hello!")) # doctest: +NORMALIZE_WHITESPACE
Hello!
>>> md([Str('Hello'), Space(), Str('world!')]) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md(Para([Str('Hello'), Space(), Str('world!')])) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md([ # doctest: +NORMALIZE_WHITESPACE
... Para([Str('Hello'), Space(), Str('world!')]),
... Para([Str('Hello'), Space(), Str('world!')])
... ])
Hello world!
<BLANKLINE>
Hello world!
>>> md(MetaInlines([Str('Hello'), Space(), Str('world!')])) # doctest: +NORMALIZE_WHITESPACE
Hello world!
>>> md(MetaBlocks([ # doctest: +NORMALIZE_WHITESPACE
... Para([Str('Hello'), Space(), Str('world!')]),
... Para([Str('Hello'), Space(), Str('world!')])
... ]))
Hello world!
<BLANKLINE>
Hello world!
Use alternate (text or binary) output formats:
>>> output = pandoc.write(doc, format="html") # html output
>>> type(output)
<class 'str'>
>>> print(output)
<p>Hello world!</p>
<BLANKLINE>
>>> output = pandoc.write(doc, format="odt")
>>> type(output)
<class 'bytes'>
>>> output # doctest: +ELLIPSIS
b'PK...'
Write documents to files:
>>> _ = pandoc.write(doc, file="doc.md")
>>> open("doc.md", encoding="utf-8").read()
'Hello world!\n'
>>> _ = pandoc.write(doc, file="doc.html")
>>> open("doc.html", encoding="utf-8").read()
'<p>Hello world!</p>\n'
>>> _ = pandoc.write(doc, file="doc.pdf")
>>> open("doc.pdf", "rb").read() # doctest: +ELLIPSIS
b'%PDF...'
Use extra pandoc options:
>>> output = pandoc.write(
... doc,
... format="html",
... options=["--standalone", "-V", "lang=en"]
... )
>>> print(output) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
...
<body>
<p>Hello world!</p>
</body>
</html>
iter(elt, path=False)
Iterate on document elements in document order.
Arguments
-
elt
: a pandoc item (or more generally any Python object), -
path
: a boolean; defaults toFalse
.
Returns
-
iterator
: a depth-first tree iterator. -
elt_path
(whenpath==True
): a list of(elt, index)
pairs.
Usage
This iterator may be used as a general-purpose tree iterator
>>> tree = [1, [2, [3]]]
>>> for elt in pandoc.iter(tree):
... print(elt)
[1, [2, [3]]]
1
[2, [3]]
2
[3]
3
Non-iterable objects yield themselves:
>>> root = 1
>>> for elt in pandoc.iter(root):
... print(elt)
1
But it is really meant to be used with pandoc objects:
>>> doc = Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> for elt in pandoc.iter(doc):
... print(elt)
Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
Meta({})
{}
[Para([Str('Hello'), Space(), Str('world!')])]
Para([Str('Hello'), Space(), Str('world!')])
[Str('Hello'), Space(), Str('world!')]
Str('Hello')
Hello
Space()
Str('world!')
world!
Two gotchas: characters in strings are not iterated (strings are considered "atomic")
>>> root = "Hello world!"
>>> for elt in pandoc.iter(root):
... print(elt)
Hello world!
and dicts yield their key-value pairs (and not only their keys):
>>> root = {"a": 1, "b": 2}
>>> for elt in pandoc.iter(root):
... print(elt)
{'a': 1, 'b': 2}
('a', 1)
a
1
('b', 2)
b
2
Use path=True
when you need to locate the element in the document.
You can get the element parent and index within this parent as path[-1]
,
the grand-parent and the index of the parent within the grand-parent
as path[-2]
, etc. up to the document root.
>>> doc = Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
>>> world = Str("world!")
>>> for elt, path in pandoc.iter(doc, path=True): # find the path to Str("world!")
... if elt == world:
... break
>>> for elt, index in path:
... print(f"At index {index} in {elt}:")
... else:
... print(world)
At index 1 in Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])]):
At index 0 in [Para([Str('Hello'), Space(), Str('world!')])]:
At index 0 in Para([Str('Hello'), Space(), Str('world!')]):
At index 2 in [Str('Hello'), Space(), Str('world!')]:
Str('world!')
See also
Refer to the Tree iteration section.
configure(auto=False, path=None, version=None, pandoc_types_version=None, read=False, reset=False)
Arguments
-
auto
: a boolean; defaults toFalse
; set toTrue
to infer the configuration from thepandoc
in your path. -
path
: the path to the pandoc executable, such as"/usr/bin/pandoc"
. -
version
: thepandoc
command-line tool version, such as"2.14.2"
. -
pandoc_types_version
: thepandoc-types
version, such as"1.22.1"
. -
read
: a boolean; defaults toFalse
. Return the configuration dictionary. -
reset
: a boolean; defaults toFalse
. Delete the current configuration.
Returns
configuration
(ifread==True
): the configuration dictionary, with entries"auto"
,"path"
,"version"
and "pandoc_types_version
".
Usage
The configuration step is triggered when you import pandoc.types
or
call pandoc.read
or pandoc.write
and will automatically infer the
configuration from the pandoc
executable found in the path (or fails).
>>> config = pandoc.configure(read=True)
>>> config # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'auto': True,
'path': ...,
'version': '2.18',
'pandoc_types_version': '1.22.2'}
pandoc.configure(...)
yourself beforehand.
Alternatively, select manually your pandoc executable afterwards:
>>> pandoc.configure(reset=True)
>>> pandoc.configure(read=True) is None
True
>>> config["auto"] = False
>>> pandoc.configure(**config)
>>> pandoc.configure(read=True) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'auto': False,
'path': ...,
'version': '2.18',
'pandoc_types_version': '1.22.2'}
See also
Refer to the Configuration section.
pandoc.types
¶
AlignCenter
Concrete data type
Signature
<class 'pandoc.types.AlignCenter'>
AlignDefault
Concrete data type
Signature
<class 'pandoc.types.AlignDefault'>
AlignLeft
Concrete data type
Signature
<class 'pandoc.types.AlignLeft'>
AlignRight
Concrete data type
Signature
<class 'pandoc.types.AlignRight'>
Alignment
Abstract data type
Signature
<class 'pandoc.types.Alignment'>
See also
AuthorInText
Concrete data type
Signature
<class 'pandoc.types.AuthorInText'>
Block
Abstract data type
Signature
<class 'pandoc.types.Block'>
See also
Alignment
, Attr
, BlockQuote
, BulletList
, CodeBlock
, DefinitionList
, Div
, Double
, Format
, Header
, HorizontalRule
, Inline
, Int
, LineBlock
, ListAttributes
, Null
, OrderedList
, Para
, Plain
, RawBlock
, String
, Table
, TableCell
.
Bool
Primitive type
Signature
bool
Citation
Concrete data type
Signature
<class 'pandoc.types.Citation'>
See also
CitationMode
, Inline
, Int
, String
.
CitationMode
Abstract data type
Signature
<class 'pandoc.types.CitationMode'>
See also
Decimal
Concrete data type
Signature
<class 'pandoc.types.Decimal'>
DefaultDelim
Concrete data type
Signature
<class 'pandoc.types.DefaultDelim'>
DefaultStyle
Concrete data type
Signature
<class 'pandoc.types.DefaultStyle'>
DefinitionList
Concrete data type
Signature
<class 'pandoc.types.DefinitionList'>
See also
DisplayMath
Concrete data type
Signature
<class 'pandoc.types.DisplayMath'>
Double
Primitive type
Signature
float
DoubleQuote
Concrete data type
Signature
<class 'pandoc.types.DoubleQuote'>
Example
Concrete data type
Signature
<class 'pandoc.types.Example'>
HorizontalRule
Concrete data type
Signature
<class 'pandoc.types.HorizontalRule'>
Inline
Abstract data type
Signature
<class 'pandoc.types.Inline'>
See also
Attr
, Block
, Citation
, Cite
, Code
, Emph
, Format
, Image
, LineBreak
, Link
, Math
, MathType
, Note
, QuoteType
, Quoted
, RawInline
, SmallCaps
, SoftBreak
, Space
, Span
, Str
, Strikeout
, String
, Strong
, Subscript
, Superscript
, Target
.
InlineMath
Concrete data type
Signature
<class 'pandoc.types.InlineMath'>
Int
Primitive type
Signature
int
LineBreak
Concrete data type
Signature
<class 'pandoc.types.LineBreak'>
ListAttributes
Typedef
Signature
<class 'pandoc.types.ListAttributes'>
See also
ListNumberDelim
Abstract data type
Signature
<class 'pandoc.types.ListNumberDelim'>
See also
ListNumberStyle
Abstract data type
Signature
<class 'pandoc.types.ListNumberStyle'>
See also
Decimal
, DefaultStyle
, Example
, LowerAlpha
, LowerRoman
, UpperAlpha
, UpperRoman
.
LowerAlpha
Concrete data type
Signature
<class 'pandoc.types.LowerAlpha'>
LowerRoman
Concrete data type
Signature
<class 'pandoc.types.LowerRoman'>
MathType
Abstract data type
Signature
<class 'pandoc.types.MathType'>
See also
MetaValue
Abstract data type
Signature
<class 'pandoc.types.MetaValue'>
See also
Block
, Bool
, Inline
, MetaBlocks
, MetaBool
, MetaInlines
, MetaList
, MetaMap
, MetaString
, String
.
NormalCitation
Concrete data type
Signature
<class 'pandoc.types.NormalCitation'>
Null
Concrete data type
Signature
<class 'pandoc.types.Null'>
OneParen
Concrete data type
Signature
<class 'pandoc.types.OneParen'>
OrderedList
Concrete data type
Signature
<class 'pandoc.types.OrderedList'>
See also
Period
Concrete data type
Signature
<class 'pandoc.types.Period'>
QuoteType
Abstract data type
Signature
<class 'pandoc.types.QuoteType'>
See also
SingleQuote
Concrete data type
Signature
<class 'pandoc.types.SingleQuote'>
SoftBreak
Concrete data type
Signature
<class 'pandoc.types.SoftBreak'>
Space
Concrete data type
Signature
<class 'pandoc.types.Space'>
String
Primitive type
Signature
str
SuppressAuthor
Concrete data type
Signature
<class 'pandoc.types.SuppressAuthor'>
Table
Concrete data type
Signature
<class 'pandoc.types.Table'>
See also
TwoParens
Concrete data type
Signature
<class 'pandoc.types.TwoParens'>
UpperAlpha
Concrete data type
Signature
<class 'pandoc.types.UpperAlpha'>
UpperRoman
Concrete data type
Signature
<class 'pandoc.types.UpperRoman'>
-
refer to Pandoc's heuristics for the gory details of this inference. ↩↩