utils¶

utils.ids¶

Functions for manipulating Anthology IDs.

AnthologyID `module-attribute` ¶

AnthologyID = str | AnthologyIDTuple

Any type that can be parsed into an Anthology ID.

AnthologyIDTuple `module-attribute` ¶

AnthologyIDTuple = tuple[str, Optional[str], Optional[str]]

A tuple representing an Anthology ID.

build_id ¶

build_id(collection_id, volume_id=None, paper_id=None)

Transforms collection ID, volume ID, and paper ID to a width-padded Anthology ID.

Parameters:

Name	Type	Description	Default
`collection_id`	`str`	A collection ID, e.g. "P18".	required
`volume_id`	`Optional[str]`	A volume ID, e.g. "1".	`None`
`paper_id`	`Optional[str]`	A paper ID, e.g. "42".	`None`

Returns:

Type	Description
`str`	The full Anthology ID.

Examples:

>>> build_id("P18", "1", "1")
P18-1001
>>> build_id("2022.acl", "long", "42")
2022.acl-long.42

Warning

Does not perform any kind of input validation.

build_id_from_tuple ¶

build_id_from_tuple(anthology_id)

Like build_id(), but takes any AnthologyID type.

Parameters:

Name	Type	Description	Default
`anthology_id`	`AnthologyID`	The Anthology ID to convert into a string.	required

Returns:

Type	Description
`str`	The full Anthology ID.

Examples:

>>> build_id(("P18", "1", "1"))
P18-1001

infer_year ¶

infer_year(anthology_id)

Infer the year from an Anthology ID.

Parameters:

Name	Type	Description	Default
`anthology_id`	`AnthologyID`	An arbitrary Anthology ID.	required

Returns:

Type	Description
`str`	The year of the item represented by the Anthology ID, as a four-character string.

parse_id ¶

parse_id(anthology_id)

Parses an Anthology ID into its constituent collection ID, volume ID, and paper ID parts.

Parameters:

Name	Type	Description	Default
`anthology_id`	`AnthologyID`	The Anthology ID to parse.	required

Returns:

Type	Description
`AnthologyIDTuple`	The parsed collection ID, volume ID, and paper ID.

Examples:

>>> parse_id("P18-1007")
('P18', '1',  '7')
>>> parse_id("W18-6310")
('W18', '63', '10')
>>> parse_id("D19-1001")
('D19', '1',  '1')
>>> parse_id("D19-5702")
('D19', '57', '2')
>>> parse_id("2022.acl-main.1")
('2022.acl', 'main', '1')

Also works with volumes:

>>> parse_id("P18-1")
('P18', '1', None)
>>> parse_id("W18-63")
('W18', '63', None)

And even with just collections:

>>> parse_id("P18")
('P18', None, None)

Warning

Does not perform any kind of input validation.

Note

For Anthology IDs prior to 2020, the volume ID is the first digit after the hyphen, except for the following situations, where it is the first two digits:

All collections starting with 'W'
The collection "C69"
All collections in "D19" where the first digit is >= 5

utils.latex¶

BIBTEX_FIELD_NEEDS_ENCODING `module-attribute` ¶

BIBTEX_FIELD_NEEDS_ENCODING = {
    "journal",
    "address",
    "publisher",
    "note",
}

Any BibTeX field whose value should be LaTeX-encoded first.

BIBTEX_MONTHS `module-attribute` ¶

BIBTEX_MONTHS = {
    "january": "jan",
    "february": "feb",
    "march": "mar",
    "april": "apr",
    "may": "may",
    "june": "jun",
    "july": "jul",
    "august": "aug",
    "september": "sep",
    "october": "oct",
    "november": "nov",
    "december": "dec",
}

A mapping of month names to BibTeX macros.

SerializableAsBibTeX `module-attribute` ¶

SerializableAsBibTeX = (
    None | str | MarkupText | list[NameSpecification]
)

Any type that can be supplied to make_bibtex_entry.

bibtex_convert_month ¶

bibtex_convert_month(spec)

Converts a month string to BibTeX macros.

Parameters:

Name	Type	Description	Default
`spec`	`str`	A month specification, as stored in the metadata.	required

Returns:

Type	Description
`str`	A BibTeX macro corresponding to the month specification, if possible. If the string contains digits or is otherwise not parseable, it is returned unchanged with quotes around it.

has_unbalanced_braces ¶

has_unbalanced_braces(string)

Checks if a string has unbalanced curly braces.

latex_convert_quotes ¶

latex_convert_quotes(text)

Parameters:

Name	Type	Description	Default
`text`	`str`	An arbitrary string.	required

Returns:

Type	Description
`str`	The input string with regular quotes converted into LaTeX quotes.

Examples:

>>> latex_convert_quotes('This "great" example')
"This ``great'' example"

latex_encode ¶

latex_encode(text)

Parameters:

Name	Type	Description	Default
`text`	`Optional[str]`	A string that does not contain any LaTeX commands.	required

Returns:

Type	Description
`str`	The input string encoded for use in LaTeX/BibTeX.

make_bibtex_entry ¶

make_bibtex_entry(bibtype, bibkey, fields)

Turn a list of field/value pairs into a BibTeX entry.

Values will be LaTeX-formatted if necessary, and can also be empty, in which case they are automatically omitted.

Parameters:

Name	Type	Description	Default
`bibtype`	`str`	The BibTeX type for the entry.	required
`bibkey`	`str`	The BibTeX key for the entry.	required
`fields`	`list[tuple[str, SerializableAsBibTeX]]`	A list of tuples of the form (key, value) specifying the fields to include in the entry.	required

Returns:

Type	Description
`str`	A fully formatted BibTeX entry.

namespecs_to_bibtex ¶

namespecs_to_bibtex(namespecs)

Convert a list of NameSpecifications to a BibTeX-formatted entry.

Parameters:

Name	Type	Description	Default
`namespecs`	`list[NameSpecification]`	A list of names to be included in the BibTeX entry.	required

Returns:

Type	Description
`str`	A BibTeX-formatted string representing the given names.

utils.logging¶

Functions for logging.

SeverityTracker ¶

SeverityTracker(level=logging.NOTSET)

Bases: Handler

Tracks the highest log-level that was sent to the logger.

If this class is added as a log handler, it can be used to check if any errors or exceptions were logged.

Attributes:

Name	Type	Description
`highest`	`int`	The highest log-level that was sent to the logger.

get_logger ¶

get_logger()

Returns:

Type	Description
`Logger`	A library-specific logger instance.

setup_rich_logging ¶

setup_rich_logging(**kwargs)

Set up a logger that uses rich markup and severity tracking.

This function is intended to be called in a script. It calls logging.basicConfig and is therefore not executed by default, as applications may wish to setup their loggers differently.

Parameters:

Name	Type	Description	Default
`**kwargs`	`object`	Any keyword argument will be forwarded to logging.basicConfig. If logging handlers are defined here, they will be preserved in addition to the handlers added by this function.	`{}`

Returns:

Type	Description
`SeverityTracker`	The severity tracker, so that it can be used to check the highest emitted log level.

utils.text¶

remove_extra_whitespace ¶

remove_extra_whitespace(text)

Parameters:

Name	Type	Description	Default
`text`	`str`	An arbitrary string.	required

Returns:

Type	Description
`str`	The input string without newlines and consecutive whitespace replaced by a single whitespace character.

utils.xml¶

TAGS_WITH_MARKUP `module-attribute` ¶

TAGS_WITH_MARKUP = {
    "b",
    "i",
    "fixed-case",
    "title",
    "abstract",
    "booktitle",
    "shortbooktitle",
}

XML tags which contain MarkupText.

TAGS_WITH_UNORDERED_CHILDREN `module-attribute` ¶

TAGS_WITH_UNORDERED_CHILDREN = {
    "talk",
    "paper",
    "meta",
    "frontmatter",
    "event",
    "colocated",
    "author",
    "editor",
    "speaker",
    "variant",
}

XML tags whose child elements can logically appear in arbitrary order.

assert_equals ¶

assert_equals(elem, other)

Assert that two Anthology XML elements are logically equivalent.

Parameters:

Name	Type	Description	Default
`elem`	`_Element`	The first element to compare.	required
`other`	`_Element`	The second element to compare.	required

Raises:

Type	Description
`AssertionError`	If the two elements are not logically equivalent.

indent ¶

indent(elem, level=0, internal=False)

Enforce canonical indentation.

"Canonical indentation" is two spaces, with each tag on a new line, except that 'author', 'editor', 'title', and 'booktitle' tags are placed on a single line.

Parameters:

Name	Type	Description	Default
`elem`	`_Element`	The XML element to apply canonical indentation to.	required
`level`	`int`	Indentation level; used for recursive calls of this function.	`0`
`internal`	`bool`	If True, assume we are within a single-line element.	`False`

Note

Adapted from https://stackoverflow.com/a/33956544.

stringify_children ¶

stringify_children(node)

Parameters:

Name	Type	Description	Default
`node`	`_Element`	An XML element.	required

Returns:

Type	Description
`str`	The full content of the input node, including tags.

Used for nodes that can have mixed text and HTML elements (like <b> and <i>).

xml_escape_or_none ¶

xml_escape_or_none(t)

Like xml.sax.saxutils.escape, but accepts None.

xsd_boolean ¶

xsd_boolean(value)

Converts an xsd:boolean value to a bool.

utils¶

utils.ids¶

AnthologyID module-attribute ¶

AnthologyIDTuple module-attribute ¶

build_id ¶

build_id_from_tuple ¶

infer_year ¶

parse_id ¶

utils.latex¶

BIBTEX_FIELD_NEEDS_ENCODING module-attribute ¶

BIBTEX_MONTHS module-attribute ¶

SerializableAsBibTeX module-attribute ¶

bibtex_convert_month ¶

has_unbalanced_braces ¶

latex_convert_quotes ¶

latex_encode ¶

make_bibtex_entry ¶

namespecs_to_bibtex ¶

utils.logging¶

SeverityTracker ¶

get_logger ¶

setup_rich_logging ¶

utils.text¶

remove_extra_whitespace ¶

utils.xml¶

TAGS_WITH_MARKUP module-attribute ¶

TAGS_WITH_UNORDERED_CHILDREN module-attribute ¶

assert_equals ¶

indent ¶

stringify_children ¶

xml_escape_or_none ¶

xsd_boolean ¶

AnthologyID `module-attribute` ¶

AnthologyIDTuple `module-attribute` ¶

BIBTEX_FIELD_NEEDS_ENCODING `module-attribute` ¶

BIBTEX_MONTHS `module-attribute` ¶

SerializableAsBibTeX `module-attribute` ¶

TAGS_WITH_MARKUP `module-attribute` ¶

TAGS_WITH_UNORDERED_CHILDREN `module-attribute` ¶