Skip to content

utils

utils.ids

Functions for manipulating Anthology IDs.

AnthologyID module-attribute

AnthologyID = str | AnthologyIDTuple

Any type that can be parsed into an Anthology ID.

AnthologyIDTuple module-attribute

AnthologyIDTuple = tuple[str, Optional[str], Optional[str]]

A tuple representing an Anthology ID.

build_id

build_id(collection_id, volume_id=None, paper_id=None)

Transforms collection ID, volume ID, and paper ID to a width-padded Anthology ID.

Parameters:

Name Type Description Default
collection_id str

A collection ID, e.g. "P18".

required
volume_id Optional[str]

A volume ID, e.g. "1".

None
paper_id Optional[str]

A paper ID, e.g. "42".

None

Returns:

Type Description
str

The full Anthology ID.

Examples:

>>> build_id("P18", "1", "1")
P18-1001
>>> build_id("2022.acl", "long", "42")
2022.acl-long.42
Warning

Does not perform any kind of input validation.

build_id_from_tuple

build_id_from_tuple(anthology_id)

Like build_id(), but takes any AnthologyID type.

Parameters:

Name Type Description Default
anthology_id AnthologyID

The Anthology ID to convert into a string.

required

Returns:

Type Description
str

The full Anthology ID.

Examples:

>>> build_id(("P18", "1", "1"))
P18-1001

infer_year

infer_year(anthology_id)

Infer the year from an Anthology ID.

Parameters:

Name Type Description Default
anthology_id AnthologyID

An arbitrary Anthology ID.

required

Returns:

Type Description
str

The year of the item represented by the Anthology ID, as a four-character string.

parse_id

parse_id(anthology_id)

Parses an Anthology ID into its constituent collection ID, volume ID, and paper ID parts.

Parameters:

Name Type Description Default
anthology_id AnthologyID

The Anthology ID to parse.

required

Returns:

Type Description
AnthologyIDTuple

The parsed collection ID, volume ID, and paper ID.

Examples:

>>> parse_id("P18-1007")
('P18', '1',  '7')
>>> parse_id("W18-6310")
('W18', '63', '10')
>>> parse_id("D19-1001")
('D19', '1',  '1')
>>> parse_id("D19-5702")
('D19', '57', '2')
>>> parse_id("2022.acl-main.1")
('2022.acl', 'main', '1')

Also works with volumes:

>>> parse_id("P18-1")
('P18', '1', None)
>>> parse_id("W18-63")
('W18', '63', None)

And even with just collections:

>>> parse_id("P18")
('P18', None, None)
Warning

Does not perform any kind of input validation.

Note

For Anthology IDs prior to 2020, the volume ID is the first digit after the hyphen, except for the following situations, where it is the first two digits:

  • All collections starting with 'W'
  • The collection "C69"
  • All collections in "D19" where the first digit is >= 5

utils.latex

BIBTEX_FIELD_NEEDS_ENCODING module-attribute

BIBTEX_FIELD_NEEDS_ENCODING = {
    "journal",
    "address",
    "publisher",
    "note",
}

Any BibTeX field whose value should be LaTeX-encoded first.

BIBTEX_MONTHS module-attribute

BIBTEX_MONTHS = {
    "january": "jan",
    "february": "feb",
    "march": "mar",
    "april": "apr",
    "may": "may",
    "june": "jun",
    "july": "jul",
    "august": "aug",
    "september": "sep",
    "october": "oct",
    "november": "nov",
    "december": "dec",
}

A mapping of month names to BibTeX macros.

SerializableAsBibTeX module-attribute

SerializableAsBibTeX = (
    None | str | MarkupText | list[NameSpecification]
)

Any type that can be supplied to make_bibtex_entry.

bibtex_convert_month

bibtex_convert_month(spec)

Converts a month string to BibTeX macros.

Parameters:

Name Type Description Default
spec str

A month specification, as stored in the metadata.

required

Returns:

Type Description
str

A BibTeX macro corresponding to the month specification, if possible. If the string contains digits or is otherwise not parseable, it is returned unchanged with quotes around it.

has_unbalanced_braces

has_unbalanced_braces(string)

Checks if a string has unbalanced curly braces.

latex_convert_quotes

latex_convert_quotes(text)

Parameters:

Name Type Description Default
text str

An arbitrary string.

required

Returns:

Type Description
str

The input string with regular quotes converted into LaTeX quotes.

Examples:

>>> latex_convert_quotes('This "great" example')
"This ``great'' example"

latex_encode

latex_encode(text)

Parameters:

Name Type Description Default
text Optional[str]

A string that does not contain any LaTeX commands.

required

Returns:

Type Description
str

The input string encoded for use in LaTeX/BibTeX.

make_bibtex_entry

make_bibtex_entry(bibtype, bibkey, fields)

Turn a list of field/value pairs into a BibTeX entry.

Values will be LaTeX-formatted if necessary, and can also be empty, in which case they are automatically omitted.

Parameters:

Name Type Description Default
bibtype str

The BibTeX type for the entry.

required
bibkey str

The BibTeX key for the entry.

required
fields list[tuple[str, SerializableAsBibTeX]]

A list of tuples of the form (key, value) specifying the fields to include in the entry.

required

Returns:

Type Description
str

A fully formatted BibTeX entry.

namespecs_to_bibtex

namespecs_to_bibtex(namespecs)

Convert a list of NameSpecifications to a BibTeX-formatted entry.

Parameters:

Name Type Description Default
namespecs list[NameSpecification]

A list of names to be included in the BibTeX entry.

required

Returns:

Type Description
str

A BibTeX-formatted string representing the given names.

utils.logging

Functions for logging.

SeverityTracker

SeverityTracker(level=logging.NOTSET)

Bases: Handler

Tracks the highest log-level that was sent to the logger.

If this class is added as a log handler, it can be used to check if any errors or exceptions were logged.

Attributes:

Name Type Description
highest int

The highest log-level that was sent to the logger.

get_logger

get_logger()

Returns:

Type Description
Logger

A library-specific logger instance.

setup_rich_logging

setup_rich_logging(**kwargs)

Set up a logger that uses rich markup and severity tracking.

This function is intended to be called in a script. It calls logging.basicConfig and is therefore not executed by default, as applications may wish to setup their loggers differently.

Parameters:

Name Type Description Default
**kwargs object

Any keyword argument will be forwarded to logging.basicConfig. If logging handlers are defined here, they will be preserved in addition to the handlers added by this function.

{}

Returns:

Type Description
SeverityTracker

The severity tracker, so that it can be used to check the highest emitted log level.

utils.text

remove_extra_whitespace

remove_extra_whitespace(text)

Parameters:

Name Type Description Default
text str

An arbitrary string.

required

Returns:

Type Description
str

The input string without newlines and consecutive whitespace replaced by a single whitespace character.

utils.xml

TAGS_WITH_MARKUP module-attribute

TAGS_WITH_MARKUP = {
    "b",
    "i",
    "fixed-case",
    "title",
    "abstract",
    "booktitle",
    "shortbooktitle",
}

XML tags which contain MarkupText.

TAGS_WITH_UNORDERED_CHILDREN module-attribute

TAGS_WITH_UNORDERED_CHILDREN = {
    "talk",
    "paper",
    "meta",
    "frontmatter",
    "event",
    "colocated",
    "author",
    "editor",
    "speaker",
    "variant",
}

XML tags whose child elements can logically appear in arbitrary order.

assert_equals

assert_equals(elem, other)

Assert that two Anthology XML elements are logically equivalent.

Parameters:

Name Type Description Default
elem _Element

The first element to compare.

required
other _Element

The second element to compare.

required

Raises:

Type Description
AssertionError

If the two elements are not logically equivalent.

indent

indent(elem, level=0, internal=False)

Enforce canonical indentation.

"Canonical indentation" is two spaces, with each tag on a new line, except that 'author', 'editor', 'title', and 'booktitle' tags are placed on a single line.

Parameters:

Name Type Description Default
elem _Element

The XML element to apply canonical indentation to.

required
level int

Indentation level; used for recursive calls of this function.

0
internal bool

If True, assume we are within a single-line element.

False
Note

Adapted from https://stackoverflow.com/a/33956544.

stringify_children

stringify_children(node)

Parameters:

Name Type Description Default
node _Element

An XML element.

required

Returns:

Type Description
str

The full content of the input node, including tags.

Used for nodes that can have mixed text and HTML elements (like <b> and <i>).

xml_escape_or_none

xml_escape_or_none(t)

Like xml.sax.saxutils.escape, but accepts None.

xsd_boolean

xsd_boolean(value)

Converts an xsd:boolean value to a bool.