utils¶
utils.ids¶
Functions for manipulating Anthology IDs.
AnthologyID
module-attribute
¶
AnthologyID = str | AnthologyIDTuple
Any type that can be parsed into an Anthology ID.
AnthologyIDTuple
module-attribute
¶
AnthologyIDTuple = tuple[str, Optional[str], Optional[str]]
A tuple representing an Anthology ID.
build_id ¶
build_id(collection_id, volume_id=None, paper_id=None)
Transforms collection ID, volume ID, and paper ID to a width-padded Anthology ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
collection_id |
str
|
A collection ID, e.g. "P18". |
required |
volume_id |
Optional[str]
|
A volume ID, e.g. "1". |
None
|
paper_id |
Optional[str]
|
A paper ID, e.g. "42". |
None
|
Returns:
Type | Description |
---|---|
str
|
The full Anthology ID. |
Examples:
>>> build_id("P18", "1", "1")
P18-1001
>>> build_id("2022.acl", "long", "42")
2022.acl-long.42
Warning
Does not perform any kind of input validation.
build_id_from_tuple ¶
build_id_from_tuple(anthology_id)
Like build_id(), but takes any AnthologyID type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
anthology_id |
AnthologyID
|
The Anthology ID to convert into a string. |
required |
Returns:
Type | Description |
---|---|
str
|
The full Anthology ID. |
Examples:
>>> build_id(("P18", "1", "1"))
P18-1001
infer_year ¶
infer_year(anthology_id)
Infer the year from an Anthology ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
anthology_id |
AnthologyID
|
An arbitrary Anthology ID. |
required |
Returns:
Type | Description |
---|---|
str
|
The year of the item represented by the Anthology ID, as a four-character string. |
parse_id ¶
parse_id(anthology_id)
Parses an Anthology ID into its constituent collection ID, volume ID, and paper ID parts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
anthology_id |
AnthologyID
|
The Anthology ID to parse. |
required |
Returns:
Type | Description |
---|---|
AnthologyIDTuple
|
The parsed collection ID, volume ID, and paper ID. |
Examples:
>>> parse_id("P18-1007")
('P18', '1', '7')
>>> parse_id("W18-6310")
('W18', '63', '10')
>>> parse_id("D19-1001")
('D19', '1', '1')
>>> parse_id("D19-5702")
('D19', '57', '2')
>>> parse_id("2022.acl-main.1")
('2022.acl', 'main', '1')
Also works with volumes:
>>> parse_id("P18-1")
('P18', '1', None)
>>> parse_id("W18-63")
('W18', '63', None)
And even with just collections:
>>> parse_id("P18")
('P18', None, None)
Warning
Does not perform any kind of input validation.
Note
For Anthology IDs prior to 2020, the volume ID is the first digit after the hyphen, except for the following situations, where it is the first two digits:
- All collections starting with 'W'
- The collection "C69"
- All collections in "D19" where the first digit is >= 5
utils.latex¶
BIBTEX_FIELD_NEEDS_ENCODING
module-attribute
¶
BIBTEX_FIELD_NEEDS_ENCODING = {
"journal",
"address",
"publisher",
"note",
}
Any BibTeX field whose value should be LaTeX-encoded first.
BIBTEX_MONTHS
module-attribute
¶
BIBTEX_MONTHS = {
"january": "jan",
"february": "feb",
"march": "mar",
"april": "apr",
"may": "may",
"june": "jun",
"july": "jul",
"august": "aug",
"september": "sep",
"october": "oct",
"november": "nov",
"december": "dec",
}
A mapping of month names to BibTeX macros.
SerializableAsBibTeX
module-attribute
¶
SerializableAsBibTeX = (
None | str | MarkupText | list[NameSpecification]
)
Any type that can be supplied to make_bibtex_entry
.
bibtex_convert_month ¶
bibtex_convert_month(spec)
Converts a month string to BibTeX macros.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spec |
str
|
A month specification, as stored in the metadata. |
required |
Returns:
Type | Description |
---|---|
str
|
A BibTeX macro corresponding to the month specification, if possible. If the string contains digits or is otherwise not parseable, it is returned unchanged with quotes around it. |
has_unbalanced_braces ¶
has_unbalanced_braces(string)
Checks if a string has unbalanced curly braces.
latex_convert_quotes ¶
latex_convert_quotes(text)
latex_encode ¶
latex_encode(text)
make_bibtex_entry ¶
make_bibtex_entry(bibtype, bibkey, fields)
Turn a list of field/value pairs into a BibTeX entry.
Values will be LaTeX-formatted if necessary, and can also be empty, in which case they are automatically omitted.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bibtype |
str
|
The BibTeX type for the entry. |
required |
bibkey |
str
|
The BibTeX key for the entry. |
required |
fields |
list[tuple[str, SerializableAsBibTeX]]
|
A list of tuples of the form (key, value) specifying the fields to include in the entry. |
required |
Returns:
Type | Description |
---|---|
str
|
A fully formatted BibTeX entry. |
namespecs_to_bibtex ¶
namespecs_to_bibtex(namespecs)
Convert a list of NameSpecifications to a BibTeX-formatted entry.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
namespecs |
list[NameSpecification]
|
A list of names to be included in the BibTeX entry. |
required |
Returns:
Type | Description |
---|---|
str
|
A BibTeX-formatted string representing the given names. |
utils.logging¶
Functions for logging.
SeverityTracker ¶
SeverityTracker(level=logging.NOTSET)
setup_rich_logging ¶
setup_rich_logging(**kwargs)
Set up a logger that uses rich markup and severity tracking.
This function is intended to be called in a script. It calls logging.basicConfig and is therefore not executed by default, as applications may wish to setup their loggers differently.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs |
object
|
Any keyword argument will be forwarded to logging.basicConfig. If logging handlers are defined here, they will be preserved in addition to the handlers added by this function. |
{}
|
Returns:
Type | Description |
---|---|
SeverityTracker
|
The severity tracker, so that it can be used to check the highest emitted log level. |
utils.text¶
utils.xml¶
TAGS_WITH_MARKUP
module-attribute
¶
TAGS_WITH_MARKUP = {
"b",
"i",
"fixed-case",
"title",
"abstract",
"booktitle",
"shortbooktitle",
}
XML tags which contain MarkupText.
TAGS_WITH_UNORDERED_CHILDREN
module-attribute
¶
TAGS_WITH_UNORDERED_CHILDREN = {
"talk",
"paper",
"meta",
"frontmatter",
"event",
"colocated",
"author",
"editor",
"speaker",
"variant",
}
XML tags whose child elements can logically appear in arbitrary order.
assert_equals ¶
assert_equals(elem, other)
Assert that two Anthology XML elements are logically equivalent.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
elem |
_Element
|
The first element to compare. |
required |
other |
_Element
|
The second element to compare. |
required |
Raises:
Type | Description |
---|---|
AssertionError
|
If the two elements are not logically equivalent. |
indent ¶
indent(elem, level=0, internal=False)
Enforce canonical indentation.
"Canonical indentation" is two spaces, with each tag on a new line, except that 'author', 'editor', 'title', and 'booktitle' tags are placed on a single line.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
elem |
_Element
|
The XML element to apply canonical indentation to. |
required |
level |
int
|
Indentation level; used for recursive calls of this function. |
0
|
internal |
bool
|
If True, assume we are within a single-line element. |
False
|
Note
Adapted from https://stackoverflow.com/a/33956544.
stringify_children ¶
stringify_children(node)