Getting Started¶
To use this library, you need Python 3.10 or newer. For automatically fetching data from the main ACL Anthology repository, you will also need to have Git installed.
Installation¶
The library is available as a PyPI package and can therefore simply be installed via pip:
pip install acl-anthology-py
Alternatively, you can download releases from Github.
Instantiating the Anthology¶
From the official repository¶
The easiest way to instantiate the Anthology in Python is as follows:
from acl_anthology import Anthology
# Instantiate the Anthology from the official repository
anthology = Anthology.from_repo()
This will automatically fetch the latest metadata from the official ACL Anthology repository. If you are instantiating the Anthology for the first time, it might take a few seconds to complete, as it will download around ~120 MB worth of data. On subsequent instantiations, it will look for updates and only download missing/updated data.
From a folder on your machine¶
If you want to instantiate the Anthology from a local folder on your machine, do:
anthology = Anthology(datadir="/home/user/repos/acl-anthology/data")
This may be useful if you are working on your personal fork of the Anthology, or
a branch of the official repo. The argument to datadir
needs to point to a
data directory with the same structure as the data/
directory of the official
repo.
Examples¶
This section demonstrates how to use the anthology
object by way of examples.
Finding a paper by its ID¶
All metadata from the Anthology can be accessed through the
anthology
object. For example, to obtain
information about a specific paper, you can call
anthology.get()
with the paper's
Anthology ID:
>>> anthology.get("2022.acl-long.220")
Paper(
id='220',
bibkey='kitaev-etal-2022-learned',
title=MarkupText('Learned Incremental Representations for Parsing'),
authors=[
NameSpecification(name=Name(first='Nikita', last='Kitaev'), id=None, affiliation=None, variants=[]),
NameSpecification(name=Name(first='Thomas', last='Lu'), id=None, affiliation=None, variants=[]),
NameSpecification(name=Name(first='Dan', last='Klein'), id=None, affiliation=None, variants=[])
],
...
)
For more information on the provided metadata fields, see Types of Metadata.
Finding all papers by an author¶
To find a person by name, you can use anthology.find_people()
:
>>> results = anthology.find_people("Dan Klein")
Note that this will always return a list of
Person
objects, as names can be
ambiguous. For now let's assume there is only one, and get all their
publications:
>>> person = results[0]
>>> person.item_ids
{
('P18', '2', '75'),
('P17', '2', '52'),
('2023.acl', 'short', '65'),
...
}
>>> for paper in person.papers():
... print(paper.title)
...
Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing
Fine-Grained Entity Typing with High-Multiplicity Assignments
Modular Visual Question Answering via Code Generation
...
If you know the internal ID of the person (which is what appears in the URL
for their author page, e.g.,
https://aclanthology.org/people/d/dan-klein/),
you can find the corresponding Person
object directly:
>>> person = anthology.get_person("dan-klein")
If you want to look up a person based on the "author" or "editor" field of an
existing paper, you are working with a
NameSpecification
, which is a
name that may additionally contain information to help disambiguate it from
similar names. In this case, you can call
anthology.resolve()
, which will
always return a single, uniquely identified person:
>>> paper = anthology.get("2022.acl-long.220")
>>> paper.authors[-1]
NameSpecification(name=Name(first='Dan', last='Klein'), ...)
>>> person = anthology.resolve(paper.authors[-1])
Accessing Authors/Editors describes the intricacies of working with names and people in more detail.
Finding all papers from an event¶
Volumes that were presented at the same conference are grouped together under
Event
objects. For example, here is
ACL 2022 and all volumes that
belong to the conference or to colocated workshops:
>>> event = anthology.get_event("acl-2022")
>>> event
Event(
id='acl-2022',
is_explicit=True,
colocated_ids=<list of 34 AnthologyIDTuple objects>,
title=MarkupText('60th Annual Meeting of the Association for Computational Linguistics'),
location='Dublin, Ireland',
dates='May 22–27, 2022'
)
>>> for volume in event.volumes():
... print(volume.title)
...
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Findings of the Association for Computational Linguistics: ACL 2022
Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models
Proceedings of the 21st Workshop on Biomedical Language Processing
...
If you don't know the event ID(s), you can also get associated event IDs from a
paper or volume. Here, we find out that
2020.blackboxnlp-1
belongs to its "own" event (blackboxnlp-2020
), the generic "workshops in 2020"
event (ws-2020
), as well as the EMNLP 2020 event.
>>> volume = anthology.get("2020.blackboxnlp-1")
>>> volume.get_events()
[
Event(id='blackboxnlp-2020', colocated_ids=<list of 1 AnthologyIDTuple objects>, ...),
Event(id='ws-2020', colocated_ids=<list of 105 AnthologyIDTuple objects>, ...),
Event(id='emnlp-2020', colocated_ids=<list of 27 AnthologyIDTuple objects>, ...)
]
Getting the BibTeX entry for a paper¶
To generate the BibTeX entry for a paper, simply call Paper.to_bibtex()
:
>>> paper = anthology.get("2022.acl-long.220")
>>> print(paper.to_bibtex())
@inproceedings{kitaev-etal-2022-learned,
title = "Learned Incremental Representations for Parsing",
author = "Kitaev, Nikita and
Lu, Thomas and
Klein, Dan",
editor = "Muresan, Smaranda and
Nakov, Preslav and
Villavicencio, Aline",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.220/",
doi = "10.18653/v1/2022.acl-long.220",
pages = "3086--3095"
}
To also include the abstract in the BibTeX entry:
>>> print(paper.to_bibtex(with_abstract=True))
Searching for papers by keywords in title¶
There is no dedicated search index for paper titles, but you can iterate over papers and compare their titles manually. For example, the following code finds all papers containing the substring "semantic parsing" in their title, and prints their Anthology IDs and full titles:
>>> for paper in anthology.papers():
... if "semantic parsing" in str(paper.title).lower():
... print(paper.full_id, paper.title)
...
2007.tmi-papers.10 Learning bilingual semantic frames: shallow semantic parsing vs. semantic role projection
2020.acl-main.427 CraftAssist Instruction Parsing: Semantic Parsing for a Voxel-World Assistant
2020.acl-main.606 Semantic Parsing for English as a Second Language
2020.acl-main.608 Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing
2020.acl-main.742 Exploring Unexplored Generalization Challenges for Cross-Database Semantic Parsing
2020.acl-main.746 Universal Decompositional Semantic Parsing
2020.acl-demos.29 Usnea: An Authorship Tool for Interactive Fiction using Retrieval Based Semantic Parsing
2020.alta-1.16 Transformer Semantic Parsing
2020.coling-main.226 Context Dependent Semantic Parsing: A Survey
...
Note how the comparison calls str()
on paper.title
to obtain the title as a
string. This is because paper titles can contain markup, and therefore need
to be explicitly converted to strings first if you want to perform string
operations on them.