Skip to content

Overview

What is this?

This is a Python library for accessing data from the ACL Anthology.

The ACL Anthology is a prime resource for research papers within computational linguistics and natural language processing. Metadata for all of its publications is stored in a public Github repository. This package provides functionality to access all of the metadata you can find on the website easily from within Python. If you are interested in contributing to the Anthology, you can even use this library to programmatically make changes to the metadata.

How to use

This package requires Python 3.10 or newer. Install via pip:

pip install acl-anthology-py

Instantiate the library, automatically fetching data files from the ACL Anthology repo (requires Git to be installed on your system):

from acl_anthology import Anthology
anthology = Anthology.from_repo()

Some brief examples

>>> paper = anthology.get("C92-1025")
>>> str(paper.title)
Two-Level Morphology with Composition
>>> [author.name for author in paper.authors]
[
    Name(first='Lauri', last='Karttunen'),
    Name(first='Ronald M.', last='Kaplan'),
    Name(first='Annie', last='Zaenen')
]
>>> anthology.find_people("Karttunen, Lauri")
[
    Person(
        id='lauri-karttunen', names=[Name(first='Lauri', last='Karttunen')],
        item_ids=<set of 30 AnthologyIDTuple objects>, comment=None
    )
]

Further information

Look at the Getting Started guide for further information, or the API documentation for detailed descriptions of the provided functions.