Accessing Authors/Editors¶
People are complicated.1 Metadata for publications often only includes the "name" of each author given as a string; but names can be ambiguous (the same name can refer to different people), and conversely, the same person can have published under different names.
Therefore, when it comes to names and personal identities, this library distinguishes between the following three concepts:
Name
objects represent a name. They are essentially strings with a little bit of metadata, but contain no information about the actual identity of a person behind the name.NameSpecification
objects represent authors/editors as specified on a publication. They are essentially names with optional extra information for disambiguation, such as the person's affiliation or their internal Anthology ID.Person
objects represent natural persons. They may have one or more names, but will always have one name that we consider to be the "canonical" one.
Tip
It is useful to remember that only a Person can have publications. If you have only a Name or a NameSpecification, you first need to resolve that to a Person before you can look up papers authored/edited by that person.
Names¶
A person's name is always split up into first and last name components. While this, of course, doesn't fully reflect the complexities of how names work across different cultures, it is the minimum structure that we assume in order to, e.g., generate accurate bibliographic information.
The following ways to instantiate a Name are equivalent:
from acl_anthology.people import Name
Name("Yang", "Liu")
Name(last="Liu", first="Yang")
If a person only has a single name, the convention is to record this as the
last name. In this case, the first name part must be explicitly given as
None
:
Name(None, "Mausam")
Looking up names¶
To look up names, use
anthology.find_people
, which
will return a list of persons that can be referred to by that name:
>>> anthology.find_people(Name("Yang", "Liu"))
[
Person(id='yang-liu-edinburgh', names=[Name(first='Yang', last='Liu')], item_ids=<set of 15 AnthologyIDTuple objects>, comment='Edinburgh'),
Person(id='yang-liu-blcu', names=[Name(first='Yang', last='Liu')], item_ids=<set of 1 AnthologyIDTuple objects>, comment='Beijing Language and Culture University'),
Person(id='yang-liu-hk', names=[Name(first='Yang', last='Liu')], item_ids=<set of 3 AnthologyIDTuple objects>, comment='The Chinese University of Hong Kong (Shenzhen)'),
... 12 more ...
]
For convenience, you can also call .find_people()
with tuples or strings; the
following are all equivalent:
anthology.find_people("Yang Liu")
anthology.find_people("Liu, Yang")
anthology.find_people(("Yang", "Liu"))
However, supplying a {first} {last}
string only works as long as the split is
unambiguous; you must use the {last}, {first}
format otherwise:
anthology.find_people("Daniel A. McFarland") # raises ValueError
anthology.find_people("McFarland, Daniel A.") # works
Name specifications¶
Author or editor fields, e.g. on papers, will always return a
NameSpecification
. This is
mostly a regular name with an optional ID (i.e., the name was already manually
disambiguated by us) and affiliation. In the example below, you can see that
author "Yang Liu" was assigned an explicit ID in the metadata:
>>> paper = anthology.get("2021.emnlp-main.151")
>>> paper.authors
[
NameSpecification(name=Name(first='Jialu', last='Wang'), id=None, affiliation=None, variants=[]),
NameSpecification(name=Name(first='Yang', last='Liu'), id='yang-liu-umich', affiliation=None, variants=[]),
NameSpecification(name=Name(first='Xin', last='Wang'), id=None, affiliation=None, variants=[])
]
The "variant" field is not systematically used at the moment, but is intended for name variants written in a different script, such as:
>>> anthology.get("2021.ccl-1.1").authors
[
NameSpecification(name=Name(first='Hao', last='Wang'), id=None, affiliation=None,
variants=[Name(first='浩', last='汪')]),
NameSpecification(name=Name(first='Junhui', last='Li'), id=None, affiliation=None,
variants=[Name(first='军辉', last='李')]),
NameSpecification(name=Name(first='Zhengxian', last='Gong'), id=None, affiliation=None,
variants=[Name(first='正仙', last='贡')])
]
Looking up name specifications¶
In contrast to names, name specifications will always resolve to a single person. This is enforced by our metadata checks; if name specifications are ambiguous, they must be resolved before the data can appear in the ACL Anthology.
To look up name specifications, use
anthology.resolve
, which will
return the person that is being referred to:
>>> paper = anthology.get("2021.emnlp-main.151")
>>> name_spec = paper.authors[1]
>>> name_spec
NameSpecification(name=Name(first='Yang', last='Liu'), id='yang-liu-umich', affiliation=None, variants=[])
>>> anthology.resolve(name_spec)
Person(
id='yang-liu-umich',
names=[Name(first='Yang', last='Liu')],
item_ids=<set of 4 AnthologyIDTuple objects>,
comment='Univ. of Michigan, UC Santa Cruz'
)
Persons¶
A Person
object represents a natural
person. The documentation above showed how persons can be looked up via names
or name specifications; they can also be retrieved directly from their ID:
anthology.get_person("yang-liu-umich")
A person will always have exactly one canonical name, which is the one that is used as the leading name on author pages:
>>> person = anthology.get_person("dan-mcfarland")
>>> person.canonical_name
Name(first='Dan', last='McFarland')
They may also have additional names:
>>> person.names
[
Name(first='Dan', last='McFarland'),
Name(first='Daniel', last='McFarland'),
Name(first='Daniel A.', last='McFarland')
]
Looking up publications¶
You can get a set of all items associated with a person:
>>> person.item_ids
{('Q18', '1', '28'), ('2020.findings', 'emnlp', '158'), ('W11', '15', '16'), ...}
For convenience, you can also use
Person.volumes()
and
Person.papers()
to iterate over
the set of volumes/papers that person is associated with.
An Entity-Relationship diagram¶
erDiagram
Name {
str first
str last
}
NameSpecification {
Optional[str] id
Optional[str] affiliation
}
Person {
}
"Paper, Volume, etc." {
}
Person ||--|{ NameSpecification : identified-by
Person }|--|{ Name : has
NameSpecification }o--|| Name : contains
"Paper, Volume, etc." }o--|{ NameSpecification : refers-to
-
Both in real life and in bibliographic metadata. ↩