At Large Advisory Committee Advice To Gnso Whois Task Forces

AT-LARGE ADVISORY COMMITTEE ADVICE TO GNSO WHOIS TASK FORCES

Date:

26 March 2004

I. WHOIS Task Force 1 Restricting Access to
WHOIS Data For Marketing Purposes
II. WHOIS Task Force 2 Review of Data Collected and Displayed
III. WHOIS Task Force 3 Improving Accuracy of Collected
Data

Note: Unless we specifically speak about registrars, our remarks apply
to registrar and to thick registry WHOIS systems alike.

I. Restricting Access to WHOIS Data for Marketing
Purposes

Policy proposal

We recommend a simple two-tiered system.

Tier 1 -- public access. Users who access a future WHOIS-like system anonymously
get access to non-sensitive information concerning a domain name registration,
to be defined in detail by task force 2.

Tier 2 -- authenticated
access. Users who want to access a more complete data set (to be defined
in detail by task force 2) need to reliably identify themselves, and indicate
the purpose for which they want to access the data.

The identity of the data user and their purpose is recorded by registrars
and registries, and made available to registrants when requested. This
information could be withheld for a certain amount of time if the data
user is (1) a law enforcement authority that is (2) accessing the data
for law enforcement purposes.

Implementation remarks

We do not recommend any particular implementation of this proposal, but
note that "reliable identification" could be provided by commercially
available SSL certificates. In general, we would favor implementation
of our proposal in a dedicated protocol (such as IRIS) over implementation
through Web forms.

Rationale

The key aspect for deciding whether access to data gathered by registrars
can be given to a third party is the purpose for which this data is going
to be used. Obviously, registrars have no way to verify the purpose for
which WHOIS data is being accessed.

The best heuristic we know of is to hold data users accountable for their
activities, and to put enforcement of purpose limitations into the hands
of registrants. This can be achieved by reliably identifying data uses
and putting their identity, contact information, and purpose indication
in the hands of registrants.

At the same time, a tiered system -- if implemented reasonably -- could
preserve the ability of data users to automatically access WHOIS data
in reasonable quantities. Registrars, on the other hand, would be enabled
to limit the amount of data any particular party can access in a given
interval of time.

Identifying data users and their purposes would also enable registrars
to comply with legal obligations to make this kind of information available
to data subjects.

Discussion of other proposals

There have been suggestions that "automated access" could be
used as a heuristic to determine illegitimate access. In this scheme,
automated access is blocked by attempting to require human attention with
all queries. One set of implementations of these kinds of tests is known
as CAPTCHA.

There is evidence that automated access is also being used for legitimate
purposes; on the other hand, there is publicly available information on
how CAPTCHA-like tests are being circumvented in other contexts. The circumvention
here is based on a fundamental design problem of CAPTCHAs. <http://boingboing.net/2004_01_01_archive.html#107525288693964966>

One particularly popular CAPTCHA has been broken in academic more than
a year ago, but is still being used by registrars. <http://www.cs.berkeley.edu/~mori/gimpy/gimpy.html>

Accessibility problems posed by CAPTCHA-like tests are not fully understood
by now; we note, though, that purely visual tests are insufficient from
an accessibility point of view. <http://www.w3.org/TR/turingtest/>

In conclusion, CAPTCHA tests address the wrong problem, and they address
it badly. We strongly recommend against going down this path.

II. Review of Data Collected and Displayed

Policy proposal

We recommend that the mandatory collection and display of personal information
about registrants be reduced as far as possible. What information is actually
required for placing a domain name registration should be a matter of
registrars' business models, and of applicable law, not of ICANN policy.

We consider the removal of the following data elements from registrars'
and registries WHOIS services (in a tiered model, from *all* tiers) a
priority:

Registrant name, address, e-mail address, and phone number, unless
registrant has requested that this information be made available.
Administrative contact name, address, e-mail address, and phone number,
unless registrant (or admin-c) has requested that this information be
made available.
Billing contact. These data are traditionally not published by registrars,
but are included in many thick registries' public WHOIS services.

For the purposes of a tiered access system (see recommendations for task
force 1), we would recommend that the following information be included
in a public tier:

Registrar of record.
Name servers.
Status of domain name.
Contact data, if the data subject specifically requests that these
data be included in the public tier.

Implementation remarks

None.

Rationale

For personal registrations, the registrant, administrative contact, and
billing contact data sets are most likely to
concern sensitive information, such as the registrant's home address and
phone number.

We recognize that domain name registrations by online merchants often
imply less privacy concerns; it has been
argued that online merchants must make privacy information public in many
jurisdictions. We are confident that businesses will also follow these
duties by requesting registrars to make contact information about them
available publicly. Conversely, if bad actors decide not to make contact
information publicly available, that could actually make bad actors more
easily recognizable, and provide consumers with a "red flag."

Discussion of other proposals

At the WHOIS workshop in Rome, we heard several lawyers praise the usefulness
of registrant and other telephone numbers in WHOIS services. That way,
we were told, many cases could be settled by a single phone call. The
easier the contact, we were told, the merrier.

This argument is troubling: What we were hearing there is a request to
ICANN to enable lawyers to make off the record contact with other parties
to a dispute that may not have a lawyer readily available, and to make
this contact in a way which makes it hard for the registrant to get legal
counsel involved in early negotiations arising out of the dispute.

Telephone numbers of registrant and administrative contacts should be
*removed* from WHOIS services for precisely this reason: Forcing the non-registrant
party to a dispute to open up that dispute by on-the-record means (e-mail,
fax [not universally available], postal mail) ensures that registrants
have an opportunity to retain legal counsel in these disputes, and to
fully understand any claims made by the non-registrant party. It also
helps to avoid legal bluff and plain bullying.

To summarize, it may be true that availability of phone numbers enables
quick settlement. But availability of phone numbers also favors situations
in which these settlements are achieved by dubious means, to the detriment
of the registrant.

III. Improving Accuracy of Collected Data

Additional comments submitted by ALAC

Summary and recommendations

The At-Large Advisory Committee would like to express appreciation for
the difficult and time-consuming work that the Task Force has been
doing.

However, we stress that trying to get accurate information from people
who are not willing to provide it is a waste of time and effort. No
automated verification scheme is able to tell between true data and
plausible data, and thus such schemes would only have the effect of
increasing the number of crimes such as identity theft and make
reliable identification of actual fraudsters even more difficult.
Generic TLDs are a global resource which should be impartially
accessible to registrants from all parts of the world. Verification
schemes usually do not cover all parts of the world with the same
effectiveness, and often information which may seem implausible to an
American eye will be actually true; so these schemes must not be used
to unfairly discriminate access to gTLDs depending on the registrant's
country. Also, any communication with the registrant should happen in
the registrant's own language; and the registrant should not be asked
to bear the cost of verification activities, since they are not part
of the service he is asking for, but rather of services desired by
some third-party data users.

The actual feasibility of a verification scheme that meets these
requirements, even after the data gathering activity made by the task
force, is still unproven. For these reasons, we recommend against
taking any action in this field at this stage.

We thus suggest that the focus of the work on Whois accuracy is
shifted from how to force unwilling people to provide their true
information to how to effectively allow registrants who want to
provide true information to do so. There are a number of practical
hurdles for any registrant to keep his/her data up to date, and
removing these hurdles would prove much more beneficial to the overall
accuracy of the Whois databases than going after an impossible and
worrying dream of a global centralized control system over
registrants' identities.

Finally, we note that the Registrar Accreditation Agreement provisions
about data collection, display and accuracy requirements and their
enforcement are clearly illegal, and thus void, in a number of
jurisdictions.

Thus we recommend that ICANN suspends any enforcement of those
provisions until the RAA and the related policies are amended so to
comply with existing laws; as clearly and repeatedly exposed in
writing and in person by a number of relevant public authorities, any
other choice is likely to bring ICANN and involved registrars to
litigation with registrants and with the Privacy Authorities in
European and other countries.

A deeper analysis on the problem of Whois accuracy

We think that, to be able to solve a problem, you should first
investigate the reasons why it happens. In this case, you could
roughly divide the registrants whose data are inaccurate into four
categories:

1. Those who purposedly provide inaccurate data for fraudulent
reasons.
2. Those who purposedly provide inaccurate data to protect their
privacy.
3. Those who mistakenly provide inaccurate data.
4. Those who provide accurate data at registration, but then fail
to keep them up to date so that the information becomes inaccurate.

Until now, the general discussion on accuracy has been almost
completely focused on the first category and we think this is an
error. The purpose of the Whois system is not to provide bullet-proof
identification for those who register domains and operate services on
top of them, but rather to provide quick contact information for those
domain holders who want to be contacted. Turning the Whois system into
a certified directory of domain name owners would go beyond its
purpose and, as practice shows, is practically incompatible with its
spirit and architecture.

Also, at the present state of technology and of operational practices,
costs of very secure authentification of world-wide registrants for
all domain name registrations would be high and would possibly destroy
the domain name market as we know it today. We think it might be more
cost-effective (and also more respectful of basic civil rights of
people) to seek after fraudulent registrants once they actually commit
a fraud, rather than to presume that all registrants are to commit
frauds and so should be carefully screened in advance.

Finally, we point out that there is no verification system, other than
requiring a person to physically show up and exhibit a secure proof of
identity such as a passport or national ID document, that could tell
between true personal data and plausible, but fake, personal data. If
going down the path of imposing stricter and stricter checks on data
as they are submitted by the registrant during the registration
process, after spending lots of time and lots of money on them, we
might actually discover that no benefit has arisen in terms of fraud
prevention, but that the stricter checks have caused a huge increase
in crimes like identity theft, which by the way are made easier by the
very existence of the public and anonymously accessible Whois system.

Said this, we think that an increased accuracy in the Whois database,
if limited to those registrants who actually agree to provide their
data, would be highly desirable. This is why we think that future
activities in the field of enhanced accuracy should not focus on the
first category of the above list, but rather on the other three.

We will not discuss here the issue of privacy protection, which is the
subject of another task force; we just stress that the overwhelming
majority of those who purposedly provide inaccurate data does so for
privacy protection reasons, rather than for fraudulent intentions.
Just allowing these people not to disclose their data to the public,
but just to the registrar, would actually avoid most cases of wilful
inaccuracy.

The third category is, according to our experience, somewhat small
also because this kind of errors is clerical and can easily be fixed
in case there is actual need to contact the owner. Once the
registrant's desire to publish their data is ascertained, some simple
automated verifications could be made by the registrar's system, to
warn the registrant about possible errors.

However, creating an automatical verification algorithm for all
countries and scripts of the world might prove very difficult and
prone to errors for less common countries; the current practical
examples only come from TLDs and environments with geographically
limited registrants. On the other hand, systems which provide
automatical verification only for residents of some countries could be
acceptable only as long as they do not prevent or make it unreasonably
harder for residents of unverifiable countries to register domains.
This is why we think that the output of this automated verification
algorithms should only be used as a warning to the registrant, but
should not prevent the registrant from submitting data that might seem
incorrect, as they could possibly be absolutely correct.

We also note that requiring Roman-script information for registrants
of those countries who do not use Roman characters would be unduly
discriminating them in access to gTLDs. All registrants should be
asked to provide their data only in their local language and script,
and just as an option they could be asked whether they want to provide
Romanized data as well. Requiring the ability to type in Roman script
to register domains in global generic TLDs is unacceptable.

Finally, we think that much could be done to improve the situation of
the fourth category those registrants who would be happy to provide
accurate information, but who fail to keep it up to date. In fact,
experience shows that updating Whois data is a long and difficult
process for registrants. In many cases, the registrant has to send
faxes, make phone calls, and suffer other costs while devoting a
significant amount of time; in other cases, the authentication
mechanism used by registries or registrars is based on the e-mail
address (or on a username/password couple which, if forgot, will be
resent to the current e-mail address), so that a change in the e-mail
address of the registrant will make him/her unable to manage the
information, and will make these domains orphan. If you add this to
the fact that keeping personal data up to date in a public Whois
registry certainly cannot be the first worry of a registrant when he's
changing address, phone number or e-mail address, you realize that
this is possibly the easiest cause of inaccuracy in Whois databases.

Also, in many cases the registrant is only the last link in a long
chain of interactions that starts with a registry, then goes through
an ICANN-accredited registrar, a domain name reseller, a web hosting
company, or even an Internet-savvy friend who does the job for the
registrant. We think that this is an unavoidable consequence of the
average registrant turning from a skilled engineer in a small
Internet, as it was when Whois was designed, to a non-technical
average person in a mass Internet. It is very difficult to create the
awareness of the existence and purpose of the Whois database for
non-technical persons on a mass scale, and we think this is another
reason why we should never expect the Whois to be a terribly accurate
list of all registrants.

However, for this category the problem possibly lies in the lack of
simple online systems for the registrant to edit his/her data in the
database at no cost. Thus we think that one of the two following
solutions should be tried:

1. Requiring registries to directly deal with registrants' update
requests, by supplying them a virtual certificate or account at
registration, plus offline procedures to recover access if such
account is lost;
2. Changing the architecture of the Whois database from centralized
to distributed.

Since the first option would raise many concerns in terms of business
models, customer ownership, and cost recovery, the second could
possibly be more interesting. After all, the very reason for which the
DNS system was created, replacing the old centralized hosts table, was
the impossibility of keeping this centralized table up to date. We
should simply apply the same principle and move the data at the edge
of the network, by embedding Whois servers into DNS server
implementations. Whois queries could then be sent directly to the
authoritative name servers for the domain, and only if no reply is
received, the registry could be used as a fall-back. This way,
registrants would be able to keep their Whois information up to date
as easily as they keep their zone files up to date, and even if this
would not completely solve the problem, it would possibly cause a
dramatic increase in the number of Whois records that are actually
kept updated.

We thus recommend a shift in the focus of accuracy-related
discussions, so to deal with those types of inaccuracy that can and
should actually be solved, rather than dealing with world-wide
verification and law enforcement systems that are not practically
conceivable at the present social and political state of our planet,
and that would anyway have to be discussed at other political levels.

AT-LARGE ADVISORY COMMITTEE ADVICE TO GNSO WHOIS TASK FORCES

A note about our privacy policies and terms of service: