Identity, Authentication and Authority for the Virtual Observatory

Introduction

Why should the VO burden itself with considerations of Authority, Identity and authentication? The astronomical parts of the current web do not seem to need these concepts; most web access is anonymous. What do we gain by going beyond anonymous access?

Authority is a given. Almost all data archives have some usage policy, such as a one-year proprietary period in which data are only available to the original observers. If every single datum attached to the VO were to be put into the public domain, then the authority issue would be trivial; but this is unlikely to be the case. Otherwise, authority is part of the science requirements. This is slightly masked by the fact that the WWW interfaces to archives usually disregard the access policy of the archive, or implement only the lowest level of access.

Identity, and its on-line representation, is necessary to support authority. Current on-line facilities that use anonymous access cannot grant special authority to individuals and hence cannot grant all the access implied in their policies. For example, a PI on a survey should be allowed access to data during the proprietary period but will not be able to obtain that access anonymously.

Identity also allows the VO to store results on behalf of users for later collection. It enables a user to start a large and complex query, to disconnect from the VO, and later to reconnect and ask for the results. "The user's results" is only a valid concept when the user can be identified. The idea that results are stored on the grid is built into the current AstroGrid architecture.

Authentication (i.e. the proving of identity) is necessary to support identity. Because the VO is distributed and is built on the public Internet, all parts of it may be addressed by potential users and by attackers. An identity assumed by a party may be false and identities must be tested before they are trusted. There is no place to put a firewall around the VO such that internal operations may trust identities without authentication.

Authority

Authority attached to resources

Authority means permission to use some resource, and resource means either a data-set or some computing facility for which there is competition. These are examples of authorities:

Permission to copy an entire data-set (e.g. a CCD image in a FITS file).
Permission to search a data-set and copy part of it (e.g. an object catalogue in a DB table).
Permission to alter or delete a data-set.
Permission to store private files of results in storage provided by the VO.
Permission to run an involved query that takes a day of compute time at a data-centre.
Permission to use a data-mining system exclusively for three hours.

In this list, the first three authorities are binary - a user either has them or he does not - but the last three imply limits on the amount of some resource that is granted to a user.

Authority attached to individuals

Authorities apply to potential users of resources: they are associated with identified parties.

Most facilities have access policies granting rights to groups of persons. Individuals receive the authorities by affiliation. For example:

Members of a survey team can access observations from the survey during the proprietary period.
Researchers working for institutions in ESO member-states can access the ESO archives.
Staff at British institutions can apply to PATT for time on telescopes funded by the UK.

Some authorities over data-sets apply inherently to one individual. E.g.

The results of the data-mining run #2356547 on the Leicester facility belong to J. Random Astronomer of the University of Birmingham and are for her use only.

This means that each individual is affilated to a group consisting only of himself (or herself, or, recognising that some individuals may be automata, itself).

These per-individual authorities may be extended by delegation. E.g.

JRA grants read and write access to result-set #2358547 to her research student.

Authorities tabulated

A central register of authorities could, in principle, be drawn up as a table of identities against resources. Since authorities can apply to individuals and to individual data-sets, this table is, by implication, a matrix of every registered user against every data-file in the VO: a horrendously large amount of data to sort and search. The third dimension of the matrix lists the different kinds of possible authority (e.g. read access, write access, etc.).

Since most authorities apply to groups of individuals (by affiliation) and to groups of data-sets, the hypothetical big table could be split into three smaller units:

individuals against affiliations;
affiliations against groups of resources;
resources against groups of resources.

These three parts are logically all sparse matrices. However, they don't have to be implemented that way. The individual/affiliation mapping can be done as a list of affiliation codes for each registered individual. The affiliation/resource-group mapping can be a list of affiliation codes for each group. The sets of resource in each group is naturally a list. These variable-length lists may prove to be easier to handle than tables with fixed dimensions and most cells empty.

It may be convenient, or even necessary, to distribute the parts. For example, the body that registers users into the VO knows about the affiliations, but the providers of the data-services know the mapping from affiliation to groups of resources. In checking authority, the service provider needs to obtain a list of affiliations to check.

If the affiliation/resource-group and resource-group/resource mappings are delegated to the service providers, there is no need to impose any standard on how they are implemented.

Identity

Requirements

Identity of an individual has its commonplace meaning. In the VO, the identity serves mainly to determine the list of affiliations described in the previous section, but identity is still vital.

The prime requirement of an on-line identity is that it be unique throughout the organization where it is used. In the case of the Virtual Observatory, this means that the identity needs to be unique amongst all astronomers who may use the VO. If the VO chooses to rent machine time on facilities of other grids (a possible scenario), then the identities of the users need to be unique in the union of the VO and the other grid. Ultimately, this suggests that identities should be globally unique.

Since email addresses are globally unique, they are an economical way of encoding identity. Some parts of the web already use email addresses this way.

The second requirement on an identity is that it be verifiable using any one of the authentication schemes chosen by the virtual organization. This means that the authentication method dictates the form in which the identity is encoded.

The third requirement on identity is that it change rarely if at all, and that the time between change be long compared with the time for the system to adapt to the change. If this condition is not met, users will be deprived of their access rights for much of the time (like a world traveller who mail never quite catches up with him). This makes the use of email addresses suspect for the VO, since email addresses are tied to institutions.

The fourth requirement on the identity is that its encoding, in a certificate or whatever, be small enough to attach to all service requests without overloading the data-transport mechanism. This may limit either the form of encoding or the form of the data transport.

The X509 standard for certificates of identity

One common way of authentication (described in more detail below) is to write the identity into a digitally-signed certificate as defined by the X509 standard.^[?] That means that identity is carried in a small (~300 bytes) document that is signed by a trusted third party and which cannot be altered without invalidating the signature.

The identity encoded in an X509 certificate is formally known as a "distinguished name" (DN).

This is an example of a certificate:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 2844 (0xb1c)
        Signature Algorithm: md5WithRSAEncryption
        Issuer: C=US, O=Globus, CN=Globus Certification Authority
        Validity
            Not Before: Oct  4 16:06:11 2000 GMT
            Not After : Oct  4 16:06:11 2001 GMT
        Subject: O=Grid, O=Globus, OU=ast.cam.ac.uk, CN=Guy T Rixon
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
            RSA Public Key: (1024 bit)
                Modulus (1024 bit):
                    00:b8:05:8b:ed:fb:33:2d:f4:fb:1b:6f:fe:40:07:
                    34:51:a0:cf:6c:62:b8:11:25:d8:94:ee:00:c8:da:
                    24:07:1f:3c:55:09:cf:12:6b:03:cf:4a:25:97:f2:
                    02:20:1b:a1:38:82:56:e9:6b:7d:9a:2d:50:9d:ff:
                    b2:ce:46:90:3d:a1:bf:99:f3:7f:ee:7f:0b:31:09:
                    45:50:a9:e1:1c:08:1e:c4:9f:92:1d:78:fe:66:46:
                    3e:49:1d:5f:20:08:b1:c3:77:2f:6f:83:5f:b5:53:
                    e3:60:43:e3:1b:3b:3f:7e:e1:4e:be:1e:37:3e:1d:
                    b3:7d:72:e7:cf:60:ef:19:e7
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            Netscape Cert Type:
                0xC0

    Signature Algorithm: md5WithRSAEncryption
        49:f1:ff:37:49:c5:d9:a6:b1:40:ae:6a:b2:43:da:7e:a6:7d:
        a6:3b:99:28:5e:87:9c:a6:40:b2:13:8d:13:e5:d0:3f:2b:63:
        c1:e9:3a:80:54:64:b7:cb:9b:21:b6:97:1a:0a:aa:99:75:62:
        09:30:db:74:47:66:50:47:b6:b1:59:bc:24:8c:3d:6e:cb:b2:
        cb:a4:7d:d1:b0:a1:7a:fe:a9:78:a5:ac:4d:0a:81:69:37:34:
        46:e2:af:50:86:13:9b:dc:95:38:d0:3a:62:ba:4f:5a:01:35:
        d2:b5:66:7e:90:ff:09:44:7d:d6:ed:f3:dd:f2:43:23:59:30:
        53:09

-----BEGIN CERTIFICATE-----
MIICITCCAYqgAwIBAgICCxwwDQYJKoZIhvcNAQEEBQAwRzELMkGA1UEBh
MCVVMxDzANBgNVBAoTBkdsb2J1czEnMCUGA1UEAxMeR2xvYnVzIENlcnR
pZmljYXRpb24gQXV0aG9yaXR5MB4XDTAwMTAwNDE2MDYxMVoXDTAxMTAw
NDE2MDYxMVowTjENMAsGA1UEChMER3JpZDEPMA0GA1UEChMGR2xvYnVzM
RYwFAYDVQQLEw1hc3QuY2FtLmFjLnVrMRQwEgYDVQQDEwtHdXkgVCBSaX
hvbjCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAuAWL7fszLfT7G2/
+QAc0UaDPbGK4ESXYlO4AyNokBx88VQnPEmsDz0oll/ICIBuhOIJW6Wt9
mi1Qnf+yzkaQPaG/mfN/7n8LMQlFUKnhHAgexJ+SHXj+ZkY+SR1fIAixw
3cvb4NftVPjYEPjGzs/fuFOvh43Ph2zfXLnz2DvGecCAwEAAaMVMBMwEQ
YJYIZIAYb4QgEBBAQDAgDAMA0GCSqGSIb3DQEBBAUAA4GBAEnxzdJxdmm
sUCuarJD2n6mfaY7mSheh5ymQLITjRPl0D8rY8HpOoBUZLfLmyG2lxoKq
pl1Ygkw23RHZlBHtrFZvCSMPW7LssukfdGwoXr+qXilrE0KgWk3NEbir1
CGE5vclTjQOmK6T1oBNdK1Zn6Q/wlEfdbt893yQyNZMFMJ
-----END CERTIFICATE-----

The certificate information is given twice, first in human-readable form and then in condensed for for machine use. In the human-readable part, the certificated identity is in the "data" section and is followed by the digital signature^[?] of the certifying authority.

If an organization uses a flexible format for its certificates, then it may be possible to encode that party's affilliations and delegated authorities into the certificate. The technique is a little like attaching visas to a passport. The X509 standard seems to provide the means for this, but the details need to be checked. This approach works well with the authorization scheme described above in that it delivers to the service provider the full list of affiliations needed to check access. However, there are problems in the implementation. Because the X509 certificate is digitally signed, it becomes invalid if altered to add new affiliations. Hence, each time an individual gets a new affilation, they need a new certificate which has to go away to be signed by some responsible person. Furthermore, the number of parameters in the certificate needed to encode all the affiliations may be large, making the certificate unwieldy.

It is worth noting that the X509 certificate doesn't seem to specify a unique syntax for recording affiliations. The VO would have to define its own usage of the X509 facilities.

It is possible that in the plannable future of the VO most activity on the public and commercial WWW will move to a grid-like operation with identified users. Monetary transactions and the need to fund the WWW by micropayments make this evolution likely. In this case, every computer-using person on the planet will be using an on-line identity, and it would be sensible for the VO to be using the same scheme for indentification. Currently, X509 certificates are as good a guess as any at the form that this will take.

Authentication

Simple schemes

All authentication systems work by challenging the party presenting the identity in question. The challenge is a question to which only the owner of the identity could know the answer.

The simplest challenge is a password. The combination of a username (i.e. a simple representation of identity) and a password check for authentication works well for transations between one user and one service. The simple password-system works less well when each user works with many services: in this case, the user either has to remember many passwords or the services have to use the same password. Getting separate services to use the same password is difficult, since the password database has to be copied securely to each service. In a dynamic VO where new users are added often, the administrative load of synchronizing passwords could be excessive. It would be better if the data services didn't need to hold copies of the password database.

In principal, authentication can be centralised to a single service that holds the password database. This seems attractive for a grid system in which the VO itself can provide the authentication service. However, it is difficult to ensure that the party whose identity was proven at the authentication service is the same party pushing for access to the other services; "Man in the middle" attacks are possible. Microsoft's passport system, in which MSN handle authentication on behalf of on-line merchants, suffers from this problem and is considered seriously flawed on this account.

Authentication using public-key cryptography

A better scheme would delegate the password check to the challenged party, removing both the need for the data-service to hold the password database and the need for a third party. The X509 certificates allow this.

In the X509 scheme, authentication is done using public-key cryptography ^[?]. The certificate presented by a party to establish identity contains that party's public key. The challenge is a phrase chosen at random by the challenger. The party challenged encrypts the challenge phrase and with his/her/its private key and sends the cyphertext back to the challenger. The challenger decrypts the cyphertext: if it matches the the plain-text of the challenge phrase then the challenger knows that the challenged party has the private key matching the public key in the certificate. Since only the owner of the identity is supposed to have the private key, the authentication is then successful. Note that the challenger does not need any prior record of the challenged party to make this authentication process work.

Clearly, an attacker could generate a fake certificate by making up a new public/private key-pair, and the challenge process would not distinguish this from the true certificate. To prevent this, the certificates must be digitally signed^[?] by a party that the challenger trusts. The signature has two effects: it prevents an attacker from making up a convincing fake certificate; and it makes the certificate invalid if its contents are changed in any way. The latter point counters an attack where the attacker copies a valid certificate, keeps the signature but changes the certificate to show a different identity.

The X509 system requires a transport mechanism by which the challenge phrase and the response to it can pass between the challenging and challenged parties.

The scheme described above authenticates a user of the service to the service's "gatekeeper". The gatekeeper may also have an identity, and a certificate and may be authenticated to gthe user. This mutual authentication would be important in ecommerce, but it seems to be of little use in e-science.

What Globus does

Globus ^[?] encodes identities for (human) users and (software) keepers of resources. It uses X509 certificates for this. The Globus toolkit will work with any X509 certificate for which the certifying authority is listed in a configuration file written by the person who installed Globus. That is, the set of certificates accepted by a site using Globus is determined by the person setting up that site. Most early Globus sites get certificates for both users and gatekeepers from the HQ of the Globus project.

In Globus, gatekeepers are trusted to protect their identities (i.e. keep their private keys private), but end-users are not. Users' private keys are kept off-line and are not used directly in the challenge process. Instead, a user is required to "log on to the grid" once per day. In this log-on process, the user enters a pass-phrase to temporarily release the private key and the local Globus software makes up a proxy certificate with a new public/private key-pair. The proxy is used for that day's work and the original private key is not used again that day. The proxy certificate has a lifetime of 24 hours, so it matters less if it is compromised.

Globus programmes use mutual authentication through the Grid Security Infrastructure (GSI)^[?]. This means that Globus users have to be able to authenticate gatekeepers; this often causes problems in new installations. GSI presumably provides the transport mechanism for the authentication but I do not know how it works in detail.

Globus' authorization scheme is very crude, probably too crude to support a VO. At each node providing a grid service, there is a gatekeeper, and each gatekeeper has a file mapping DNs from the user certificates to UIDs on the local computer. This file is set up manually; there is no provision for synchronizing the map-files between nodes of one site let alone between sites. The scheme seems to be designed for compute grids where each service has dozens of privileged users, rather than a data grid where a service has thousands of users.

Globus provides no accounting facilities whatsoever.

It is likely that the Globus project will move to provide better authorization facilities and at least some accounting. It would be convenient if they could be persuaded to adopt a scheme designed for the VO.

Tentative suggestions for the VO architecture

The VO should use Globus GSI scheme for authentication in order to remain compatible with other grid projects. This means that the VO is working with distinguished names encoded in X509 certificates and has a decent chance of remaining interoperable with the commercial web as that moves to identified operation.

The VO must arrange that all its users certificates are trusted by all its parts. Exactly how this is implemented in chains of CAs is an open question.

The VO should not encode academic affiliations in the certificates. The hierarchies used in the authenication chain are very unlikely to match the affiliations that matter for authorization of resource usage, and the latter affiliations are likely to change more often than the natural (one-year) life-time of the certificates. Changing VO-wide certificates is likely to be painful and slow. Hence, the certificates should carry only the unchanging identity.

The mapping from affiliations to authorities on a particular resource should be distributed so that each service provider retains control of over their own service. Providers of read-only, public data can then ignore the whole issue.

Storage of results, and later access to them is a service that may be provided by the VO directly. For this service, the VO should provide the mapping directly from identities to authorities, recognising that most authorities relate to individuals not to groups. VO users should have the ability to grant access to their data to others; the VO should provide software to make these adjustments.

The VO should provide a look-up service mapping DNs to lists of affiliations. The data-services would call this service to resolve a user's identity into affiliations. Tracking users' changes of affilliations as they move from project to project would then become the VO's responsibility. This implies that affiliations are encoded in some standard way.

The VO should handle the absence of a certificate (and hence of an identity for a user) as a special but common case that grants enough authority to read public data. This allows users to work casually and anonymously in the way that they do currently on the WWW. Since the potential number of anonymous users is high (all the amateur astronomers in the developed world), the VO should provide some kind of "flood control".

Since the use of some resources (e.g. time on compute-farms) may be charged against an account or against a pre-agreed quota, there must be some way of controlling this usage. The simplest and most robust way seems to be to delegate to the service providers. The VO should not try to regulate usage of charged resources centrally, but should rely on the service providers controlling access to these resources. The services should provide the VO with a way of finding out the current access "charges" (e.g. how much quota is left for the current accounting period) so that the VO can help the user choose the best service for a given job.

Topics for investigation

Find out how GSI works. Identify any performance and security flaws.
Find out if there are viable alternatives to GSI. In particular, look for alternatives that may become commonplace in the commerical WWW.
Find out how the Globus project (and the global Grid community) plan to handle authorization and accounting in the future.
Determine how to arrange signing of certificates so that they work throughout the VO.
What use is mutual authentication? Does it slow things down? Can we turn it off if we use GSI?
What are Globus proxies really needed for?
Find the best structure for the identity/affiliation/authority tables.
List the affiliations to be encoded.
Define codes or some other format for recording affiliations.
Suggest possible implementations for the affiliation look-up service.
Suggest methods to apply "flood control" to anonymous users.

References

The globus project: http://www.globus.org/
Grid Security Infrastructure: http://www.globus.org/security/
Background information on public-key cryptography: http://www.verisign.com/repository/crptintr.html
Explanation of digital signatures by David Youd: http://www.youdzone.com/signature.html
The X.509 standard: http:/www.itu.int/ITU-T/asn1/database/itu-t/x/x509/1997/README.html