Solving the Problems of Finding Law on the Web: World Law and DIAL
Madeleine Davis and Jill Matthews
Australasian Legal Information Institute (AustLII)
Keywords: indexing, computerisation of law, information retrieval, internet
This is a Refereed Article published on 29 February 2000.
Citation: Greenleaf G et al, 'Solving the Problems of Finding Law on the Web: World Law and DIAL', 2000 (1) The Journal of Information, Law and Technology (JILT). <http://elj.warwick.ac.uk/jilt/
Despite its recent development, the Web already contains an astonishing variety of legal materials from dozens of countries. Significant collections of legislation are already available on the Web from over 50 countries. The full text is available on the Web of all legislation from almost all the jurisdictions of the USA, Canada, Australasia, many Latin American countries and some European countries (such as Norway and Germany), and extensive collections from many other European counties (such as the United Kingdom, France, Spain, Portugal). Substantial collections of legislation are available from many developing countries, including India, Turkey, Kazakhstan, South Africa, Vietnam, Zambia, China, Mexico and Israel.
There are also extensive collections of case law from about 20 countries, particularly from North America and Australasia and some European courts, but also courts from India, Korea, Brazil and other countries. The Parliaments of dozens of countries have Web pages, and these contain many significant resources concerning legislation and law reform. Law reform commissions and similar bodies are starting to make their reports and working papers available via the Web. There are specialist university and other centres which provide very large specialist collections of materials in areas such as constitutional law, trade law, the law of the sea and human rights.
Despite the abundance of valuable legal materials already on the Web, and the rapidity with which these materials are expanding, these materials are often very difficult to find, since they are scattered across thousands of Web sites located all around the world. As we shall see, the tools we have used for internet legal research until now do not serve us well enough.
The Internet is still dominated, in terms of both location of Web sites and location of users, by the developed world of North America, Europe and Australasia. The preponderance of English-language information on the Web is in part a reflection of this.
However, the availability of legal information on the World-Wide-Web is of considerable importance to developing counties, in Asia and elsewhere. Law libraries with reasonably comprehensive and up-to-date collections of legislation, case law and law reform reports, are virtually unknown in the developing world, and the costs of maintaining them are prohibitive. Access to large online commercial services such as Lexis is also prohibitive for most lawyers in developing countries. This is the principal reason that the Asian Development Bank has funded the development of Project DIAL (Development of the Internet for Asian Law) (one subject of this paper): to provide a means by which legislative and law reform personnel can obtain access to comparative legislative and law reform models. It is of equal importance for superior Courts in developing countries to have access to the decisions of similar Courts throughout the world. It is not only the legislation or case law of the developed countries that is of interest and importance: a legislator in Mongolia might find their best model for a Buy-Operate-Transfer (BOT) law from a Kazakhstan Web site, and the Supreme Court of a small Pacific Islands country is likely to find the most important precedents for its decisions in the decisions (currently almost completely unobtainable) of similar Pacific Island states.
Legal information on the Internet is also important to developing countries for a quite different reason: the provision via the Internet of comprehensive and up-to-date information about a country's laws and legal system can be a striking demonstration of the transparency of a country's legal system. This is likely to be of considerable importance to potential foreign investors, both in symbolic terms and in facilitating efficient and low cost legal advice necessary for investment decisions to be made. A recent example is the proliferation of government agency Web sites in Mongolia which provide the legislation that they administer: it would be very difficult and expensive to obtain that legislation by any other means.
There are essentially only two types of tools which help users find legal materials on the internet, commonly called catalogs and search engines.
Catalogs are where individual Web sites are classified by hand according to various classificatory schemes. Because of the human intelligence and effort involved, they are also called 'intellectual' indexes. Usually, such indexes only provide the title, URL and perhaps a brief description of each site indexed. Yahoo! is a well known example of an internet-wide catalog or intellectual index of the Web (i.e. one which is not law-specific).
'Search engine', in the Web context, has become a shorthand way of referring to the combination of a 'Web robot' and the data it gathers, and the text retrieval software used to search that data. A program (variously called a 'Web robot' or 'Web spider') traverses the Web, downloading every page it encounters, so that every word on every page can be converted into one very large word occurrence index ('concordance') which can be searched by the text retrieval software. They can also be called 'automatic indexes'. When the search engine displays a URL as a result of a search, that URL is to the original site, not to a mirror on the remote site. Alta Vista, Northern Light, Hot Bot, Excite and Infoseek are well-known 'general purpose' search engines that search an index created by a Web spider, and where the Web spider goes to all types of subject matter. They have only existed since Alta Vista's creation in 1996. The principle advantage of this approach that it is possible to search every word that has been indexed, not just the titles and brief summary of what is on the site. About 85% of internet users use search engines to locate information (Lawrence and Giles 1999).
Combinations of catalogs and search engines on the one site are now becoming more common, and such combinations are often referred to as 'portals'. Some catalogs are now attempting the automated or semi-automated classification of new Web sites located by a Web spider (for example, Excite Australia ). This approach is potentially promising but has not yet been shown to produce useful systems for legal research.
Despite the existence of these research aids, finding legal information on the internet is surprisingly difficult, partly because neither catalogs nor search engines used alone can provide a satisfactory solution. This is particularly in relation to internet-wide (i.e. not law-specific) catalogs and search engines. These difficulties will now be summarised.
Good catalogs (intellectual indexes) for law are hard to find[5 ]. While there are many multi-country intellectual indexes to law on the internet, none are even remotely comprehensive, and many are US-oriented with only a slight international gloss. Many are updated only rarely if at all. Some very good indexes do exist for particular countries (e.g. Canada, the USA, Germany and Australia), and many exist for particular subject matter areas, but they are often difficult to find from the multi-country indexes. It is therefore difficult to find a good place to start. The coverage of legal materials in general-purpose internet indexes is no more helpful, as an inspection of the limited coverage of legal materials in an index such as Yahoo! (the largest general-purpose catalog) will show.
Catalogs are hard to maintain. As the quantity of legal material on the internet grows, the sites that contain significant legal information grow so numerous, and some are so large, that it is difficult to maintain catalogs at all, and particularly to maintain them with any depth of indexing of each site. The best that can be hoped for is that sites with significant legal materials are identified in the index, even though there is no detailed description of their content. For example, it soon becomes impossible to include in a catalog the content of each piece of legislation, each case, or each journal article included on a large site.
As a result, we can say that catalogs are inherently shallow - even when they are good at identifying important law sites, they cannot index very deeply into those sites.
The main problem of general purpose search engines for legal research is their lack of comprehensiveness, but there are numerous other problems as well.
General search engines are not comprehensive. There are very good internet-wide search engines, but they are nowhere near as comprehensive as people often assume. A July 1999 report in Nature by Lawrence and Giles (1999) shows that of eleven major search engines tested (including Alta Vista, Northern Light, Hot Bot, Excite and Infoseek) the search coverage they provided of the estimated 800M pages on the Web was 16% at best (Northern Light) and down to 5.6% (Excite), with some not named here even lower.
There are a number of reasons, other than the rapidly expanding size of the Web, for the lack of comprehensiveness of search engines based on automated Web spiders. They include the following:
To make matters worse, the percentage of the available Web pages that the search engines are indexing is declining, not increasing, due to the rapid expansion of the Web from an estimated 320 M pages in December 1997 to 800 M pages in February 1999. The decline in the best search engine was from about 33% to 16% of the estimated total size of the Web, a dramatic drop. It seems that the Web spiders of general purpose search engines simply cannot keep up with the expansion of the Web. The technical problems caused by such expansion may also be exacerbated by diminishing financial returns from the costs of trying to keep up. Lawrence and Giles suggest:
'Why do search engines index such a small fraction of the Web? There may be a point beyond which it is not economical for them to improve their coverage or timeliness. The engines may be limited by the scalability of their indexing and retrieval technology, or by network bandwidth. Larger indexes mean greater hardware and maintenance expenses, and more processing time for at least some queries on retrieval. Therefore, larger indexes cost more to create, and slow the response time on average. There are diminishing returns to indexing all of the Web, because most queries made to the search engines can be satisfied with a relatively small database. Search engines might generate more revenue by putting resources into other areas (for example, free e-mail).'
The combined coverage of the search engines tested by Lawrence and Giles was estimated at 42% of the total number of Web pages, a decline from 60% in their 1997 study. They agree that use of meta-search engines such as MetaCrawler will provide a more comprehensive search. However, it seems from their research that the best possible coverage would still only be about 42%, and even then this would depend on maximum coverage and sorting efficiency of the meta-search engine, so the real figure is likely to be somewhere between 16% and 42%.
There are other significant problems with general purpose search engines which stem from the ambitious nature of their task of indexing all information on the Web irrespective of its subject matter:
The conclusion we may draw from all of these problems is that it would be unrealistic to expect general purpose internet-wide search engines to provide a very effective method of a task as specialised and 'non-popular' as internet legal research.
Many significant law sites can't be searched When you do find a site containing valuable legal information it will often not have a search engine at all, so searching at word level is not possible. Of the more than 30 internet sites around the world containing significant quantities of legislation, less than half have any search engine. It requires considerably greater technical ability to run a search engine than it does to simply put pages of legal material onto the internet where they can be browsed.
Using different search engines can be confusing Even if a law site does have its own search engine, users who wish to find legal materials on different sites can also be easily confused by the need to use different search engines with different search commands.
So the problems of finding legal materials world-wide are that it is both difficult to find which useful sites exist for a particular country or subject, and also difficult to find what is on such sites as are known. These research problems are very substantial even for the most expert 'internet savvy' lawyers and law librarians. They are much worse for inexperienced users.
The challenge is to find a new approach to legal research on the internet which will provide an internet-wide (which means world-wide) method of effectively providing access to legal materials available on the Internet, no matter where they are located.
The answer we propose is in part a technical solution, a limited area search engine for law, but to a large extent the success of the technical component will depend on an organisational element, the creation of a multi-national group of collaborators who are willing to make joint use of the technical tools we have developed in order to create and sustain a world-wide legal research facility.
Our approach to reducing the problems of internet legal research rests on these propositions:
The key to effective legal research on the internet may therefore be a tight integration of an intellectually created catalog and a search engine based on a Web spider, a symbiotic relationship in which each builds on the features provided by the other. Intellectual indexing and automated indexing can feed off each other.
The following diagram explains this relationship, from the perspective of the indexer and the user.
Interactions in the use of a World Law/DIAL's limited area search engine
AustLII personnel have developed software and systems to implement this approach:
The approach we are taking requires the development and maintenance of a world-wide catalog created by intellectual indexing effort, which at least catalogs significant national law sites in all countries world-wide and the major subject-oriented resources. A free access law facility such as AustLII, even if it does obtain significant funding support for the task, is unlikely to be able to support more than a couple of legal indexing staff to undertake the task of adding content. The human languages that such staff are conversant with will be necessarily limited. While it is possible to provide a basic level of world-wide coverage with such resources, as can already be seen in the World Law / DIAL ' Countries' page, this will not be fully satisfactory unless regional, national or subject experts are also involved in the indexing process.
Consequently, the way in which we are now envisaging the World Law / DIAL project is that, while most of the organisation of the content indexing and much of the substantive indexing will be carried out by AustLII staff, wherever possible we will invite appropriate authorities from other countries, and other language and subject specialists, to participate in the cataloging process as 'indexing partners' for particular parts of the catalog.
There are a number of ways in which such partnerships can proceed, ranging from a partner periodically emailing lists of proposed URLs to be added to our pages to AustLII's indexers (who would add them to the relevant pages and send the Web spider); to our indexers checking a partner site regularly for any additional links which should be added to our pages; to a remotely located indexer being provided with password access to the editing interface to World Law such that they can add links and change sub-categories in 'their' parts of the index but not elsewhere. The editing interface of World Law is via the Web, so contributing editors can be located anywhere with Web access.
Wherever collaboration occurs, it will be acknowledged on those pages of the catalog where the collaboration is based, by inclusion under the 'Contributor' heading of the logo (or name) of the contributing organisation or person and a link to their Web site. For example, Australia's Department of Foreign Affairs and Trade contributes content to the creation of our 'Treaties and International Agreements' pages, which is acknowledged as shown below.
Some of the advantages of this approach for our indexing partners are that they will obtain additional users through referrals from the World Law pages, and if they wish they will also be able to place CGI search interfaces on their own pages which will search over the full texts indexed by our Web spider, but only the part of AustLII's index relevant to them.
We are only now starting this process of creating partnerships, so it is too early to report on its success. Our first international partnership is being established between Ralph Amissah's Lex Mercatoria site in Norway and World Law's International Trade pages. Other partnerships are being negotiated.
AustLII had been developing a catalog of Australian law sites since July 1995, with some international indexing. The index was maintained as hand-created HTML pages until 1997, at which point it was moved into a mSQL database, and it became possible to search for individual catalog entries. At this point the appearance and functionality was similar to Yahoo!'s approach. In early 1997 Web spider software was customised, and the internet indexing software rewritten so that it could be used to 'target' the Web spider.
The first opportunity for extensive testing of the targeted Web spider was in Project DIAL, a project for the Asian Development Bank which aims to increase the accessibility of legislation on the internet. The Web-spider search facility, DIAL Search, was released on the Web in August 1997 and allowed searches of legislation and legislation-related material from many countries. The World Law / DIAL facilities were demonstrated at the Australian Society of Indexers' Conference in 1997 (Greenleaf et al 1997). The more general version of the targeted Web spider was made available from AustLII's home page as 'World Law Search' in February 1998, when non-legislative material started to be added by the Web-spider. Two test 'Libraries' were added to allow searches to be restricted to legislative materials and to other internet law indexes - the precursor of the search limitation facilities which have now been added.
In June 1999 the Asian Development Bank entered into a three year agreement for AustLII to develop and host the full-scale Project DIAL facility, as a Regional Technical Assistance (RETA) of the Bank. The value of the RETA is approximately A$1 M, with nearly a half of those funds being used to assist and train users in the Bank's Developing Member Countries (DMCs) of the Bank.
During 1999 there has been a major redevelopment of the search facilities in World Law / DIAL, so that the scope of searches can be limited by the location in the catalog from where the user does the search. This enables users to limit their search scope (thus increasing precision) without having to understand any search commands in order to do so. It is the major single technical innovation in World Law / DIAL.
AustLII's World Law Search is one of the first and one of the few examples of what we call 'targeted Web spiders' and others have called 'Limited Area Search Engines'. There are a couple of examples of this technology being used in the field of law.
JURIST launched in January 1998 what it calls a 'Limited Area Search Engine (LASE)' to make searchable all home pages of law Professors and course pages of law subjects in its index.
JURIST uses the Argos LASE which, according to its developers, was 'the first peer-reviewed, limited area search engine (LASE) on the World-Wide Web' at the time of its release in October 1996. Argos was developed to provide a more precise means of searching for scholarly literature on the ancient and medieval world. It's development was prompted by the same shortcomings with Internet-wide search engines as we have identified in the preceding discussion:
'At the time of this writing, a search for 'Plato' on the Internet search engine, Infoseek, returned 1,506 responses. Of the first ten of these, only five had anything to do with the Plato that lived in ancient Greece, and one of these was a popular piece on the lost city of Atlantis. ... Add to this broad range of responses the fact that Infoseek returns ten entries per page, making it necessary to examine one hundred and fifty one pages of entries, many of which are irrelevant to a scholarly search of 'Plato', and the result is a process that is frustrating and inefficient.' Argos LASE
They identify advantages of the alternative, 'targeted' approach:
'By limiting the range of the search engine, a LASE strips out many unwanted references... The result is a higher quality index built for a specific purpose and for a smaller audience. Furthermore, the quality of the index, its purpose and the level of specialization expected of its intended audience are variables that can be manipulated with LASE technology.' Argos LASE
They also note that, because of its limited scope, it is possible for Argos to update weekly the indexes to all the sites it covers, rather than the couple of months that (in their experience) are taken by Internet-wide search engines to update sites. They estimate that this means that 98% of all links found by their search engine work at any given time.
Another example is that Knowledge Basket, the New Zealand Internet content provider, released Legal Search New Zealand in December 1997. It uses the Verity Web robot and Topic search engine, and makes searchable 25 New Zealand law sites at present. The advantages they claim for this approach are similar to those described above < http://www.knowledge-basket.co.nz/kete/nzsearch.html>.
World Law Index, the intellectual index or catalog aspect of World Law has a conventional 'Yahoo-like' interface. The 'World' root page of the index is shown below, and all other sub-categories are located in a hierarchy under 'World'.
Approximately 4,000 law sites are indexed in the catalog at present, with most sites indexed under a number of sub-categories.
Some features of the catalog structure are:
Extract of categories from the 'Australia >> Subject Index' page
Every catalog page lists its hierarchical location in the catalog. Click on any point in the hierarchy to go back to that catalog page.
You can always get back to the start of the catalog by clicking on '>> World >>' (or on 'Australia' in the Australian part of the catalog). If you are in the 'World' part of the catalog and you want to get to a particular country page (e.g. Vietnam), click on '>> World >>' then '>> Countries >>' and then select 'Vietnam@'.
Users can see at any time what content has been added recently to World Law Index (and to World Law Search) by checking the 'New Additions' page from the World Law home page or from the [New] button on the button bars in the system.
Extract from the 'New Additions' page
Editing entries in the catalog also involves the editor deciding whether to send the Gromit Web spider to index every word on the site which has been 'targeted'. The harness program (Wallace) reads the list of instructions from the Web indexing software, and then sends off multiple instances of the Web spider program, each to download the content of a particular Web site. The harness program ensures that only one instance of the Web spider software is ever downloading from a particular site, to avoid saturating that site with spider requests and denying access to other users. The harness ensures that the Web spider is 'well behaved', causing minimum impact on the sites from which it downloads Web pages. We call this a targeted Web spider, as it is not designed to traverse the Web generally: its downloading is limited to the site specified in the original URL which 'targets' it.
Targeting the Web spider to start indexing at the correct page, so that it when it indexes all other pages to which its starting page is directly or indirectly linked, but are equal to or below the start page in the server's file hierarchy, it indexes all and only the desired pages, is a complex task. Some desired sets of data cannot be indexed because of the 'noise' they will bring with them. For others, it is impossible to find an appropriately located 'table of contents' page to use as the 'start page'. Other 'problems of targeting' have also been identified.
Catalog links (URLs) go out of date as pages are moved on remote sites, or the sites cease to exist. If the URL is also the 'target' for the Web spider to start its work, the spider will be unable to do so next time it returns to the site to update its download of pages. To assist the indexers maintain the catalog we have a program called Comet which checks all URLs in the catalog every day and sends a report to the indexers indicating which links appear to be broken.
Another maintenance problem is that the web spider is sometimes sent to a URL which does exist, but for some reason cannot download the intended pages from that starting point. Conversely, sometimes an editor does not notice that by starting the web spider from a particular page it will download far more pages than are intended. To assist in addressing these problems, the program that controls the sending of Web spiders ('Wallace') sends a report to the indexers if the Web spider downloads only a couple of pages from a URL, or downloads more pages than a pre-set limit (currently 1,000).
The search facilities in World Law/DIAL search over two types of content: (i) the contents of the catalog; and (ii) the full text of all the remote pages indexed by the targeted Web spider.
There is now one interface to both the catalog and the search engine, with a search form located at the top of each catalog page (illustrated below). The very unusual feature of the search facilities in World Law / DIAL is that the scope of the search is (in default) limited by the location in the catalog from which the user carries out the search. In other words, if the search is from the 'World>> Legislation' page, it will search over all legislation from any country (but only legislation); if it is from the 'World >> Countries >> Germany' page it will search only over sites related to German law; and if it is from the 'World >> Countries >> Germany >> Legislation' page it will search only German legislation.
Search results are displayed with catalog pages ('categories') listed first, and then with the full texts of remote documents ('documents') listed. Both lists are sorted into likely order of relevance to the search query (relevance ranking).
About 10 GB of text from targeted sites has been indexed to date, providing searches over about 1M pages of legal information. This is a search space about 50% larger than AustLII's Australian databases (approximately 6 GB). To date, the countries from which the largest components come are the .us domain of the USA (over 2 GB - mainly State legislation), non-AustLII Australian sites (over 1.5 GB), the .edu and .org domains and Canada (each about 1 GB), followed by another 3 GB or so drawn from 57 other countries. The emphasis is on legislation, law reform reports and law journals to date (because of Project DIAL), but major components of case law, law school sites and the like are being added.
The scope of a search is limited by where you search from (the context). As in the example below, if a user is at the 'World >> Law Reform' page, the default search scope will be 'Law Reform'. A search from this point will search over all Web sites listed on the Law Reform page, or those on any page which are sub-categories or cross-references from the Law Reform page.
To put it very simply, the way in which the search restriction mechanism works is that the search is first done over all pages retrieved by the Web spider, but before the results are displayed to the user they are 'filtered' by the URLs listed on the page indexed and its related pages, so that only relevant pages are displayed.
To broaden or narrow your search scope, you go to a more appropriate page in the catalog. Context determines search scope.
To search over everything available, the user could go back to the 'World' page (by clicking on 'World') and search from there. Alternatively, no matter where you are in the catalog, you can search everything ('World') by selecting 'All World Law' (from the 'in' option) instead of the default option limiting the search scope, as shown above. At present, the 'All World Law' option is by far the fastest way to search. But the more restricted search scopes may give greater precision. Test for what works best until the restricted scope searches become faster.
World Law uses AustLII's SINO search engine, so all of the search facilities available for searching over AustLII's Australian databases can be used, but this is modified by new interface options. The following options are available:
The default option is 'any of these words'. If you want to do any other type of search, you must change this default.
Where it has been possible to send the World Law / DIAL Web spider to a site, the icon appears next to the listing in the catalog. If you click on the words 'search site' or the icon, then you are taken to a 'Search Site' page which automatically limits the scope of the search to the one site selected. This function allows World Law Search to be used to search specific sites which have no search engine of their own, or have a search engine which does not have the same features as the SINO search engine used for DIAL Search. It also overcomes the need for users to learn to use the features of multiple search engines.
In the following, two of the sites may be searched, but the other two may not:
When a site is selected, a 'Search selected site' page is presented and searches from that page are limited to the site selected:
Results are displayed as shown below. Items are ranked in order of likely relevance. Catalog categories are listed first ('World Law Categories'), with the search over the text of the catalog pages, not just the title of the page.
The following example is the results of a search for the phrase 'financial intelligence' over the whole of World Law.
The percentages given against each item found are based on the most relevant item being given a score of 100% (provided it contains all search terms - otherwise less), and all subsequent items being given a percentage ranking proportional to that, according to their likely relevance.
The search form containing the executed search is displayed at the top of each search results page, so any search can be modified easily in light of the results obtained.
The present method of displaying results relies upon the remote site attaching informative titles to its HTML pages, as it is these titles that are displayed. While most sites do achieve this to a reasonable degree (as can be seen from the above example), some sites fail to provide any title at all, so that only the URL of the site can be displayed in default. This is not informative enough. A number of alternatives are under consideration. One is to display the first 50 words or so of the document, similarly to what is done in Alta Vista. A second is to include the name of the site from which each the document comes (as described in the catalog), or perhaps the name of the catalog page, after the display of the title. This would at least make it easier to recognise which countries and systems particular items are from. These choices have significant processing overheads, and we are concerned not to unduly slow the delivery of search results.
One of our main tactics in creating a sustainable World Law catalog is the use of 'stored search links' (or 'stored searches') in the catalog.
For example, on the Intellectual Property Subject Index page (below) there are various stored searches for different aspects of intellectual property. The hypertext links that appear on the page are each a search of World Law that a legal indexer has created in order to find a general set of documents which relate to the topics of the searches. For example, the link entitled 'World Law search: trade marks and related laws' is in fact a stored Boolean search for 'trade mark or trademark or unfair competition or passing off'. These examples are relatively simple searches, but searches of any complexity can be stored.
Intellectual Property Subject Index page
The significance of these 'stored searches' of DIAL Search in the DIAL Index is twofold:
One particularly effective use of stored searches in World Law Search is to use them to enable users to find starting points for research concerning the laws of a named country. In the example below, the stored search is for 'fiji* or fidji* or fiyi* or fidschi* or figi*' , so as to find entries in most common European languages. The titles of the first few results show the effectiveness of the multi-lingual search. The total of 1366 items show how much is available even for a small country like Fiji.
Because the relevance ranking tends to give short documents and documents that use a search term in a title, many of the internet law indexes that have a separate page for that country will appear near the top of the list, so the user can they quickly review existing intellectual law indexes for that country. Here, the first 15 items found are a mixture of the 'Fiji' pages in other internet law indexes (JurWeb, ICL and ILRG), documents about human rights compliance, and Asian Development Bank law and development project notices.
The Intellectual Property stored searches are an example of how the current 'World >> Subject Index' pages will be developed so that each subject category has a basic set of stored searches that will keep that part of the subject index reasonably current. Resources will be available for more intensive intellectual indexing of some subject index pages, but this will not always be possible, so the use of stored searches will allow a moderate cost 'across the board' extension of the subject index.
In the longer term work will be done on the use of legal thesauri to create large scale sets of stored searches and their distribution through the catalog. Good thesauri are difficult to find on the web.
The resources available on the Web for legal research are biased in favour of English, but there is a very large quantity of non-English language materials if only they can be located. Apart from the inherent value, the availability on the Web of the laws of a person's own country is more likely than most other factors to encourage that person to undertake legal research using other countries laws on the Web.
The key to the development of a genuinely world-wide free access catalog and search facility for law is definitely the formation of 'indexing partnerships' with legal institutions with expertise in other key languages for materials on the Web (Chinese, French, Spanish, German, Russian). Development of the technical capacity to handle different character sets is also required.
As the range of non-English materials searchable in World Laws increases, it is likely to become valuable to be able to limit searches to materials in a particular language. This will probably be implemented by the indexer indicating the language of non-English materials at the time of adding them to World Law Search, with an option for users to exclude or include materials in particular languages.
All pages in World Law have a [Translate] button that takes the user to Alta Vista's automated translation service, provided by Systran translation software and ensures that the Systran page has inserted in it the correct URL for the DIAL page that the user was just viewing (in the example below, the World page). The user then only has to select to which language the DIAL page is to be translated, press the 'Translate' button, and then be returned to the DIAL page translated into the language of choice.
The resulting translation seems adequate to convey the meaning of most of the items on the page.
The World page and search options translated automatically to French by Systran
The Alta Vista/Systran translation facility is at present limited to translations from English to French, Spanish, Portuguese, Italian, or German, and vice-versa. This translation facility is also only a prototype, and sometimes has inadequate processing power to translate very long pages. It is also not recommended to use it to translate documents with complex grammar, or where accuracy is vital (such as legislation). However, for pages such as menus, or lists of search results, it is usually extremely helpful.
The translation facility has uses beyond translations of catalog pages:
The combined result of these translation features for a world index is revolutionary: instead of being an 'English only' facility, it is now effectively available in six of the most pervasive European languages.
Where embedded searches are included on the catalog pages, if it is possible they are being constructed as multi-lingual searches. For example, on the Mozambique page, the embedded search for 'Search World Law: Mozambique' is in fact a search for 'mozambi* or mocambiq* or mosambi*'. Similar multi-lingual searches by country names have been placed as stored searches on all country pages (more than 200) in World Law.
At present the languages used cover most common European languages, with translations constructed using the Eurodicautom service.
Translations of country names in a number of Asian languages will be added next to the embedded searches. As these searches are permanent, broad searches of all the documents indexed from time to time, an investment of time in the construction of such searches is probably more effective than equivalent time spent indexing sites in detail.
Internet 'portal sites' have been described as designed to perform two functions (Hinton 1999, Chapter 3): (i) providing users with the range of tools they need to find the content they want; and (ii) obtaining large audiences so as to generate revenue, typically through advertising (and the user surveillance it usually involves). World Law aims to be a portal for legal information, but it is a perverse one, because it is based around the idea of free access to Web resources and does not depend on advertising or individual subscriptions to sustain its development and maintenance.
If World Law succeeds it will be because the technical infrastructure that we are creating, and the collaborative environment within which it is used, is capable of attracting the interest and cooperation of others around the world with a commitment to the provision of free access to legal information via the Internet, and an expertise in some part of the Internet's rapidly expanding legal content.
One aim of World Law is to build an audience for legal research resources on the Internet which goes beyond the normal users in law firms and law schools of the developed world, and provides a facility which is also valued and used in the developing counties of the world as an affordable and genuinely international resource for legal development.
First DIAL training session for Mongolian teachers from the Legal Retraining Centre, Ulaan Baatar, held at the University of Melbourne on 9 July 1999.
The Asian Development Bank's funding of the training component of Project DIAL is a significant experiment in achieving this broader goal. DIAL involves in-country training of government lawyers (and others as resources permit) in seven Developing Member Countries (DMCs) of the Bank: Vietnam, Philippines, China, India, Pakistan, Indonesia and Mongolia. The DIAL training team will involve creation of a team from eight countries: local trainers in each of these DMCs, a Regional Training Coordinator from the Philippines, and the DIAL lead consultants at AustLII. 'Train the trainer' courses are planned to begin in Mongolia, the Philippines and Vietnam during 1999, and in the other countries in 2000. In addition there will be on-line training materials that anyone can access, and DIAL-user email list where anyone trained in DIAL use will be able to receive online support from the DIAL training team in any aspect of Internet legal research.
Greenleaf G (1998) Developing the Internet for Asian Law - Project DIAL (A feasibility study and prototype (Asian Development Bank, January 1998, 156 pgs) - at < http://www2.austlii.edu.au/~graham/DIAL_Report/ >.
Greenleaf G, King G, Mowbray A, Austin D and Matthews J (1997) 'Future-proofing a global internet index by a targeted Web spider and embedded searches' Australian Society of Indexers Annual Conference 'The Futureproof Indexer' 27-28 September 1997, Katoomba, Australia - at < http://www2.austlii.edu.au/~graham/Futureproof/indexers.html>.
Greenleaf G, Mowbray A and van Dijk P (1995) 'Representing and using legal knowledge in integrated decision support systems - DataLex WorkStations' Artificial Intelligence and Law, Kluwer, Vol 3, Nos 1-2, 1995, 97-124.
1. For more details see part ' 2.1. The potential 'world law library' on the Internet' - in Greenleaf 1998.
2. See < http://beta.austlii.edu.au/links/World/Countries/Mongolia/Legislation/index.html> for examples.
3. 'Universal Resource Locator' or internet address of a Web page.
4. For the detailed arguments see ' 2.2. Finding law on the net - Why is it so difficult?' - in Greenleaf 1998.
6. See < http://beta.austlii.edu.au/links/World/Other_Indexes_and_Search_Engines/> for many examples.
7. This part draws on part 2.4. 'Automated indexes and Internet-wide search engines' - at < http://www2.austlii.edu.au/~graham/DIAL_Report/Report-2.4.html> - in Greenleaf 1998.
9. In 1996 it was claimed that Alta Vista only indexed about 10% of the pages of moderately large Web sites (600 to 6,000 pages in the example cited), and not denied by Alta Vista. Alta Vista now claims to index sites without any limit on pages. The claim by John Pike, webmaster of the American Federation of Scientists, and the reply by Alta Vista are available at < http://www5.zdnet.com/anchordesk/talkback/talkback_11638.html>and discussed in 'The Alta Vista Size Controversy' on Search Engine Watch.
13. For example, on Alta Vista, a search for Vietnamese legal materials requires a search which is limited to materials which are located on a server in Vietnam (the 'domain:vn' delimiter) or contain 'Vietnam or Viet Nam' - and this is still somewhat hit or miss.
17. Geoffrey King (to 1998), Daniel Austin, Philip Chung and Andrew Mowbray have all worked on the software and systems side of this project.
18. Developed by Daniel Austin. Geoffrey King developed a previous version.
20. Developed by Daniel Austin.
21. Developed by Andrew Mowbray.
22. Rachel Romana of CD Asia and her colleagues.