shaney crawford DOT com

コンテンツ流通基盤技術論
Shaney Crawford
2003.05.19

Open Directory Project and Digital Libraries

オープン・ディレクトリー・プロジェクト と ディジタル図書館

Definition and Background

  • What is the Open Directory Project (ODP)?
  • How did it come into being?
  • How has it changed over the years?

定義と背景

  • ODPってなに?
  • 背景は?
  • 始まったころからどういうふうに変わった?

In the late 1990s, the internet became so large, it was necessary to create a way to make searching it less daunting a task. Two solutions dominated the search algorithms: robots and directories. Yahoo, a young upstart internet company, created one of the first directories that received wide use. While this solution satisfied searchers for a while, it soon became apparent that Yahoo's management were paying more attention to advertising revenues than its web directory. The Yahoo staff couldn't seem to keep pace with the growth of the internet, and link rot became a problem for the links that were already included.

The ODP was created in June 1998 in response to the shortcomings of Yahoo's directory. Rich Skrenta and Bob Truel came up with the idea of getting volunteers to index the internet. If they just had more manpower, they reasoned, the directory could keep up with the growth and decay cycle of the internet. Furthermore, the results of this human-powered directory would be made available to any person or company free of charge.

The ODP was originally called Gnuhoo, which joins the concept of the "Gnu" open source concept and Yahoo. "Gnu" aficionados eventually balked at the use of this name, as the directory's results were available free of charge, but the actual source code of the directory-creation system was not. In addition, when the directory was bought by Netscape in 1998, the name was changed to "Open Directory Project" to avoid any legal implications of using a name so obviously similar to "Yahoo".

How to...

  • Choose a Category
  • Become an Editor
  • Learn the ontology

どうやって...

  • カテゴリーを選ぶ
  • エディターになる
  • スキーマを覚える

It is very easy to join the ODP team. First you have to choose a category to edit. A suitable category might be one that you feel you have expert knowledge in, one that you think could be more complete, or one that desperately needs an editor. On the bottom of each page of the directory, there is a list of the editors for that page. If there is no editor, you will see "Volunteer to edit this category" or "This category needs an editor".

Clicking on that link will start the application process. During the application process, you are asked to read the ODP guidelines and then fill out a form. The form includes a section in which you must list three sites that could be added to the category you want to edit. You have to list the address, give the site a title, and give a brief description of the site. The important thing here is to have read the guidelines so you don't make mistakes that are clearly noted as wrong in the guidelines.

If you are able to list three sites and describe them according to the guidelines, and you are applying for a category that clearly needs some help, it is likely that you will be accepted as an editor. If you fail to follow the guidelines, or if you apply for a high level category, it is likely that you will be declined. The categories are organized heirarchically, so it is generally wise to pick a category that appears as the last branch of a tree (and even better if that category doesn't have many entries).

Once you have been accepted as an editor, you will probably have to spend some time learning the ontology (heirarchy and rules) before you start working on the directory. The basic work of a volunteer includes adding and deleting sites, and writing or rewriting titles and descriptions (metadata).

Special Characteristics

  • Preferred Terms
  • Fora
  • "Get Your Cat Checked"
  • Promotion System
  • Tools for Editors

特徴

  • シソーラス
  • フォーラム
  • カト・チェック
  • 進級制度
  • エディター・ツール

There are many resources to help editors make the right decisions. First of all, there is a list of preferred terms to help in writing the descriptions. This helps clarify words that can be used in category names, site titles, and descriptions.

There is also a very large world of fora in which to meet other editors and ask questions. It is generally recommended for new editors to read several threads in the fora before they start to make any major changes. Also, after a new editor has made a few changes, it is a good idea to go to the "Get Your Cat Checked" thread and have a senior editor look at your work.

Senior editors are people who have been editing for a while and who have assumed a greater level of responsibility in the heirarchy. As you can see from the above "Tsuchiura" category, you could progress from editing "Tsuchiura" to "Ibaraki" to "Prefectures" to "Japan", etc. Once you have added a number of sites and tidied up the original descriptions, you can apply to edit a different or higher category. This promotion system works well because it motivates the volunteers to try harder, and also gives a clear indication of the abilities of the senior editors. You can look editors in the ODP and check how many "edits" they have completed and how many categories they are currently editing. This way, editors can get a good idea of who to ask questions when they come upon a difficult choice.

Tools for Editors

Official Tools

  • General Tools (input, searching, output)
  • Bookmarks
  • Watchlist
  • Help Wanted / Greenbusters
  • Create Profile / Find Editor / Editor Feedback

Unofficial Tools

  • ODP Toolbox

エディター・ツール

公式ツール

  • 一般的なツール(入力、検索、出力)
  • ブックマーク
  • ウォッチリスト
  • 求人・グリーンバスター
  • プロフィール作成・エディター検索・エディターにメールを送る

非公式ツール

  • ODP ツールボックス

Once you become an editor, you are given a userid and password which lets you into the innards of the ODP. The main editor interface is called the "dashboard". In addition to the fora, there are many tools provided, both official and unofficial to help with the daunting process of organizing the vast Internet.

First, it's necessary to describe the basic work of being an editor.

After you have been assigned a category, the first thing you must do is go through the category and clean up any mistakes you find. Often there are either spelling mistakes or inappropriate words in the description. "Inappropriate words" means that for that category the certain words shouldn't appear, such as a place name in the regional section of the ODP. (The regional section is organized by location, so it is redundant to include place names.)

Once the original sites and descriptions are in good shape, the process of adding sites to the category begins. There are two ways for this to happen. If you are lucky enough to be working in a category that has many submitted sites, you can start by looking in your "unreviewed" list and adding those sites. The descriptions of these submitted sites almost always need to be cleaned up because the people who submit the sites don't always read the guidelines. If there are no unreviewed sites, you have to search the internet for appropriate sites to add. This is generally the case for many regional categories, as most sites are submitted topically rather than regionally (except, perhaps, for real-estate agents).

Once you find a site that would make an appropriate addition to the category, you select the category from your dashboard and enter the site's address into the blank space provided. The ODP servers will automatically search their databases for any mention of that site. If the site has already been placed in another category, you have a choice of leaving it where it is, telling the person who added it to remove it from that category, and/or adding it to your own category. Regional editors will sometimes find that the sites they submit have already been placed in a topical category. It is generally agreed that most sites can be listed once in Regional and once in Topical (leaving out the multilingual "World" section for the moment), so the choice is up to the editor. In the case of Japan-related sites, it is generally a good idea to include them in the regional section even if they already appear in the topical section, as almost all Japan categories need some beefing up.

As you submit a site, you are prompted for the address, the title of the site, a description, and a note to other editors regarding your decisions. Certain notes are automatically generated, such as additions and deletions of sites, so the notes section is generally used for explaining the reasons why a certain action was taken. It often includes information about the reliability of a site or the rationale for placing it in one category over another. These notes are visible to all ODP editors who enter that site into the ODP framework.

Once you have finished entering the information, the site gets automatically added to the ODP. This means that the site will show up when people directly search the ODP, but it does not mean that all search engines that use the ODP data have their data updated automatically. The ODP updates immediately, but third party search engines do not update their ODP-connected databases every day. Some of the larger ones seem to take between a few weeks to a few months to pick up the new data.

Often you will find a site that is inappropriate for your category, but a search of the ODP shows that it is not included anywhere else. This is what the Bookmark system was created for. Bookmarks are given to every editor to do with as he or she pleases. In the regular ODP, there are rules and guidelines that must be followed strictly. However, the rules do not apply to the bookmarks section. For example, you are not allowed to list your own sites in the regular ODP, and you are certainly not allowed to place a "cool" mark beside them. In your own bookmarks, however, you can if you want (although it is unlikely that you would want to). Anything goes in the bookmarks, so this is an area in which editors can experiment. Many editors use it to build up categories before they put them into the official ODP. Or they use it to list the sites they are personally affiliated with, with appropriate notes, so that anytime an editors enters that address into the ODP, he or she can find out which editors are connected to that site. It is also used to collect new sites for a category that you might be thinking of applying for. When you apply for a new category, you have to list three appropriate sites (with proper titles and descriptions) in order to show your understanding of the category guidelines. Many people build a collection of sites in their bookmarks and make a link to the bookmarks on the application form rather than listing the individual sites on the form.

There is a system for counting the number of "adds" you make. Promotions within the system depend heavily on these statistics. For every original site that you add to the ODP, you get credit for an "add". For every site you edit, you get credit for an "edit". "Adds" and "edits" are good, and "unrevieweds" are bad. If you have too many "unrevieweds" in your category, you will not be allowed to apply for a new one. You can look up the statistics for any editors in the ODP through the "Find an Editor" function (discussed below).

Another important part of being an editor is monitoring your categories for dead links. If you find that a site is regularly down, you can add it to your "watchlist" and set a reminder time on it. This removes the link temporarily from the ODP so that you can monitor its progress. If the problem is only temporary, the site can be returned to its category. If it is a permanent error, the site will have to be deleted.

As stated above, if you find a site that is inappropriate for your category, but you feel that it should be added to the ODP, you can store the site in your bookmarks until you decide what to do with it. This is a good "career move" because adding the site, even to your own bookmarks, gives you the credit for the original "add". Furthermore, you can give the site a title and a description, write a note about it, and forward it directly to the appropriate category. If you have editing privileges in that category, this action will automatically place the site in the ODP. If you don't have editing privileges in that category, the site will be placed in the "unreviewed" section of that category for the proper editor to deal with.

There are as many kinds of editors as there are people. Some editors like to stick to their own little corner of the ODP and make a perfect nest. Others like to flit around and make additions to several different, often unrelated, sections. For those editors who feel they need a challenge, there are the "Help Wanted" and "Greenbusters" functions. The "Help Wanted" section lists all categories that are desperately in need of an editor. This is usually because the number of submissions is too great for the current number of editors. This happens often in "Personal Websites" and "Music" and "Movies", along with a number of other topics and regions. Often it is difficult to imagine what other areas of the ODP you might like to edit, so this section serves to give direction to editors with a bit of extra time on their hands. Greenbusters is similar, but with a twist. If you apply for a "Help Wanted" section, you will get full editing privileges (if you are accepted). However, the "Greenbusters" section also lists categories that are in need of attention, but this system allows you to edit the titles and descriptions of sites within a certain category, but not make the final decision on publishing them to the category. The point of this system is to let editors help each other out, to a certain point, but to leave the final decisions up to the main editor. (I have never applied for a Greenbusting category, so I am writing based on my understanding of what I have read about this system, not my experience, so I may have left out some details.)

The final official functions are the ability to create a profile for yourself and to find and contact other editors. These functions are not exactly related to the process of finding and fixing websites on the ODP, but rather provide a kind of social setting for the editors. Some editors add photos and biographical data to their profiles, while others just leave the default data (categories, number of adds, edits, and unrevieweds). By using the "Find Editor" function, you can locate editors who are higher up on the hierarchy and contact them through the "Editor Feedback" system, which is an email system that doesn't let the sender see the address of the editor he or she is contacting. These systems are all designed to protect the privacy of the editors to the extent that they desire. Information that editors wish to publish can be placed in their profile, and communication can take place with no personal information being revealed.

This ends the description of the official tools of the ODP. The average editor is satisfied with these general tools. However, advanced editors may take advantage of some "unofficial tools" that are provided on an "as-is" basis. These tools have been created by editors who felt the need for advanced tools. I have never used these tools myself, but a quick look at them shows the kinds of advanced options that are available. (See image below.)

Statistics

  • Sites
  • Categories
  • Editors
  • Who uses it?

統計

  • サイト
  • カテゴリー
  • エディター
  • 利用

The ODP data is used by individuals (via www.dmoz.org) and by search engines who download the raw data. Some search engines that use the raw data include Google, AOL, Netscape, Lycos, AlltheWeb, and AskJeeves. Around 500 other organizations also reportedly use the data.

Strong Points

  • "Humans Do It Better"
  • Sheer volume
  • Co-operation
  • Objectivity (?)
  • Self Monitoring

長所

  • 「人の手がいい」
  • 協同作業
  • 客観性 (?)
  • 自己検査

The success of the ODP is the success of its volunteers. There are many reasons why human really can do it better. Humans can monitor the quality of sites by checking for relevance, appropriateness, and location of services. They can also reduce the number of gratuitous mirror sites, squatting pages, and obscene images that searchers are exposed to.

The huge number of volunteers also puts the ODP miles ahead of other directory-style search engines. It is unlikely that any commercial venture will be able to top the current volunteer staff of over 50,000 people. The level of co-peration that can be witnessed in this project is truly impressive. It may very well be humanity's largest co-operative effort to date.

The fact that the staff is made up of volunteers helps to ensure the objectivity of the directory. No one gets any material benefit from listing or not listing a site, so it can generally be considered objective. However, with 50,000 people to keep track of, it is impossible to know if everyone is actually being 100 percent objective. There are certain self-monitoring practices in place to keep people honest, but it is possible that a few editors decided to join the project to further their business interests, or quash the competition. It seems reasonable to suggest, however, that the majority of the volunteers have no ulterior motives other than to meet new people, participate in a huge co-operative effort, and get the satisfaction of a job well-done.

Criticisms

  • Infighting
  • Link rot
  • Not enough people (?!)
  • Not a strict ontology
  • Topical vs. Regional
  • Volunteer system
  • Methods for site submission
  • Now owned by a private company
  • Can the ODP keep up with the growth of the web?
  • English-centric

短所

  • 内輪もめ
  • リンク切れ
  • 人の手が足りない?!
  • スキーマが厳密でない
  • 話題(件名) 対 地域
  • ボランティア制度
  • サイトの提出は自動的でない
  • 民間会社のものになった
  • ウェブの成長に対応できる?
  • 英語(アメリカ)中心

While the ODP is certainly a worthy and worthwhile project, there are a certain number of criticisms that must also be considered.

As with any group of people, the ODP team has its share of infighting. People get upset because they are not chosen to be editors, or their request for a new category is denied, or their actions are called into question in a forum thread. There is a certain amount of animosity towards the ODP due to a number of vocal former editors.

While the ODP was created in order to stave off link rot, it can be seen that this pervasive phenomenon has not been completely eradicated. Even with such a huge number of editors, it is not possible to keep every single category completely clean of dead links. Perhaps it is fair to say that they ODP does as good a job as can be expected, when the overall fluidity of the internet is considered.

Believe it or not, 50,000 editors is not enough. A brief wade through the regional pages in particular will show just how many categories are begging for editors. If 50,000 isn't enough, how much is? (There is also reason to doubt that all 50,000 are actively editing.)

From the perspective of a librarian, the structure of the heirarchy is a bit of a nightmare. Editors try to keep the structure as simple and intuitive as possible, but it is just not possible to avoid problems like this from popping up here and there:

This example demonstrates a serious flaw in the design of the heirarchy. Rather than using either a topical or regional "main entry" system or finding a more elegant solution, the topical and regional structures exist in parallel. This can pose serious problems, as a single website might fit into at least 4 different locations:

Topical
Regional

World:Topical
World:Regional

For example, the website for the Tsukuba Public Library could be listed under:

Top: Reference: Libraries: Public
Top: Regional: Asia: Arts and Entertainment: Libraries

and

Top: World: Japanese: 各種資料: 文化施設: 図書館
Top: World: Japanese: 地域: アジア: 日本: 茨城: 市町村: つくば市: 地域社会・文化

(Notice the different categorization under topical and regional systems.)

There is a system of cross-referencing (called "@links") to prevent this kind of thing from happening, but it is not always successful. There should be a more elegant solution to this problem. One that comes to mind would involve attaching metadata to the sites which allow for both topical and regional entries.

The fact that the system is run by volunteers means that no demands can be placed on the staff. There is no way to speed up production or ensure quality control standards. The ODP has to work on a "take what we get" principle.

This "take what we get" concept also relates to the submission of sites. There are some categories that receive a huge influx of sites, so much so that editors can't keep up with the rate of submissions. But the average editor has a much more difficult problem -- that of coming up with sites to enter. Editors have to be skilled in data mining techniques in order to bring certain categories up to a respectable level. That means that sites that are not either submitted by the creator or "happened upon" by the ODP editor remain a part of the "invisible web". There must be a way to automate at least a part of this process, so editors can spend their time editing rather than mining.

One of the most serious criticisms of the ODP is that it is not entirely volunteer-based. It is owned by one of the largest media companies in the world: AOL (Time Warner). The ODP seems to have been able to maintain its autonomy since its purchase in 1998, but can that continue indefinitely? Furthermore, if the volunteers realized that the founders (Skrenta and Truel) sold the ODP to Netscape for a hefty profit, would they be as willing to donate their time?

While the above chart seems to show that the ODP is keeping pace with the growth of the web, it is likely not really true. In particular, the number of sites in languages other than English is increasing rapidly. The ODP is an American project owned by an American company with a very America-centric (i.e. English-centric) view of the world. That can be seen clearly in the structure of the ODP which relegates non-English languages to a level in the heirarchy below English. In order to truly represent and encompass the internet's size and philosophy, this English-dominant thinking must be re-considered.

Is there a future for the ODP?

ODPには未来があるのか?

The ODP used to be considered the "next big thing". It was touted as the answer to Yahoo's sloppiness and the seeming randomness of spidered results. Crawlers, and in particular the Google algorithm, have vastly improved the keyword search results of non-directory methods. Directories are not used as much as they used to be, and they are often considered only as a last resort, as a backup plan after a keyword search has failed. Google often gives valid answers within its first page of results, and it is very rare to come upon obscene content or blatantly incorrect hits within that first page.

If I were to predict the future of the ODP, I would say that on its present course, it is doomed to fall by the wayside. If, however, it improves the description system by following a fill-in-the-blank metadata option (region:____, topic:____) AND does something to enhance its multilingual function, I would say that it still has a fighting chance of being a useful tool in the second 5 years of its existence.

Connection between ODP and Digital Libraries

Digital libraries are concerned with the...

  • Selection
  • Acquisition
  • Organization
  • Access
  • Preservation

...of contents. Use the ODP concept to improve digital libraries

ODPとディジタル図書館

ディジタル図書館はコンテンツの...

  • 選択
  • 獲得
  • 組織化
  • アクセス
  • 保存

を気にするので、ODPの概念を利用して、ディジタル図書館を改良する

The ODP concept can be used to help with selection, acquisition, and organization of digital materials. While improvements are necessary, the submission/mining system of selection seems to have worked well in getting over 3 million sites into the directory. As these sites are only being linked, the actual acquisition of the materials is not necessary. And, with improvements to the heirarchy, the organization of the vast public space that is the internet might just be possible to complete with human hands.

References

参考文献・サイト

1Belle, Jeff
ODP Shakes Up Search Directory Content Revenues
EContent February 2001
web.archive.org/web/20010310080728/http://www.ecmag.net/EC2001/belle2_01.html
Accessed through Wayback Machine April 29, 2003

2DMOZ Monthly
Christmas 2002, Issue 12
(I'm not sure if this is in the public domain.)

3Open Directory Project
dmoz.org

4O'Neill, Edward T., et al
Trends in the Evolution of the Public Web 1998-2002
D-Lib Magazine April 2003
dlib.org/dlib/april03/lavoie/04lavoie.html
Accessed May 14, 2003

5Prenatt, David F.
Life After the Open Directory Project
Traffick June 1, 2000
www.traffick.com/story/06-2000-xodp.asp
Accessed May 14, 2003

Mr. Prenatt's response to my article: groups.yahoo.com/group/xodp/message/2092

6Rumsey, Eric
Beyond ODP: The Hardin Meta Directory
Traffick July 5, 2000
www.traffick.com/story/07-2000-hardin.asp
Accessed May 14, 2003

7Search Engine Watch
www.searchenginewatch.com

8Sherman, Chris
Humans Do It Better: Inside the Open Directory Project
Online 2000
www.infotoday.com/online/OL2000/sherman7.html
Accessed April 29, 2003

9Tennant, Roy
Digital Libraries in a Nutshell: The California Digital Library
http://escholarship.cdlib.org/rtennant/presentations/fvrl2001/digital_files/digital.ppt
Accessed April 29, 2003

Notes

メモ

I am an editor for

See also: