BigQuestions.Com

The Basics
bigQuestions.com searches web sites and library catalogues for words which signify the "big" questions of philosophy, religion, and science. It stores these words and their contexts in a searchable database, which at present consists of approximately 65,000 library and 45,000 web entries. Periodically, a new word is added to the bigQuestions list, and the software will now search for this new word along with the old. The current list is as follows:

bigQuestions.com has two main windows-- Real time and Search-- which you get to by clicking on the appropriate button at the bottom of this page.

Real time
bigQuestions is first of all a data mining project. In the real time window you can watch the activity of the software which searches--mines--the web for the "big" questions. It displays both the raw results of the current search and the progress of the code which is doing the search. In the code display, you can "step" through the lines of the software code and see the actual workings of the search. The data mining software is not always active, but when you click on the "Update" button in the real time window, you will get an actual run-through of the next iteration of the search. (For more information, see help in the real time window.)

Search
One of the basic interests of bigQuestions.com is how culture and language contextualize the big questions in ways and usages

Typically, you don't search for the big questions themselves but for something else that interests you. A search for "automobile" would find this entry from the National Library of Scotland, where it is associated with "hell":
National Library of Scotland;z3950.nls.uk:7290/voyager 
     020 ISBN: 0720704634 
     050 LC call number: GV1029.8 
     100 author: Carrick, Peter. 
     245 title: All hell and autocross - more hell and rallycross Peter Carrick 
     650 subject: Automobileracing.
If you click on the link to the National Library of Scotland you will be able to search through their catalogue for any other related (or unrelated) listings. The numbers to the left of each line--020,050, etc.--are standardized codes for each field of the catalogue record.

A search for "rain" comes up with this web hit, where we get a traditional use both of the metaphor of rain and of the terms fate and misfortune.

http://shestov.by.ru/pc/pc1_4.html
What are the gifts of fate worth if it can presently take them back and make misfortunes rain on our heads?

You can click on the link and the web site turns out to be devoted to a Russian existentialist who died in 1938.
(For more information, see help in the Search window.)

The Concept
As I say in The Basics, one of the central interests of bigQuestions.com is how culture and language contextualize the big questions. But the introduction of the 'real time' component speaks to wider concerns.

The Net and Architectural Space
Looked at from the experience of the I at its center, the Internet, like architecture, exists at the intersection of public and private space. The comparison with architecture is particularly apt when its space is most massive and, like the cultural space of the Internet, cannnot be taken in all at once.

There are two aspects to this analogy--the intersection of public and private, and the contrast of scale between the individual and the whole. bigQuestions.com attempts to address these issues.

Scale comes into play in the data mining process, where we are aware of the massive amounts of data available on the web. At the present time, the database holds about 45,000 entries from web sites and 65,000 from library catalogues. The result would be far greater, potentially overwhelming, if I did not impose limits on the data mining searches.

Matters of scale also come into play in real time mode, where the visitor sees a contrast between the incoming raw data and the local particulars of the code which carry on the search. Moreover, there is a clear cross-over here between the matter of scale and the intersection of public and private space. I think of code as analogous to the interior space of the self and the raw data as an instance of the Internet's massive external complexity. As well, the analogy of code to interior experience and space has a pertinence that may not be immediately apparent, except perhaps to progammers, for whom code can have an aesthetic dimension which can be experienced as shape and form and is therefore a manifestation our inner lives.

The intersection of public and private experience also comes into play when the visitor to the web site queries the database and one's personal interests intersect with the interests that have shaped the database. This intersection is extended when the visitor exercises the option to search library databases or to visit a web site that comes up in response to a query. And in each instance, these intersections of public and private are also reflections of the contrast in scale between the individual and a complex and massive whole.

Data mining

The Web
All of the data mining software is written in Perl. The scripts which search the Web use perl's World Wide Web Library (LWP) to communicate with the web and the HTML::Parser module to facilitate the parsing of incoming web pages. There are two sets of scripts, one which does the actual data mining and one which is used for the real time data displayed in the browser. There is some communication between these two sets of scripts because the real time scripts have to know where the data mining scripts are in their search.

Libraries
  Again, the scripts which access libarary catalogues are written in perl. They depend on the Z39.50 information protocol. More than 1000 libraries around the world which are accessible through this protocol--I use a list of 977.
  Perl has access to the Z39.50 protocol through the Net::Z3950 module. I am a contributor to the Net::Z3950 project and make use of my own Net::Z3950::AsyncZ module for querying large number of library databases asynchronously.
  The numbers to the left of each line of the libarary listings--020,050, etc.--are standardized MARC database codes for each field of the catalogue record. MARC stands for "MAchine Readable Cataloging", format, which was designed by the Library of Congress in the late 1960s in order to allow libraries to convert their card catalogs to digital format.

Databases
  bigQuestions.com doesn't make use of any of the standard databases but uses instead a system created speicifically for the project.
  New data is not inserted into the database immediately on its arrival at the server. bur, rather, periodically. The incoming raw data from both the web and the libraries is filtered through several sets of scripts, which re-shape the raw data for final inclusion in the database. Once new data has been added, the database is re-indexed.

Browser Software
What you see in the browser is created through the use of DHTML--in which the older HTML is combined with Javascript. It's DHTML which makes possible, for instance, the code tracing in the real time window.

Brower Requirements
The design and layout of bigQuestions.com is optimized for resolutions of 1024 x 768 and greater. It will automatically reconfigure for lower resolutions; the effect is smaller viewing areas and additional scrollbars. Notes
  1. The Search features should work in just about any 5th generation browser--IE 5.0 or later, Netscape 6.0 or later. When problems occur, they are in the real time features.
  2. bigQuestions.com is still under development and I hope to address any problems in subsequent versions.
 
     if (@queries = $origtext =~/
         amazement|amaze|astonishment|astonish|awe|wonder|mystery
        |misfortune|fortune|fate|wisdom|ignorance|miracle|marvel
        |luck|doom|heaven|hell|chance|destiny
                     /gxi
                  ) {
              my %hash = ();
              foreach my $q(@queries) {  $hash{$q} = $q; }
              foreach my $q(keys %hash) { 
                   $pat .= $hash{$q} . '|';
              }
          chop $pat;

          }
          else { return; }

       $bigQuestions.com = 
              { Myron Turner =>  room535.org 
                $version     =>  1.0,
                $copyright   =>  2004 
              };