Description of files: The simulated data sets are in ASCII files with a `*.sim’ name. The two null distribution files contains 100,000 data sets each, while a file with data simulated under one of the clustering models contains 10,000 data sets. Each data set is given on a separate row in the file. A data set is represented by the randomized number of cases in each of the 245 counties, listed in the same order in which they are listed in the geographical coordinates file. The geographical coordinates file (neast.geo) contains one row for each county, and three columns giving first the county name, and then the x and y coordinates. The population file (neast.pop) contains one row for each county, and three columns giving first the county name, then the census year (1990) and lastly the population size. The name convention for the files with simulated data from the different clustering models are: *rural*.sim = single hot-spot cluster in the rural area around Grand Isle. *mixed*.sim = single hot-spot cluster in the mixed rural/urban area around Pittsburgh. *urban*.sim = single hot-spot cluster in the urban area around new York City. *two*.sim = two hot-spot clusters, in rural and urban area respectively. *three*.sim = three hot-spot clusters, in rural, mixed and urban area respectively. *ch2co*.sim = global chain clustering using twins and constant distance. *ch2ex*.sim = global chain clustering using twins and exponential distance. *ch2co*.sim = global chain clustering using triplets and constant distance. *ch2ex*.sim = global chain clustering using triplets and exponential distance. The second star in the file names above reflects the number of counties in the cluster for the hot-spot models, and the distance between twins/triplets in the global chain clustering. For the latter, a distance of 0.005 is denoted by `hh’, a distance of 0.01 by 01, a distance of 0.02 by 02, and so on. File names that begin with `6’, contain data sets with 600 cases, while the remaining files contain data sets with 6000 cases.