Simulation experiments with SaTScan and rsatscan

In this vignette, I use the space-time permutation scan to show how the rsatscan package can be used to simplify the process of making data in R, running SaTScan on the generated data, and collecting the results, presumably leading to quicker and easier accumulation of results.

I begin by making data on a 10*10 grid of locations, over 30 days. Each day, each location has a 0.1 probability of having a single case.

set.seed(42)
mygeo = expand.grid(1:10,1:10)
daysbase = 30
locid = rep(1:100, times=daysbase)
basecas = rbinom(3000, 1, .1)
day = rep(1:30, each = 100)
mycas = data.frame(locid,basecas, day)

Here’s what the geo and case files look like. I’m using generic time, for convenience.

head(mygeo)

##   Var1 Var2
## 1    1    1
## 2    2    1
## 3    3    1
## 4    4    1
## 5    5    1
## 6    6    1

head(mycas)

##   locid basecas day
## 1     1       1   1
## 2     2       1   1
## 3     3       0   1
## 4     4       0   1
## 5     5       0   1
## 6     6       0   1

Now I can write the data into the OS; the row names in the mygeo data.frame object are the location IDs for SaTSCan, so I’m using the userownames option to use, rather than ignore, the row names from R in the geography file; in the case file, there is an explicit column with the same information included.

library("rsatscan")

## RSaTScan only does anything useful if you have SaTScan-- see http://www.satscan.org/ for free access.

td = tempdir()
write.geo(mygeo, location = td, file = "mygeo", userownames=TRUE)
write.cas(mycas, location = td, file = "mycas")

Now I’m ready to build the parameter file. This is adapted pretty closely from the NYCfever example in the rsatscan vignette.

invisible(ss.options(reset=TRUE))
ss.options(list(CaseFile="mycas.cas", PrecisionCaseTimes=4))
ss.options(list(StartDate="1", CoordinatesType=0, TimeAggregationUnits=4))
ss.options(list(EndDate="30", CoordinatesFile="mygeo.geo", AnalysisType=4, ModelType=2)) 
ss.options(list(UseDistanceFromCenterOption="y", MaxSpatialSizeInDistanceFromCenter=3)) 
ss.options(list(NonCompactnessPenalty=0, MaxTemporalSizeInterpretation=1, MaxTemporalSize=7))
ss.options(list(ProspectiveStartDate="30", ReportGiniClusters="n", LogRunToHistoryFile="n"))

Then I write the parameter file into the OS and run SaTScan using it. I’ll peek in the summary cluster table to see what we got.

write.ss.prm(td, "mybase")
mybase = satscan(td, "mybase")
mybase$col[3:10]

##    X Y RADIUS START_DATE END_DATE NUMBER_LOC TEST_STAT   P_VALUE
## 1  7 1      2         29       30          7  4.407435 0.3311258
## 2 10 9      1         26       30          4  4.203974 0.4569536
## 3 10 7      0         28       30          1  3.560880 0.7152318

As one would hope, there’s no evidence of a meaningful cluster.

Now, let’s add a day just like the others. I’ll stick it onto the end of the previous data, then write out a new case file.

newday = data.frame(locid = 1:100, basecas = rbinom(100,1,.1), day = 31)
newcas = rbind(mycas,newday)
write.cas(newcas, location = td, file = "mycas")

I don’t need to re-assign any parameter values that don’t change between runs. In this case, since I used the same name for the data file, I only need to change the end date of the surveillance period.

ss.options(list(EndDate="31"))
write.ss.prm(td, "day1")

day1 = satscan(td, "day1")
day1$col[3:10]

##    X Y RADIUS START_DATE END_DATE NUMBER_LOC TEST_STAT   P_VALUE
## 1 10 9      1         26       31          4  4.347783 0.4464286
## 2 10 7      0         28       31          1  3.011037 0.9285714
## 3  7 1      2         28       31          7  2.727801 0.9732143
## 4  6 5      0         29       31          1  2.697472 0.9732143

Again, no clusters, as we would expect.

But now let’s make a cluster appear. I create an additional time unit as before, but then select a location to get a heap of extra cases. Glue the new day to the end of the old case file, write it to the OS, change the end date, and re-run SaTScan.

newday = data.frame(locid = 1:100, basecas = rbinom(100,1,.1), day = 32)
newday$basecas[20] =5
newcas = rbind(mycas,newday)

write.cas(newcas, location = td, file = "mycas")

ss.options(list(EndDate="32"))
write.ss.prm(td, "day2")

day2 = satscan(td,"day2")
day2$col[3:10]

##    X Y RADIUS START_DATE END_DATE NUMBER_LOC TEST_STAT     P_VALUE
## 1 10 2      0         32       32          1  7.355883 0.007642534
## 2  7 2      1         28       32          5  3.515149 0.726000000
## 3 10 9      1         26       32          4  3.075375 0.863000000
## 4 10 7      0         28       32          1  2.704465 0.966000000

This demonstrates that I did detect what I inserted. I can also extract the wordier section of the report about this cluster.

summary(day2)

## Prospective Space-Time analysis 
## scanning for clusters with high rates 
## using the Space-Time Permutation model. 
##  
## Study period.......................: 1 to 32 
## Number of locations................: 100 
## Total number of cases..............: 330 
## _______________________________________________________________________________________________ 
##  
## 
## There were 4 clusters identified.
## There were 1 clusters with p < .05.

cat(day2$main[20:31],fill=1)

##  
## CLUSTERS DETECTED 
##  
## 1.Location IDs included.: 20 
##   Coordinates / radius..: (10,2) / 0 
##   Time frame............: 32 to 32 
##   Number of cases.......: 5 
##   Expected cases........: 0.47 
##   Observed / expected...: 10.71 
##   Test statistic........: 7.355883 
##   P-value...............: 0.0076 
##   Recurrence interval...: 131 units

Simulation experiments with SaTScan and rsatscan

Ken Kleinman

2015-02-17