I have a random forests model with which I am trying to predict species presence or absence.
This is my code:
#read in dataframe containing observations of species presence/absence & predictor variables mydata <- read.csv('mydata.csv') #fit random forests model fitmodelA <- randomForest(SPECIESA ~ var1 + var2 + var3 + var4 + var5 +var6 + var7 + var8 + var9 + var10, data=mydata, mytry=3, ntrees=500, replace=T, importance=T, keep.forest=T) #predict to new data predictmodelA <- predict(fitmodelA, newdata, type="prob") # save as raster image writeRaster(predictmodelA,"predictSPECIESA.tif")
Apparently I should get back a matrix that has the probability for both classes, i.e., in two columns. Do I understand correctly that it is also possible to add the "index" argument to predict only one class?
With my code the way it is, my output raster produced one layer with probabilities 0 to 1, but no other attributes – what class is this predicting? I am more interested that my map show predictions of presence rather than absence.
Probably a simple solution to this…? Thanks!
Best Answer
I don't understand your confusion at all. When you choose type="prob" the output is, naturally, the probability of each class. So you should be getting two columns. You can simply choose either one by doing predictmodelA[,1] or predictmodelA[,2] in your two case example. Since the R vanilla implementation of randomForest needs a factor variable for the case of classification, the order of these probability columns will just follow the order of the factor levels.
By the way I don't think your last bit of code will get you to the raster file you want. You would first need to determine it's coordinates, so you need to add them to your prediction vector, for example:
raster_species <- data.frame(species_prediction=predictmodelA,x=X_coordinate, y=Y_coordinate
then explicitly tell R which variables are the coordinates (I'm guessing you are using the raster package):
coordinates(raster_species)<-~x+y
Indicate that it is gridded
gridded(raster_species)=TRUE
and the simply label it a raster
raster_species <- raster(raster_species)
you can then plot it and save it
plot(raster_species) writeRaster(raster_species, filename="raster_species.tif", format="GTiff", overwrite=TRUE)