Friday, April 27, 2012

Mapping other lives - part 2

(Continued from part 1)

"One cannot observe without a theory, and what seems the simplest of ornithological tasks - to go out of doors and look out for something worth recording - is in reality one of the hardest… It is a mistake to imagine that complete impartiality and freedom from preconceived ideas is the qualification for the perfect observer. The cow has a remarkably open mind, yet it has never been found to reach a high degree of civilisation." - Max Nicholson (1929)

Every data gathering exercise should have clear aims, have a clear vision and method. Max Nicholson captured the essence of this principle in a chapter aptly titled  'How to Observe' in his 'The Study of Birds' (1929).

Observing a single individual organism

If one were to get periodic observations of an individual organism, whose biology is unknown, over a period of say a month. One might see a picture of dots distributed in space. Here is a hypothetical map, the red spots being locations where our mystery organism was found and the blue line indicating a river.

If you knew that these dots were sampled over a few weeks and had some idea of the scale
  • You could tell that your organism is reasonably mobile
    • you might be able to get some idea of how far they travel each day
  • If you knew the time at which those points were taken, you might be able to say something about their habit
    • perhaps they go regularly to water
    • perhaps there is a place where it rests
    • perhaps it never crosses the river 
All those are hypotheses and everyone including so-called non-scientists come up with them and unconsciously make estimates of how reasonable each idea might be without actually using any pompous terms for those actions. (Actually many other animals demonstrate  "scientific reasoning")

It is clear that our map is a rather crude representation of reality and that we might be jumping to conclusions. So let us continue recording the locations of our mystery organism.

Based on the additional data we might now be reasonable in assuming that our mystery organism moves along fairly definite routes. Because we still do not know much about the times at which they are present at various points we cannot tell much about their habits. Perhaps they follow a fixed daily routine. If it went up to the river edges in the mornings, one could hypothesize that the animal goes there to drink water. We still know so little about the terrain. Remember what I mentioned in the first part about maps simplifying things by hiding reality.

So what happens when our map reveals that there is a large rock in the centre? We now refine our idea of the animal and its movement. It seems to avoid moving over the rock, perhaps that would expose it to the view of predators.

Observing many individuals of a species

Individual organisms usually have a home range and will sometimes defend a territory. This is the region in which they live and obtain what they need within it. No organism will spend more energy than is worthy. No species, not even a bird, will fly a 100 km just to feed on a berry. The net gain has to positive. When too many individuals live in the same area there are conflicts and they will space themselves out by defending individual territories and pushing others out to the edges of their own if the distribution of resources is uniform enough.

Taking our mystery organism example again. Suppose we looked at a large area and looked at all the places where our mystery organism occurs and map it, we might get something like this.
Now you might actually be able to strengthen our case for the animal preferring to be close to rivers. You may also be able to make a stronger case for their inability to cross the river.

What might we be able to say about the species as a whole. What is its distribution? Here are just two possibilities based on the earlier map.
A bounding polygon

Based on distance to river or habitat suitability

The regions that we marked are not places where the species occurs, but rather specifies a region within which the species is highly likely to occur. Now the second map takes into account something about the animal's life history, proximity to water, which the map on the left ignores.

The likelihood of occurence of a species is not uniform. Indeed most modern maps no longer use a single flat colour and instead use shades. The situation is a lot more difficult to deal with when we look at the distribution of animals at the scale of a country the size of India.

Observing over time

There are also great changes in the distributions of organisms over time and seasons. There might have been a time when the distribution map of a tiger or elephant could well have shown all of India shaded. Today we have hemmed such species into pockets. So the habitat per se could well be good for a species but its likelihood of occurrence is altered by anthropogenic factors. There can also seasonal changes.
Shrinking distribution of Sarus Crane (Sundar, KSG et al. 2000)
Pretty but not accurate

We decided from our example that distribution maps essentially indicated likelihood. How is that probability figure arrived at? Why are maps in black and white if probability can vary continuously from 0 to 1? The reality is that most maps that one sees are more artistry than science. This is rarely admitted but perhaps understandable given that most of these books are targetted at touring birders rather than those interested in longer-term aspects of ecology or conservation.

Here is a comparison of the distribution maps of a "fairly common" species (within its range, whatever that may be!) - now Parus cinereus (earlier included in Parus major as P. m. cinereus, P. m. stupae etc. ). This is one of my favourites, simply because it should be a good target to test "citizen science" ideas. One can see a range of ideas on the distribution of the species from fairly simple ideas from 1969 to very complex ones in recent times. One would expect the more complex maps to be better justified. Kazmierczak and Van Perlo (2000) indicate spots for what they think of as "outliers". None of these published maps actually share the underlying point records, making it hard to make any judgement on accuracy.

Ali & Ripley (1969) - "Handbook".Volume 9

Grimmett, Inskipp & Inskipp (1998)
Kazmierczak & Van Perlo (2000)

Rasmussen & Anderton (2005)

The nature of spatio-temporal data

Data association with locations and time can be of a range of types. In the previous part on history we noted the early geological mapping work of "Strata Smith". What he noted was that the pattern of strata in the soil showed continuity. If location A and location B had identical strata, then a point halfway between A and B was most likely to match the pattern as well. This is something that simply does not happen in animal distributions.

There can be good habitats with a high density of a particular animal and there could be similar habitat nearby with a few nearby, but suitable habitats between them do not neccessarily have an intermediate density. The mechanics of animal population movements is more complex. A population that is breeding at one location could result in younger animals dispersing out of it in search of good habitats in the vicinity. So it would be a bit like diffusion. Animal populations can also change over a fairly short time period. The original population being present itself is a part of biogeographic history. The whole idea of population density is a derived one and the only way to make that measure appear smooth over space is through the use of computational approaches. There are a whole slew of computational approaches for this - the commonest involving the use of triangulation, nearest neighbours, kernels or splines (the 3D version known as a thin-plate spline). Because the choice of method affects the result, it makes sense for raw data to be preserved for posterity with any work in this area.

Rainfall is a feature that is best displayed as an interpolated surfaces and not as points
Weather and other such large-scale phenomena are good candidates for smoothing. Interpolation of weather data on a large scale has been found to be reasonable except in hilly areas where there are extremely local effects due to topology. Those working on systems like this would benefit from a reading of the methodology used in the creation of WorldClim. The India Biodiversity Portal has some rather  confused maps in this regard. When a climatic layer is selected, it produces a scatter of points. One would have thought that there would be some continuous data layer (at least monthwise averages associated with those points, although this is already done by WorldClim). If this site is to be usable, the best option would be for the system to take raw data from the met stations and generate interpolated grids. These grids should be precalculated and one should be able to query the weather parameter for any day-month-year, month-year, or month (average over a year range). Minimal bending thin-spline based interpolated layers would be particularly nice to have. Precomputing them at a resolution appropriate to the density of stations would be nice. If such layer data is available, it would be possible for a researcher to examine relationships between species occurrence and climatic factors.

Because spatial data can vary over time, any system that works on them should allow for a time period to be specified. So a typical website would have sliders for start year (date) and end year (date) that allow one to make use of a filtered set of records. Biological data also shows cyclic annual patterns, and so data aggregation by month and week within year would be desirable.

The mighty sparrow

A classic work by Edward Tufte warns us of the lie factor in graphics. The size of markers lie about the density of sampling. A sparrow is a tiny bird and using a large marker obscures the sparsely distributed samples seen in this 2012 website created ostensibly to study the House Sparrow in India.

One should be cautious about the size of markers used.
The website does not make the data available (although it claims it will), nor does it indicate a hypothesis, method used or underlying assumptions. Given the data being gathered and the non-standard comments received, it is hard to imagine what the expected outcomes are. The site claims that the exercise has been organized by the BNHS. The BNHS is a rather ancient (if not archaic) organization that even at the time of founding decided to keep away the average Briton living in India. The opening page of the first Journal issue is telling:
...was founded on the 15th September 1882 by seven gentlemen interested in natural history, who proposed to meet monthly and exchange notes, exhibit interesting specimens, and otherwise encourage one another. The subscription was purposely made little more than nominal, ...
As a club largely for gun-wielding English officers, its Indian members were chiefly from the royalty belonging to princely states within India (who were allowed to hunt while ordinary folk rarely came in touch with the wildlife). After Independence, it remained largely a club of upper-class Mumbai residents. During its existence, there have been few major activities outside of its member-clique. Despite this elite membership, its journal has been subsidised by tax-payer funds (MoEF). In the years of its existence, numerous surveys have been conducted and along with its spin-off organization, SACON, many questionnaires have been posted to its members. The aims of these surveys, the data collected and the results have rarely been made available. Of course one cannot question private surveys circulated within a private club. The matter however becomes serious when public money is spent on them.


In the case of this sparrow website, the about link has little information to offer. The potential contributor is  not provided any education, training or insight into the project which apart from other considerations can lead to the collection of data of dubious quality. A sparrow is a small flocking bird and the distributions of flocks are patchy. If the project aims to find habitat associations, then one should have been asking for more specifics on the habitat where they were seen. Because this survey is also looking at historic data, it would require historic conditions to be captured. This is not an easy problem and at the very minimum the aims and methods followed could have been discussed. Improvements could have been suggested if it was open to citizen comment. The project appears not to be a long term one either and in spite of being so badly designed has actually been funded by the Ministry of Environment and Forests (the extent of which is not even revealed). And it is rather unfortunate that the BNHS does not even add its own museum records to the database! The spatial display merely hides the poor design and even this display leaves too much to be desired. Compare the quality of display in a system like eBird or even the Indian Rail monitoring website.

Google Maps are very often used for showing point records, while this might be arguably easy to program, it is something that should be avoided. In fact if a system was merely required to store point records and show them on a map - a very straightforward way would be to install MediaWiki - transfer some of the basic templates needed and one can have spot maps like the one here on White-tailed Iora. MediaWiki security settings can of course be altered. An extra benefit of using something like this is that the traceability of records and their modification are automatically handled.


Simpler citizen engagement projects for the BNHS to attempt


The BNHS could try small exercises in science before attempting something complex. In terms of data collation, there are numerous things that they can do on their own - things that any other organization of its kind would have taken up as a routine activity. Here are a few that have the organization should have done a long time ago.
  • Digitize all the specimens held in the collections - make a big Excel File / consider using Google documents - spreadsheet - with species, collector, date, location, determiner, sex, other information, length, culmen, tarsus, wing, tail etc. - a line per specimen
  • Photograph all the specimens and labels carefully and make these available online as well
  • Digitize all the ringing records and ring recoveries - georeference locations - as above
  • Scan all the old archival literature and records and save them for posterity and make them online - convert to PDF and upload to the Internet Archive 

All the above activities can easily be done as "citizen science" - one would just have to let in interested citizens into the premises and they could take up the tasks above. This would cost nothing and earn the organization some credit and would help change it from a colonial club to a serious modern organization. Making such data available would be more helpful than conducting badly designed surveys.

Observing many species over time

Keeping track of the geographical distributions of all bird species over time should help track many other patterns. Do species co-occur? Are there species showing a mutually exclusive pattern? What is the species richness at various places?

What about abundance? Can one get a relative measure using non-standardized and low-effort approaches? Does the reliability vary across species?

Naturally, there are so many questions to be asked ? Many attempts have been made to do such studies. SACON once sent out questionnaires to everyone asking them to count birds on the birthday of Salim Ali. Many of us sent in data but nothing useful came out of it and in fact it generated only scepticism in future survey participation.

Working with raw data

Just to show what an average computer user with access to the Internet can do I consider about 260 georeferenced point occurrence records for Parus cinereus in the BirdSpot database - either sight reports published in journals, egroups or locations extracted from the labels of specimens in museums (museums in India refuse to such data even upon request - and that is another story!). There are definite biases in reporting, more records noted in winter and southern India has a better published record of the species. Additionally there is no information on non-occurrence - no data points recording the absence of the species. Proving absence is a lot harder. The data is spread out over a century or more and they are all taken together here.


The map here is made using basic high school geography techniques and does not use a GIS. And here is a little experiment on the way to see how markers can mislead the observer about the sampling intensity. I have Google Earth installed and so using the button "Google Earth" and changinge the maker size we get.


It is also very easy to work with the open source statistical system R. If you have good Internet access, you can always add new analytical packages.

I have the point records for Parus cinereus in a comma-separated file with the format:

Long, Lat
72.6,23
77.5,27.5
77.58,12.98
77.2,28.5
75,26.5
...
79.47,29.4
Inside the R console you provide commands as follows to do a quick plot.
 # Do this once to get the extra packages
install.packages(c('raster', 'rgdal', 'dismo', 'rJava', 'fossil', 'vegan', 'ape'))

# load the library at the start of the session
library(maptools)
# use the world map outlines
data(wrld_simpl)
# plot the outlines (don't care about your country borders, the other animals don't either)
plot(wrld_simpl, xlim=c(60,100), ylim=c(0,40), axes=TRUE)
parus<-read.csv("PATH/TO/DATA/points.csv", header=TRUE)
points(parus$Long, parus$Lat, col='red', pch=20, cex=0.75)

Now coming back to the Parus cinereus occurrence records. What can be extracted from such a set of points? The first and roughest approach to finding the boundary distribution of a species can be derived using what is called a convex hull. You can see that it can greatly overestimate the distribution range of a species.

With the data set above in R one can quickly plot it using the following snippet:
# we need to provide the data as a matrix to find the hull
hull<-chull(cbind(parus$Long, parus$Lat))
# add the initial point at the end to allow a closed polygon
hull <- c(hull, hull[1])
lines(cbind(parus$Long[hull], parus$Lat[hull]))

Minimum bounding polygon / convex hull for Parus cinereus

The following diagram of a "minimum convex bounding polygon" (Sundar et al. 2000) is not. It is one of many possible bounding polygons but is not convex. And although it looks like a much tighter boundary, this is not unique. Indeed one could push in many other edges until they touch points. This is not based on an algorthmic approach and is therefore not "repeatable", although it is possible that this region is more accurate and guided by the intuition of the author(s).

Sarus distribution - spot records and a "manually chosen" bounding polygon

Another technique to look at animal distributions is through the use of what is called a "minimum spanning tree". This allows one to think of how the populations of organisms could have movements between them. Note that this is simplistic and does not take into account rivers or other hurdles that an organism may not be able to cross. It could however help identify disjunct distributions through automated procedures. A long connecting edge in the MST that exceeds a certation distance threshold (dependent on the mobility of the organism) could be cut out to identify population clusters. A computationally more complex approach but one worth trying is a Steiner tree - which would introduce new nodes (=potentially new areas to conserve) that help identify a better network of connections between populations or protected areas. These algorithms require a distance matrix and are affected by zeros in it. A quick fix therefore involves the removal of duplicate records. So here we go with R again:

# remove duplicated points
dups<-duplicated(cbind(parus$Long, parus$Lat))
parusuniq<-cbind(parus$Long[!dups], parus$Lat[!dups])
# calculate distance matrix (here Euclidean)
dm<-dist(parusuniq)
# identify mst
library(ape)
tree<-mst(dm)
# use library fossil to find and plot the mst
library(fossil)
mstlines(tree, parusuniq)
Which yields this spot map with an overlaid minimum spanning tree.

A minimum spanning tree is a way of indicating potential gene-flow between populations

One can also see spatial density using smoothed plots. Note that this merely gives the density of records available, but could be more representative if we have some indicator of sampling intensity incorporated into this.


smoothScatter(parus,xlim=c(60,100),ylim=c(0,40), axes=TRUE)
plot(wrld_simpl, xlim=c(60,100), ylim=c(0,40), axes=TRUE,add=T)
A simple density scatter plot. The kernel smoothing procedure bleeds into the sea but that can be masked.


These methods above do not take into account the habitat of the species. There are number of approaches to checking associations of species with their habitat and coming up with propositions on what they like or what limits their distribution. This requires additional data for climate, vegetation and abiotic measurements across the region of interest. These are usually available as what are called raster files. One well known set that is used widely in ecology is called BioClim. The datafiles are large and split into tiles. About 4 tiles need to merged together to cover the Indian region. You need to know a bit about how to convert band-interleaved data files into grid files (you can do this easily and also explore this with the free GIS - DIVA-GIS). Note that the environmental data layers may not be entirely suitable for examining distribution. Unfortunately such environmental data layers are not found at sufficiently high resolution, especially for the Indian region. Additionally, environmental conditions change and historic data would ideally need to be compared with historic conditions. Satellite and weather data for the Indian region are not easily available even to researchers. The government charges for some of this unlike the situation in many other parts of the world. The BioClim dataset is one that is free but its limitations should be understood and projects like the India Biodiversity Portal need to improve the quality of data layers of this kind.


The mechanical parts of running the analysis are a lot easier than grasping the theory, its benefit and pitfalls. Once you have it all in place, you can use the R package dismo to continue our exploration of mapping the possible distribution of Parus cinereus. As mentioned earlier distribution maps have been traditionally show in black-and-white when in fact they should be in shades-of-grey due to the probabilistic nature of the data. To give an idea of it, I have used the dismo package to generate a "bioclim" model and done a prediction for just a part of India (due to memory constraints). Now I am sure most will agree that making such tools more easy to use via a website would make sense. Note here that exploratory data analysis is an important aspect of science and one that should be shared with citizens in so-called "citizen science" projects. Showing cooked results is what scientists do when they submit papers to journals. Citizen science requires the recipe to be available.

# once you have downloaded the bioclim data for your region
layers<-c('PATH/TO/FILEbio1_asia.grd',
'PATH/TO/FILEbio1_asia.grd',
'PATH/TO/FILEbio2_asia.grd',
'PATH/TO/FILEbio3_asia.grd',
'PATH/TO/FILEbio4_asia.grd',
'PATH/TO/FILEbio5_asia.grd',
'PATH/TO/FILEbio6_asia.grd',
'PATH/TO/FILEbio7_asia.grd',
'PATH/TO/FILEbio8_asia.grd',
'PATH/TO/FILEbio9_asia.grd',
'PATH/TO/FILEbio10_asia.grd',
'PATH/TO/FILEbio11_asia.grd',
'PATH/TO/FILEbio12_asia.grd',
'PATH/TO/FILEbio13_asia.grd',
'PATH/TO/FILEbio14_asia.grd',
'PATH/TO/FILEbio15_asia.grd',
'PATH/TO/FILEbio16_asia.grd',
'PATH/TO/FILEbio17_asia.grd',
'PATH/TO/FILEbio18_asia.grd',
'PATH/TO/FILEbio19_asia.grd'
)
# make a stack of the bioclim layers
# note that they all have to be of the same region
library(raster)
predictors<-stack(files)
# build a bioclimatic model
bc<-bioclim(predictors,parus)
# predict for southern India alone
ext<-extent(71,83,8,17)
pb <- predict(predictors, bc, ext=ext)
plot(pb) # plot the probabilistic distribution
library(maptools)
# add the map boundary
plot(wrld_simpl,add=TRUE, border='dark grey')
# mark the actual point records
points(parus)

A probabilistic distribution map predicted using bioclimatic variables
There are numerous other techniques and a particularly elegant technique is  one based on the Mahalanobis distance. Unfortunately the R dismo implementation appears to be poorly done and even when unique points are provided the algorithm runs into trouble as it finds the covariance matrix near singular. An iterative power/deflation/NIPALS method would have been much faster and just a few principal components should have been computed. The method of course is easy to understand and implement on your own. Needless to say, the whole technique which is so ubiquitous in industrial statistics across the world seems to be hardly known within Indian academia. An irony given that the method itself was born in India, when Nelson Annandale, the director of the ZSI came across a problem and suggested it to P C Mahalanobis, the famous statistician, who came up with an elegant solution.


The R package "dismo" for distribution modeling is something that all biodiversity website designers should examine - that is of course assuming that they have already received a training in the basics of ecology particularly those topics dealt under the umbrellay of macroecology.  This should be compulsory reading even for software developers involved in any such project. From an end-user perspective such functionality has to be present on any biodiversity website/portal.

Summary

Numerous techniques exist for examining organism distributions over large scales using large amounts of data. The data is hard to collect and cannot be done within the limited world of academia where faculty strive to keep their careers and students flee by in their quest for careers. With failing achievements within that system the only hope then is outside of it - and it is little wonder that "citizen science" has become a buzzword. It however has many connotations to it - in particular - the use of citizens as instruments to gather data for professional scientists to advance their own careers is not one that can be seen positively. I personally suspect that low HDI countries will not be able to work effectively in cooperative projects. One has to be at a situation where self-preservation is not important in order to do good science and the extreme skew in the availability of resources to citizens in India does not do any good. Many projects claim to be equitable but the language used often belies the claim. (I have written about this in the past with regard to the Cornell eBird project. The response to it can also be found, and eBird  website represents a system that should be considered as the minimum standard for anyone to achieve in their own projects) Attitudes of superiority of compiler over contributor can be detected in the design of user-interfaces, the features and policies used. Such attitudes have no place if citizens are to be employed. Non-inclusive "collaboration" is doomed to fail in the long run and inclusiveness cannot coexist with "exclusive membership" in club-like/clique-oriented organizations such as the BNHS. Public money spent on private clubs has to be carefully monitored.

The idea of taking so much space here is to point out that that there is so much to the life of an organism that can be told by putting a collar on an animal or using an electronic gizmo to spy on it. Instruments have a way of generating a large amount of data. Humans can also be instruments for gathering such information. The gathering of data is itself fraught with risks and every detail counts. Gathering and looking at large amounts of data may require the use of computers - but - reasoning will always have a place and that has to be shared freely. Science requires clear reasoning even if it has to be labelled as "citizen science". If traditional science restricted reasoning within universities, then citizen science would necessarily have to make that  reasoning public. As citizens we cannot afford to be "open-minded cows" and we certainly should not be treating anyone claiming to be a scientist as a holy one.

Postscript

(May 2, 2012)
In examining the history of informed consent in medical practice one comes across a guiding principle 

“Every human being of adult years and sound mind has a right to determine what shall be done with his own body; and a surgeon who performs an operation without his patient’s consent commits as assault for which he is liable in damages.” - Schloendorff

In essence I think "scientists" working with "citizens" (the two classes being entirely arbitrary division by misguided professionals who have forgotten that science is an essentially egalitarian enterprise) should think of a similar contract in large scale data gathering exercises going under the umbrage of "citizen-science":

“Every human being of adult years and sound mind has a right to determine what shall be done with his own knowledge contribution; and a person  who performs an operation on that knowledge without the consent of the contributor commits an assault for which he is liable in damages.”

(July 3, 2012)
Here is a BTO report on sparrow declines which gives the kind of approaches one expects a typical study to use and the kind of results to examine. An average citizen like me can expect scientists to aspire to the level set in this study.

21 December 2014: Someone recently wrote to me asking about the use of Voronoi tesselations. If you are mapping a species at a very high resolution, this can be useful to examine home-ranges from point locations. This would be especially useful for territorial species. Let us say you had the locations of all singing Magpie Robins, you could generate a rough division of their home-ranges/territories even without having to establish the boundaries by assuming that they are divided mid-way between neighbouring birds. If that assumption works you could use the library deldir to plot dividing lines. As an example

library(deldir)
x <- 1000*runif(10)  // your longitudes can go here
y <- 1000*runif(10)  // your latitudes can go here
vt <- deldir(x, y) 
plot(vt, wlines="tess", lty="solid", add=TRUE)

Reference 
  • Ali, S and S D Ripley (1968-) Handbook of the Birds of India and Pakistan. Edition 2. Volumes 1-10.
  • Brown, J.H. and M.V. Lomolino.1998. Biogeography (2nd ed). Sinauer
  • Brown, J.H. 1995. Macroecology. University of Chicago Press, Chicago.
  • Richard Grimmett, Carol Inskipp and Tim Inskipp (1998) Birds of the Indian Subcontinent. Oxford University Press.
  • Kazmeirczak, K and Van Perlo, B. (2000) A fieldguide to the Birds of India, Srilanka, Pakistan, Nepal,Bhutan, Bangladesh and the Maldives. OM Book Service
  • Rasmussen PC and JC Anderton (2005) Birds of South Asia. The Ripley Guide. Volume 1 and 2. Smithsonian Institution and Lynx Edicions.
  • Sundar, KSG; Kaur, J; Choudhury, BC (2000). "Distribution, demography and conservation status of the Indian Sarus Crane (Grus antigone antigone) in India". J. Bombay Nat. Hist. Soc. 97 (3): 319–339.
  • R package "dismo" (Along with an introduction to distribution modeling
  • An older presentation made to the BNHS (on the need to trash elitism and clique-ish behaviour) 
  • A biodiversity mapping project from Australia
  • PlotKML package - http://gsif.isric.org/doku.php?id=wiki:tutorial_plotkml

Friday, April 20, 2012

Mapping other lives - part 1

1805 caricature of Napoleon and Pitt slicing up the world (source)
Maps attract us. They make us feel confident of our place. The history of maps is a history of mastery over the land (and the sea) and it was often just that attitude that drove the creation of many of the early maps. Maps also simplify the world around us, letting us pick specific aspects and ignore the noise (and perhaps music too) of reality. Tremendous amounts have been written about on the history of maps- the evolution of cartographic knowledge, new tools and new ways to depict information. But this note is not about maps of humans or their constructions, but deals instead with our attempts to map other life-forms, particularly those of birds. It is however well worth having a historical perspective.

Maps have been the tools of conquerors, one of the biggest cartographic enterprises in the world began at St. Thomas Mount in Madras (on 10 April 1802) followed by a march to Bangalore close on the heels of Tipu Sultan's defeat. The great trigonometrical survey made its way around the country putting a figure for the height of Mt. Everest. Once the view of the world has been simplified, it paves the way for the construction of roads, communication lines, the mass movement of peoples and the extraction of resources - all tools of the master.
Punch caricature (1892) of Cecil Rhodes connecting the Cape and Cairo by telegraph

Given our colonial origins, it is not surprising that maps were hidden away from the general populace, and in fact even the current government upholds these old laws and tried to keep maps away from citizens. We are informed that making them accessible would allow "enemies" to bomb critical locations or otherwise do damage! Whether information is value-neutral is always debatable but it seems like information for all or information for none is more equitable and therefore more democratic than information accessible to a select few tyrants, terrorists or government officials.

The simplification that maps are of reality makes them intrinsically political. Drawing boundaries to simplify administration make them even more so. But when did maps get employed in biology. The colonial period coincided with (in fact, actually drove) Victorian natural history - the collection of biological specimens from across the world. They were pickled, skinned, sketched, described and most of them made their ways to a curator at the British Museum or a wealthy private collector who could place material collected over years, across seasons and locations in one place and see if there were patterns to them. And patterns they did find. P L Sclater split the world into six zoological regions, Alfred Russel Wallace found short but strange gaps, Alfred Wegener discovered a jigsaw puzzle and strange long-distance similarities in fossils gave rise to the idea of a dynamic earth surface with Darwin's idea of evolution playing upon it. But these were large scale patterns both in terms of the areas of the regions involved and the period of time involved. The earliest geological map of Britain was produced by little known William Smith and the production of it involved sampling of the ground at various places and looking at the angle and layers of earth, a habit which earned him the nickname of "Strata Smith".

In India too, these Victorians began to look at the fauna, flora and geology of small regions - William Henry Sykes for instance concentrated on the Deccan region ("Dukkun") - he collected numerous animals in the course of his "statistical" surveys and sent them off to Britain where names were given to them. Sykes' Crested Lark - a bird found only in peninsular India bears his name. The Victorian statistical movement was a data gathering enterprise that grew alongside with natural history and the collection of specimens. Supporting these were "Learned societies" and museums, where information was documented, specimens were deposition and ideas discussed. The learned societies were mostly in Britain and enlightened Victorians living in India found this annoying due to the time it took for letters - and this led to the creation of local versions like the Asiatic Society of Bengal and the Madras Literary Society along with museums and journals.
Wallace's map of the jays (1895)

There are few maps in any of the early faunal works covering the Indian region. Even the second edition of The Fauna of British India (or "new fauna") has none except perhaps one for informing the reader on the region covered. Even a work like Henry Seebohm's The geographical distribution of the charadriidae (1888) or Geographical distribution of British birds (1893) is devoid of maps. Perhaps it was expensive to print illustrations giving priority to the pictures of animals. Alfred Russel Wallace however needed to include a map to illustrate his thesis in his 1895 Island life or the phenomena and cause of insular faunas and floras.

Charles Minard packed information on troop size, weather, dates, routes and river crossings in one tiny map in 1869
 The French or at least Charles Minard is credited with one of the most information packed graphics of the time (1869), so I thought the French naturalists may have been more map savvy, but a quick search (with  help from Google translate for guessing keywords) failed to find any pre-1900 faunistic works with maps (at least none were found on the Internet Archive). This despite the fact that Cuvier and Buffon had come up with theories to explain fossils and the distributions of animals. German works like Studien zur Zoogeographique (1898) also seem not to carry maps. Even Gloger, who in 1833 came up with the idea that birds from moister zones were darker, does not include a map in his massive tome - Das Abändern der Vögel durch Einfluss des Klimas. The Germans later created a whole vocabulary related to biogeography and introduced terms like Formenkreis, Rassenkreis etc., which were (re)interpreted by Ernst Mayr. It seems that even when it came to keeping notes, many did not find it important to note geographical locations carefully. Even Darwin failed to maintain careful locality information for the specimens that he collected during his Galapagos visit and had to make use of the notes of others including Captain FitzRoy to resolve some of the species-island relationships.

Post 1900s one begins to see maps and map publishers like Bartholomew even had a volume on zoogeography in 1911.
Hume's observation on rainfall and bird distributions (1878)

Looking specifically at birds, it would seem like maps were either too expensive to print for each species or perhaps they were not considered informative enough. Hume in his notes in the Stray Feathers often noted the smaller size of specimens from southern India or Ceylon but he does not seem to have been aware of Bergmann's work. The first edition of the Fauna of British India series on birds includes notes on distribution but rarely attempts to find biogeographic delineations. The second edition is more interesting because Stuart Baker introduced the concept of subspecies. Now one would have expected him to carefully describe the boundaries of the subspecies, but this is not done. Little wonder that many of his subspecies are not disjunct, have considerable overlap or are examples of clinal variation. At that time, there was no way to decipher the age of evolutionary divergence and the difference between a subspecies and full-species was always a point of argument. It still is, but the phylogenetic species concept can at least be supported by an objective analysis of sequences - although that method itself has many a slip. Anyway, Baker did not include maps in his work. Salim Ali notes ("Handbook." Volume 1. Introduction. xxiv) about Baker's New Fauna:

"Whistler and Ticehurst in particular joined issue with Baker on a number of his statements and dicta. Many of their objections derived from the fact that large tracts of the country had as yet not been sufficiently explored ornithologically and there remained considerable gaps in our knowledge of the geographical distribution of many 'resident' birds-knowledge which is crucial for a proper application of the subspecies concept.

It has been said that Hugh Whistler (along with Claude Ticehurst) made maps (unavailable to most Indian researchers) that were apparently used by P C Rasmussen for her work. His maps and notes even helped her rediscover the Forest Owlet.

Salim Ali (and Dillon Ripley) brought in maps in their Handbook of the Birds of Indian and Pakistan begun in 1968. It is clear that he(they) was influenced by Whistler's comments on Baker's subspecies. He notes that (Introduction: xxix) "The distribution maps have been constructed or adapted from a number of different sources the chief of which are Atlas of European Birds by K H Voous (Nelson), Waterfowl of the World by Jean Delacour (Country Life), and Birds of the Soviet Union by G P Dementiev, H. Gladkov et al." A survey of the maps in the Handbook shows that they are used for indicating breeding / winter ranges, subspecies and for sister species. Salim Ali was of course influenced by Ernst Mayr whose ideas on speciation, particularly allopatric speciation, lead Salim Ali to make bold claims on the mutually exclusive distributions of sister species.

Map of Common Kestrel - from Dementiev & Gladkov 1951
The maps in the Birds of the USSR include both actual point records indicated by numbers and outlines that define distribution limits based on their imagination of the authors. They indicate breeding and migrant zones and further work is indicated in some cases through the use of question marks. The use of point records (indicated by numbers) overlaid on "hypothetical regions" was adopted in the Handbook. Ali was influenced by Hora's Satpura hypothesis - that Western Ghats affinities to the Malay Peninsula was through dispersal via the central Indian ranges. He was also aware of Glogers rule and Bergmann's Law. Beyond these he does not attempt to find any patterns. He never actually made careful lists of specimens with dates and locations. Salim Ali conducted bird surveys across the country and specimens were deposited in the collections of the BNHS. Like Darwin, he seems to have been careless with recording location information. Lozupone et al. (2004) state (emphasis mine):
Some ornithologists were meticulous about documenting where a specimen was taken, whereas others were remarkably cavalier about such things. Sálim Ali, India's greatest ornithologist, cared not a whit about providing exact information on a locality, whereas some lesser-known fieldworkers went to great lengths to geo-reference each important place where work was conducted. Needless to say, we spent more time on the localities with less documentation, conducting the sleuthwork to determine just where they were.
Ali includes a chapter on J B S Haldane in his autobiography. Although he held Haldane in great esteem, he does not seem to have heeded Haldane's 1959 advise, possibly because he was fundamentally against Haldane's thesis that supported ahimsa. Elsewhere Ali (1980) actually blamed this religious sentiment for the backwardness of Indian ornithology. Haldane's address on the non-violent scientific study of birds is however a remarkable bit of advice on bird research.

The method of collating data from observers across the country that Haldane suggested in 1959 was not original, he was merely promoting an idea that had taken shape in Britain almost 20 years ahead. Back in Britain, the idea that many people could collaborate to collect data was begun and proven by a man named Tom Harrisson. Tom Harrisson (more on him can be found starting here) was more interested in anthropology and ethnology and started a movement to study human behaviour through what was called Mass Observation. Harrisson started a Political and Economic Planning organization in 1931, in which he was aided by Max Nicholson and Julian Huxley (who was a close associate of Haldane). Nicholson also went on to start the Oxford Bird Census - they decided to get people to report on the Grey Heron (just Heron there) and then the Great Crested Grebe. The Oxford Bird Census became so popular that they could actually have a salaried director - W B Alexander. Now even if Salim Ali chose not to listen to Haldane (due to his leftist/ahimsa stand) he would still have had to hear about this venture from W. B. Alexander's brother, Horace Alexander (also a Gandhian!), who was in India, a birder and a friend of Salim Ali. Anyway the whole result of these bird surveys was that a new organization, the British Trust for Ornithology was created in 1932. Now the result of these studies on birds like the heron is spectacular. Today the BTO can produce amazing historic analyses and systematic data-based statements on clutch size - see the links here , here etc. The fact that neither Salim Ali nor indeed the BNHS or any other Indian organization took this step in spite of such advise is extremely unfortunate. The BNHS has never spent time documenting any policy (perhaps it has no system of evolving internal policy at all) of  making information freely available which makes them extremely suspect when it comes to collaboration with citizens. It is also sad that nobody really examines NGOs like the BNHS carefully enough nor are they amenable to suggestion in spite of being greatly funded by tax-payers.

When I began to learn about the birds around me, it was clear that there was considerable location specificity and that the broad picture given in books simply does not hold locally. People who talked to each others at a local scale had a far better picture of what was more likely and what was not (remember also that this was before the Internet, even before telephones were easily accessible) and that perspective was never easily available unless you knew the key people. It was with this perspective that we should be able to see a better picture when data was collated that I wrote up a small bit of software called BirdSpot in 2003 - it is still available for download although not updated. In 2003, Google Earth did not exist and Internet access was still quite expensive. Since then there have been major advances, Google Earth really made geospatial info and georeferencing a breeze. What used to be hard to get became commonplace. Since that time, I have been contacted by a large number of software developers (>50) and several people working with biodiversity related organizations who have sought to make similar system over the web. Although several of them have made use of the data, no mature system has really emerged to make sense of bird data or indeed that of any other taxon. A number of ventures have begun across the country with tremendous amounts of publicity and funding. Projects often spend on software but not on qualified curatorial staff. Another project at NCL Pune got numerous museums and collections to submit label data and created a database. Large funds were apparently made used but a few years later the website vanished, leaving data submitters in the lurch. Numerous other so-called citizen-science projects and biodiversity databases have sprung up with simple PHP frontends to gather entries into a database to gather data (a recent exercise being Citzen Sparrow). Most of them finally display the data as points on Google Maps with little callouts that showing additional metadata. There is a need for a debate on how systems should be designed, and what kinds of legal and philosophical stands they need to take.

The BNHS published a large  book on what are termed as "important bird areas" with little spots on maps indicating the locations of such sites. These spots, the introduction to the book claimed, were generated by GIS experts aided by high-end software. The contributors then decide on their favourite birding areas and provide a list of what they have pre-determined as being threatened. From a layman perspective such an effort is inappropriate. A minimally qualified worker would have first  created a database of all the bird occurrence data along with information on the habitat specificity, life-history traits, locations where they occur along with abundance estimates. They would then come up with a set of justifiable rules to produce a rational network of important bird areas based on the bird species involved, their current and potential distributions as well as the distributions of already protected areas. 

Clearly it seems that Indian scientists are not living up to expectations and if they really need to do citizen science, they should perhaps also be open to suggestions from the (lowly) "citizens". An average netizen today knows that geospatial information can be collated without the need for programming - one can set up a personal Google Map layer, share it and collate notes. Here is a local venture to collate sparrow records, and it did not harm any tax-payers. As a bottom up venture, it did not go through the level of planning one would expect for large projects. The policy design requirements need to be stringent  for large projects which are funded by the tax-payer money and these should be well documented particularly given that the organizations behind their execution claim expertise.



(In the next part I try to specify a layperson's [mine] expectations of biodiversity mapping projects)

References
  • Lozupone, Patsy, Bruce M. Beehler, and S. Dillon Ripley. (2004). Ornithological Gazetteer of the Indian Subcontinent. Center for Applied Biodiversity Science, Conservation International, Washington, DC, USA.
  • Ali, Salim (1980). "Indian Ornithology: The Current Trends". Bull. Brit. Orn. Club 100 (1): 80–83.
  • Wright, Alex  (2010) Managing Scientific Inquiry in a Laboratory the Size of the Web. NY Times. December 27, 2010

Postscript 
31 May 2012 : An interesting source on this topic is by Jane R. Camerini (1993). Evolution, Biogeography, and Maps: An Early History of Wallace's Line. Isis 84(4):700-727. Some of the earliest biological maps are by Eberhardt A. W. von Zimmermann, Specimen zoologiae geographicae, quadrupedum domocilia et migrationes sistens (Leiden/Batavia: Theodorum Haak, 1777) as well as in Geographische Geschichte des Menschen und der Allgemein Verbreiteten Viefussigen Thiere (Leipzig: Weygandschen, 1778-1783). Other examples included here are de Candolle and Jean B. A. P. M. de Lamarck, Flore francaise . . . , 3rd ed. (Paris: Desray, 1805); Alexander von Humboldt and Aime Bonpland, Essai sur la geographie des plants (Paris: Levrault, Schoell, 1805); and Joachim F. Schouw, Pflanzengeographischer Atlas zur Erlauterung von Schouws Grundzugen einer allgemeinen Pflanzengeographie (Berlin: Reimer, 1823).

Thursday, April 12, 2012

How to lay an egg

The captivating lapwings
Some years ago I learnt from a friend who rears chicken that a freshly laid egg can be put away in the fridge and when given to a broody hen a few days later, that egg would still hatch. On some study, it was clear that this was very sound empirical knowledge but I had not thought enough about its evolutionary significance. Most birds can only lay an egg a day, so if a bird laid four eggs in a clutch, they would hatch at one day intervals if they all took the same time to develop. The earliest birds at nest that caught my attention were Yellow-wattled Lapwings. Just a short walk and in plain view of the balcony of the quarters where I lived as a teenager, I could spend my summer observing them at their nest and with young birds. It was something, a great privilege, that I took for granted then. One person who actually boasted of such a privilege was J B S Haldane who wrote to Julian Huxley about how he could study the species right from the comfort of his verandah.[1]
Nesting lapwings mobbing a kite, an artistic impression
What I did know about lapwings from those teenage birdwatching years was that all the chicks hatched at around the same time. How did this happen? I now know that one aspect to the possible answer is that the birds actually begin to incubate the clutch only after the clutch is complete. Synchronous hatching, it turns out, is particularly useful if all the chicks need to move away from the nest after hatching - as is the case in nidifugous birds (ducks, fowl, lapwings and others). As long as an egg is kept below what is called "physiological zero", the metabolic process of development is not switched on. Physiological zero is about 26° C for chicken, but there appears to be no such estimate for waders (plovers and lapwings included) - and Jayakar and Spurway (1968) [Spurway was then Haldane's wife] note in their study that the soil temperature during the breeding season was about 60° C  while air temperatures were about 30-40° C. Further, they noted (emphasis mine):
"All 4 eggs have hatched in 5 clutches ; the interval between the hatching of the first chick and the removal of the last egg shell was, for pair 1, 23 hours 10 min. (b); the interval between the day of first finding a chick (often more than 1) and the nest emptying was, for four nests (2, 7, 11 and 13') 1 day, and for one (5) 2 days. However, 3 days intervened between the first and the last hatch of the 3 surviving eggs of clutch 6'. These intervals are less than the 4 days between the 1st and 4th lay, confirming b that some synchronisation of hatching exists. We have also confirmed that this synchronisation is not achieved by delaying incubation, as it seems to be by species in colder habitats."[2]
It seems that this whole idea of egg hatching was of considerable interest to the Haldane gang for I hear that they also noted how a certain wasp (or is it a leaf cutter bee?) that builds a nest in a tube with multiple cells laid outwards hatched in reverse order - that is the outermost and last laid egg leaves first. (pers. comm. K. Chandrasekhara, original source still sought).

It turns out that in some cases it is better to have eggs hatch early - a cuckoo and possibly other brood-parasites preincubate their eggs before laying them in the nest of a host, giving their own egg a headstart.[3]

There appear to be a number of works on this subject but few really comprehensive summaries of the issues involved. Several authors have tried to find the costs and benefits of having young hatching simultaneously. In turtles, it is suggested that hatching in one big bunch leads to satiating predators and diluting the individual risk of predation. Others note that the nutrient quality in eggs improves from the first to the last laid - so that the last laid eggs develop faster. That of course is teleological wording that reminds one of Haldane famous quip-  ‘Teleology is like a mistress to a biologist: he cannot live without her but he's unwilling to be seen with her in public.’ But in looking at the whole issue I came upon an even more serious philosophical question in Murray (2001), although the author suggests a comprehensive mathematical model for examining synchrony in hatching! So much for an egg ...



References
  1. Krishna R. Dronamraju (1987) On Some Aspects of the Life and Work of John Burdon Sanderson Haldane, F.R.S., in India. Notes and Records of the Royal Society of London 41(2):211-237. 
  2. Jayakar, SD; Spurway, H (1968). "The Yellow-wattled Lapwing, Vanellus malabaricus (Boddaert), a tropical dry-season nester. III. Two further seasons' breeding". J. Bombay Nat. Hist. Soc. 65 (2): 369–383.
  3. Birkhead TR; N Hemmings; CN Spottiswoode; O Mikulica; C Moskát; M Bán and  K. Schulze-Hagen (2011) Internal incubation and early hatching in brood parasitic birds. Proc. R. Soc. B 278(1708):1019-1024.
  4. Murray, BG, Jr. (2001) Are ecological and evolutionary theories scientific? Biol. Rev. 76:255–289. 
  5. http://www.stanford.edu/group/stanfordbirds/text/essays/Brood_Reduction.html