Dr. Graham Pluck
Minds, Brains, & Internationalism

Travel, Internationalism, and Academia Blog


Reinforcement learning, rats, and a holiday in Cambodia

As this blog is for my website on ‘Minds, Brains, and Internationalism’, I’m painfully aware of the foolishness of having started it during a global pandemic, when international travel was severely limited. But things have just eased up enough for my first holiday abroad in the new normal - a trip to Cambodia. And this turned out to be the perfect way to combine the study of minds, brains, and travel. The story, derived from my recent visit to an NGO in Siem Reap, involves rats, TNT, tuberculosis, and reinforcement learning.

Cambodia is probably the most dangerous place in the world for unexploded mines and other military ordinance. Due to the multiple regional conflicts in the second half of the 20th century, the borders and some internal regions were very heavily seeded with land mines, often without keeping records of locations. And millions of bombs were dropped, many of which remain unexploded. This has rendered much land unusable, and has maimed or killed millions. Decades later, Cambodia is said to have the highest rate of limb amputation in the world- due mainly to land mines. Of course, it’s the rural poor who suffer most from this. This image shows Cambodian children playing near a mine field.

Unlike planting landmines, clearing them safely is slow and expensive work. A couple of years ago I watched mind-clearance teams working on the Falklands Islands. The region had been mined during the brief occupation by Argentina in 1982. Since then, mine-clearance teams from Zimbabwe have been removing them, slowly, by traditional methods, such as walking across the land and gently prodding the ground with a stick. Nerve racking work without doubt, and slow, but effective. This is a photo I took on the Falkland Islands, of wreckage of an Argentine fighter jet, still on the ground decades after the war ended, as are many of the landmines.

Pebble Island the Falklands 2017.jpg

In Cambodia, and several other countries, a novel method is used. Rats are conditioned to detect the smell of TNT, and indicate to their handlers the locations. The rats are not heavy enough to detonate the mines, so can walk over them safely. The rats that are trained are giant African pouched rats. Nocturnal, and with poor vision, they have very well-developed olfactory recognition systems. Importantly, they can smell TNT from a distance. This is a picture of me with one of the giant mine-sniffing rats in Siem Reap, Cambodia. 

Rat Graham3.jpg

Training each rat takes about nine months. The first stage is to socialize the young pups to be friendly with humans. When training proper begins, they are conditioned to associate the sound of a clicker with food, basically classical (or Pavlovian) conditioning. With repeated pairings the click sound is experienced as rewarding. In a later training phase, the trainee rats are placed near a ball containing TNT explosive. When they touch it, the click is given, and the food reward is given. This provides instrumental conditioning. With repeated pairings, the rat learns to seek out the smell of TNT. The clicker works because it allows the reward to be given immediately, as the handler cannot go to the rat. However, the handler can bring the rat back to the safe area, give it a treat and mark the location of the mine so that it can be safely detonated. The training procedure has been described in detail by Poling et al., (2010).

The trainee rats must pass a strict test before they are deployed to actual mined areas, and when they graduate and start work, they are known as HeroRATs. They are so effective because they can detect mines that have few or no metal parts (so avoiding false negatives) and they don’t detect random pieces of metal, as human operators with metal detectors do (so avoiding false positives). In medical-statistics terms, they have both good sensitivity and good specificity. Rats can therefore clear land much faster than people with metal detectors can.

OPOPO is the NGO that trains the HeroRATs and deploys them. In Cambodia alone, the rats have so far located 1,521 anti-personal mines, 10 anti-tank mines, and 643 cluster bombs, as well as other types of  unexploded munitions. OPOPO operates in several countries in Asia and Africa, clearing land of explosives. This photo shows some deactivated mines at the OPOPO visitor centre in Siem Reap. 

Mines 2.jpg

OPOPO also train HeroRATs for medical use: they can detect the smell of tuberculosis (TB). They do this my sniffing sputum samples. A TB-trained HeroRAT can screen hundreds of samples per day. The training is very similar to that of TNT detection, involving clicker training and operant conditioning. For trained rats, the sensitivity (true-positive rate) is high, as is the specificity (true-negative rate). In fact, they are much better than microbiologists with microscopes, which is the standard method of TB detection in developing countries. The training procedure for rats to detect TB is described by Poling et al., (2011).

It’s cute and very practical, but does this really matter, other than for the mine clearing and TB screening? Behaviorism is a now-discredited branch of psychology, right? Well no. Behaviorism never really went away, the methods are still widely used in neuroscience, and experimental psychology still works on method of altering the stimuli and observing the effect on behavior. See Brown and Gillard (2015) for a discussion of why behaviorism is still alive and well. More importantly, reinforcement learning has a made a big comeback in the past few years, in the form of machine learning. The once limited ability of neural networks to simulate intelligence is long gone and now when people discuss artificial intelligence, they often actually mean machine learning. A big part of that is reinforcement learning.

As an example, a recent article has sparked much debate by suggesting that ‘reward is enough’ to explain human complex cognitions, such as intelligence, as well enough to produce general artificial intelligence in computers (Silver, et al., 2021). And my own small contribution is to point out that one of the greatest changes to how people live their lives in the past century has been the uptake of smartphones, and that interactions with them are likely conditioned through action-reinforcing social media ‘likes’ etc. (Pluck & Falconi, 2021). So, reinforcement learning really is quite an important concept.

Back to my travel story- I attended the visitor center for OPOPO in Siem Reap, Cambodia. It’s a great place to visit to learn about how HeroRATS are trained and used. And an educational experience on the importance of reinforcement learning. Siem Reap is also in the city that is nearest to the Angkor Wat complex, so it can be visited as part of journey for that archaeological attraction. To find out more about OPOPOs work with conditioned rats, or how to visit one of their centers, see https://www.apopo.org/en


     Brown, F. J., & Gillard, D. (2015). The ’strange death’ of radical behaviourism. The Psychologist, 28(1), 24–27. https://www.bps.org.uk/psychologist/strange-death-radical-behaviourism

     Pluck, G. & Barrera Falconi, P. E. (2021). Sensitivity to financial rewards and impression management links to smartphone use and dependence. Cognition, Brain, Behavior. An Interdisciplinary Journal, 25(2), 107-128. https://doi.org/10.24193/cbb.2021.25.06

     Poling, A., Weetjens, B. J., Cox, C., Beyene, N. W., & Sully, A. (2010). Using giant African pouched rats (Cricetomys gambianus) to detect landmines. The Psychological Record, 60(4), 715-728. https://doi.org/10.1007/BF03395741

     Poling, A., Weetjens, B., Cox, C., Beyene, N., Durgin, A., & Mahoney, A. (2011). Tuberculosis detection by giant African pouched rats. The Behavior Analyst, 34(1), 47-54. https://doi.org/10.1007/BF03392234

     Silver, D., Singh, S., Precup, D., & Sutton, R. S. (2021). Reward is enough. Artificial Intelligence, 299, 103535. https://doi.org/10.1016/j.artint.2021.103535

Admin - 08:03:50 | 3 comments