Fallout in positive reinforcement training: a post for trainers

When you start using positive reinforcement to train a dog, you are often operating under the notion that it’s almost fallout-free, that there are few negatives associated with it. Indeed, I’ve only ever used positive reinforcement to train my pup Heston and it wouldn’t have even crossed my tiny mind way back when I started that there were things that could go wrong. I’d happily sing the wonders of positive reinforcement training with my friends. And as I learned more, I read more. Academic textbooks, papers, studies. Back to undergraduate psychology textbooks and Applied Learning theory. I can’t tell you how many courses I’ve attended, how many DVDs I’ve watched, how many books I’ve read. Few people talk about the challenges or difficulties of positive reinforcement training, leading to a view that it is somehow easier than using aversives.

That isn’t quite true, however.

So often, positive methods of reinforcement are seen as a panacea for all behaviours as well as being the ethical choice. I personally operate under the notion that you almost can’t go wrong when you’re a ‘cookie pusher’. I hate this term, but I know it’s how many of the French dog trainers see me, as they don’t understand what I do. It’s not about cookies, though, is it? I have a wide repertoire of reinforcers (including toys and smells, functional behaviours and other conditioned environmental reinforcers – I really try to keep that repertoire of reinforcers as big a basket as I can) and I’m still at the beginning of a marvellous learning journey, where I have the privilege of being able to practise these methods in the shelter where I am a member of the board of trustees. I’m always learning about how to use reinforcers and different cues, and it’s so much more than being a cookie pusher.

I think that’s why positive reinforcement is seen as the ‘easy’ option. It has been reduced by critics who don’t understand it to a simplistic explanation of what R+ trainers do.

Because, too, they imply that we don’t work in ‘all’ the quadrants, we’re kind of ‘quarter’ the experts. That also contributes to the notion of how ‘simple’ R+ is. We must be idiots if we only use a quarter of the available ‘tools’.

There is also a cheery enthusiasm associated with positive reinforcement. A glossing over of the negatives or difficulties that is sometimes coupled with a righteous indignation about ethics.

And I try to be open-minded. Really I do.

That said, I’m hyper-critical of aversive methods. I know I am. And I recommend nothing other than the most minimally invasive training or Premack methods with a “do-this and have this” methodology that is as minimally aversive as I can make it. As my last post no doubt made clear, I’ll happily tell you about the fallout of aversives. They are etched in my mind for every single time I think, “Wouldn’t it be quicker just to…. ??”

But one thing I think is necessary for those of us who use the least aversive method available is that we deal with all the potential effects of the methods we are using. Some of these effects may be ones we hadn’t thought about, and I certainly feel that some of the effects of R+ are not things often discussed – or as often as they should be.

If we gloss over these effects or don’t pay them enough mind, we run the risk of passing them onto clients who are ill-prepared for things that might go wrong or the potential ‘fallout’ when you use reinforcers. I know it’s a line I often use, that you can do little harm if you get R+ wrong, but that’s not entirely true and it’s rose-tinted thinking at best. Ironically, where I think this is most true is with ‘cross-over’ adult dogs who have been used to aversives.

So many great presenters and great teachers of trainers gloss over some potential undesirable consequences of positive reinforcement , especially with owners. By not being mindful that positive reinforcement can have unintended effects too, we’re damning dogs and owners because we’re not being honest enough – in the same way as ‘balanced’ trainers who are not honest about the potential fallout from aversives! Certainly, having had hundreds of hours of training, the ‘fallout’ of positive reinforcement is rarely mentioned, yet in practice, it’s my view that we need to be aware of these, especially when we promote it as a fail-proof method when it is not. And we need to share these potential ‘risks’ with clients, who may fail if we do not.

Not sharing the difficulties of R+ training is a massive blind spot that I see across the industry.

First off, we need to stop being humble about how easy positive reinforcement is, and share the benefits and the consequences in the same way we wish our aversive-loving colleagues would do. R+ is not the easy option.

We spend so long as positive trainers being humble. We pick up on things like that great statement from Dr Ian Dunbar about “to use punishment effectively, you need a thorough understanding of canine behaviour, a thorough understanding of learning theory and impeccable timing… and if you have those things, you don’t need to use punishment” (apologies for misquoting, but you get my drift) and we say things like “I’m not a good enough trainer to use punishment” in mock humility. Honestly, there’s something a little wrong with this approach. It suggests R+ is easy. Sure, there is much less that can go wrong, but there still are things that can go wrong, and the view that R+ is free from side-effects and that it is better for beginners or owners belittles what great trainers do and underestimates what our clients will find difficult about it.

Of course we are good enough trainers not to use punishment. We know that too. That’s why I said ‘mock humility.’ I got a dog to stop being so aroused around bikes and joggers last week. Do you think that I truly think I’m too crappy a trainer to have done it without a shock collar or choke collar? Yet I often hear trainers use this as an ‘excuse’ for why they use the least intrusive methods of behaviour mod. “Oh, I use R+ because I’m not good enough to use other methods.”

We need to be honest about the fact it is hard work – sometimes harder and usually more time-consuming than punishment is. Who knows? One high jolt of electricity might have put that dog off bikes and joggers for life. But the risk of fallout is too big to risk getting it wrong.

That’s why I spent hours working with this dog.

I’d have been disingenuous to promote this method to the owner without saying the method may be hard, may be time-consuming and may be frustrating. It’s that lack of honesty that sometimes sends our clients from us back to balanced trainers for a quicker and more immediate intervention. Sure it adds another ten minutes to an already-overscientific explanation to owners. I am a fan of putting all the behaviour mod programmes out there for owners and saying, “these are the possible – and likely – consequences of X, Y and Z… ” without always making it explicit that there are consequences of games, toys and food too. I need to change that.

That self-effacing mock humility some positive trainers use covers up a kind of smugness that, in fact, we are good enough trainers that we have never found it necessary to use those ‘most intrusive, most aversive’ methods. And that smugness can be hard work for owners as well. Our sometimes saintly ethics can be a real aversive to owners.

I am proud to have found a workaround to those typical horrors such as kneeing dogs in the chest, using electric collars for recall, jerking on their lead to get a heel, or using bark collars.

I also know it takes a lot more time and it can be difficult to stick to. That’s something owners and novices need to know.

We need to prepare novices and owners for the frustrations of positive reinforcement. Those frustrations are often ours, not the dog’s. I’d be dishonest if I didn’t say that sometimes I let out a big “For Fuck’s Sake, Heston, that hare’s over a bastarding kilometre away. Chill your fucking beans.”

Aversives are our species-wide preference and I am not a saint when my dog is going mental about a hare in a field and I’m trying to finish the walk so I can get on with my actual job. Frustration is a big part in my own autoplay behaviours and preferences. Our own frustrations are something we need to share with our clients, as well. They need to expect to feel frustrated at times and to know that using aversives is our natural autoplay as a species. It is so easy to turn to them the first time our own lack of skill with reinforcement lets us down.

As Balsam and Bondy (1983) point out, there can be symmetrical undesirable consequences to positive reinforcement too. I’m not sold on ‘symmetrical’, more ‘parallel’ or ‘similar’. If you ask me, they are not symmetrical because the consequences or risks are not of the same intensity. They don’t do the same damage and they can be easily worked around or anticipated to avoid.

Most of these unexpected effects are rarely shared with owners, or rarely discussed, as if the only undesirable consequence of training with food is that a dog may gain some weight unless you’re careful. In fact, I’d hazard a guess that most trainers know the fallout of positives from their own experience, but there’s little literature out there that actually makes it clear. Ironically, if there were, the “balanced” trainers would probably be hot on pointing out the ‘drawbacks’ of positives, rather than resorting to ‘cookie pusher’ insults, saying our dogs will get fat, or that we aren’t using ‘the full quadrant’ as if we’re somehow deficient and inadequate.

It behooves those who use positive methods of reinforcement to understand each of those parallel consequences, to be prepared for them, to know when and how to use R+ properly instead of using it as a remedy that anyone can use – R+ is not “Training for Dummies”, safe in the hands of non-experts.

What are the unintended effects of R+, then?

Where an undesirable consequence of aversive methods is anger and aggression, there are times when R+ trainers will have faced frustration, even anger and aggression, most notably when withholding a food or toy reinforcer or when they aren’t coming quickly enough.

In my experience, this is most likely to happen with ‘crossover’ dogs who are suddenly faced with the joys of reinforcers for the first time in their life.

This is what I call the ‘put the fucking lotion in the basket’ response after the scene the Silence of the Lambs where the serial killer Jame Gumb has repeatedly requested a behaviour from his would-be victim (in a scene that is coincidentally a perfect example of an escape/avoidance routine which hasn’t worked – the frustration of which is the handler, not the subject… another side-effect of behaviour mod we should all be conscious of – handler frustration!)

Of all the dogs I’ve worked with, I’ve had anger and frustration a couple of times. For me, I think it’s related to control, and has both times been with dogs who had issues with handling and coercion. This is why I think it’s more a problem for cross-over dogs. The moment they get the illusion of control (I do this… she gives me this…) there can be real issues regarding who has access to the reinforcers. “My treats. Why is that silly woman holding the bag? I could quite easily snatch it for myself.”

For dogs who have never known anything but R+, this isn’t an issue. Certainly for my own dogs, I’ve never seen anger or real frustration that they have had a reinforcer withheld either accidentally or on purpose, and I’ve never had a “just give me the fucking cookie” moment – but they have a clear understanding that their behaviour is operating on my giving the reinforcer.

I also understand that if I don’t get the behaviour, I’m asking too much.

Ken Ramirez on video not getting a walrus to open his mouth is a good example.

He asks for a simpler behaviour and the walrus obliges.

Then he goes back up and escalates through behaviours, asks again and gets a flawless open mouth.

Good trainers know this. Increase the rate of reinforcement, lower the complexity of the task. Get some reliable behaviour and then try the unreliable behaviour again.

Poor R+ trainers get stuck in the loop of asking for behaviours and coming unglued when they don’t get it. This can lead to frustration and anger from both handler and animal.

For me, this frustration and aggression will come early in the process where a dog has not grasped the notion of control-within-control, where the task has been too difficult or where reinforcers have not been rapid enough. With crossover dogs, then, go easy and have a really, really high rate of reinforcement for simple behaviours and keep sessions really short. Be mindful that this is going to raise eyebrows with owners and critics alike. There’s this assumption that you are always going to need to do the same level of reinforcement that frightens the unfamiliar. Working with one dog at the beginning, I burned through a pouch of chicken in about three minutes. You can see why that might raise eyebrows among people who don’t understand R+. But it’s a good way to avoid anger or control problems with dogs who have only known punishment. Watch Ken Ramirez in action and you’ll realise short bursts, the right level of Goldilocks’ challenge (not too easy, not too hard) and rapid reinforcement avoid these emotions on the whole. See what I mean about how we shouldn’t suggest R+ is the easy option?

One factor I also noticed in the two dogs who were aggressive with reinforcers was that they had little relationship with me. If you think that frustration could be an issue with an unfamiliar dog you are training (often with big, male, unruly, mannerless dogs who have been ruled with an iron fist previously) build up a conditioned emotional response to you first and then start doing other things afterwards. Channel your inner Ken Ramirez: reward frequently, back up if it’s too challenging, keep it in small bursts. You may also find that having a two or three hour neutral period with no food or toys with new dogs helps you get to know each other better. I do a couple of walks with no reinforcers from me.

When you work with animals with a history of aggression and punishment, don’t be surprised if you get a moment where the dog looks at you and you find yourself wondering if they are weighing up whether to mug you with menaces or do your silly little game. I had a moment with Lidy where I’d just started reinforcement training – basics like ‘sit’ and ‘four paws on the floor’ where she turned around and looked at me – I swear she looked like she was weighing up the pros and cons of stealing my treat pouch and running off into the forest in a blaze of glory. By using reinforcers frequently – copiously, even – for stupidly small behaviours, every dog I’ve worked with has come to realise that they can quite literally have their cake and eat it in return for our silly little games. But there can be moments where you are praying to the Gods of Dogs that an unruly, powerful dog who has a well-defined history of using threat to get what they want doesn’t decide that mugging with menaces is a better option. It is a behaviour they have honed already. Cooperation has never been a concept they understand. Lidy and Hagrid were like that at the beginning. For dogs who have always been coerced, cooperation is a taught skill. I think this is where some less polished R+ will fail.

Make it easy and build in ‘no mugging’ training as you do with puppies fairly early on and you avoid the accidental P- of positive reinforcement. Susan Garrett’s “It’s Yer Choice” is still viable with unruly dogs who have had a lifetime of aversives. Not something I would do with a dog with a history of using aggression to get what they want though, not until we’ve got a working relationship in order to avoid those ‘put the fucking lotion in the basket’ moments.

Over-arousal is another side-effect for some dogs around reinforcers: shall I tell you about the day I started using food with one dog and it was obviously too highly arousing as I got humped every time there was a lull in action? You have to be mindful that for some dogs, whatever you are using as a reinforcer is just too much of a distraction and it interferes with the learning. Working with a hungry Hagrid on his bite inhibition is like running the gauntlet. I absolutely have to back up to some fairly undesirable treats that are the size of my fist and taste like flour. I also do it after he’s been fed and when he’s had a good meal.

These issues come back to poor impulse control and frustration tolerance. But you have to be mindful that you may have to put those things to one side for a while with some adult dogs who come to you with a history. For me, that’s one reason people might stop using R+ with a dog who has not been used to reinforcers coming from humans.

So unexpected emotional issues can be one side-effect of R+ training that you need to prepare yourself for. Always have a contingency plan and always know that whenever you start working with food or toys with a dog who has never had R+ training, you may need to address a few emotional issues first. It’s not to say you can’t use them or you will never be able to, but just you might need a workaround whilst you find the right level of reinforcer – reinforcing but not distracting and whilst you find a natural rhythm with the dog – frequent enough not to be frustrating. Be mindful of what you do with dogs with RG issues too.

A second issue relates to proximity. If aversive methods lead to animals who don’t want to be near you, R+ can lead to animals who want to be near you all the time. You might not think this an issue until you are working with a dog who is hyperattached or who suffers separation anxiety. But it’s true of a range of ‘normal’ dogs too.

Once, my little cocker spaniel heeled in perfect position for 5km. She didn’t drop a step. Why? I had a pig’s ear in my pocket from something unconnected. That promise of a pig’s ear meant that the maximum distance she was from me was about 50cm at most. The pig’s ear had become a stimulus, not a reinforcer, and whilst that might seem like a dream dog, that’s the mindless automatons R+ training is sometimes criticised for making. R+ means dogs want to be near you and want your attention. Live in a multi-dog home and YOU can become a highly-valued resource to guard… or a source of wars. I’m not suggesting R+ is responsible for velcro dogs or inappropriate attention seeking, but if aversives send dogs from you, reinforcers can bring dogs in. If you think this is an issue, remote training devices like remote treat dispensers can help, as well as the strategies specialists in separation anxiety use to occupy a dog when teaching them to cope with absence or distance. I read tonight the criticism that R+ dogs can be constantly waiting for training, and in a horrible ‘R+ gone wrong’ world, you can see how you could create a monster. Easy to get around by building in ‘release’ cues and encouraging interaction with the environment. I start all my sessions with “Ready?” which is my cue that we are working. But I guess there are a few trainers out there whose dogs are constantly hanging around in the hopes of a little learning. Those reinforcers and that learning can be addictive. R+ plays on the same reward pathways as other addictive behaviours, and then the moniker of ‘cookie pushing’ is not far from the mark. This is another accidental by-product of R+ training that you might see in novice dogs or in novice practitioners.

Another drawback comes in the form of increased behaviour. If aversives diminish behaviours, reinforcers can not only increase the target behaviour, but other behaviours too. Dogs who are clicker-savvy offer lots of behaviour. It can be a bit “how’s this? What about now? I’m going to try this… now this? What do you think of this?” To me, this is not a problem. I don’t mind my dogs doing more. I don’t mind them offering behaviours either. I’ve been doing leg weaves and stationing between my legs with Lidy and she is fairly delighted with her behaviour so she does it often. You don’t want to get caught out by an excited mali-mutt doing leg weaves when you don’t expect it.

Get the behaviour-offering mixed-up and cue-less and you can also make yourself a problem. I had this with Hagrid. This is a mali x GSD who has arousal issues. He is 40kg of jumping, mouthy, hard-mouthed dog. I like him to walk in front of me where I can see him (he has a thing about coming behind and herding, so I don’t ever let him walk behind me so we can avoid the ankle and calf nipping) but Hagrid and I had a dysfunctional relationship for a while. I’m going to call it reverse heeling. I wanted him not to heel. He would move in to heel position, so I would throw him a biscuit to get him to move away and in front. He increased the moving in as I increased the biscuit throwing. There’s that ‘dog gets closer to handler/reinforcer’ side-effect too. Only the cue became him moving in, the behaviour was me throwing the biscuit… and we had a horrible circle. I got out of it by withholding the treat throwing for a millisecond at first, then building up his ‘heel’ and teaching it like a proper heel on the 300-peck method (he walks to heel for one, I reward…. for two, I reward…. up to three hundred paces per reward) and gradually spacing out the reinforcers to 6 minute intervals… so he has now a perfect heel and it looks like I taught him rather than the other way around. What this is is a cautionary tale. Rewarding offered behaviour which has not been cued can lead to a rapid increase in offering behaviours and dogs who seem to be “testing” you. What you need is a dog who understands ‘no cue, no reinforcer’ so they don’t keep offering and offering. A dog who doesn’t understand that behaviour must be cued is a dog who is a nightmare to teach hand-targeting to, as they are constantly butting your hand to see if it works. So clients need to have a modicum of understanding that the dog can’t just go around ‘behaving’ and being reinforced – it must be cued. At this point, I’m reminded of when Heston was a young pup and he would bring me toys constantly during my lessons. I became a dab hand at pulling a tug whilst teaching A level English Literature. It got manageable when I realised I had to ask him to play and ignored his attempts otherwise, but how many of us explain to clients that they must initiate the behaviour, not the dog? Therefore, another important potential fallout to be mindful of.

The final parallel side-effect of R+ training is in generalisation and specificity. This tends to be the one drawback most trainers are aware of and talk about – how clients must practise behaviours everywhere beyond the classroom, otherwise the dog risks never generalising the behaviour. The need to generalise is well-accepted as a potential sticking point for R+ training, so I won’t labour the point. Likewise with the need to fade out reinforcers… where punishment and aversives may stop being effective when their application comes to an end, so behaviours that have been subject to reinforcement may also fade if you don’t practise and reinforce them from time to time. They stop being habits without at least occasional reinforcement.

When we are mindful that reinforcers may have emotional fallout, that they may cause an animal to decrease distance to the source of reinforcement, that reinforcement – like punishment – can get in the way of learning, that it can lead to a lot of increased behaviours as well as offered behaviours, that animals may fail to generalise unless we teach them to, and that unless we keep practising from time to time, behaviour may fade or become extinguished, we are better prepared to help our clients navigate these issues. Although the fallout of aversives is enough to keep me from using them, the fallout of reinforcers just makes me a little more careful in how I use them myself and in how I explain them to clients.

In response to Balsam and Bondy, I think it is fair to say that there is not perhaps a symmetry but an equal and opposite reaction. Punishments and aversives create negative emotions. On the whole reinforcement creates positive emotions (which is why I could only find you two examples of dogs at the beginning of R+ who had some negative fallout). Punishment increases distance; reinforcement decreases distance. Punishment decreases behaviours offered; reinforcement increases behaviours offered. Both can have transient effects if the consequences are discontinued, and both may face issues with generalising and specificity.

To finish, whilst I am mindful there will be some “balanced” trainers who will seize on the notion of flaws in the great panacea of R+, it is timely to remind people that, in general, organisms seek out reinforcement and avoid punishers/aversives. For that reason, reinforcement is what an organism chooses. The ethics of that should not be overlooked. Although I may be very much in control of the rates of reinforcement, the schedules of reinforcement, what the reinforcement is… reinforcement is how an organism chooses for themselves. Thus, it is the only method for those who want a partnership with their animal, who want to work with their animal and although there are occasionally unintended effects of positive reinforcement, they are engineering and management issues rather than ethical ones. Most are tied up with getting the reinforcement just right. If you get the reinforcement rules, rate and schedule right, those accidental effects cease altogether. That cannot be said of aversive training. The “side-effects” of reinforcement can be eradicated by becoming better at R+ training – they are beginners’ errors (whether the animal or the trainer is a beginner!) Eradicating the fallout of aversives is a much more complex procedure and not always possible.

It is vital that we talk about these unintended effects that positive reinforcement can have. This way, we can avoid them altogether. Maybe with a more critical eye to balance out our enthusiasm, we can ensure our clients don’t make these errors and are therefore less likely to default to aversives. We can also ensure novice R+ practitioners get the best out of their experience, meaning they are more likely to use it again in the future. R+ is not the easy option. The fallout is far less frequent and much less dramatic than aversive training, but if we don’t think about those parallel consequences, we do our clients and their dogs a disservice.