Ñòóäîïåäèÿ

Ãëàâíàÿ ñòðàíèöà Ñëó÷àéíàÿ ñòðàíèöà

ÊÀÒÅÃÎÐÈÈ:

ÀâòîìîáèëèÀñòðîíîìèÿÁèîëîãèÿÃåîãðàôèÿÄîì è ñàäÄðóãèå ÿçûêèÄðóãîåÈíôîðìàòèêàÈñòîðèÿÊóëüòóðàËèòåðàòóðàËîãèêàÌàòåìàòèêàÌåäèöèíàÌåòàëëóðãèÿÌåõàíèêàÎáðàçîâàíèåÎõðàíà òðóäàÏåäàãîãèêàÏîëèòèêàÏðàâîÏñèõîëîãèÿÐåëèãèÿÐèòîðèêàÑîöèîëîãèÿÑïîðòÑòðîèòåëüñòâîÒåõíîëîãèÿÒóðèçìÔèçèêàÔèëîñîôèÿÔèíàíñûÕèìèÿ×åð÷åíèåÝêîëîãèÿÝêîíîìèêàÝëåêòðîíèêà






Associative Learning






 

Associative learning can be said to have taken place when there is a change in an animal’s behaviour as a result of one event being paired with another (Pearce, 2000). As we have seen in Part I, the methods elaborated by behaviourists were based on classical and operate conditioning which are considered the simplest form of learned behaviour. Although there is a number of types of learning that neither classical nor operant conditioning theory can explain, the experimental procedures elaborated by behaviourists still play the important role in studying animal intelligence. In particular, these methods help to test limits of learning abilities and evaluate balance between flexibility and conservatism in animal’s behaviour in the real world where one event will readily predict another.

Experimental technique based on the first class of associative learning i.e. classical (Pavlovian) conditioning allows investigating how animals learn that one stimulus signals another and, vice versa, that one stimulus indicates another will not occur. To learn effectively, organisms should differentiate between similar stimuli in one situation and use generalisation in another. These relatively simple capacities could underline cognitive skills such as concept formation and rule extraction.

When, for example, a trainer asks Alex, a grey parrot, to answer how many things he sees and how many red things a set contains, how many wooden and so on (see Part IX for details), the researcher tests cognitive skills of the bird. However, " simple" properties of associative learning such as stimuli discrimination and generalisation are also involved in this test. Of course, these properties are not so simple in reality. Wasserman and Miller (1997) named their review " What's elementary about associative learning? ". In this review they consider associative learning as the foundation for our understanding of other forms of behaviour and cognition. In the present chapter we will analyse laws of associative learning in order to estimate the formation of associations as an integral part of intellectual activity.

6.2.1. Classical conditioning

 

A procedure to study learning basing on pairing an unconditional stimulus (such as food or pain) with a conditional (previously neutral) stimulus is connected with the name of the Russian physiologists Pavlov and called also Pavlovian conditioning (see Chapter 2). The main idea of classical conditioning is that an organism learns new behaviours by establishing an association between an involuntary unconditional stimulus and another stimulus it faces in its life. Every animal that possesses the nervous system has a number of innate stimulus-response associations.

The unconditional stimulus (UCS) is a simple inborn reflex. This reflex may involve, say, taste receptors, sensory neurons, networks of interneurons in the brain, and motor neurons running to the salivary glands. For example, hungry vertebrates, including humans, will produce saliva when presented with food and do many other things automatically, just because many connections are “wired” in their nervous system at birth. As it was described in Part I, while studying the salivary reflexes, Pavlov noticed that his laboratory dogs had formed an association between a previously neutral stimulus and an unconditioned, unlearned, response. They then learned to respond to a substitute stimulus, the conditioned stimulus (CS).

The acquisition of the response includes several stages (see Cartwright, 2002). At the first stage, the animal is presented with the to-be-conditioned stimulus (for example, a bell ring) called a neutral stimulus (NS). This is a control procedure, used in order to ensure that this NS does not cause the unconditioned response (salivation) when it is presented alone. The dog will show some reflex responds, that is, an orienting response or barking, but should not salivate when hearing the bell.

At the second stage, the animal is presented with the unconditional stimulus (UCS), say, meat, and exhibits an unconditional response (UCR), in this case the salivation reflex. At the next stage the dog is presented with the NS (the bell) and the UCS (meat) simultaneously on a number of occasions. Each time the animal automatically produces the UCR (in this case the salivation response is meaning). After a number of parings of the NS and the UCS, at the final stage the animal is presented with the NS alone, without the food. If the dog now exhibits the salivation response when only the bell is presented it has been conditioned to associate the NS with the receipt of food. Hence the bell (previously the NS) has become a conditional (learned) stimulus (CS), and the salivation response (previously UCR) to this is now called the conditional (learned) response.

In this example the “positive” UCS (the food) was used but it is easy to form the association between the NS and any “negative” stimulus. For example, the dog will withdraw a paw (or simply lift it) hearing the bell (or, say, seeing flashlight) because in previous series of trials the NS combined with a smooth electric shock. Thus the previously neutral stimulus (NS) such as the bell ring became the conditional (CS): the dog associated this stimulus with pain.

These laboratory experiments reflect many situations that could be observed in real life. It is known, for example, that babies quickly learn to associate white doctor's smock with injections and other unpleasant procedures.

Any stimulus can play a role of a CS, if only this stimulus itself did not provoke too strong reaction in an organism. A very hungry dog that is normally wince and afraid of electric shock is able to associate this with a food reward if an experimenter combined these two stimuli. The dog thus would be paradoxically wagging, licking and salivate when feeling light penalty. What happens with the hungry dog if we intensify strength of the electric shock and finally combine the painful insult with giving the food? In Pavlov’s experiments this resulted in development of experimental neurosis which, in turn, led to either stomach ulcer or cancer in experimental dogs.

Pavlov’s research also revealed that the main factors that influence the strength of a CR are the intensity of the UCS and the order and timing of the NS and UCS. The stronger the UCS, the stronger the CR, and vice versa. For example, if the bell (the CS) was rung quietly the dog would not produce as much salivation as if the bell was rung loudly.

The order and timing of the NS and UCS are important, that is, the temporal contiguity of presentation of the US and UCS. It turned out that the most effective order and timing for the learning procedure were when the NS was presented half a second before the UCS and remained until the UCS appeared. This format of presentation of the NS and UCS is known as forward conditioning.

Using combination of conditional stimuli, it is possible to develop different variants of associations between stimuli - just an idea that lead Pavlov to hypothesis that complex behaviour of animals and humans governed by conditional reflexes.

Combination of conditioning stimuli forms the so-called Stimulus-Stimulus Learning (S-S Learning). By this protocol, a sequence of stimuli precedes UC. For example, a tone might be followed by a light, which is followed by food. Although the initial element of the sequence is rather distant from the UC, it nevertheless elicits a CR. In this case the CR may be weaker than the response observed in the presence of the element that is closer to the US. For example, rats were trained with the sequence light-tone-food (Holland, Ross, 1981). The normal conditioned response to a light that signals food is rearing or magazine approach; for a tone it is either head jerking or magazine approach. After a number of sessions of serial conditioning it was found that during the light there was little magazine activity but there was a considerable amount of head jerking. The authors suggested that during serial conditioning the presence of the first element causes animals to anticipate the second one and to respond as if it were actually present.

Another set of examples demonstrating that associations can develop between two stimuli come from sensory preconditioning. In an experiment by Brogden (1939) a dog received the sequence of a tone followed by a light in a number of trails. The dog did not display any visible sign of learning. The tone was then paired with electric shock and, finally, the light was found to elicit a substantial fear CR when it was presented for testing. This phenomenon was later studied by many physiologists (Seidel, 1959, Rizley and Rescorla, 1972). Parks (1968) found that in those cases when two neutral stimuli were paired too frequently the first stage of the experiments, subjects hardly acquired S-S associations. He suggested that an orienting reaction serves as a mediator during sensory preconditioning. When two stimuli are paired frequently, this leads to extinction of attention. We later will return to a role of attention in learning processes.

The most known method that shows evidence of associations being formed between two stimuli is known as second-order and higher-order conditioning. In laboratory the second-order conditioning was observed in cases of pairing of conditional stimulus with the second - still unconditional- stimulus. For example, the first CS (bell) is paired, in close temporal contiguity, with light, but without the original UCS (food) being presented at the same time. After several combinations of the first-order and second-order stimuli, the dog salivates when the second-order stimulus is presented alone; therefore these are now a second-order CS (light) and CR (salivation).

Pavlov (1927) was the first to report this effect and later many results were obtained and more complex displays of conditioning were studied such as paring a second-order CS (CS2) with one that has already been paired with a US (CS 1). It turned out that stimulus similarities (such as lights of different colours but not the light and the tone which are from different modalities) are important in determining the outcome of these forms of conditioning (see detailed analysis in: Pearce, 2000).

Conditioning, as a form of associative learning, penetrates behavioural displays in huge variety of species, from planaria to primates.

Although methods applied for conditioning were very clear, the obtained results were not as simple as it was desired. Many researchers noted that in their experiments conditional reactions were accompanied by “superfluous” behavioural acts, that is, a dog may look at a source of sounds, turn it’s head, bark and whine, chew, lick itself, blink etc. This raises a problem of separation of acts, which are necessary to consider.

It could be difficult to sort out behavioural acts when an animal reacts to some stimuli by hidden behavioural patterns. For example, pigeons to be paired, a male and a female were housed in neighbouring chambers divided by a falling door. Once per day light is switched and the door is lifted up to allow the male starting courtship. After some trials, males start displaying courtship addresses this to a light signal and behave so as if the light signal is a real female (McFarland, 1985).

Some characteristics of conditioning are hidden from experimenters. Such are conditional reactions for internal-receptive stimuli as well as reactions for timing. The organism is able to associate UCS with its own internal changes, say, in blood pressure or in a level of glucose in blood.

Indeed, students of animal and human behaviour should be familiar with specific characteristics and rules of classical conditioning as well as difficulties in interpreting obtained results. This is a rather arduous task to avoid all possible associations that could appear during any laboratory experiments with animals, such as associations with the day time when an experimenter comes, with details of equipment and so on. But these possibilities should be, at least, taken into account because the hidden conditional connections could impact on other results.

Classical conditioning cannot explain many situations when associations are formed contrary to theory’s expectations. To explain them, other forms of learning should be considered.

 

6.2.2. Trial-and-error learning and instrumental conditioning

 

Pavlov's dogs were restrained and the response being conditioned (salivation) was innate. But the principles of conditioning can also be used to train animals to perform tasks that are not innate. This procedure was suggested by the American physiologist Thorndike (1911). As it was described in 2.1., in Thorndike’s experiments an animal was placed in a setting where it was able to move about and to be engaged in different activities. Thorndike did not place a hungry animal in a stall; instead, he allowed animals to make trials and errors in order to get out a puzzle box. The method employed by Thorndike to study animal intelligence is now referred to either as instrumental or as operant conditioning. In perfunctory descriptions it is also called trial-and-error learning because the animal is free to try various responses before finding the one that is rewarded. We will see from the further analysis that instrumental and operant conditioning differ in some details and that operant conditioning is not truly based on “trial-and-error” learning.

In real world animals and humans quickly remember sequences of movements or actions that once have led them to a pleasant event even though these actions are senseless themselves. The environment “selects” successful actions and thus forms some behavioural traits. For example, in the book of Bloom et al. (1985) one can find a photo of a large dog “Joy” who is hardly keeping his balance on a fire hydrant. Nobody knows why Joy does so with every fire hydrant he meets. He springs on each hydrant met and keeps his balance on it for some time. This reminds some irrational rituals in human routine when, say, a schoolboy thrice knocked on a cover of a table, or a manager only makes the important decision after by a yellow bus will pass a window. Having considered characteristics of instrumental conditions we can explain some behaviours observed in experimental and natural situations.

Thorndike was the first psychologist who proposed that animals learn basing on the outcome of their actions. He formulated his theory of instrumental conditioning at about the same time that Pavlov was conducting his experiments on classical conditioning. Being placed into Thorndike‘s puzzle boxes, animals spent progressively less time to escape. From these observations, Thorndike claimed that the animal gradually formed a connection between a situation and a response that led to freedom. Certain stimuli and responses become connected or dissociated from each other. To Thorndike, the most prevalent questions within learning theory were:

1. What happens when the same situation or stimulus acts repeatedly upon an organism - does the mere frequency of an experience cause useful modifications?

2. What happens when the same connection occurs repeatedly in a mind?

3. What effect do rewards and punishments have on connections, and how do they exert this effect?

Trying to answer these questions, Thorndike formulated the principles of Instrumental Conditioning Theory. This theory represents the original “S-R” framework of behaviourism in that it states that learning involves forming connections between stimuli and responses. According to Thorndike, these are neuronal connections within the brain and learning is the process of “stamping in” and “stamping out” of these stimulus-response connections. Behaviour is due to the association of stimuli with responses that is generated through those connections.

Thorndike’s theory states that the following three main conditions are necessary for learning to occur: the law of effect, the law of recency and the law of exercise.

The law of effect states that what happens as an effect or consequence or accompaniment or close sequel to a situation-response, works back upon the connection to strengthen or weaken it. Thus, if an event was followed by a reinforcing stimulus, then the connection was strengthened. If, however, an event was followed by a punishing stimulus, then the connection was weakened.

The law of recency states that the most recent response is likely to govern the recurrence of the response.

The law of exercise states that all things being equal, the more often a situation connects with or evokes or leads to or is followed by a certain response, the stronger becomes the tendency for it to do so in the future.

It contains two portions: law of use (the strength of a connection increases when the connection is used) and law of disuse (the strength of a connection diminishes when the connection is not used). Thus connections become strengthened with practice.

Considering the concrete example with an animal, for example, a cat, escaping from the puzzle box in terms of S-R associations, the corresponding laws of learning can be seen in these studies. After many trials and errors the cat learns to associate, say, pulling the loop (S) with escape from the box (R). This S-R connection is established because it results in a positive consequence. The connection was established because the S-R pairing occurred on many occasions (the law of exercise) and resulted in a positive consequence (law of effect).

Thorndike went very close to formulating Hebb’s law (see Part I) when he discovered the law of effect: the probability that a stimulus will cause a given response is proportional to the satisfaction that the response has produced in the past.

Through many series of experiments Thorndike came to conclusion that there is no sufficient reasons for ascribing any power over and above that of repetition and reward to any “higher powers” or “forms of thought” or “transcendent systems '' and that all learning involves the formation of connections and these connections are strengthened according the low of effect. Thus, intelligence is attributed to the ability to form connections, and as humans are the most evolved animals they form more connections than others and learning processes could be investigated only basing on observable behaviour.

This statement makes similar instrumental conditioning theory and classical conditioning theory. However, the key difference between these theories is that classical conditioning states that learning involves associations between unconditioned reflex behaviours. Instrumental and operant conditioning state that learning involves associations between the performance of specific behaviour and the consequences of these actions.

 

6. 2. 3. Operant Conditioning

 

Operant conditioning is a form of learning in which voluntary behaviour becomes more or less likely to be repeated, depending on its consequents. It is also known as Skinnerian conditioning. As it was described in 2.2, in the 1930s Skinner developed and modified Thorndike’s instrumental conditioning theory and formulated operant conditioning theory (Skinner, 1938). Skinner’s name is connected with Skinner box in which animals should press a lever in order to obtain a reward. Nevertheless, as we have already seen in Chapter 2, it is not necessary to be packaged in the box for being involved in operant conditioning. Key terms of Skinner’s theory are reinforcement and schedule of reinforcement. Rewarding even the slightest movement to the desired way and not encouraging, or even punishing other actions, in other words, doing selection of animal’s actions, a skilled experimenter can train animals to do many things, for example, train pigeons to play ping-pong or play a toy piano. These behaviours are of no sense themselves for the animals but they lead it to successes in laboratory experiments, that is, food rewards. This way of training is based on operant conditioning. Let us consider this approach to study animal learning in more details.

The law of reinforcement: an association between behaviour and a consequence. Thorndike’s law of effect was the conceptual starting point for Skinner's work on operant conditioning. Skinner changed the name of instrumental conditioning to operant conditioning because it is more descriptive (i.e., in this learning, one is " operating" on, and is influenced by, the environment). But the main idea is the same: operant (instrumental) behaviour is spontaneous as this is not a reaction of organism to a certain stimulus. In fact, there is no CS at a starting point of these experiments. With time, some aspects of the situation may start to act as conditional stimuli that let the subject know that its actions are relevant and reinforcement is coming. Skinner also renamed a positive consequence “reinforce” and a negative consequence “punishment”. The apparatus invented by Skinner, which is known as Skinner box (or “operant chamber”), with a cumulative recording device, is actively used nowadays - being upgraded with computers - for studying learning process in animals and humans.

In order to make this experimental technique more clear, let us focus once more on the differences between classical, operant and instrumental conditioning.

By a protocol of the technique of classical conditioning the animal is relatively passive being optionally locked in a stall. It is interesting to note that the Skinner box often used for studying classical conditioning. In this case an operant chamber is transformed into a conditioning chamber. For example, a hungry pigeon is placed in a conditioning chamber (Fig. II-1). At interval of about 1 minute a response key is illuminated for about 5 seconds, and the offset of this stimulus is followed by the delivery of food to a hopper. At first subjects may be unresponsive to the key, but after a few trials they will peck it rapidly whenever it is illuminated. This is not an instrumental conditioning, as the pigeon does not have to peck the key to obtain food. Instead, it is an example of Pavlovian conditioning as the mere pairing of the illuminated key with food is sufficient to engender a CR of key-pecking. It is important that behaviour is not governed by its consequences in this case.

If we compare instrumental and operant conditioning, in both, behaviour is affected by its consequences, but in operant conditioning the basic process is not “trial-and-error” learning. Instead, operant conditioning forms the association between the behaviour and the consequence. As it was already noted in Part I, it is also called response-stimulus or RS conditioning because it forms an association between the animal's response (behaviour) and the stimulus that follows (consequence).

Unlike in classical conditioning, where a new response is formed basing on the association with the previously neutral stimulus (UCS), in operant conditioning there is no creation of a new response to a neutral stimulus. Instead, there is an increase or decrease in a response that is already being exhibited. Operant conditioning is based on the law of reinforcement, which states that the probability of a given response being emitted is increased if the consequences of performing it are pleasant.

Manifestations of the law of reinforcement clearly support the Thorndike’s law of effect. Let us consider the simplest case of operant conditioning. When a rat is placed in a Skinner box initially, the animal begins to explore the new space, wandering around and sniffing at all things. During this exploration the rat may accidentally press the lever (operant). This accidental occurrence enables the researcher to manipulate the consequences of the rat’s accidental lever-pressing behaviour by making the consequences of this either pleasant or unpleasant. If the researcher positively reinforces the lever-pressing behaviour (positive consequences) then it results in an increase in lever pressing by the rat. In a case of using punishment (an unpleasant consequence) the lever pressing behaviour decreases and then stops.

As this has been noted above, unlike Thorndike, who used “trials” in instrumental conditioning, Skinner’s operant conditioning procedure did not use trials. Instead, the researcher has to wait for manifestation of reactions, which were already presented in animal’s behavioural repertoire.

Both instrumental and operant conditioning essentially differ from classical (Pavlovian) conditioning.

Where classical conditioning illustrates S--> R learning, operant conditioning is often viewed as R--> S learning since it is the consequence that follows the response that influences whether the response is likely or unlikely to occur again. It is through operant conditioning that voluntary responses are learned.

The 3-term model of operant conditioning (S--> R --> S) incorporates the concept that responses cannot occur without an environmental event (e.g., an antecedent stimulus) preceding it. While the antecedent stimulus in operant conditioning does not elicit or cause the response, as it does in classical conditioning, it can influence it. When the antecedent does influence the likelihood of a response occurring, it is technically called a discriminative stimulus.

Reinforcement and punishment. Reinforcement is the key element in Skinner's S-R theory. A reinforcement is anything that strengthens the desired response. Reinforces may be positive or negative. A positive reinforcement reinforces when it is presented; a negative reinforcement reinforces when it is withdrawn. Anything that increases a behavioural pattern, that is, makes it occur more frequently, makes it stronger, or makes it more likely to occur, is reinforcement. Anything that decreases the behavioural pattern - makes it occur less frequently, makes it weaker, or makes it less likely to occur - is a punisher. For example, in the Skinner box a rat may have to learn to press the lever to either gain food (positive reinforcement), or an electric shock (negative reinforcement), stop food being taken away (negative punishment) or prevent the receipt of an electric shock (positive punishment).

Elements of casuistry may be found in this terminology. Thus, positive reinforces are something like rewards, however, the definition of a positive reinforcement is more precise than that of reward. Specifically, we can say that positive reinforcement has occurred when three conditions have been met:

(1) A consequence is presented dependent on behaviour.

(2) The behaviour becomes more likely to occur.

(3) The behaviour becomes more likely to occur because and only because the consequence is presented dependent on the behaviour.

Negative reinforcement is often confused with punishment but they are different, however. Reinforcements always strengthen behaviour; that is what " reinforced" means. Punishment is used to suppress behaviour. It consists of removing a positive reinforce or presenting a negative one. It often seems to operate by conditioning negative reinforcements. Negative reinforcement strengthens behaviour because a negative condition is stopped or avoided as a consequence of the behaviour. Punishment, on the other hand, weakens behaviour because a negative condition is introduced or experienced as a consequence of the behaviour.

Here is an example of how a negative reinforcement works. A rat is placed in a cage and immediately receives a mild electrical shock on its feet. The shock is a negative condition for the rat. The rat presses a bar and the shock stops. The rat receives another shock, presses the bar again, and again the shock stops. The rat's behaviour of pressing the bar is strengthened by the consequence of the stopping of the shock.

There are four major techniques in operant conditioning. They result from combining the two major purposes of operant conditioning (increasing or decreasing the probability that a specific behaviour will occur in the future), the types of stimuli used (positive/pleasant or negative /aversive), and the action taken (adding or removing the stimulus).

Schedules of consequences. One of the most important Skinner’s assets into theory of learning is his concept of schedules of consequences.

According to Pavlov’s theory, effective conditioning is possible when time gap between the NS was presented half a second before the UCS and remained until the UCS appeared. This format of presentation of the NS and UCS is known as forward conditioning. Skinner showed that there are two main factors influencing the strength of conditional reflex (CR):

1. Ratio (proportion) of reinforcement;

2. The time interval (delay) between response and reinforcement.

There are two basic categories in this format: continuous reinforcement and intermittent reinforcement.

Continuous reinforcement simply means that the behaviour is followed by a consequence each time it occurs. Intermittent schedules are based either on the passage of time (interval schedules) or the number of correct responses emitted (ratio schedules). The consequence can be delivered based on the same amount of passage of time or the same number of correct responses (fixed) or it could be based on a slightly different amount of time or number of correct responses that vary around a particular number (variable). This results in four classes of intermittent schedules. Experiments revealed that each specific type of schedule results in different effects of conditioning.

1. Fixed interval: the first correct response after a definite amount of time has passed is reinforced (i.e. a consequence is delivered). For example, an animal is given a food pellet two minutes after it presses the lever throughout the trial. In the context of positive reinforcement, this schedule produces a scalloping effect during learning (a dramatic drop of responding immediately after reinforcement). The animal often only makes responses towards the last few seconds of the interval. This schedule is not that effective; its effectiveness is probably because the animal is able to learn the amount of time that will elapse before the response is reinforces.

2. Variable interval: after the first reinforcement, a new time period (shorter or longer) is set with the average equalling a specific number over the total sum of trials. For example, the animal is given the food after six minutes, then three minutes, and then seven minutes after it has pressed the lever, resulting in an average time interval of five minutes between receptions of reinforcement. This schedule is effective, probably because it is not possible for the animal to learn the precise amount of time that will elapse before the response is reinforced, so it should carry on emitting the lever-pressing response.

3. Fixed ratio: Reinforcement is given after a specified number of correct responses. This schedule is best for learning a new behaviour. Behaviour is relatively stable between reinforcements, with a slight delay after reinforcement is given.

4. Variable ratio: Reinforcement is given after a set number of correct responses. After reinforcement the number of correct responses necessary for the reinforcement changes. This schedule is best for maintaining behaviour.

In summary, the number of responses per time period increases as the schedule of reinforcement is changed from fixed interval to variable interval and from fixed ratio to variable ratio.

Conditional reinforcement and clicker-training. Skinner referred to two types of reinforcement: (1) primary reinforcement; (2 ) conditional (or secondary) reinforcement. A primary reinforcement is one that is biologically pre-established, such as food, water or sex. A conditional (or secondary) reinforcement is a previously neutral stimulus that, if paired with a primary reinforce, acquires the same reinforcement properties that are associated with the primary reinforce. For instance, for a hungry rat placed in the Skinner box, the sound of the food dispenser may serve as the secondary reinforce.

For practical animal trainers it is important to know that the secondary reinforce may serve as a “bridge” between the desirable behaviour and the receiving of a reward. For example, you use special words (such as “a good dog”) or a special clicker (a small hand-held device that emits a clicking sound) to train dogs. The latter have given a name for the new technique of training: clicker training. Let us clarify this with several examples.

If you want to train, say, a bear to cycle, you use a primary reinforce, i.e. food. Each time the bear exhibits the desired behaviour, for example, sitting on a bike, it receives a bit of food. But this is very important to provide a pupil with a reward at the same moment with the performance of desirable reaction. Thus, if you are a little bit late with the reinforcement and the bear receives the food at the moment it sets a paw to dismount from the bike, it perhaps “concludes” that you reward leaving the bike. You have missed the moment. This is not an easy task for a trainer to hit the bear’s mouth with the food pellet just at the necessary moment. If you use the secondary reinforcement such as the clicker, it is not necessary to hit the animal’s mouth. What you need is to “negotiate” with the bear during previous trials that it will be delivered with the food when it hears the clicking noise. In this case the sound of the clicker becomes equivalent to the receipt of the primary reinforcement, the food pellet. Therefore the sound of the clicker now serves as a reward for the desirable behavioural model. It is much easier to click at the necessary moment than to aim at the bear by the bit of the food.

Using such a simple method as the secondary reinforce, practical animal trainers can solve many problems. They are now able to “stop the moment” when shaping the desirable patterns, for example, to reward the top of the high jump in dolphins by whistle and thus train them to jump higher and higher. In general, fixed association between the secondary and the primary reinforcements, allows you to inform your subject about what behaviour you are interested in.

In reality this way of agreement has been spontaneously used for ages in many domains. Even money may be considered as the secondary reinforcement for human beings as indication for things that could be purchased. This analogy might be nearly straight especially if we forget about economy and concentrate on psychology remembering experiments performed on primates and based on token reinforces. These are typically small plastic discs that are earned by performing some response, and once earned they can be exchanged for food, drink and even a possibility to escape unpleasant events such as a collision with a rat. The fact is that the secondary reinforcement works as a conventional sign. In Wolf’s early experiments (1936) with chimpanzees he elaborated something like “token language”: plastic discs of different colour had different values

(such as one or two banana) and gave access to different sorts of material welfare (food, drink) as well as spiritual need (possibility to play with a trainer or to escape the collision with the rat).

Returning to clicker reinforcement, it is worth to note that trainers consider the clicker as more than simply a conditioned reinforce, a substitute for food, or “an event marker”. It is also a bridging stimulus, meaning " Food is coming, but later" (Pryor, 1975).

The elegant technique that is now called “clicker training” is an application of behaviour analysis that was initially invented and developed in 60-th, by Keller Breland, Marian Breland Bailey and Bob Bailey on the basis of Skinner’s theory (see Chapter 20). Karen Pryor has further developed this method and in 1985 published the book on the new concept of training. " Don't Shoot the Dog: The New Art of Teaching and

Training" is concerned with altering animal behaviour without being coercive.

The bridge between operant conditioning and thinking. In fact, it was Skinner who immured the first stone in this bridge. As he customarily made pigeons perform different tasks in order to receive food, he wondered what would happen to these same pigeons if the receipt of this food suddenly became arbitrary, i.e. having no relationship with tasks the birds might perform. The results of his study were surprising, and Skinner published them in a famous short article entitled " Superstition in the Pigeon" (Skinner, 1948b). When the receipt of food became arbitrary, the birds began to do strange things like putting their heads in the corner of the cage, or tucking their heads under their wings. Since these animals had already gone through many behaviouristic experiments, it would perhaps not be too far-fetched to imagine that they were trying to come up with some kind of behaviour that would give food. That is, they were throwing out " hypotheses" about how to influence food acquisition.

Many practical trainers use animals’ capacity to produce " hypotheses". For example, a mare had learned that clicks mean carrots; and that she could make clicks happen. And she had also become aware that the operant had something to do with ears, but what? In order to make clicks happen, she starts to make different “proposals” for her trainer, different variants of ear position and movements until one of them will be rewarded.

The next example explains that such behaviour stands beyond trials and errors as animals have to derive an algorithm of a task in order to get a reward. When dolphins jump high out of the water they receive herring for their behaviour and this is somewhat like an agreement between the animals and their trainer about awarding the high jump. Once the dolphin jumps high but nothing happens. It jumps again and again but it does not make clicks happen. In a chafe the dolphin taps on the water with its tale – and immediately hears the click. Naturally, it repeats the awarded behaviour but, again, nothing happens. After some abortive attempts, the animal drifts to another pattern, say, stands in all its magnitude above water level. This brings a success but again only once. When the common behavioural repertoire is exhausted, the dolphin starts to contrive new and new actions and elaborate novel “creative” patterns. It has learnt a rule: in order to hear the click do only things that have been never done before (Pryor, 1969).

Such a capability to “derive a law” perhaps exceeds the bounds of associative learning and requires cognitive skills. Similar experiments on many species have revealed a great deal of ingenuity. For example, in Skinner’s experiments pigeons exhausted the usual repertoire of actions; desperate to get rewarded they went to such unusual behaviour as dancing on their wings expanding and stretching them under their legs.

How to train all creatures effectively. Basing on principles of operant conditioning, Skinner and his followers achieved great success in training great variety of species including humans. Even invertebrates were included. A crayfish learned to pull out a rope and thus ring a bell; a bivalve clapped its shuck at the command of the trainer.

All these results were obtained by the method of secondary reinforcement. In order to train trainers themselves, a special “training game” has been elaborated that allows trainers to make mistakes, and learn from them, without confusing some poor animal or unsuspecting person. In a group, one person is selected as the Animal, and goes out of hearing range. The others choose a Trainer and a behavioural pattern to be shaped. The behaviour must be something easy to do physically, which everyone can see, such as turning in a circle, pouring or drinking water, turning on a light switch, picking up an object, opening or closing a door or window, or marking on a blackboard. The Trainer will use a clicker, handclasp, or other noise as a secondary reinforce. Each time the Animal hears the sound he or she must return to the trainer and get an imaginary treat.

One of the funniest stories came from Skinner himself. He once attended a lecture of a famous psychologist who denied Skinner’s “inhuman” methods but did not know him personally. Skinner sat in the front line and listened to a lecture very passionately thus forcing a lecture to concentrate on the so keened “student”. He then began pose as bored when the lecturer spoke about love but came alive and nodded his approval for every warlike or annoying gesture, even in the slightest degree. By Skinner’s words, that lecturer brandished with his fists like Hitler up to the end of his presentation.

Pryor successfully applied her method for shaping behaviour of all members of her family and strongly recommend using clicker training in day-to-day activity. These seems fantastic but armed with this method, you can force a teenager to clean a room, an infant to stop yell or a grandmother to stop grumbling. Parents are learning to shape appropriate behaviour instead of accidentally reinforcing inappropriate behaviour: to reinforce silence not noise, play, not tantrums.

It is really possible to achieve consensus both with humans and animals as if you were in speechless dialog with them. I myself examined this method in an “extreme” (and funny) situation when I needed promptly return a tit to its cage in an empty flat of my friends who asked me to feed the bird before leaving. I was there with my friend who dealt with animal morphology, not behaviour, and thus considered that situation a causa mortis. She nevertheless agreed to help me and to fulfil my instructions. Our relationship with the tit was divided into several stages. At first, we prevent the bird from sitting anywhere with the exception of that zone of the flat where the cage was located. If the bird intended to sit within other space, it was immediately scared away. When it learnt that this is the only place to sit, visits to that zone became more frequent. At the second stage the bird was permitted to sit only in the vicinity of the cage, then on the cage only and finally on the door of the cage. After that it became easy to have the bird in the cage. This took not more than half an hour and Skinner’s method of shaping behaviour triumphed over others that day.

There are some principles of effective training based on Skinner’s theory as well as experience gained from practice of clicker training. Some of them were mentioned before, namely use reinforcement rather than punishment, interval schedules and optimal timing for giving a sign of approval. Let us them and some others as protocols for practice.

Optimal timing. If something is wrong with the learning process, the trainer should first consider the timing of reinforcement. As it was already noted, the secondary reinforcement is the message that informs a subject about what is desirable in its behaviour. If this message is late, even for a moment, the subject begins to fix this with another kind of behaviour. Adding once more to examples considered before, if you are training a dog to sit but compliment the animal as early as it is standing up, your dog “concluded” that the owner thanks it for standing up. If this situation repeats at least twice, this brings learning to a standstill. Pryor (1985) considers encouragement students for unfinished tasks and, in general, giving compliments and gifts for incomplete behaviour do not reinforce appropriate behaviour. We seem to encourage attempts to do something good but in reality even if something is being shaped this is rather instant behaviours and, most likely, this is the behaviour aimed on begging for reinforcement.

Nature of reinforcement. As it has been dissembled in details above, the reinforcement can be both positive and negative. A sharp loud word of an interdiction for many animals (and children) frequently is a primary (unconditional) negative stimulus. But if a subject does not react to such influence, another negative reinforcement should be applied to which the subject is sensitive. For example, Lorenz (1952) in his book “King Solomon’s Ring” recommended shooting at a dog with pebbles in order to rid it of the unpleasant habit to flee from the owner.

Reinforcement rate. When dealing with animals, it is better to use relatively small food items in order to keep their interest continuously. For example, bears have learnt many tricks for raisins. Nevertheless, like humans, animals often disclaim their obligations if recompense is small. Thus, killer whales in aquatic show denied small herrings for high jumping. Only large fishes used by trainers restored their motivation for jumping. Pryor (1985) also used a concept of “snatching a large sum” designating the largeness of award obtained for “insight” and in some situation even for nothing. This pedagogical trick sometimes helps to sustain competitive spirit in animals and humans and allows achieving astonishing results in learning.

Interval schedules. One of usual mistakes is made by those trainers who once began shaping some behavioural pattern intend to reinforce this behaviour with monotonous frequency during all life of a trainee subject. Indeed, there is no need to reinforce busily learnt behaviour in order to sustain reliability. Quite the contrary, it is necessary to cut off giving tips and start giving them fortuitously.

For example, if a dolphin receives a fish for every jump it possibly will jump more and more jauntily, just for throwing off. But if the dolphin taught to jump for a fish receives its award for the first jump, then for the third, and then at random, it most likely will jump more high, trying to catch fortune. In turn, this allows supporting highest jumps selectively and thus improving desirable behaviour by the use of variable interval of reinforcement. If stop awarding completely, then a tendency of extinction will become apparent but it may be sufficient to give a tip from time to time and prolonged interval may just strengthen behaviour.

Pryor assumes that efficacy of variable schedule underlies all gambling games and she even considers deep affections in human beings amongst these phenomena. It is enough that a rough and egoistic person occasionally presents his or her partner by “good behaviour” so as to force the partner to gape for returning these marvellous moments, the more passionately the more rarely they happen.

In addition to variable schedule, a specified regime of reinforcement sometimes gives good results. A subject knows that it is needed to work for definite time or perform some patterns in order to get a reward. For example, when awarding a dolphin for each sixth jump, it is possible to obtain stable series of six jumps. There is a weak point in these conditions (a fish for the sixth jump, a salary every Friday): both animals and humans tend to do the minimal work, just to escape being fired.

With the help of variable and specified schedules it is possible to shape extremely long chains of behaviours. For example, a chicken is able to peck a button up to a hundred times in order to obtain one grain. In experiments with the use of token reinforces chimpanzees had to press a key 125 times in order to receive a single token, and when they collected 50 tokens they were allowed to push them all into a slot in order to receive food. The animals performed a sequence of more than 6000 responses (Kelleher, 1958). There is a joke among psychologists that school studies belong to the longest non-awarded regimes in human life.

 


Ïîäåëèòüñÿ ñ äðóçüÿìè:

mylektsii.su - Ìîè Ëåêöèè - 2015-2025 ãîä. (0.031 ñåê.)Âñå ìàòåðèàëû ïðåäñòàâëåííûå íà ñàéòå èñêëþ÷èòåëüíî ñ öåëüþ îçíàêîìëåíèÿ ÷èòàòåëÿìè è íå ïðåñëåäóþò êîììåð÷åñêèõ öåëåé èëè íàðóøåíèå àâòîðñêèõ ïðàâ Ïîæàëîâàòüñÿ íà ìàòåðèàë