|
Lynne Reder's book Implicit Memory and Metacognition may be purchased from Amazon.Com |
 |
Parallel Models of Serial Behaviour: Lashley Revisited
George Houghton & Tom Hartley
Dept. of Psychology
University College London
London, WC1 6BT
U.K.
g.houghton@psychol.ucl.ac.uk
Copyright (c) George Houghton & Tom Hartley 1996
PSYCHE, 2(25), February 1995
http://psyche.cs.monash.edu.au/v2/psyche-2-25-houghton.html
KEYWORDS: serial order, response competition, neural networks, competitive
queuing, response chaining, spelling, verbal short-term memory, syntax.
ABSTRACT: In 1951, Lashley highlighted the importance of serial order for
the brain and behavioural sciences. He considered the response chaining
account untenable and proposed an alternative employing parallel response
activation and "schemata for action". Subsequently, much has been
learned about sequential behaviour, particularly in the linguistic domain.
We argue that these developments support Lashley's picture, and recent computational
models compatible with it are described. The models are developed in a series
of steps, beginning with the basic problem of parallel response competition
and its possible resolution into serial action. At each stage, important
limitations of the previous models are identified and simple additions proposed
to overcome them, including the provision of learning mechanisms. Each type
of model is compared with relevant data, and the importance of error data
is emphasized. Taken together, the models constitute a unified approach
to serial order which has achieved considerable explanatory success across
disparate domains.
1. Introduction: The Problem of Serial Order
1.1 In a well-known article published in 1951, "The Problem of Serial
Order in Behavior", Karl Lashley proposed that the problem of how behavioural
sequences are produced should be of central concern to the neuropsychologist
and physiologist. He pointed out that sequential organization is central
to much of animal and human behaviour, from locomotion, through reaching
and grasping to language and the control of logical reasoning. This organization
could not be attributed to moment by moment responding to a serially ordered
environment, but rather depends upon internal organizing principles by which
the animal controls its own behaviour. Surveying the then available ideas,
Lashley concluded that neither the neurosciences nor psychology had much
insight to offer into the problem. The "only strictly physiological
theory" to have been explicitly formulated was associative chaining
theory, in which it is postulated that each element in a series of actions
provides the excitation of the next (various examples of such theories,
which remain current, are discussed below). From consideration of a variety
of qualitative data, particularly regarding speech errors, Lashley came
to the conclusion that such theories are untenable. He postulated instead
that the production of serial behaviour involves the parallel activation
of a set of actions, which together comprise some "chunk", so
that responses are internally activated before being externally generated.
This activation, in itself, does not contain the serial ordering of the
actions. Superimposed on this activation is some kind of independent ordering
system, a "schema for action", which selects which response, of
those activated, to produce at which time. Unfortunately, Lashley was able
to progress no further, writing that,
[I]ndications ... that elements of the [sequence] are ... partially
activated before the order is imposed upon them in expression suggest that
some scanning mechanism must be at play in regulating their temporal sequence.
The real problem, however, is the nature of the selective mechanism by which
the particular acts are picked out in this scanning process and to this
problem I have no answer.
1.2 In the years following Lashley's article, the "cognitive revolution"
got underway, born of the union of theoretical linguistics (Chomsky, 1957)
and the computer metaphor for the mind developed in artificial intelligence.
In these subjects, the problem of serial order is solved technologically.
Theoretical linguistics, which only concerns itself with the internal representation
of serial order ("competence") and not with its execution ("performance"),
avails itself of such formal objects as ordered sets, strings etc. as primitives,
from which to build descriptions of grammars and other abstract objects.
In artificial intelligence and computer science, analogous objects plus
recursive serial processing are provided by computer programming languages.
In such a context, serial order per se will not appear to be any kind of
problem at all. Thus, although these devices have never been defended or
tested on empirical grounds, their availability and computational power
suffice to obscure the fact that cognitive science has no (neuro-) psychologically
grounded theory of serial order. In neurophysiology and psychology, the
problem has been largely ignored or workers have fallen back on the very
position that Lashley attacked, associative chaining theory. More recently
however, there are signs that the problem of serial order may once again
be being taken seriously in its own right. In cognitive science, the rise
in the importance of neuropsychological and physiological data, coupled
with the widespread use of neural network models, has led to the rethinking
of many basic issues, and to the questioning of the uncritical use of classical
symbol processing architectures. Any abandonment of the symbolic primitives
mentioned above immediately throws the serial order problem to the fore
(Houghton, 1990; Shallice, Glasspool & Houghton, in press). Additional
impetus for the development of more biologically plausible models of serial
order is provided by the increased interest in the problem in the neurosciences
(Aldridge, Berridge, Herman, & Zimmer, 1993; Berridge & Wishaw,
1992; Colombo, Eickhoff & Gross, 1993; Kermadi, Jurquet, Arzi &
Joseph, 1993; Paulesu, Frith & Frackowiak, 1993), and in animal learning
studies (Fountain, Henne & Hulse, 1984; Terrace, 1991).
1.3 The ubiquity of serial order in behaviour, from something as simple
as an eyeblink to the performance of a piano sonata, leads us to raise the
issue of what range of behaviours a given model of serial behaviour should
be expected to apply to, as it is quite possible that numerous different
solutions to the serial order problem have evolved. All animals exhibit
some degree of endogenous temporal structure in their behaviour (e.g., the
different gait patterns in quadripedal, walking, trotting and galloping).
In some cases this sequential organisation can be quite complex e.g., in
grooming sequences (Berridge & Wishaw, 1992), or bird song (Konishi,
1985), and exhibit a significant degree of learning (Marler, 1991). Is it
reasonable to search for a global set of principles applicable to all such
cases, or are local, task-specific solutions more likely? We believe that
at present levels of understanding, it would be premature to attempt to
answer this question decisively. We suggest that theoretical investigations
should therefore be concentrated on specific, well-studied, classes of behaviours
in particular animals (though potential insights from related studies should
not be ignored). Empirically successful theories developed in this way can
then be compared for the presence or absence of common principles. Accordingly,
the models described in the current paper are all concerned with the voluntary
production of sequences of actions by human subjects, where the same actions
may occur in many different orders. Although at various points we discuss
action sequencing in general, the models are largely motivated by data from
studies of linguistic behaviour in speaking, spelling and typing. Even within
this limited domain, the emphasis is on analyses of recall error data (from
both normal and impaired subjects, and from short- and long-term memory),
including the distribution of errors as a function of such variables as
serial position and the familiarity of the target sequence. Error data from
such sources were central to Lashley's (1951) original argument for the
need for a new model of serial behaviour, and have continued to play a central
role in constraining theories (MacNeilage, 1964; Mackay, 1970, 1972; Dell,
1986; Henson, Norris, Page & Baddeley, in press).
1.4 In the current terminology of the memory literature, such data would
appear to mainly involve "explicit memory" tasks, in which remembered
information is consciously used to guide behaviour. Subjects in serial recall
tasks certainly know they are intended to reproduce the target list, and
effortful concentration of attention is required to correctly repeat lists
of more than a few items long. Speaking and writing in a particular language
require correct, explicit, retrieval of the phonological and orthographic
forms of words, and people know whether or not they possess such knowledge
(though we certainly would not claim that they know what form this knowledge
has or how it is they use it to guide their behaviour). Recently there has
been increasing interest in the possibility of "implicit" sequence
learning, whereby subjects show evidence in serial reaction time (SRT) tasks
of learning properties of repeated sequences that they cannot explicitly
recall or state (Nissen & Bullemer, 1987. See Curran, this volume, for
a neurophysiologically-oriented review of this literature). What kind of
knowledge this form of learning results in (or how it is put to use) is
not clear from available SRT data (Jackson, Jackson, Harrison, Henderson
& Kennard, in press). We therefore restrict our attention to models
of explicit serial recall, by which we mean those cases in which the subject
voluntarily uses what has been learned to attempt to produce a target sequence
of actions.
1.5 Below, we review a number of relatively recent computational models
of serial order which are compatible with Lashley's insights (and which
we refer to generally as "competitive queueing" models). Three
related classes of model are built up in a series of steps, beginning with
the basic problem of parallel response competition and its possible resolution
into serial action. At each stage, important limitations of the previous
models are identified and simple additions proposed to overcome them, including
the provision of learning mechanisms. Each type of model is compared with
relevant data, and the importance of constraints from error data is emphasized.
The final type of model discussed incorporates "schemata for action".
1.6 By way of contrast, we first consider the characteristics of the associative
chaining approach to serial order, which Lashley considered "doomed
to failure". Rosenbaum (1991; p. 80) states that "It is difficult
to introduce response-chaining theory without appearing to treat it as a
straw man." Our reason for discussing it here is that, in one guise
or another, it seems to be always with us (e.g., Lewandowsky & Murdoch,
1989). Indeed, recent neural network models of serial order appear to have
breathed new life into the "straw man", as they generally depend,
at least in part, on the formation of associations between successive actions
(Amit, Sagi, & Usher, 1990; Ans, Coiton, Gilhodes, & Velay, 1994;
Bairaktaris, 1992; Dehaene, Changeux, & Nadal, 1987; Jordan, 1986).
2. Serial Order and Associative Chaining
2.1 Some of the earliest psychological accounts of serial order postulated
that action sequences were represented as chains made up of unidirectional
S-R links. The appeal of this type of account is its simplicity; it requires
nothing more than a representation of the items themselves and the links
between them. Retrieval of a sequence is achieved by tracing a path through
the links. One of the central difficulties Lashley (1951) identified with
such a model is the handling of sequences containing repeated items. In
such sequences, a stimulus action is associated with more than one response,
and chaining models provide no mechanism for choosing between different
associative links. This is illustrated in Figure 1a, which shows the associative
links necessary to represent the sequence "E, V, E, R, Y". "E"
must be linked to both ``V'' and ``R'', so that it is not clear which item
follows the first instance of "E". This problem arises again when
the second instance of "E" is realized, and there is the potential
for endless looping through the first part of the sequence, without ever
reaching the final item. The problem is exacerbated when one considers a
single associative structure containing representations of a large number
of sequences, for example sequences of speech sounds making up the familiar
words in a mental lexicon. If the same elements are to be used to stand
for the /t/,/a/, and /k/ in tack, cat and act, then the chains will be linked
together to form a network in which the underlying serial structure of any
single word is obscured by links between them.
Figure 1
Simple forms of Associative Chaining (AC): (a) AC has problems with sequences
containing repeats. the attempt to represent every leads to uncertainty
as to which link from e should be followed; (b) Wickelgren's Context-Specific
Coding overcomes these difficulties, but only at the expense of using entirely
different tokens to represent instances of the same type.
2.2 To overcome this difficulty, Wickelgren (1969) suggested a form of
context-sensitive chaining. Elementary actions were represented by different
tokens depending on their immediate context. For example the sequence "E,
V, E, R, Y" would be represented as the set of tokens $Ev , eVe, vEr,
eRy, rY$, where $ is an end marker, and lower case letters represent local
context. Representations based on Wickelgren's idea have proved popular
with some neural network modellers (Rumelhart & McClelland, 1986; Seidenberg
& McClelland, 1989). Because the approach uses tokens (e.g., standing
for particular instances of each "E" in the example) rather than
types (standing for the category of actions designated "E"), a
chain involving a repeated action can be represented without linking one
stimulus to more than one response. In Figure 1b. it is clear that by following
the stimulus-response chain from left to right, the target sequence "E,
V, E, R, Y" can be generated. However, the token based form of representation
is immediately unappealing, because it fails to capture any relationship
between different instances of the same item in a sequence. In the above
example, the two "E"s are as different from one another as they
are from the other letters in the sequence. In fact there is no reason why
the action represented $Ev should resemble the action represented by vEr
in any way. The scheme thus allows for different orderings of the 'same'
actions within the same associative structure, but only by suggesting that
the same actions are in fact quite different.
2.3 Context-sensitive coding deals, in a similar fashion, with the problem
of the interference between sequences in the same associative structure.
In this case, associative chains representing the spoken words cat, tack,
and act do not interact. This is because allophonic variants of each speech
sound have quite different representations in Wickelgren's scheme. For example,
the /a/ in "cat" is represented by a token designated kat, whereas
the /a/ in "tack" is represented by completely different token,
tak. These units have been termed 'wickelphones' (Rumelhart & McClelland,
1986). The use of different tokens for variants of the same phoneme can
be used to provide a weak account of coarticulation simply by assuming that
each wickelphone is associated with a different articulatory realization.
However, Wickelgren's account fails to provide any explanation for the similarity
between the same phoneme occurring in different contexts (e.g., the /a/
sounds in cat and tack from the above example).
2.4 In addition to the unsatisfactory use of a token-based representation,
Wickelgren's solution to the problems chaining models face with the representation
of repeated items is incomplete. If an action is repeated with identical
local context (e.g., in /kankan/), the same wickelphone must be used twice,
creating the kind of looping chain shown in Figure 1a. In addition it fails
to solve the problem of representing multiple sequences in the same associative
structure. For example, all sequences beginning /ka.http://psyche.cs.monash.edu.au/ would begin with
the same wickelphone $ka .To generate the sequence "catalyst"
it would be necessary to choose between associative chains radiating from
the same starting point ( "cat", "camera", "cancel"
etc.).
2.5 Some recent neural network models of serial order, in particular Jordan
(1986) and related work, overcome some of these problems, while retaining
a dynamics dependent on chaining, in that the current output of the network
is cued by a learned relationship to some record of its previous responses
(or previous internal states, Elman, 1990). These models, made possible
by the development of learning algorithms for non-linear mappings, include
a static, sequence-specific "plan" input, which helps the networks
store different orders of the same items. The use of a history of past outputs,
rather than just the last one, as the cue to the next action helps to overcome
repetition problems. However, the models require many exposures to sequences
to learn them, precluding their use in modeling single trial learning and
short-term memory (see below). They also seem to us unlikely to be prone
to the kinds of serial order errors discussed below. In addition, the use
of chaining still leads to interference between different orders of the
same items, constraining learning capacity.
2.6 The attraction of associative chaining lies in its use of well-defined
associative learning rules and its avoidance of biologically implausible,
computer-based primitives such as "serial buffers" etc. Unfortunately,
chaining does not provide a satisfactory basis for the understanding of
many aspects of serial learning and recall, whether from long- or short-term
memory. There would appear therefore to be room for a theory of serial order
which possesses the attractive features of associative chaining (simple
learning rules, no intrinsically ordered buffers) while avoiding its limitations.
3. Model 1: Serial Order And Response Competition
"There are indications that, prior to the internal or overt
enunciation of the sentence, an aggregate of word units is partially activated
or readied" (Lashley, 1951, p. 119).
3.1 It is evident that most associative theories of serial order begin with
some prior model for forming an association between two items. The prior
model may use simple Hebbian S-R associations, vector convolution (Lewandowsky
& Murdoch, 1989), Hopfield learning (Amit, Sagi & Usher, 1990),
backpropagation (Jordan, 1986), or some other method. When faced with the
problem of extending the basic associative learning model to serial learning,
the extension that involves the least additional machinery is chaining.
This appears to make it the default option for theorists already wedded
to one or another basic model (possibly explaining the remarkable tenacity
of the idea). However, this may not be the most profitable theoretical strategy.
All such models basically treat the generation of serial behaviour as little
more than iteration over the process of associative recall. The representation
of a sequence in memory is thus treated in isolation from any other component
of response generation mechanisms. As an alternative approach, we might
start with the simpler (and possibly evolutionarily prior) problem of the
resolution of response competition in complex environments (i.e., response
scheduling under environmental stimulation rather than from memory).
3.2 To make this point clear, suppose an organism is capable of producing
two responses r1, and r2, (e.g., eating and drinking) and these responses
are called forth (either innately or due to learning) by stimulus configurations
s1 and s2 respectively. Suppose now that s1 and s2 occur simultaneously,
activating r1 and r2 in parallel. If the two responses are both valuable
but are not such that they can be generated simultaneously, say due to effector
limitations, then the organism is faced with the problem either of choosing
one response over the other or of *ordering the two responses so that one
occurs after the other*, i.e., generating serially ordered behaviour. Either
solution requires that one of the activated response tendencies be allowed
to control behaviour while the other is somehow "held in abeyance"
until the chosen response is completed. For the serial case, the previously
withheld response can only be released if the first, "dominant",
response is not repeated. This "all or nothing" aspect of animal
behaviour has been frequently noted. For instance Hinde (1970, p. 396, cited
in Neumann, 1987, p. 377) states: "Undoubtedly the commonest consequence
of the simultaneous action of factors for two or more types of behavior
is the suppression of all but one of them".
3.3 Response competition due to parallel perceptual processing is commonly
observed in experimental situations in which subjects must respond to a
target object in the presence of to-be-ignored distractors. The distractors
lead to increased error rates and delayed reaction times (Stroop, 1935;
Eriksen & Eriksen, 1974). That the effect is indeed due, in part, to
response competition is shown in a study by Eriksen, Coles, Morris and O'Hara
(1985) using a two-choice reaction time task (see also Coles, Gratton, Bashore,
Eriksen & Donchin, 1985). In this study, subjects had to respond to
a central letter in the presence of flanking distractor letters, e.g. the
"S" in H S H. The distractor letter (H) could appear in the target
position on other trials (e.g., S H S), and had a different associated response.
Eriksen et al. found that, even when subjects made a correct response to
the target letter, the incompatible distractor letter frequently gave rise
to its associated response to the point that significant electromyographic
activity was detectable in the muscles controlling the relevant effector
(the hand, in this case). Reaction times were significantly slower on those
trials in which such activity was detected compared to those in which it
was not. Subjects in these experiments were required to make only one response
on each trial, so the competing response was never released (except in error).
It would be revealing to change the design to permit sequential responding,
and to measure the degree of activation of upcoming responses.
3.4 Such a scenario is likely to face any organism capable of a degree of
parallel processing in its perceptual systems, while limited to largely
serial action due to its effector structure (Allport, 1987; Neumann, 1987;
Houghton & Tipper, 1994, 1995), and similar findings are reported for
predatory animals faced with more than one prey object (see Ingle, 1972,
for an example involving frogs). Thus even simple organisms may be equipped
with mechanisms for the selection and serial ordering (scheduling) of responses
activated in parallel. This raises the possibility that serial behaviour
may be generated from memory by internally activating a set of responses
in parallel in such a way that the general "response scheduling"
mechanism leads to them being produced serially.
3.5 To develop this idea more concretely, we need first to consider how
the response scheduling mechanism might operate. Intuitively, response tendencies
can exist with different "strengths", that is, the inclination
to perform a particular action can be more or less "pressing".
In the simplest case then, given two (or more) competing evoked tendencies,
say to take a drink from a glass of beer or take a drag on a cigarette,
the stronger will be performed first. The performing of the most pressing
action leads to a temporary lessening of its strength ("drive reduction"),
leading to the competing action becoming the strongest and hence being produced.
This is roughly the kind of mechanism envisaged by Shallice and colleagues
(Shallice, 1972; Norman & Shallice, 1986; Cooper, Shallice, & Farringdon,
1994) to be involved in the automatic production of routine actions, and
given the name "contention scheduling". In this theory, individual
response types are represented by response schemata which can be more or
less active due to a combination of perceptual inputs (triggering stimuli)
and internal motivational inputs (Shallice, 1972). Activated schemata compete
by lateral inhibition to become the most active, so that, "No more
than one action system may be strongly activated (i.e., become dominant)
at any given time." (Shallice, 1972, p. 387). The use of lateral inhibition
as the mechanism of conflict resolution essentially means that the initially
most active schema (strongest response tendency) will be the one which gains
control of effector systems. Note that Shallice's proposals were not developed
in the first instance as a theory of serial order but "as a solution
to the potential cybernetic problem that an organism has many goals which
it needs to achieve at any one time and has only a limited number of effector
units available." (Shallice, 1972).
3.6 In the original formulation of this model, Shallice did not discuss
how a schema, once dominant, might become de-activated. In this form the
model runs the risk of endlessly repeating its dominant response. In addition,
the model postulated direct connections from activated response schemata
to effector structures, apparently leading to the need for one schema to
completely suppress all others in the response competition (to prevent them
from sending interfering input to the effectors). As well as being liable
to perseverate, this tendency to obliterate competing responses could further
cause the model to "forget" other contextually relevant responses.
Thus the scenario envisaged above, whereby an animal resolves parallel response
competition by sequencing actions could not be easily realized by the architecture.
However, these limitations can be overcome by some fairly simple additions.
An example is shown in Figure 2.
Figure 2
A mechanism for the resolution of parallel response competition into serial
action ("contention scheduling"). Responses activated at L1 compete
for control of output at L2. Selected responses inhibit themselves.
3.7 In this model, envisaged as a neural network, one layer of nodes
(L1) corresponds to Shallice's response schemata. The activation of a response
is a continuous variable in some range, represented by the activation value
of a given node. Activating inputs (from whatever source) arrive at response
nodes in parallel, and more than one node may be simultaneously active.
Instead of proposing that conflict resolution must take place at L1, this
function is devolved to another layer of units (L2 - the "competitive
filter", Houghton, 1990). In the simplest case L1 nodes can activate
L2 nodes in a one-to-one fashion. The response activation in L1 is thus
copied to L2. It is proposed that the severe lateral inhibitory (competitive)
interactions envisaged by Shallice take place at L2, so that the initially
most active node suppresses the rest. This scheme means that response selection
can take place at L2, without the need to completely suppress the representation
of other potentially important responses, which can remain activated at
L1. The need to suppress the currently dominant response after completion
(to prevent perseveration) suggests the use of some form of inhibitory feedback
to L1. In principle this could be quite a complex process, depending on
the internal complexity of the response. In the simplest case however, we
can imagine that, once selected for output at L2, the activation of the
response at L1 is no longer needed. Thus a simple one-to-one inhibitory
feedback loop from L2 to L1 will cause the selected response to inhibit
itself (Figure 2). Once this is done, the remaining activated responses
at L1 can compete to be produced next. It is easy to see that if a set of
responses are activated in parallel at L1, but with a "gradient"
of activations over them representing response strength, then this mechanism,
though entirely parallel in itself, will sequentially select responses in
the order dictated by their degree of activation. In other words, serial
order can be an emergent property of a parallel mechanism dedicated to resolving
response competition.
3.8 If a mechanism with the basic characteristics described above is in
place to enable an organism to order its actions in terms of some simple
internal measure of response strength, then the organism could produce serially
ordered behaviour from memory simply by being able to activate all the responses
in some sequence in parallel, but with an activation gradient over the responses,
such that the sooner the response is to be produced the more active it is.
Specific models with this basic character have been proposed for a number
of serially ordered behaviours (Estes, 1972; Grossberg, 1978; MacKay, 1987;
Rumelhart & Norman, 1982), though with by far the greatest emphasis
on one or another form of linguistic behaviour. A good example is provided
by the Rumelhart and Norman (R&N) model of typing. In this model, R&N
were particularly interested in the form of typing errors and how they could
constrain models. Error data (e.g., MacNeilage, 1964; MacKay, 1970, 1972;
Norman, 1981; Reason, 1984) have played an especially important role in
the development of the serial order models described below. For instance,
many typing errors are transposition errors of the form "trap"
->"tarp". Transposition errors are common in many serial order
tasks (e.g., immediate verbal recall tasks) and are highly problematic for
most conventional models of serial order. Consider what has to happen to
produce the error "trap" ->"tarp". First, at the
point at which the "r" should be produced two things occur: the
"r" is not produced, but the "a", which should occur
later, is produced instead. This provides evidence that upcoming responses
are already active before the point at which they are to be produced. Second,
at the point at which the "a" should be produced two further things
happen: the "a" is not produced in its correct position (i.e.,
it is not repeated), and the "r" which was omitted at position
2 is now produced. These events provide evidence that the "a"
response produced at position 2 has been suppressed, preventing its occurrence
at its appropriate position. If this were not so, this would produce an
error such as "taap" or "taarp", forms which are rarely,
if ever, found. The idea of suppression is further supported by the appearance
of the "r" in position 3, indicating that, not having been produced
at position 2, it has remained active.
3.9 The Rumelhart & Norman model is outlined in Figure 3. The model
is hierarchical, in that specific sequences are represented as "chunks",
i.e., sets of individual responses bound together by connections to a higher-level
node which spans the chunk. The chunks in the model correspond to words
(or parts of words). When a word is to be produced nodes representing the
letters in the word are equally activated in parallel by word-to-letter
connections. Letter nodes in a chunk have lateral inhibitory connections
between them such that each one is inhibited by those nodes representing
letters which are to precede it (a scheme suggested by Estes, 1972). The
first letter therefore receives no inhibition and later letters receive
progressively more. Thus the net excitation (excitation from the word node
minus inhibition from letter nodes) received by a letter node decreases
the later in the word it is to be produced. This induces an activation gradient
over the letter nodes. In combination with a select-and-inhibit mechanism
of the type described above (Figure 2), this parallel activation generates
serial output.
Figure 3
Schematic diagram of the Rumelhart and Norman (1982) Typing model.
3.10 The R&N model produces errors by the addition of noise to letter
node activations. Transposition errors occur when the wrong letter node
becomes the most active at the wrong time. Given the activation gradient,
this is most likely to be the letter to be produced at the following position
- thus most transpositions involve adjacent letters, as is found in the
human data. After the wrong response has been produced it is automatically
inhibited, preventing it being repeated at its correct position. The omitted
response remains active however, allowing it to win the output competition
at the next position. The model thus produces these errors quite naturally,
in an intuitively satisfying way. R&N also model other a priori puzzling
error types, such as "doubling shift" errors of the form "screen"
-> "scrren" (discussed in more detail below). In addition,
the parallel response activation in the model is independently motivated
by its use in modeling co-articulation effects in typing, in which the hand
configuration adopted by the typist while making one response is affected
by the location (on the keyboard) of upcoming key presses.
3.11 The R&N model illustrates the explanatory value of a serial order
model based on the resolution of parallel response competition. What, though,
is its status as model of memory for serial order? This question involves
the internal representations (connection patterns, in this case) by which
it generates the necessary activation gradient over the response set. This
has two components. One component, the word-to-letter connections, provides
equal excitation to all letters in the word. As this is the only activation
letter nodes receive, this input specifies item information, i.e., what
letters are in the word. The other component, the lateral inhibitory connections,
specifies order information (the activation gradient). This latter component
is problematic, for reasons similar to the problems with chaining models
discussed above. To be plausible, the model must represent the spelling
of all words (or, at least, plausible subword chunks) with the same set
of letter nodes. Many words obviously contain the same letters in different
orders (e.g., trap, part, rapt). The lateral inhibitory connection pattern
needed to specify one of these orders is obviously different from that required
for the others. If all the patterns are simultaneously present in memory
then they will clearly interfere with each other. Indeed, since the word
nodes for the above three words all activate the same letters to the same
degree, the lateral inhibitory pattern due to the representation of all
three words in memory will lead to exactly the same letter node activation
pattern whichever word node is active. Other problems arise for instance
in words containing repeats, such as "prop", in which the "p"
is first and last, and hence must be simultaneously the least and most inhibited
letter. In these cases, R&N have to parse the sequence by making divisions
on the occurrence of (non-immediate) repeats. Although such a chunking scheme
has some attractions (see Keele & Jennings, 1992, for a similar proposal),
it can lead to parsings of words which are not the most intuitively appealing,
e.g., trot -> (tro)(t), leaning -> (leani)(ng), disastrous -> (disa)(strou)(s),
nonetheless -> (no)(neth)(el)(ess).
3.12 Rosenbaum, in his book on motor control (1991; p. 285), says of the
Rumelhart and Norman model that "It represents an important advance
in the modeling of human motor control and should serve as a useful starting
point for future research". Below we develop models which show similar
behavioural characteristics, but which do so on the basis of learned representations
of serial order that do not have the problems discussed above.
4. Model 2: Response Competition Under Internal Modulation
"My principal thesis...will be that the input is never into a quiescent
or static system, but always into a system which is already actively excited
and organized. In the intact organism, behavior is the result of interaction
of this background of excitation with input from any designated stimulus.
Only when we can state the general characteristics of this background of
excitation, can we understand the effects of a given input" (Lashley,
1951, p. 112).
4.1 Recent work on serial order involving the modulation of parallel response
competition has developed learning algorithms for these models which produce
memory representations that do not suffer from the difficulties facing the
Rumelhart and Norman model (Burgess & Hitch, 1992; Houghton, 1990).
Following Houghton (1990), we will henceforth refer to models based on such
principles as "Competitive Queueing" (CQ) models. This name reflects
the idea that the activated responses in such models are "queueing"
for "service" (i.e., output), but without forming an ordered line,
such as might form at a ticket office. A competitive queue is more analogous
to the situation at a crowded bar with only one bartender. Customers are
still served one at a time (serial order), but no ordered structure exists.
Service depends instead on success in the competition to attract the bar
staff's attention.
4.1 In recent CQ work, the storage of sequence information in connections
(excitatory or inhibitory) between response elements is avoided. Instead,
it is contained in connections to the response elements from nodes at a
higher level. In the Rumelhart & Norman model, such connections (from
the word to the letter level) all have the same strength and thus contain
only item information (what letters are in the word). It is possible, however,
for these connections to contain order information if their strengths are
allowed to vary. For instance, a word node might have a stronger connections
to a letter the earlier it appears in the word. Activation of the word node
would then activate all the letters in the word but to a degree dependent
on the letters' target positions. Response selection mechanisms of the type
discussed above would lead to serial output in a similar manner to that
achieved by the Rumelhart and Norman model. Note that in this scheme there
is no anagram problem, as there are no sequence-specific lateral connections.
Figure 4 shows the representation of the words rat, art and tar using such
a scheme. Activation of one order of the three letters is not affected by
the ability to activate any other order.
Figure 4
A hierarchical coding of response gradient. Anagrams RAT, ART and TAR can
be encoded purely by word-to-letter connections, avoiding any cross-talk.
All connections shown are excitatory, darker connections indicate stronger
weights.
4.2 This scheme still faces problems however. One the one hand, the suppression
of responses after production makes difficult the storage of a sequence
such as prop, as the letter p has to be produced twice. The arrangement
shown in Figure 4 would simply produce pro. Conversely, there is also the
problem of the undesired reactivation of previously executed responses.
Though suppressed, they will continue to receive activation from the word
node while it is still on. If suppression decays (as it surely must), then
this top-down activation could be sufficient to reactivate items produced
early in the sequence (which receive the strongest inputs). This problem
would become worse the longer the sequence to be stored.
4.3 Recent CQ models incorporate an important development aimed at solving
both of the above problems (Houghton, 1990). The basic idea is to use a
dynamical representation of serial position at the level above the items
to be sequenced. We may refer to this level in the general case as the "sequence"
or "control" level. In the Rumelhart and Norman model, no positional
information (i.e., information as to where one is in the sequence) is available
from a word node, as this comes on at the beginning of the sequence and
then stays on without varying. Sampling of the state of a word node tells
one whether the sequence that node stands for is being produced, but not
what point in its execution has been reached. Positional information can
be incorporated into these models by abandoning the assumption that activation
at the sequence ("word") level should be (i) static, and (ii)
unidimensional. That is to say, one permits a sequence to be controlled
by the activation of a set of "sequence nodes" whose state of
activation varies in some regular way during learning and execution of a
sequence (Burgess & Hitch, 1992). Formally, the activation of the sequence
level, rather than being a scalar constant, becomes a time-varying vector,
which we may refer to for convenience as the "control signal".
(Figure 5). This vector can be used to implicitly encode positional information.
During learning, different states of the signal become associated with different
response states, according to simple learning rules (Houghton, 1990). This
adds an endogenously dynamic element to these models, so that, in Lashley's
words, the input is "never into a quiescent or static system".
Figure 5
Recall in Competitive Queuing (CQ) models employing a time-varying "Control
Signal" (CS). The shaded vertical bars in the graphs represent the
degree of activation of nodes representing the letters P, R,O during recall
of the sequence prop. At each time (t3d1-4), the most active response is
selected and subsequently inhibited. the shaded horizontal bar represents
the control signal, conceived of as a time-varying vector. The shading of
the bar represents the "degree of activation" (magnitude) of components
of the control signal vector. Darker shading represents more activation.
Different responses are associated with different states of the control
signal, and repeated items may be associated with more than one state. This
is shown by the arrowed lines from the CS to the letters. The crucial factor
in generating the characteristic CQ activation gradient (responses being
more active the sooner they are to be produced) is that the state of the
control signal changes smoothly and monotonically, being more similar to
itself at closer positions in time.
4.4 With this mechanism, a sequence such as PROP can be stored because
the two occurrences of the letter P are associated with different states
of the control signal. This is illustrated in Figure 5. Here the control
signal is represented schematically by the shaded horizontal bar, the degree
of shading representing degree of activation, and the change in shading
representing changing activation. The arrows from the control signal to
the letters represent associations between particular components of the
signal and those letters. The P node is activated twice, the first occurrence
activated by the "start state" of the signal, and the second by
the signal moving towards its end state. The same mechanism solves the problem
of the undesired reactivation of responses. If the control signal changes
quickly enough, or a response node discriminates sufficiently between successive
states of the signal, then a suppressed response will not be reactivated
because the evolving state of the control signal will soon cease to be strongly
associated with a response once its target "position" has passed.
For instance, in Figure 5, the end state of the control signal, which reactivates
the P, is not strongly associated with the R, which remains suppressed.
If the target word were, say, PROD, then the P would not be associated with
the end state and would not be reactivated.
4.5 An important constraint on the form of the control signal is that it
should be temporally correlated (more similar to itself at nearer points
in time) in order to produce the characteristic CQ activation gradient,
whereby responses are more active the sooner they are to be produced. If
the CS has this property, then any given state will partially activate responses
associated with similar states, and these responses will be ones that occurred
at similar times or positions. This is indicated in Figure 5 where the shaded
bar changes gradually.
4.6 How complex does the control signal need to be? In the absence of additional
specific constraints, it seems desirable to investigate the properties of
the simplest that are likely to work. Work by Houghton and colleagues (Houghton,
1990, Houghton et al., 1994) has shown that a control signal generated by
two nodes, one starting with high activation and then falling (a "start"
node), and the other starting with low activation and then increasing (an
"end" node), can encode sequences of lengths up to around seven
or eight items, including ones with repeated items. Houghton et al. (1994)
use this form of control signal in a model of lexical spelling. The signal
is smoothly correlated in time, generating an activation gradient similar
to that of the Rumelhart and Norman model, but without the use of sequence
specific lateral connections. These models show that it is not necessary
to use a "discrete" positional representation, i.e., one in which
specific nodes represent specific positions, though such a representation
is compatible with the general approach. Burgess and Hitch (1992) use a
more powerful "distributed" representation of position (referred
to as the context) in which individual nodes represent more than one position,
and each position is represented by more than one node. This model is especially
effective for single-trial serial order learning, but the complexity of
control signal begs the question of its origin.
4.7 The idea that serial behaviour might depend on the existence of such
internal dynamics relates to similar ideas in models of time perception,
in which the endogenous activation (the internal "clock") is typically
generated by oscillators (Church & Broadbent, 1990; Treisman, Cook,
Naish, & McCrone, 1994). Similarly, motor sequencing in many species
has been found to depend on the generation of repeating patterns of activity
by groups of neurons known as "central pattern generators" (Pearson,
1993). The neurons in the pattern generators are distinct from the motoneurons
controlling individual responses. Recent work by Burgess, Hitch and colleagues
has suggested that the internal signals required for learning and recall
in their short-term memory model might be composed of oscillators entrainable
to the rhythmic characteristics of the input sequence (Hitch, Burgess, Towse
& Culpin, 1995; Hitch, Burgess, Shapiro, Culpin & Malloch, 1995).
4.8 Such mechanisms have been applied in a number of a domains including
speech production (Houghton, 1990; Hartley & Houghton, in press), auditory-verbal
short-term memory (Burgess & Hitch, 1992; Burgess, 1995; Glasspool,
1995; Hitch, Burgess, Towse & Culpin, 1995; Houghton, Hartley &
Glasspool, in press), speech errors in immediate nonword recall (Hartley
& Houghton, in press), and serial order errors in spelling (Houghton,
Glasspool & Shallice, 1994; Shallice, Glasspool & Houghton, in press).
Full review of this work is beyond the scope of the current paper. Instead
we will concentrate on two issues covered in this work which provide important
sources of constraint on serial order models. The first involves the problem
of single-trial serial learning, the second the importance of error data
in recall.
Serial Order and Short-Term Memory
4.9 Experimental studies of short-term memory have frequently employed serial
recall tasks, where subjects are required to reproduce, in correct order,
an unfamiliar sequence of familiar items, e.g., digits. If the lists are
not too long, the most common type of error in such studies is the misordering
of the items in the list. It is thus the novelty of the sequence, rather
than its content, which seems to test short-term memory. If a well-known
sequence, such as the days of the week, is presented then such errors will
be much less likely. The crucial factor in such studies is that the sequence
information must be encoded on-line, in a single trial. This is especially
important in serial recall of nonwords (e.g., Treiman & Danis, 1988),
in which not only the order of the items must be learned on-line, but also
the order of the phonological units making up the nonwords. Although in
the past nonword recall has frequently been cited as a prime example of
experimental tasks which are ecologically bizarre, recent discoveries have
shown nonword repetition ability to be causally related to capacity for
long term phonological learning, an important component of vocabulary acquisition
and language learning generally (see Gathercole & Baddeley, 1989, 1993;
Gathercole & Martin, in press; Service, 1992; Papagno & Vallar,
1995; Baddeley, Gathercole, Bishop & Papagno, submitted).
4.10 Such data show that the ability to rapidly encode serial order is of
considerable importance for human development (at least in speech, though
we suspect the same may apply to the imitation of movement more generally).
In addition to its ecological significance, rapid learning offers a powerful
constraint on theories of serial learning. For instance, the recurrent sequential
networks developed by Jordan (1986) and Elman (1990) require the used of
iterative, supervised learning procedures, and hence are unable to effect
single-trial, unsupervised, learning. CQ models generally learn using unsupervised
"Hebbian" learning, i.e., weights between nodes at the sequence
and item levels are adjusted as a function of their co-activation. Such
learning can take place on-line. Thus associative learning models for short
term memory can be developed which do not rely on inter-item chaining.
4.11 A number of models of STM with this character have been developed (Burgess
& Hitch, 1992; Burgess, 1995; Glasspool, 1995; Grossberg, 1978; Hartley
& Houghton, in press; Henson et al., in press). Although differing in
various respects, all these models learn rapidly without forming inter-item
links, and achieve sequencing through parallel response activation and competition
for output. They are thus all prone to error types such as immediate transpositions
and others which, while commonplace in human data, cause serious problems
for non-queueing models (Henson et al., in press). Other characteristics
of serial recall such as bowed serial position curves (primacy and recency
effects) have also been shown (Burgess, 1995).
4.12 These models have been specifically applied to auditory-verbal STM,
and the question arises whether they might apply to the rapid learning of
other kinds of action sequence. It is important to note that verbal recall
may have a number of idiosyncratic properties related to the nature of the
to-be-retained materials. For instance, verbal recall shows word length
effects (Baddeley, Thompson & Buchanan, 1975; Cowan, Day, Saults, Keller,
Johnson & Flores, 1992), effects of phonetic confusability (Conrad &
Hull, 1964; Baddeley, 1968), and effects of the lexical status (word/nonword)
of the list items (Hulme, Maughn & Brown, 1991; Treiman & Danis,
1988). However, in the models mentioned above, such effects are typically
due to factors other than the competitive queueing dynamics underlying serial
ordering. As such they can be treated separately from the basic issue of
ordering, and the production of movement sequences could be studied for
the presence of such basic features as the preponderance of order errors
and the serial position curve. A study by Wilberg (1990) provides some evidence
that memory for other kinds of action sequences may indeed depend on similar
principles to auditory-verbal memory. Although Wilberg's study used free-recall
of movement lists (as opposed to serial recall), he found strong evidence
for typical order effects in free recall, including primacy and recency
effects. Wilberg concludes that his results "suggest that memory for
movement and memory for words are not substantially different." Further
studies of this type, particularly using serial recall and involving detailed
analysis of errors, would be highly instructive for the issue of the general
foundations of motor sequencing.
Errors in Serial Recall: The Problem of Repetition
4.13 As has been repeatedly emphasized, the analysis of error data has been
especially important in motivating the type of model discussed above (MacKay,
1970, 1972, 1987). We further illustrate this point with a particularly
puzzling error which Lashly noted: misplaced repetitions.
4.14 The use of response inhibition in CQ models is necessary for them to
function properly and is central to the account they provide of transposition
data. However, it leads to an obvious problem: How can an action be immediately
repeated, e.g., typing the letter "p" in the word "supper"
? If the action is suppressed after being produced, then, in a CQ context,
the next most active response will be generated. Thus "supper"
would be typed "super". This problem can be overcome if one postulates
that response repetition is a special "mode", which is entered
into only occasionally, i.e., the default assumption in behaviour is that
successive actions will be different from each other, and that perseveration
must be avoided. A similar assumption appears to be built into the movement
of attention, leading to an "inhibition of return" (IOR) effect
(Posner & Cohen, 1984), whereby attention is slower in returning to
a recently attended location than in moving to a new one.
4.15 This basic assumption is clearly part of the competitive queueing architecture.
The hypothesized "repetition mode" acts in some way to prevent
the usual response inhibition from taking place, allowing a given response
to be repeated while the mode is active. Repeated letters can be produced
if the mode is invoked at the appropriate point in production, and only
remains briefly active. Essentially this proposal is made by Rumelhart &
Norman (1982) in their typing model. Any letter to be doubled in a given
word is associated with a "doubling schema", which ideally becomes
active when the letter wins the output competition. This temporarily disables
the usual inhibitory feedback, allowing the letter to be repeated (in the
absence of inhibition, it remains the most active response).
4.16 This may appear a rather ad hoc solution to a problem that one would
prefer not to have in the first place. However, the doubling schema idea
has empirical consequences. Errors occur in the R&N model due to noise
in activation levels. The doubling schema, like letter nodes generally,
is itself subject to noise, and it will occasionally become active slightly
before or after the appropriate point. This leads to the wrong letter, generally
a flanker of the target letter, being doubled, e.g., "supper"
-> "suuper". It turns out that such errors are commonplace
in typing (readers will undoubtedly find them in their own typing), and
they generally involve letters adjacent to the target letter. Similar problems
are occasionally found in handwriting, but are less common (possibly due
to the generally slower pace of handwriting). However, they have been found
to occur in subjects with an acquired neurological disorder known as graphemic
buffer disorder (Caramazza & Miceli, 1990), who show specific and comparable
impairments in the spelling of both words and nonwords. Houghton, Glasspool
& Shallice (1994) model lexical spelling with a learning CQ model based
on that of Houghton (1990), but incorporating Rumelhart & Norman's doubling
schema to allow the model to learn words such as "supper". The
graphemic buffer disorder is modeled by the addition of debilitating amounts
of noise to the letter nodes activated when a word is to be spelt. Addition
of noise to the doubling schema leads to spelling errors involving misplaced
double letters.
4.17 Thus the implicit postulate of CQ models that behaviour has a built
in tendency not to repeat itself leads to the requirement for a specific
behavioural "mode" when repetition is required - in this mode
the mechanisms which normally keep behaviour "moving forward"
are suppressed. When sequences with repeats are learned, the point at which
this mode must be entered has to be encoded. Errors in retrieval of this
point during recall can lead to the wrong action being repeated. An additional
prediction derivable from this idea concerns leaving the repetition mode.
Clearly, if the mode is not turned off, then the dominant action will continue
to be repeated. It has been found in handwriting and typing that letters
which should be doubled are sometimes tripled (Ellis, 1979), indicating
that the repetition mode has not been turned off sufficiently quickly. Ellis,
Young and Flude (1987) report (handwritten) spelling errors made by an acquired
dysgraphic patient, the majority of whose addition errors involved producing
too many copies of doubled letters, e.g., ladder -> laddder, chilly ->
chilllly. Venneri, Cubelli, and Caffarra (1994) report a similar case of
the handwriting of an Italian dysgraphic patient who only produced letter
perseverations in words containing a doubled letter. In most cases the perseveration
consisted of tripling a doubled letter, though repetitions up to 6 letters
long are reported. Few perseverations of single letters occurred, and all
but one were found in words which contained a doubled letter elsewhere in
the word (e.g., parallelo -> parallello). Such data support the idea
that doubling involves a "repetition" mode, which results in perseveration
if not terminated.
5. Serial Order and the Origins of Grammar
"This is the essential problem of serial order: the existence
of generalized schemata of action which determine the sequence of specific
acts, acts which in themselves or in their associations seem to have no
temporal valence" (Lashley, 1951, p. 122).
5.1 In all the models and data considered so far, the items being sequenced
are individual responses which are directly activated by input from the
"sequence level", be it a steady state input such as in the Rumelhart
and Norman model, or a time-varying one. If more than one sequence is stored,
then each has its own dedicated sequence node(s) activating the appropriate
responses (Figure 4). But, as Lashley emphasized, many individual action
sequences appear to be exemplars of a more general "schema for action".
This brings us to the issue of grammar (or syntax), the representation of
generalized sequential patterns, whose individual components can vary. Thus
for instance, in English the word order of many simple noun phrases (NP)
may be specified by the phrase structure rule: NP -> det adj noun, where
det 3D determiner, adj 3D adjective, -> 3D "is realized as",
and the left-to-right order of the symbols following the arrow represents
serial order. The crucial difference between such a representation and anything
considered above is that the ordered items (det etc.) are variables, rather
than specific responses such as individual words. These variables range
over particular classes of word (or "lexical item"); for instance
determiner may be realized as "the", "a", etc., adjective
as "big", "small", etc., noun as "girl", "boy"
etc. The rule given above can be used to generate or describe numerous sequences
or words by instantiating each variable by a word from the appropriate class,
e.g., the small boy, the big meeting, a dainty biscuit etc.
5.2 It seems impossible to account for the productivity of language use
without recourse to some form of "schema of order" which is not
defined (solely) in terms of specific words. Nonetheless, it has been argued
(e.g., Ellis, submitted) that knowledge of language cannot be properly captured
solely in terms of such abstract schemata either, and that native speakers
routinely employ numerous preconstructed phrases (idioms etc.) in which
specific words are specified, for instance, "How are you?", "A
stitch in time saves nine", "You could have knocked me down with
a feather". Other expressions contain mixtures of words and grammatical
variables. For instance, Ellis (submitted) gives the example "NP be-tense
sorry to keep-tense you waiting", where the italicized items are variables.
The schema can be realized as "I'm sorry to have kept you waiting",
"Mr. Brown was sorry to keep you waiting", etc. Our impression
is that "idiomatic" command of a language may depend almost as
much on knowledge of such formulae and the conditions of their use as it
does on the kind of abstract, generative, knowledge studied in linguistics.
5.3 Such examples suggest a "productivity" continuum, with clichE9s,
proverbs and other formulae at one end, and fully "creative" language
use (e.g., poetry) at the other. Most ordinary language use appears to fall
somewhere in between, suggesting a cost-benefit trade-off. The benefit of
encoding specific words is presumably that retrieval of a prespecified word
or phrase is a simpler (faster) operation than choosing a grammatical schema
and then filling it out through lexical selection. The benefit of using
variables is that the same schema can be used in different situations, with
the variables instantiated appropriately. The alternative of only storing
fully instantiated word sequences would impose heavy memory costs (each
sequence being stored separately), and lead to a loss of adaptability. Thus
one can argue that "knowledge of language" trades off speed/flexibility
benefits against space/time costs. This perspective requires that language
learning involves the retention in memory of verbatim sequences of words,
for the acquisition of idioms and formulae, and suggests that the further
development of abstract schemata is also based on this learning. Long term
retention of formulae requires the short term retention of examples of them,
and hence acquisition of "grammar" may depend on the integrity
of verbal short-term memory (as argued for instance by Speidel, 1993; Baddeley
et al., submitted; Ellis, submitted). This reinforces the point made earlier
of the importance of the development of models for the rapid learning of
serial order.
5.4 Is there any evidence to indicate that the kinds of sequencing principles
we have discussed so far can be profitably extended to domains involving
grammatical (variable-based) ordering? Evidence from lexical level speech
errors indicates that they can (Cutler, 1982; Dell, 1986). For instance,
just as letter transpositions occur in typing, whole word transpositions
occur in speech. For instance Garrett (1976) gives examples such as: "...
but a beach on the bikini is all right" ("beach" and "bikini"
exchanged); and "It waits to pay" ("pay" and "wait"
exchanged, with morphological normalization). A priori possible errors such
as "It waits pay to", or "It tos wait pay" are never
found. Similar conclusions may be drawn from these cases as were drawn from
the discussion of exchange errors in typing, viz. words are active before
being produced, appear to be inhibited following production (not reappearing
at the appropriate position), and remain active if they have been omitted.
The crucial additional factor in these examples is that what exchanges with
what is constrained by the grammatical form of the intended utterance. In
the first of the above examples, two nouns are exchanged, in the second
two verbs. The result of this is that the intended grammatical form of the
utterance is maintained even though the order of words is altered. This
is particularly clear in the second example, "It pays to wait"
-> "It waits to pay", where the misplaced verbs have adopted
the appropriate morphological forms.
5.5 The addition of these grammatical constraints appears to have the consequence
that all concurrently active words do not compete on an equal footing at
any given position. Rather the competition is largely confined to those
words which grammatically match the current target word (MacKay, 1987).
At a "noun" position only nouns compete, at a "verb"
position only verbs, and so on. This indicates that the grammatical variables
"noun", "verb" etc., are entities over which serial
order is defined. They may be thought of loosely as being associated with
"slots" in an utterance, with the slots being activated in sequence.
As each slot becomes active, it selects a lexical item from those currently
active to fill it. The lexical item must be of the appropriate type for
the slot, but if more than one such item is active then they compete to
occupy the slot, possibly on the basis of their activation level.
5.6 Compare this situation with the typical verbal STM task, in which there
is no grammatical structure in the stimulus lists and all stimulus items
tend to be of the same sort, letter names, digits, nouns or whatever. No
position in the list therefore has any distinguishing features associated
with it. In this case, CQ models predict that the output competition will
be solely based on item activation, and that items will be more active the
sooner they are to be produced (Burgess, 1995). The strongest competitor
to a given item at a given position will therefore be its intended successor,
and this is thus the item most likely to be involved in a transposition
error (Henson et al., in press). If we add in constraints of a grammatical
nature, then the strongest competitor to, say, a given noun will be the
noun planned to appear in the next noun slot. In a sense, grammatically
constrained errors of the type illustrated still involve "immediate"
neighbours, if neighbours are defined to be of the same grammatical class.
5.7 Models of the type discussed in this paper, involving parallel retrieval,
response competition, post-output suppression etc., have been proposed in
the domain of syntax in utterance production (e.g., Dell & Reich, 1981;
Mackay, 1987). For instance MacKay's diagrammatic model postulates the existence
of grammar nodes which, when activated, equally activate a set of word class
nodes in parallel, e.g., a particular "noun phrase" node might
activate ("prime" in MacKay's terminology <1>)
nodes det, adj, and noun. Like Rumelhart and Norman, MacKay uses the Estes
(1972) solution to generate an activation gradient over these word class
nodes - so the det node inhibits adj and noun, and adj inhibits noun. Firing
of word class nodes follows the "most primed wins" principle,
with nodes being inhibited after firing. Word class nodes are connected
to nodes representing all words in the class, so that, for instance, the
adj node is connected to all adjectives. Its firing alone therefore does
not pick out any particular adjective to produce. This is achieved by semantic
input. When a noun phrase having the target structure (e.g., the black dog)
is to be produced, it is postulated that, in parallel with the activation
of the word class nodes, semantic nodes (representing the meaning of the
noun phrase) directly activate words which express that meaning. This typically
leads to more than one word being active (or "primed"), without,
in itself, specifying their order. Utterance production proceeds by the
combination of the semantic activation of specific lexical items (content)
and nonspecific input from the sequential firing of the lexical category
nodes (structure).
5.8 Natural language syntax shows considerable complexity, and is clearly
not exhausted by the specification of word order. Development of schema
based sequencing within the present framework might therefore benefit from
considering somewhat simpler examples. In our own work (Hartley & Houghton,
in press), we have looked at another example of "grammatical"
constraint in language production, the order of phonemes in syllables. Many
linguistic studies have shown that syllable structure is universally constrained
according to a number of principles, including the "sonority principle"
(Selkirk, 1984), and the "resolvability principle" (Greenberg,
1978; Hjelmslev, 1936. See Houghton, Hartly, & Glasspool, in press,
for discussion). In addition, particular languages show idiosyncratic constraints.
For instance, German allows the syllable initial consonant cluster /shl/,
whereas English does not, even though it permits both phonemes to occur
in those positions otherwise (cf. shrink, sleep). Taken together, such constraints
have the effect that only a small proportion of the sequences definable
over the phonemes of a language actually occurs. For instance, Houghton
et al., (in press ) estimate that of the a priori possible English initial
consonant clusters (including singletons), only about 0.43% actually occur
(this estimate is based on figures from Greenberg, 1975, and excludes clusters
containing a repeated phoneme).
5.9 Phonological speech errors involving phonemes, such as "barn door"
-> "darn boor" (Baars, Motley, & Mackay, 1975), conform
strongly to these syllabic constraints. The universal constraints are effectively
never violated in phonemic speech errors, and the syllabic structure of
planned syllables tends to be maintained, even though individual phonemes
may be substituted (Dell, 1986; Ellis, 1980; Treiman & Danis, 1988).
In the area of spontaneous speech production, studies of such error data
have led to the development of models such as those of Shattuck-Hufnagel
(1979), and Dell (1986, 1988). Although these models differ in many important
respects, they share the central features that (i) syllables are a fundamental
unit of speech planning, and (ii) the structure and content of syllables
are separately represented. The content of a syllable can be represented
as a set of phonemes. The structure may be represented, as suggested above
for grammatical structure, by a set of slots, each of which can only be
filled by a subset of phonemes. In a typical syllable, the initial and final
slots will be for consonants and the middle slot(s) for vowels. According
to the sonority principle, the consonant slots nearer to the vowel are occupied
by more sonorous consonants, such as liquids and nasals.
5.10 Hartley & Houghton (in press) develop a model of short-term memory
for nonwords based on this idea (combined with the general principles of
competitive queueing). Nonword recall was chosen for a number of reasons.
All the words a speaker knows were effectively nonwords on first hearing
them, and repetition and rehearsal require single-trial phonological learning.
As noted, recent studies of phonological short-term memory have shown the
importance of such abilities for language acquisition (see Baddeley et al.,
submitted, for review). Work by Gathercole and colleagues has led them to
the conclusion that nonword recall is a more sensitive test of phonological
STM than recall of word lists. For instance Gathercole and Baddeley (1993;
p. 48) state that, "[P]erformance on immediate memory tasks can reflect
the contribution of long-term memory knowledge as well as short-term memory
processes...We therefore expect to gain a more sensitive measure of phonological
memory skills by using memory items for which there are no long-term lexical
representations, because subjects will be less able to use lexical knowledge
to supplement phonological short-term memory." In addition, work by
Treiman and colleagues (Treiman & Danis, 1988; Treiman, in press) has
shown that phonological errors in the recall of nonwords are much more frequent
than for words, but are constrained in the same way by principles of syllable
structure. Hartley and Houghton (in press) propose that the capacity for
single-trial phonological encoding exploits existing knowledge of such structure.
5.11 In the Hartley and Houghton model (Figure 6), incoming verbal stimuli
are parsed into syllables. When a new syllable is to be learned an onset/rhyme
node pair is activated. As each phoneme in the syllable arrives it activates
a different slot in a generalized syllable "template", using long-term
associative knowledge. Connections from the activated onset/rhyme nodes
to the phoneme and template nodes are strengthened by a Hebbian weight change
rule. The connections to the phoneme nodes learn what phonemes occur in
the syllable (phonemic content), while the connections to the syllable template
learn which positions are used (syllabic structure). Figure 6 shows the
representation of the syllable /rat/.Recall of the syllable involves both
recovery of its constituent phonemes and the serial reactivation of the
syllable slots activated by those phonemes during learning. Phoneme nodes
therefore receive input from both onset/rhyme nodes and the syllable template.
As each syllable "slot" becomes active, phonemes compete for selection
for output. However, the competition is biased strongly in favour of those
phonemes which "fit" the currently active syllable slot. If the
model is recalling a series of syllables, the strongest competitors for
a given slot will be phonemes from upcoming syllables (already active, due
to general CQ principles) which occur in the same respective position. Thus
errors tend to involve movement of a phoneme from one syllable to the same
position in another. The output of the model is tested in detail against
data from short term memory experiments (Treiman & Danis, 1988; Treiman,
in press), and the nonword repetition of both children (Gathercole et al.,
1991) and neurologically impaired subjects (Bisiacchi et al., 1989).
Figure 6
Phonological representation of a syllable (/rat/) in the Hartley and Houghton
(1995) model. Not all connections or nodes are shown. Strings of input phonemes
are divided up into syllables (syllable group). Syllables are represented
in terms of their phonemic content (phoneme group) and the "slots"
they use in a generalised syllable template. Syllable group nodes are composed
of pairs of onset and rhyme nodes. The solid lines represent temporary weights,
formed during rapid learning (short-term memory). The dashed lines are permanent
connections (long-term memory). syllable structure and content are separately
represented, but interact during recall. Key: On 3d onset, Ry 3d rhyme,
C 3d consonant, V 3d vowel, SB 3d syllable boundary.
5.12 In this model, the serial order of phonemes during recall is governed
by the cyclical activity of the syllable template. As in the CQ models discussed
above, this template is formally a time-varying vector, and acts as a kind
of control signal. However, the template does not lead to the activation
of specific responses, as in Houghton (1990). Instead, its states are associated
with whole classes of responses (phonemes). Which of the set of possible
phonemes is to be produced is specified by a separate "content"
input. This factoring of serial order information into a separate system
means that the endogenous dynamical signals used in such CQ models as Houghton,
1990, Houghton et al., 1994, does not have to be repeatedly represented
for every sequence learnt. This represents a considerable simplification,
and is only possible in cases where the set of sequences to be learnt conforms
to some underlying pattern which can be abstracted; in other words, where
there is a grammar.
5.13 Whether such principles can be extended to sequencing outside the linguistic
domain depends on whether other forms of action sequencing are susceptible
to grammatical analysis, i.e., whether particular sequences can be seen
as instantiations of an underlying schema defined, at least in part, in
terms of variables. Error data from slips of action in normal subjects (Norman,
1981; Reason, 1984), and acquired disorders of action planning (e.g., Lhermitte,
1983; Luria, 1973; Schwartz, Reed, Montgomery, Palmer, & Mayer, 1991)
support the idea that the kinds of sequencing principles we have described
can be applied to action sequences generally. For instance, transposition
errors in routine actions are commonly found in patients with frontal apraxia.
According to Schwartz et al. (1991), such patients are frequently recorded
to put on their shoes before their socks, or to put toothpaste onto a tooth
brush after having brushed their teeth. Cooper et al. (1994) describe a
"hybrid" symbolic-connectionist model of routine action (based
on Norman & Shallice, 1986) aimed at understanding such action slips.
Routine actions are controlled by schemata activated in parallel, which
compete for control of action on the basis of their activation values. Serial
behaviour emerges from the model due to schemata being inhibited once their
corresponding goals have been achieved. Elements of schemata contain variables
which need to be given specific values during execution, for instance "arguments"
representing the object on which an action is to be performed. Argument
selection is based on activation levels of (representations) of objects,
and how well they fit a "feature specification" associated with
the schema. This is similar to the mechanism of phoneme selection by the
syllable template (or "schema") in the Hartley and Houghton (in
press) verbal recall model.
5.14 In addressing the issue of grammar last we do not intend to suggest
that schema based sequencing is particularly exceptional or rare, though
we do believe that the human capacity for it is greatly developed compared
with other animals Indeed, it is nongrammatical sequencing, as found in
the single-trial learning of lists of items all belonging to the same class,
that may be the comparatively unusual behaviour. The reason these models
are addressed last is simply that they are the most complex, and the way
in which they explain particular data presupposes that, independently of
the operation of the schemata for order, groups of competing responses are
being activated in parallel. Why should that be? The explanation we provide
is that this parallelism represents one aspect of a more "primitive"
form of sequencing. The development of schema based sequencing has not supplanted
these basic mechanisms, rather it operates in conjunction with them. This
has benefits. First of all, one should be aware that English is a somewhat
unusual language in that its word order is highly constrained ("schematised").
Many other languages show much "freer" word order, by which it
is meant that constituents in a sentence with a given meaning are not bound
to appear in a single particular order (Givon, 1979). Of course, in any
actually spoken utterance of such a language, all constituents do appear
in some definite order. What determines this order, if it is not completely
specified by grammatical schemata, and how does this ordering interact with
grammatical ordering? One possibility is that all influences on order, grammatical
or otherwise, act on the same competitive queueing output system. The resultant
order reflects the relative strengths of these influences, with more strongly
activated items appearing earlier (Prentice, 1966; MacWhinney, 1977; Sridhar,
1989). Another benefit of this interactive view of sequencing is that if
control by grammatical schemata becomes disorganised or weakened (as may
be the case for instance in agrammatism, Saffran, Schwartz, & Marin,
1980; or the frontal apraxia discussed above), the simpler, competition
based, mechanisms still ensure that serial behaviour is possible, as long
as concrete responses are activated. By contrast, production system models
based on symbolic action grammars plus serial recursive processes (e.g.
Houghton & Isard, 1987) are completely dependent on the grammar for
the specification of the order of actions. If this mechanism breaks down,
no behaviour can be produced.
6. Conclusions
"I have devoted so much time to ... the problem of syntax
... because the problems raised by the organization of language seem to
me to be characteristic of almost all other cerebral activity...Not only
speech, but all skilled acts seem to involve the same problems of serial
ordering, even down to the temporal coordination [of] such a movement as
reaching and grasping. Analysis of the nervous mechanisms underlying order
in the more primitive acts may contribute ultimately to the solution even
of the physiology of logic" (Lashley, 1951, p. 122).
6.1 We began this paper with Lashley's rejection of associative chaining
as a basis for a neuropsychological theory of serial order, and his tentative
suggestions for an alternative based on parallel response activation, and
"schemata for action". Since Lashley wrote his article, a great
deal has been learned about sequential behaviour, particularly in the linguistic
domain. Various aspects of the behavioural data are briefly reviewed above,
and we believe they support Lashley's view that models based on associative
chaining are "doomed to failure". However alternative, neurally
plausible, accounts of serial order compatible with a broad range of behavioural
data have been few and scattered. The models described in this paper have
attempted to develop a particular line of research compatible with Lashley's
insights (e.g., Dell, 1986, 1988; Grossberg, 1978; Houghton, 1990; Mackay,
1987; Rumelhart & Norman, 1982; Shallice, 1972). The central interest
of the current paper lies in its attempt to integrate these various strands
into a coherent theory of serial order, applicable over a wide range of
cases. In doing this we have built up a series of models, beginning with
the basic problem of parallel response competition and its possible resolution
into serial action. We then considered how a simple mechanism capable of
resolving competition might be exploited to generate serial behaviour from
memory rather than environmental stimulation. Models capable of this were
discussed and their basic features adumbrated. It was claimed that such
models provided unique insight into certain common error patterns. For instance,
the models have a "default" mode of operation whereby actions
are not repeated. Behaviour thus has a built in tendency to spontaneously
"move on". This makes repetition, which might sometimes be necessary,
a problem, and it was proposed that repetitive behaviour represents a specific
mode, which must be engaged and disengaged. This mode might function by
temporarily disabling the inhibitory feedback used in the normal model.
This proposal has interesting empirical consequences, some of which have
found support.
6.2 Following on, we identified certain limitations of these basic models
and discussed ways in which they can be overcome, without abandoning the
dynamical features of the models which make them so attractive. It was suggested
that hierarchical models of serial order need not have dynamic properties
only at the "terminal" or output level, but that control or chunk
nodes at higher levels could change their pattern of activity in regular
ways during learning and recall, in a manner reminiscent of the neural central
pattern generators or endogenous "clocks" found in many species
(Pearson, 1993; Treisman, 1994). This considerably increases the power of
these models, and permits an expanded learning capacity. This has been exploited
in the extension of the models to the domain of short-term memory. Finally,
we came to what Lashley referred to as "schemata for action" and
discussed how these schemata could be integrated with the models developed
so far. This has permitted the application of these ideas to complex phenomena
in language and other forms of action. Yet, even at this stage, the models
retain the stamp of their particular origin, in that parallel response competition
and the means of its resolution remain central explanatory mechanisms.
6.3 We conclude then that Lashley's original insights into serial order
in human behaviour, largely based on everyday observations, remain valid.
Perhaps most importantly, we believe that the work reviewed above provides
concrete support for Lashley's conviction, expressed in the final quotation
above, that similar ordering principles operate in many superficially different
domains.
Acknowledgements
The authors are grateful to Steve Keele, Don MacKay, Rik Henson, and Steve
Jackson for valuable comments on a previous version of this paper. We would
also like to thank our colleagues Dave Glasspool, Tim Shallice, Steve Tipper,
Neil Burgess and Gordon Brown for numerous discussions of matters raised
in the paper.
Notes
<1> MacKay's models distinguish between "priming"
and "activation". However, this (and other) complications will
be left aside for simplicity's sake.
References
Aldridge, J.W., Berridge, K.C., Herman, M., & Zimmer, L., (1993). Neuronal
coding of serial order: Syntax of grooming in the neostriatum. Psychological
Science, 4, 391-395.
Allport, A., (1987). Selection for action: Some behavioral and neurophysiological
considerations of attention and action. In H. Heuer & A.F. Sanders,
(Eds.). Perspectives on perception and action. Hillsdale, NJ: Erlbaum.
Amit, D.J., Sagi, D., & Usher, M., (1990). Architecture of attractor
neural networks performing cognitive fast scanning. Network, 14,
189-216.
Ans, B., Coiton, Y., Gilhodes, J-C., & Velay, J-L., (1994). A neural
network model for temporal sequence learning and motor programming. Neural
Networks, 7(9), 1461-1476.
Baars, B.J., Motley, M.T., & Mackay, D.G., (1975). Output editing for
lexical status in artificially elicited slips of the tongue. Journal
of Verbal Learning and Verbal Behavior, 14, 382-391.
Baddeley, A.D., (1968). How does acoustic similarity influence short-term
memory? Quarterly Journal of Experimental Psychology, 18, 362-365.
Baddeley, A.D., Gathercole, S., Bishop, D., & Papagno, C., (submitted).
The phonological loop as a language learning device. Psychological Review.
Baddeley, A.D, Thomson, N., & Buchanan, M., (1975). Word length and
the structure of short-term memory. Journal of Verbal Learning and Verbal
Behavior, 14, 575-589.
Bairaktaris, D., (1992). A speech-based connectionist model of human short-term
memory. Proceedings of the 14th Annual Conference of the Cognitive Science
Society. Hillsdale, NJ: Erlbaum.
Berridge, K.C., & Wishaw, I.Q., (1992). Cortex, striatum and cerebellum:
Control of serial order in a grooming sequence. Experimental Brain Research,
90, 275-290.
Bisiacchi, P.S., Cipolotti, L., & Denes, G., (1989). Impairments in
processing meaningless verbal material in several modalities: The relationship
between short-term memory and phonological skills. Quarterly Journal
of Experimental Psychology, 41A, 292-320.
Burgess, N., (1995). A solvable connectionist model of immediate recall
of ordered lists. In G. Tesauro, D.S. Touretzky, & T.K. Leen (Eds.).
Advances in Neural Information Processing Systems, 7, Cambridge,
Mass.: MIT Press.
Burgess, N., & Hitch, G., (1992). Towards a network model of the articulatory
loop. Journal of Memory and Language, 31, 429-460.
Chomsky, N., (1957). Syntactic Structures. The Hague: Mouton.
Church, R.M., & Broadbent, H., (1990). Alternative representations of
time, number, and rate. Cognition, 37, 55-81.
Coles, M.G.H., Gratton, G., Bashore, T.R., Eriksen, C.W., & Donchin,
E., (1985). A psychophysiological investigation of the continuous low model
of human information processing. Journal of Experimental Psychology:
Human Perception and Performance, 11, 529-553.
Colombo, M., Eickhoff, A.E., & Gross, C.G., (1993). The effects of inferior
temporal and dorsolateral frontal lesions on serial-order behavior and visual
imagery in monkeys. Cognitive Brain Research, 1, 211-217.
Cooper, R., Shallice, T., & Farringdon, J., (1994). Symbolic and continuous
processes in the automatic selection of actions. Technical Report No.
UCL-PSY-ADREM-TR11, Dept. of Psychology, University College London.
Conrad, R., & Hull, A.J., 1964, Information, acoustic confusion and
memory span. British Journal of Psychology, 55, 429-432.
Cowan, N., Day, L., Saults, J.S., Keller, T.A., Johnson, T. & Flores,
L. (1992). The role of verbal output time in the effects of word length
on immediate memory. Journal of Memory and Language, 31, 1-17.
Cutler, A., (Ed). (1982). Slips of the Tongue. The Hague: Mouton.
Dehaene, S., Changeux, J-P., & Nadal, J-P., (1987). Neural networks
that learn temporal sequences by selection. Proceedings of the National
Academy of Sciences, USA, 84, 2727-2731.
Dell, G.S., (1986). A spreading activation theory of retrieval in sentence
production. Psychological Review, 93, 283-321.
Dell, G.S., (1988). The retrieval of phonological forms in production: Tests
of predictions from a connectionist model. Journal of Memory and Language,
25, 124-142.
Dell, G.S., & Reich, P., (1981). Stages in sentence production: an analysis
of speech error data. Journal of Verbal Learning and Verbal Behavior,
20, 611-629.
Ellis, A.W., (1979). Slips of the pen. Visible Language, 13, 265-282.
Ellis, A.W., (1980). Errors in speech and short term memory: The effects
of phonemic similarity and syllable position. Journal of Verbal Learning
and Verbal Behavior, 19, 624-634.
Ellis, A.W., Young, A.W., & Flude, B.M., (1987). "Afferent dysgraphia"
and the role of feedback in the motor control of handwriting. Cognitive
Neuropsychology, 4, 465-486.
Ellis, N.C., (submitted). Sequencing in second language acquisition: Phonological
memory, chunking and points of order. Studies in Second Language Acquisition.
Elman, J.L., (1990). Finding structure in time. Cognitive Science, 14,
179-211.
Eriksen, C.W., Coles, M.G.H., Morris, C.L.R., & O'Hara, W.P., (1985).
An electromyographic examination of response competition. Bulletin of
the Psychonomic Society, 23, 165-168.
Eriksen, B.A., & Eriksen, C.W., (1974). Effects of noise letters on
the identification of a target letter in a nonsearch task. Perception
and Psychophysics, 16, 143-149.
Estes, W.K., (1972). An associative basis for coding and organisation in
memory. In A.W. Melton & E. Martin (Eds.), Coding processes in human
memory. Washington, DC; Winston.
Fountain, S.B., Henne, D.R., & Hulse, S.H., (1984). Phrasing cues and
hierarchical organization in serial pattern learning by rats. Journal
of Experimental Psychology: Animal Behaviour Processes, 10, 30-45.
Garrett, M.F., (1976). Syntactic processes in sentence production. In R.J.
Wales & E. Walker (Eds.), New Approaches to Language Mechanisms.
Amsterdam: North Holland.
Gathercole, S.E., & Baddeley, A., (1989). Evaluation of the role of
phonological STM in the development of vocabulary in children: A longitudinal
study. Journal of Memory and Language, 28, 200-213.
Gathercole, S.E., & Baddeley, A., (1993). Working Memory and Language.
Hove: Erlbaum.
Gathercole, S.E., Willis, C.S., Emslie, H., & Baddeley, A., (1991).
The influences of number of syllables and wordlikeness on children's repetition
of nonwords. Applied Psycholinguistics, 12, 349-367.
Givon, T., (1979) On Understanding Grammar. London: Academic Press.
Glasspool, D.W., (1995). Competitive queueing and the articulatory loop:
An extended network model. In J. Levy, D. Bairaktaris, J. Bullinaria, &
D. Cairns (Eds.), Connectionist Models of Memory and Language. London:
UCL Press.
Glasspool, D.W., Houghton, G., & Shallice, T., (1995). Interactions
between knowledge sources in a dual-route connectionist model of spelling.
In L.S. Smith & P.J.B. Hancock (Eds.), Neural Computation and Psychology.
London: Springer-Verlag.
Greenberg, J.H., (1978). Some generalizations concerning initial and final
consonant clusters. In J.H. Greenberg (Ed.), Universals of Human Language,
vol. 2: Phonology. Stanford, CA.: Stanford University Press.
Grossberg, S., (1978). Behavioral contrast in short term memory: Serial
binary memory models or parallel continuous memory models? Journal of
Mathematical Psychology, 17, 199-219.
Hartley, T., & Houghton, G., (1995). A linguistically constrained model
of short-term memory for nonwords. Journal of Memory and Language.
Henson, R., Norris, D , Page, M., & Baddeley, A.,(In Press). Unchained
memory: error patterns rule out chaining models of immediate, serial recall.
Quarterly Journal of Experimental Psychology.
Hinde, R.A., (1970). Animal behavior: A synthesis of ethology and comparative
psychology. New York: MacGraw-Hill.
Hitch, G., Burgess, N., Towse, J., & Culpin, V., (in press). Temporal
grouping effects and working memory: the role of the phonological loop.
Quarterly Journal of Experimental Psychology .
Hitch, G., Burgess, N., Shapiro, J., Culpin, V., & Malloch, M., (1995).
Evidence for a timing signal in verbal short-term memory. Paper presented
at the meeting of the Experimental Psychology Society, University of Birmingham,
UK.
Houghton, G., (1990). The problem of serial order: A neural network model
of sequence learning and recall. In R. Dale, C. Mellish & M. Zock (Eds.),
Current research in natural language generation. London: Academic
Press.
Houghton, G., (1994). Inhibitory control of neurodynamics: Opponent mechanisms
in sequencing and selective attention. In M. Oaksford & G.D.A. Brown
(Eds.), Neurodynamics and psychology. London: Academic Press.
Houghton, G., Glasspool, D.W., & Shallice, T., (1994). Spelling and
serial recall: Insights from a competitive queueing model. In G.D.A. Brown
& N.C. Ellis (Eds.), Handbook of spelling: Theory, process and intervention.
Wiley: Chichester.
Houghton, G., Hartley, T., & Glasspool, D.W., (In Press). The representation
of words and nonwords in short-term memory: Serial order and syllable structure.
To appear in S.E. Gathercole, (Ed.), Models of Short-Term Memory.
Erlbaum.
Houghton, G., & Isard, S., (1987). Why to speak, what to say, and how
to say it: Modelling language production in discourse. In P.E. Morris (Ed.),
Modelling Cognition. Chichester: Wiley.
Houghton, G., & Tipper, S.P., (1994). A model of inhibitory mechanisms
in selective attention. In D. Dagenbach & T. Carr (Eds.), Inhibitory
Mechanisms in Attention Memory an Language. San Diego: Academic Press.
Houghton, G., & Tipper, S.P., (In Press). Inhibitory mechanisms of neural
and cognitive control: Applications to selective attention and sequential
action. Brain and Cognition.
Hulme, C., Maughan, S., & Brown, G.D.A., (1991). Memory for familiar
and unfamiliar words: evidence for a long-term memory contribution to short-term
span. Journal of Memory and Language, 30, 685-701.
Ingle, D., (1972). Selective choice between double prey objects by frogs.
Brain Behavior Evolution, 7, 127-144.
Jackson, G.M., Jackson, S.R., Harrison, J., Henderson, L., & Kennard,
C., (in press). Serial reaction time learning and Parkinson's disease: Evidence
for a procedural learning deficit.
Jensen, A.R, & Rohwer, W.D., (1965). What is learned in serial learning?
Journal of verbal learning and verbal behavior, 4, 62-72.
Jordan, M.I., (1986). Serial order: A parallel distributed approach. ICI
report 8604, Institute for Cognitive Science, University of California,
San Diego.
Keele, S.W., & Jennings, P.J., (1992). Attention in the representation
of sequence: Experiment and theory. Human Movement Studies, 11, 125-138.
Kermadi, I., Jurquet, Y., Arzi, M., & Joseph, J.P., (1993). Neural activity
in the caudate nucleus of monkeys during spatial sequencing. Experimental
Brain Research, 94, 352-356.
Kesner, R.P, & Novak, J.M., (1982). Serial position curve in rats: Role
of the dorsal hippocampus. Science, 218, 173-175.
Konishi, M., (1985). Birdsong: From behavior to neuron. Annual Review
of Neuroscience, 8, 125-170.
Lashley, K.S., (1951). The problem of serial order in behavior. In L.A.
Jeffress (Ed.), Cerebral mechanisms in behavior. New York: Wiley.
Lhermitte, F., (1983). Utilisation behaviour and its relation to lesions
of the frontal lobes. Brain, 106, 237-255.
Luria, A.R., (1973). The Working Brain. London: Penguin .
Mackay, D.G., (1970). Spoonerisms: the structure of errors in the serial
order of speech. Neuropsychologia, 8, 323-350.
Mackay, D.G., (1972). The structure of words and syllables: evidence from
errors in speech. Cognitive Psychology, 3, 210-227.
Mackay, D.G., (1987). The organization of perception and action.
New York: Springer Verlag.
MacNeilage, P.F., (1964). Typing errors as clues to serial ordering mechanisms
in language behaviour. Language and Speech, 7, 144-159.
MacWhinney, B., (1977). Starting points. Language, 53, 152-168.
Marler, P., (1991). The instinct to learn. In S. Carey & R. Gelman (Eds.)
The epigenesis of mind: Essays on biology and cognition. NJ: Erlbaum. Reprinted
in M.H. Johnson (1993) (Ed.), Brain development and cognition: A reader.
Oxford: Blackwell.
Neumann, O., (1987). Beyond capacity: A functional view of attention. In
H. Heuer & A.F. Sanders, (Eds.), Perspectives on perception and action.
Hillsdale, NJ: Erlbaum.
Nissen, M.J., & Bullemer, P.T., (1987). Attentional requirements for
learning: Evidence from performance measures. Cognitive Psychology, 19,
1-32.
Norman, D., (1980). Categorization of action slips. Psychological Review,
88, 1-15.
Norman, D., & Shallice, T., (1986). Attention to action: Willed and
automatic control of behavior. In R. Davidson, G. Schwartz, & D. Shapiro,
(Eds.), Consciousness and Self Regulation, Vol. 4, New York: Plenum.
Pearson, K.G., (1993). Common principles of motor control in vertebrates
and invertebrates. Annual Review of Neuroscience, 16, 265-297.
Paulesu, E., Frith, C.D., & Frackowiak, R.S.J., (1993). The neural correlates
of the verbal component of working memory. Nature, 362, 342-344.
Posner, M.I., & Cohen, Y.A., (1984). Components of visual orienting.
In H. Bouma & D.G. Bouwhuis (Eds.), Attention and Performance X.
Hillsdale, N.J.: Erlbaum.
Prentice, J.L., (1966). Response strength of single words as an influence
in sentence behavior. Journal of Verbal Learning and Verbal Behavior,
5, 429-433.
Reason, J.T., (1984). Lapses of attention. In W. Parasuraman, R. Davies,
& J. Beatty, (Eds.), Varieties of Attention. Orlando: Academic
Press.
Rosenbaum, D.A., (1991). Human Motor Control. San Diego: Academic
Press.
Rumelhart, D.E., & McClelland, J.L., (1986). On learning the past tenses
of English verbs. In J.L. McClelland & D.E. Rumelhart (Eds.), Parallel
Distributed Processing, Vol. 2: Psychological and Biological Models.
Cambridge, Mass.: MIT Press.
Rumelhart, D.E., & Norman, D., (1982). Simulating a skilled typist:
A study of skilled cognitive-motor performance. Cognitive Science, 6,
1-36.
Saffran, E.M., Schwartz, M.F., & Marin, O.S.M., (1980). The word order
problem in agrammatism II: Production. Brain and Language, 10, 263-280.
Schwartz, M.F., Reed, E.S., Montgomery, M., Palmer, C., & Mayer, N.H.,
(1991). The quantitative description of action disorganisation after brain
damage: A case study. Cognitive Neuropsychology, 8, 381-414.
Seidenberg, M., & McClelland, J.L., (1989) , A distributed, developmental
model of word recognition and naming. Psychological Review, 98, 523-568.
Selkirk, E., (1984). On the major class features and syllable theory. In
M. Aronoff, & R.T. Oehrle, (Eds.), Language sound structure: Studies
in phonology presented to Morris Halle by his teacher and students.
Cambridge, Mass.: MIT Press.
Shallice, T., (1972). Dual functions of consciousness. Psychological
Review, 79, 383-393.
Shallice, T., Glasspool, D., & Houghton, G., (in press). Can neuropsychological
evidence inform connectionist modelling? Analyses from spelling. Language
and Cognitive Processes.
Shattuck-Hufnagel, S., (1979). Speech errors as evidence for a serial-ordering
mechanism in sentence production. In W.E. Cooper & E.C.T Walker (Eds.),
Sentence processing: Psycholinguistic studies presented to Merrill Garret.
Hillsdale, NJ: Erlbaum.
Speidel, G.E., (1993). Phonological short-term memory and individual differences
in learning to speak: A bilingual case study. First Language, 13,
69-91.
Sridhar, S.N., (1989). Cognitive structures in language production: A crosslinguistic
study. In B. MacWhinney & E. Bates (Eds.), The Crosslinguistic Study
of Sentence Processing. Cambridge: Cambridge University Press.
Stroop, J.R., (1935). Studies of interference in serial verbal reactions.
Journal of Experimental Psychology, 18, 643-662.
Terrace, H.S., (1991). Chunking during serial learning by a pigeon: I. Basic
evidence. Journal of Experimental Psychology: Animal Behaviour Processes,
17, 81-93.
Treiman, R., (in press). Errors in short-term memory for speech: A developmental
study.
Treiman, R., & Danis, C., (1988). Short-term memory errors for spoken
syllables are affected by the linguistic structure of the syllables. Journal
of Experimental Psychology: Learning, Memory and Cognition, 14, 145-152.
Treisman, M., Cook, N., Naish, P.L.N., & McCrone, J.K., (1994). The
internal clock: electroencephalographic evidence for oscillatory processes
underlying time perception. Quarterly Journal of Experimental Psychology,
47A, 241-289.
Veneri, A., Cubelli, R., & Caffarra, P., (1994). Perseverative dysgraphia:
A selective disorder in writing double letters. Neuropsychologia, 32,
923-931.
Wickelgren, W.A, (1969). Context sensitive coding, associative memory, and
serial order in (speech) behavior. Psychological Review, 76, 1-15.
Wilberg, R.B., (1990). The retention and free recall of multiple movements.
Human Movement Science, 9, 437-479.