Associative Memory¶
Core Idea¶
Associative memory is the structural pattern of content-addressable storage and retrieval: stored items are accessed not by a separate address or index but by their own content, or by content associated with them, so that presenting a partial, noisy, or merely related cue retrieves the full or linked item. The defining commitment is that the key and the value inhabit the same representational space, and proximity in that space drives recall — the exact opposite of address-based lookup, where the key is an arbitrary handle bearing no relation to what it points to. [1] This commitment was made mathematically precise by Hopfield (1982), who showed that a network of symmetrically coupled binary units settles into stored patterns as the stable fixed points (attractors) of an energy function, so that any state within a pattern's basin of attraction converges to that pattern. [1] The same structural idea had been anticipated in hardware decades earlier: Slade and McMahon (1956) described a "cryotron catalog memory" that located a word by matching on its content rather than by consulting an address line, the first physical instance of content addressing. [2]
What makes the prime more than a database trick is that the representation is the index. There is no separate lookup table mapping arbitrary keys to storage locations; the geometry of the stored items themselves determines what a cue retrieves. A fragment of a melody recalls the whole song; a smell recalls a childhood scene; a half-remembered face resolves into a name. In each case the cue is not a pointer to the memory — it is a piece of the memory, and recall is the system's relaxation from the partial cue toward the nearest complete stored state.
How would you explain it like I'm…
A Piece Pulls Back the Whole
Memory by Clue, Not by Address
Content-Based Recall
Structural Signature¶
Associative memory encodes a structural pattern: partial or related cue → similarity-driven convergence → recovery of the full or linked stored item. It separates two regimes of access — retrieval by arbitrary handle (address-based, exact match, brittle) and retrieval by content (similarity-based, partial cue tolerated, graceful) — and commits to the second. [3] Marr (1971) gave this an explicit anatomical reading, proposing that the hippocampal archicortex implements exactly such a recall mechanism, in which a simple input event can later be reinstated in full from a fragment of itself. [4]
Recurring features:
- Retrieval by content rather than by arbitrary address
- Key and value sharing one representational space
- A partial or noisy cue converging to the full stored item
- Proximity in representation space driving recall
- Graceful, similarity-graded recovery rather than all-or-nothing lookup
- Pattern completion from a fragment via attractor dynamics
- Storage and indexing collapsed into a single geometry
The structural insight is robust across substrates: a hippocampal circuit reinstating an episode from a single salient detail, a Hopfield network relaxing into the nearest stored attractor, a content-addressable memory chip flagging every cell whose contents match a search word, and a vector database returning the nearest neighbors of a query embedding all exhibit the identical logic — present part of the content, recover the whole. [3] Kohonen (1977) systematized this as a general theory of "associative memory" spanning correlation-matrix models, linear associators, and neural implementations, showing that the same matrix-algebraic recall rule underlies devices that look superficially unrelated. [3]
What It Is Not¶
Associative memory is not a claim that all memory is content-addressable. Many real systems are hybrids: a human brain plainly does content-addressable recall, yet it also exhibits sequence and temporal structure that pure content addressing does not capture; a computer pairs address-based RAM with content-addressable caches and lookup tables. The prime names a specific access discipline, not a theory of memory in general.
Nor does the prime assert that retrieval is always correct. Because recovery is driven by similarity, an associative system will confidently return a stored item that merely resembles the cue, producing false recall, confabulation, or the retrieval of a "spurious" attractor that was never deliberately stored. The structure guarantees convergence to a nearby stored state, not convergence to the intended one. Graceful degradation and graceful error are two faces of the same mechanism.
Associative memory is also not the same as mere "search" or "lookup" in the everyday sense. Searching a sorted list by binary search, or resolving a hash key, are address-like operations: the query is a handle, and an exact or computed match is required. Associative retrieval tolerates — indeed expects — a query that is incomplete, corrupted, or only related to the target, and it returns the best match by proximity rather than by equality.
Finally, the prime says nothing about the value or quality of what is stored. A content-addressable system will faithfully recall a trauma, a prejudice, or a bad habit from a partial cue exactly as readily as a useful skill. The structure describes how items are reached, not whether reaching them is desirable.
Broad Use¶
Neuroscience: Hippocampal and neocortical networks recall a whole episode or memory from a fragment of it; the CA3 region's dense recurrent collaterals are widely modeled as an autoassociative network performing pattern completion, and attractor models formalize how a partial cue is "cleaned up" toward a stored memory. [5]
Cognitive psychology: Priming, free association, and cued recall all reflect content-addressed structure — one concept activates related ones through learned linkages, and a retrieval cue that shares features with an encoded item makes that item disproportionately accessible. Tulving and Thomson (1973) made this precise with the encoding-specificity principle: a cue is effective to the extent that it overlaps in content with what was encoded. [6]
Computer architecture (non-obvious): Content-addressable memory (CAM) hardware in routers, network switches, and CPU caches matches on data content rather than on an address line, returning in a single cycle the location of any cell whose contents equal the search word — the canonical engineered instance of the prime. [7]
Machine learning: Vector databases and embedding-retrieval systems fetch items by nearest-neighbor similarity in a learned representational space; the attention mechanism of modern transformers is itself a soft, differentiable associative read, where a query vector retrieves a similarity-weighted blend of stored values. [8]
Information retrieval: Similarity search returns documents related to a query rather than ones bearing a matching identifier, the entire premise of vector and semantic search as opposed to exact-key database lookup. [9]
Clarity¶
Naming associative memory sharply distinguishes two fundamentally different access disciplines — by address (an arbitrary handle, requiring an exact match) versus by content (similarity, tolerating a partial cue). This single distinction explains a great deal of otherwise puzzling system behavior: why some systems degrade gracefully under partial or corrupted input while others fail outright on a single missing or wrong key; why human memory can be triggered by a smell or a snatch of music but a filing system cannot; why a hash table is unforgiving of a typo while a search engine is not. [3] The clarity is diagnostic: once you ask of any storage system "is this addressed by arbitrary handle or by content?", you can predict its failure modes, its tolerance to noise, and the kind of cue it will respond to.
The distinction also reframes a class of design questions. Instead of asking "how do we make this lookup more robust to errors?", the content-addressable view asks "should the key be the content, so that proximity does the work?" That reframing is what turns a brittle exact-match index into a similarity index that answers fuzzy queries.
Manages Complexity¶
Associative memory removes the need for a separate indexing scheme: the representation is the index. This collapses storage and retrieval into a single geometry, so that reasoning about recall reduces to reasoning about proximity in the representational space. [1] One no longer has to maintain, update, and reconcile an external map from keys to locations — a notorious source of staleness, corruption, and coordination cost in address-based systems. Instead, adding an item simply places it in the space; retrieving it means moving toward it from a nearby cue.
This consolidation is what lets the same conceptual machinery span scales. At the neural level, learning is the sculpting of an energy landscape whose minima are the stored memories; at the engineering level, indexing is the construction of a space in which nearest-neighbor queries are cheap. In both cases the complexity of "where is it stored and how do I find it?" dissolves into the single question "what is near this cue?" The system designer trades the bookkeeping burden of an explicit index for the geometric burden of arranging items so that the right things are close together.
Abstract Reasoning¶
Once a system is recognized as a content-addressable memory, one can immediately infer a cluster of robustness properties without inspecting its internals. It will tolerate partial cues and noise, because any sufficient fragment lies in the basin of attraction of the stored item and converges to it; it will support graceful, similarity-graded recall rather than all-or-nothing lookup; and it will have a bounded capacity beyond which stored patterns interfere and recall degrades — a property quantified for the Hopfield model by Amit, Gutfreund, and Sompolinsky (1985), who derived the critical storage ratio above which the retrieval states are destroyed. [10] These inferences hold whether the substrate is neurons, CAM hardware, or embedding vectors, because they follow from the structure of content addressing rather than from any particular implementation.
The same recognition licenses counterfactual reasoning across domains. If biological recall tolerates a noisy cue, can an engineered index be built to do the same by embedding items in a similarity space? If a Hopfield network suffers spurious attractors when overloaded, should we expect a brain or a vector store to confabulate when packed past capacity? These transfers are not loose metaphors; they are predictions licensed by a shared structural commitment.
Knowledge Transfer¶
The attractor-dynamics account of biological recall transfers directly to Hopfield networks and to the modern associative-memory and attention layers of deep learning, where a query retrieves a similarity-weighted combination of stored values. In the reverse direction, the content-addressable-memory hardware insight — match on content, in parallel, in a single step — transfers to the design of similarity-search indexes in vector databases, which approximate parallel content matching over millions of items.
What travels is the structural commitment itself: key and value in one space, proximity drives recall. A neuroscientist who understands hippocampal pattern completion can read a transformer's attention head as a soft associative read; a hardware engineer who understands CAM can recognize a vector index as its scaled, approximate descendant; a cognitive psychologist who understands encoding specificity can predict which cues will fail in any of these systems. The transfer is grounded in shared structure, not surface resemblance, which is why insights move between domains that share almost no vocabulary.
Examples¶
Formal/abstract¶
Hopfield network (attractor recall): Consider a network of N symmetrically connected binary neurons whose weights are set, by a Hebbian outer-product rule, to store a handful of target patterns. Each stored pattern becomes a local minimum of an energy function, and the surrounding states form its basin of attraction. Present the network with a corrupted version of one stored pattern — flip a third of its bits — and let it update: the state slides downhill in energy until it settles into the nearest stored minimum, recovering the clean pattern. No address was ever specified; the corrupted pattern is the query, and convergence is the retrieval. Push too many patterns into the same network and the minima begin to merge and spurious minima appear, and recall fails. Mapped back: This is the prime in its purest formal dress — partial content as cue, similarity-driven convergence, recovery of the whole, and a capacity ceiling beyond which interference destroys recall. The basin of attraction is the geometric form of "tolerance to a partial cue," and the spurious minimum is the formal shadow of confabulation.
Encoding-specificity in human recall: A laboratory experiment presents subjects with word pairs (e.g., ground–COLD) and later tests recall of the capitalized targets. Subjects given the original context word ground as a cue recall COLD far better than subjects given a strong free-associate of the target (e.g., hot), even though hot is "closer" to COLD in general semantic terms. The effective cue is the one that overlaps with what was actually encoded, not the one that is intuitively most related. Mapped back: Recall here is content-addressed — the cue retrieves by sharing representational content with the stored trace — but it lays bare that the relevant "space" is the encoding space, not an objective semantic space. The structural commitment (proximity drives recall) holds exactly; what counts as proximity is fixed at storage time, which is why the wrong-but-related cue fails.
Applied/industry¶
Content-addressable memory in a network router: A router must decide, for every incoming packet, which output port matches the packet's destination address, consulting a forwarding table with hundreds of thousands of entries — at line rate. Rather than search the table sequentially or via a tree, the router stores entries in ternary CAM hardware and presents the destination address as a search word. Every CAM cell compares its stored content against the search word in parallel, and the matching entry asserts its location in a single clock cycle. The address being looked up is matched against the content of the table, not used as an index into it. Mapped back: This is the engineered limit case of the prime — the cue (search word) is matched against stored content in one parallel step, exactly the "the representation is the index" collapse. The brain achieves the same parallelism through recurrent dynamics over milliseconds; the CAM achieves it through dedicated comparison circuitry in nanoseconds, but the structural logic — recover the stored item by matching on content — is identical.
Vector database for semantic search: A customer-support product embeds every past ticket as a high-dimensional vector using a language model, and stores the vectors in an approximate-nearest-neighbor index. When a new ticket arrives, it is embedded into the same space, and the system retrieves the handful of past tickets whose embeddings lie closest to the new one — surfacing relevant resolutions even when the new ticket shares no keywords with the old ones. A typo, a paraphrase, or a different language still lands near the right cluster. Mapped back: The new ticket is a partial, noisy cue; nearest-neighbor retrieval is the similarity-driven convergence; the surfaced past tickets are the recovered associated items. It is content-addressable memory at industrial scale, with the same graceful degradation (a garbled query still returns something relevant) and the same failure mode (an out-of-distribution query confidently returns the nearest — but wrong — cluster) that the formal Hopfield picture predicts.
Structural Tensions¶
T1: Graceful degradation and false recall are the same mechanism. The property that makes content addressing attractive — a partial or noisy cue still converges to a stored item — is precisely the property that makes it confabulate. A system that "cleans up" any nearby input toward the nearest stored pattern cannot, by construction, refuse to answer when the cue belongs to no stored item; it will return the nearest match with the same confidence it shows for a correct one. Robustness to noise and susceptibility to false memory are not separable design choices but two readings of one dynamic, which is why human memory, Hopfield networks, and vector stores all exhibit confident error.
T2: Capacity trades against fidelity. Packing more patterns into a fixed content-addressable substrate makes the stored items crowd one another in representation space, merging basins, spawning spurious attractors, and eventually destroying recall altogether past a critical load. An address-based store can hold as many items as it has addresses without the items interfering; a content-addressable store pays for its similarity-based access with a hard ceiling on how many distinguishable items it can reliably recover. More memory does not simply mean more storage; past a point it means worse recall of everything.
T3: The space that enables recall also fixes what "related" means. Because retrieval is driven by proximity, the geometry of the representation silently determines which cues will succeed and which associations are even possible. Two items that a designer considers related but that the embedding places far apart will never cue each other; two items that happen to be close will be linked whether or not that link is wanted. The system's notion of relevance is baked into the space at storage time and is largely invisible until a cue fails or an unwanted association fires.
T4: Content addressing resists deletion and editing. When the representation is the index, there is no clean address to overwrite or remove. Forgetting a specific item without disturbing its neighbors is hard, because the item is entangled in the same weights, attractor landscape, or vector neighborhood that supports everything near it. This is why targeted unlearning is difficult in neural systems and in trained models alike: the structure that makes recall robust also makes selective erasure structurally awkward, in sharp contrast to address-based stores where a single record can be deleted in isolation.
T5: Parallel content matching is powerful but expensive at the substrate. The single-step, match-everything-at-once behavior that gives CAM hardware and recurrent neural circuits their speed comes at a steep cost in the substrate — CAM cells require far more transistors, power, and silicon area than ordinary addressed memory, and biological recurrent networks devote enormous connectivity to the same end. The prime's elegance at the structural level (no separate index) is purchased by a heavy implementation tax wherever true parallel content matching is demanded, which is why approximate methods dominate at scale.
T6: The cue must be informative enough, yet the system cannot tell you when it isn't. Content-addressable recall depends on the cue carrying enough of the target's content to land in the right basin; too sparse or too ambiguous a cue lands in the wrong basin or in a spurious one. But the same mechanism that converges silently on a fragment provides no native signal that the fragment was insufficient — it returns an answer either way. Calibrating "how much cue is enough" and detecting when a query is genuinely out-of-store both require machinery outside the associative recall itself, which the pure prime does not supply.
Structural–Framed Character¶
Associative Memory sits at the structural end of the structural–framed spectrum: it names content-addressable storage and retrieval, where stored items are reached not by a separate address but by their own content or by content linked to them, so a partial or noisy cue recovers the whole. The defining commitment is that key and value inhabit the same representational space, and proximity there drives recall.
This is a purely formal architecture: it carries no verdict that one retrieval scheme is better than another, and no single field's vocabulary rides along when it is named. It can be specified without any reference to human practice, applying just as cleanly to a Hopfield network settling into an attractor and to a hippocampus completing a pattern from a fragment of a remembered scene. Invoking it recognizes a retrieval structure already operating in the system rather than importing an external frame. On every diagnostic, it reads structural.
Substrate Independence¶
Associative Memory is a highly substrate-independent prime — composite 4 / 5 on the substrate-independence scale. Its core is a clean structural commitment — content-addressable retrieval, where key and value share a representational space — and that signature is genuinely substrate-agnostic. It transfers across biological memory (the hippocampus), cognitive priming, and computation (content-addressable hardware, vector databases, Hopfield networks and ML), and the CAM-hardware match-on-content example is unusually concrete and non-metaphorical. What keeps it at 4 is that its true span is essentially neuro-cognitive-computational; it does not reach physical, social, or formal substrates in any meaningful way.
- Composite substrate independence — 4 / 5
- Domain breadth — 3 / 5
- Structural abstraction — 4 / 5
- Transfer evidence — 4 / 5
Relationships to Other Primes¶
Parents (2) — more general patterns this builds on
-
Associative Memory is a kind of Search and Retrieval
Associative memory is a specialization of search and retrieval in which storage and lookup are content-addressable: stored items are accessed through cues sharing their representational space, so a partial or noisy cue retrieves the full or linked item. It inherits the general search-and-retrieval commitment of locating relevant items from a larger store under speed and accuracy constraints, and specializes by collapsing the key-versus-address distinction: similarity in the representation space drives recall, with energy-function attractors making nearby cues converge onto stored patterns.
-
Associative Memory presupposes Network
Associative memory stores and retrieves items by content rather than by separate address, with proximity in representational space driving recall. Hopfield made this precise as a network of symmetrically coupled units settling into stored patterns as fixed-point attractors of an energy function. This presupposes network: a set of entities with pairwise connections studied at the level of connection pattern, where structure carries enough explanatory power to predict flows, reachability, and dynamics. Without the coupling pattern among units supplying the attractor landscape, content-addressed retrieval has no mechanism.
Path to root: Associative Memory → Network
Neighborhood in Abstraction Space¶
Associative Memory sits in a moderately populated region (47th percentile for distinctiveness): it has near-neighbors but no dense thicket of synonyms.
Family — Perception, Memory & Pattern (13 primes)
Nearest neighbors
- Priming — 0.80
- Analogy — 0.80
- Interpretation — 0.79
- Memory Palace (Method of Loci) — 0.79
- Simile — 0.79
Computed from structural-signature embeddings · 2026-05-29
Not to Be Confused With¶
Associative memory must be distinguished, first and most importantly, from Pattern Recognition, its nearest neighbor. Pattern recognition is the operation of classifying a stimulus into a known category — its structural shape is input → label, mapping a possibly novel instance onto one of a fixed set of categories or types. A vision system that reports "this image contains a cat," a spam filter that tags a message "spam," and a clinician who reads an X-ray as "pneumonia" are all recognizing patterns: the output is a category assignment, and the categories typically number far fewer than the inputs that map onto them. Associative memory, by contrast, has the shape cue → stored content: it does not assign a category but retrieves a specific stored item or association from a partial or related cue. The output space of associative memory is the set of stored items themselves, not a compressed set of labels. The two can look superficially alike — both take an input and return something, both tolerate noisy inputs — but their structural commitments diverge. Pattern recognition compresses many inputs to few categories and discards instance-specific detail; associative memory preserves and returns the full stored instance, including the very detail that recognition would throw away. A system that tells you what kind of thing a cue is, is recognizing; a system that hands you the particular thing a cue points to (the whole song from a few notes, the full episode from one detail), is recalling associatively. The relationship is complementary rather than competitive: a recognizer often uses an associative store as a subroutine (retrieve the nearest exemplars, then label), and an associative store's basins of attraction can be read as categories, which is exactly why the two are so easily conflated and why their similarity score is high.
Associative memory is also not Priming, with which it is intimately related but structurally distinct. Priming is a transient facilitation effect — exposure to a stimulus temporarily increases the accessibility of related stimuli, so that having just seen doctor you recognize nurse faster. Priming is an observable phenomenon, measured in milliseconds of facilitation; it is a symptom of an underlying architecture. Associative memory is that underlying architecture: the content-addressable storage-and-retrieval substrate whose learned linkages between items are what priming exploits and reveals. To say "priming occurs" is to make an empirical claim about behavior; to say "the system is an associative memory" is to make a structural claim about how it is organized. One is the running of the machine; the other is the machine. Confusing them leads to the error of treating a measured facilitation as if it were itself an explanatory mechanism, when in fact priming is downstream evidence that the memory is content-addressed.
Finally, associative memory overlaps substantially with Pattern Completion — the prime whose processing first surfaced associative memory as a candidate — but the two carve the territory differently. Pattern completion names the broad operation of filling in the incomplete: inferring the hidden parts of a partially occluded object, predicting the next item in a sequence, reconstructing a degraded signal, completing a partial inference. It is defined by its goal — make a whole from a part — and is agnostic about the mechanism that achieves it; completion can be done by interpolation, by generative inference, by rule, or by associative recall. Associative memory, by contrast, names a specific substrate: content-addressable storage on which one major form of completion runs. When pattern completion is achieved by presenting a fragment to a content-addressable store and letting it converge to the stored whole, the two coincide — and this is the common case in attractor models of the hippocampus, which is why the concepts are routinely discussed together. But completion can occur with no stored item to recall (a generative model completing a never-before-seen scene), and associative recall can occur where the goal is not completion of a partial object but retrieval of an associated one (a name from a face). The clean separation is this: pattern completion is an operation defined by its output (the whole from the part); associative memory is a storage discipline defined by its access mechanism (by content, not by address). Completion is what is being done; associative memory is how, in the cases where it is done by content-addressed recall.
Solution Archetypes¶
No catalogued solution archetypes reference this prime yet.
Notes¶
Associative memory spans a narrow but unusually concrete band of substrates: neural, cognitive, and computational. Unlike many cross-domain primes whose transfer is partly metaphorical, the transfer here is literal — content-addressable memory hardware and Hopfield networks instantiate the same retrieval mathematics that models of hippocampal recall invoke. This is why its substrate-independence is rated solidly but not maximally: the structure is genuinely substrate-agnostic within the neuro-cognitive-computational triangle, yet it does not reach into physical, social, or formal substrates in any meaningful way.
A recurring confusion is to treat "associative" as a loose synonym for "anything memory-related" or "anything to do with learned associations." The prime is more specific: it picks out the content-addressable access discipline, the commitment that the key and the value share a representational space. A system can store learned associations and still not be associative in this sense — for instance, if those associations are stored in an address-indexed table consulted by exact key. The diagnostic question is always about the access mechanism, not about whether associations happen to be present.
The relationship to modern deep learning deserves a note. The attention mechanism at the heart of transformer models is best understood as a soft, differentiable associative read: a query vector is compared by similarity (dot product) against a set of key vectors, and the result is a similarity-weighted average of the corresponding value vectors. This is content addressing made continuous and learnable, and recognizing it as such connects a vast contemporary literature to the much older lineage of CAM hardware, Hopfield networks, and hippocampal models. Several recent "modern Hopfield network" results make the connection explicit, showing that certain attention layers are equivalent to continuous-state associative memories with exponential storage capacity.
A final caution concerns capacity and interference. Because content-addressable stores degrade by crowding rather than by simple overflow, their failure is gradual and treacherous: recall quality declines and false retrievals creep in well before any hard limit is reached. Designers accustomed to address-based stores, where capacity is a clean wall, routinely underestimate how early and how silently an associative store begins to confabulate as it fills.
References¶
[1] Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558. Formalizes content-addressable memory: stored patterns are the stable fixed points (attractors) of an energy function, so any state in a pattern's basin of attraction converges to it, collapsing storage and retrieval into one proximity geometry. ↩
[2] Slade, A. E., & McMahon, H. O. (1956). A cryotron catalog memory system. Proceedings of the Eastern Joint Computer Conference (AIEE-IRE), 115–120. First physical content-addressable store: a superconducting "catalog memory" that locates a word by matching on its content rather than consulting an address line. ↩
[3] Kohonen, T. (1977). Associative Memory: A System-Theoretical Approach (Communication and Cybernetics, Vol. 17). Springer-Verlag. First monograph on distributed associative (content-addressable) memories; systematizes correlation-matrix models, linear associators, and neural implementations under one matrix-algebraic recall rule and separates address-based from content-based access. ↩
[4] Marr, D. (1971). Simple memory: a theory for archicortex. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 262(841), 23–81. Proposes the hippocampal archicortex as a recall mechanism that reinstates a full event from a fragment of itself, an explicit anatomical reading of content-addressable recall. ↩
[5] Treves, A., & Rolls, E. T. (1994). Computational Analysis of the Role of the Hippocampus in Memory. Hippocampus, 4(3), 374–391. Modern quantitative formulation of hippocampal pattern completion, analyzing capacity, basins of attraction, and interference properties of autoassociative networks implementing CA3 completion. ↩
[6] Tulving, E., & Thomson, D. M. (1973). Encoding specificity and retrieval processes in episodic memory. Psychological Review, 80(5), 352–373. States the encoding-specificity principle: a retrieval cue is effective to the extent that it overlaps in content with what was encoded at storage. ↩
[7] Pagiamtzis, K., & Sheikholeslami, A. (2006). Content-addressable memory (CAM) circuits and architectures: A tutorial and survey. IEEE Journal of Solid-State Circuits, 41(3), 712–727. Surveys CAM hardware in routers, switches, and caches that compares the search word against stored content in parallel and returns the matching location in a single clock cycle. ↩
[8] Ramsauer, H., Schäfl, B., Lehner, J., Seidl, P., Widrich, M., Gruber, L., Holzleitner, M., Adler, T., Kreil, D., Kopp, M. K., Klambauer, G., Brandstetter, J., & Hochreiter, S. (2020). Hopfield Networks is All You Need. arXiv:2008.02217 (ICLR 2021). Shows the transformer attention mechanism is equivalent to a continuous-state (modern Hopfield) associative read: a query retrieves a similarity-weighted blend of stored values in a single update. ↩
[9] Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620. Introduces the vector-space model in which documents and queries are points in a representational space and retrieval is by similarity rather than exact-key match — the premise of vector and semantic search. ↩
[10] Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985). Storing infinite numbers of patterns in a spin-glass model of neural networks. Physical Review Letters, 55(14), 1530–1533. Derives the critical storage ratio (α_c ≈ 0.14) for the Hopfield model above which the patterns crowd, interfere, and the retrieval states are destroyed. ↩