Some Afterthoughts on Hopfield Networks

.


L Introduction
In his 1982 paper [12], John Hopfield introduced a very influential associative memory model which has since come to be known as the discrete-time Hop- field (or symmetric) network.Particularly Hopfield nets compared with general asymmetric networks have favorable convergence properties.Part ofthe appeal of Hopfield nets also stems from their natural hardware implementation, e.g., Ising spin glasses [3], optical computers [7], etc. Hopfield nets are well suited for appli- cations that require the capability to remove noise from large binary patterns.Besides associative memory, the proposed uses of Hopfleld networks include, e.g., fast approximate solution of combinatorial optimization problems [13,31].Although the practical applicability of Hopfield nets seems to be limited because of their low storage capacity, this fundamental model inspired other importarrt neural network architectures such as BAM, Boltzmann machines, etc. [23].Thus the theoretical analysis of Hopfield nets is also worthy for r¡nderstanding the computational capabilities of the corresponding models.
We will first briefly specify the model of a finite dliscrete recurrent neural network, The network consists of n simple computational un'its or ner.trons, in- dexed as 1,.,. ,tu, which are connected into a generally cyclic oriented graph or ørch'itecture in which each edge (i, j) leading from neuron i to j is labelled with arr integer weight w(i, j) : w¡t.Tlne absence of a connection within the ar- chitecture corresponds to a zero weight between the respective neurons.Special attention will be paid to Hopfi,eld, (symmetric) networkq whose architecture is an undirected graph with symmetric weights u(i, j) : w(j,i,) for every 'i,, j. * Resea¡ch supported by GA ÕR Grant No. 20l/98/0717 \Me will mostly consider ttre sgnchronotae computational dynamics of the network, working in fullg parallel mod,e, which determines the evolution of the networlc statuy(t): (git), ...,gÍ:\ € {0,1}'for all discete time insta¡rts ú: 0, 1, .. .as follows.At the beginning of the computation, the network is placed in an 'init'ial stúe y(0) which may include an external input.At discrete time ú ) 0,eachneuron jl",...,ncollectsitsbinary'inputsftomthestates(outputs) a[ù e {0,1} of incident neurons i.Then its integer erci.tation{t) : ILo .¡nuÍ')(j : t,..',n) is computed as the respective weighted sum of inputs including arr integer bias w¡o which can be viewed as the weight of the formal constant unit input at) : L. At the next instant t*1, ffi act'iaation functiono is applied to 6j¿) for all neurons j : !r.. ., n in order to determine the new network state y(ú+1) as follows: a(t+t)_"(ej,,) j:!,...,n (1) where a bi,na:ry-state neural network employs the hard lim'iter (or threshold) an- tivation function o(€):{å i:: Ë:s (2) Alternative computational dynamics are also possible in Hopfield nets.For example, under sequential mode only one neuron updates its state according to (1) at each time instant while the remaining neurons do not change their outputs.
Or in Section 5 we will deal with the finite ønalog-state discrete-time recurent neural networks which, instead of the threshold activation function (2), employ e.g. the saturo,ted,-linear sigmoid activation function L for {) { for 0( 0 for(< Hence the states of analog neurons are real numbers within the interval [0, 1], and similarly the weights (including biases) are allowed to be reals.
The funda¡nental property of the symmetric net is that a bounded Liaprurov, or 'energy' function can be defined on the state space which is properly decreasing along any nonconstant computation path (producúiue computation).Narnely, for a sequential computation of a Hopfield net (for the simplicity, with zero feedbacks wii:0 and biases w¡o:}tand non-zero excitations €Í') +0, i:\r...,n) an energA associated with state g(Ú) at time ú ) 0 can be defined as follows: E (s(')) : Eþ): -iäþo,,noÍ')a5') (¿)   for which Hopfield showed that -E(t) <,ø(¿-1)-1 for every ú ) 1 of a productive computation [12].Moreover, the energy function (4) is bounded, i.e. l.ø(ú)l < l{z where 1 J. *3. .w: i¿llwi'l is called lhe wei,ght, of the network.Hence, the computation must converge to a stable state within time O(I;I/).An analogous result can be shown for parallel update where a cycle of length at most two different states may appear [22].
The present paper discusses four relatively independent issues regarding the computational properties of Hopfield networks and presents the corresponding new results and observations concerning the computational equivalence of asymmetric and Hopfield networks, convergence time analysis, polynomial time ap proximate solution of the minimum energy problem, and the T\uing universality of analog Hopfield nets.Unfortr¡¡rately the proofs here a,re sketched or omitted due to the lack of space and can be found in the respective draft version [29].

A Size-Optimal Simulation of Asymmetric Networks
The computational power of Hopfield nets is properly less than that of asymmetric networks because of their different asymptotic behavior.Hopfield nets cannot enter the limit cycle of a given length as the asymmetric networks can.However, it is known [20] that this is the onlg featwe that cannot be reproduced, in the sense that any conuerg'ing fully parallel computation by a network of n discrete-time binary neurons, with in general asymmeüric interconnections2 can be simulated by a Hopfield net of quadratic size O(nz).More precisel¡ there exists a subset ofneurons in the respective Hopfield net whose states correspond to the original convergent asymmetric computation in the course of simulation possibly with some consta¡rt tïme overhead per each original update.The idea behind this simulation is that each directed edge is implemented by a small symmetric subnetwork which receives energy support from a symmetric clock subneüwork (a binary counter) [11] in order to propagate a signal in the right direction.This result may also be interpreted within the context of infinite families of neural networks which, each for one input length, can be exploited for uni- versal computations (similarly as circuit families).Thus the infinite sequences of discrete symmetric networks with polynomial number of neurons in terms of the input length are computationaþ equivalent to (nonwriform) polynomially space bou¡ded T\uing machines, i.e. they compute the complexity class PSPACE/poly or P/poly when polynomial weights are considered [20].
In the following theorem the construction from [20] is improved by reducing the number of neurors in the simulating symmetric network to the linear size 6n * 2 which is asymptotically optimal.This is achieved by simulating the neurons (instead of edges) whose states are updated by means of the clock technique.A similar idea was used for an analogous continuous-time simulation [28].
This result can be interpreted in the sense that convergent asymmetric networks are computationally equivalent with symmetric ones to a greater degree when con:sidering also the network size.
Prool. (Sketch) Observe, first, that any converging computation by an asymmet- ric network of n binary neurons must terminate within t* < 2n steps.A basic technique used in our proof is the exploitation of an (n, * L)-bit symmetric clock subnetwork (a binary counter) which, using 3n * 1 units, produces a sequence of 2' well-controlled oscillations (0111)2* before it converges.This sequence of clock pr,rlses generated by the least significant counter unit cs is used to drive the rest of the network.The construction of the (n+ l)-bit binary counter is omitted.
We only assume that the corresponding weights are accommodated so that the clock is not influenced by the simulating subnetwork.In addition' neuron rt is added which computes the negation of cs output.
Then for each neuron j from the asymmetric network, 3 r¡nits P¡rQ¡,r¡ are introduced in the Hopfield net so linat p¡ represents the new (current) state yj¿) of j at time ú ) 1 while Ç¡ stores the old state y{¿-1) of j from the preceding time instant ú -1", and r¡ is an auxiliary neuron realizing the update of the old state.The corresponding symmetric subnetwork simulating one neuron j is depicted in Figure 1 where the parameter W is the network weight (5).Here the symmetric connections between neurons are labelled with corresponding weights, and the biases are indicated by the edges drawn without an originating unit.In the sequel the s¡rmmetric weights in the Hopfield net will be denoted by ttl whereas t¿ denotes the original asymmetric weights.The total number of units simulating the asymmetric network is 3n, * 1 (including õe) which, together with the clock size 3n* 1, gives the desired 6n*2 neurons of the Hopfield net. the original initially active neurons j,i,u.AÍo) : 1.Then an asymmetric network update at time ú > 1 is simulated by a cycle of four steps in the Hopfield net as follows.In the first step, unit cs fires and remains active until its state is changed by the clock since its large positive bias makes it independent on all the ?? neurons ?¡.Also the unit cs fires because it computes the negation of cç that was initially passive.At the same time each neuron pl computes the new state gjú) from the old ones gÍt-t) *hich are stored in corresponding units q¿.Thus each neuronp¡ is corurected with units q¿ via the original weights w'(q¿,p¡): w(i, j) and also its bias w'(0,P¡): tø(0,i) is preserved.So far, unit q¡ keeps the old state y{¿-1) due to its feedback.In the second step, the new state Uj¿) is copied from p¡ to r¡t artd.the active neuron c0 makes each neuron pj passive by mears of a large negative weight which exceeds the positive influence from r¡nits ø (i:1,., .,n)including its bias w(O,pj) according to (5).Similarl¡ the active neuron de €rases the old state y(ú-1) from each neuron 8¡ by making it passive with the help of a large negative weight which exceeds its feedback and the positive influence from units p¿ (i :1,...,n).Finall¡ also neuron rt becomes passive since cs was active.In the third step, the current state gjú) is copied from r¡ to q¡ since all the remaining incident neurons p¿ and cT¡ are and remain passive due to cs being active.Therefore also unit rj becomes passive.In the fourth step, cs becomes passive and the state yjt), being called old from now on, is stored in q7.Thus the Hopfield net finds itself at the starting point of the next asymmetric network update simulation at time ú*1 which proceeds in the same way.Hence the whole simulation is achieved within 4ú* discrete-time steps.

tr 3 Convergence Time Analysis
In this section the conuergence t'ime in Hopfield networks, which is the number of discrete updates until the network converges, will be analyzed.We will consider only the worst case bounds while the average-case analysis can be fou¡rd in [17].Obviously, there a¡e exactly 2n di-fferent states in a network with r¿ binary neurons which yields trivial 2" upper bound on the convergence time in symmetric networks of size n.On the other hand, the symmetric clock network [11] which is used in the proof of Theorem 1 represents an explicit example of a Hopfield net whose convergence time is exponential with respect to n. Na,rnel¡ this gives A(2"/z¡ lower bound on the convergence time of Hopfield nets since the respective (k + l)-bit binary counter requires n:3lc * 1 neurons.However, the above-mentioned bounds do not take the weight size into account.The corresponding upper bound O(W) is derived from the energy func- tion (see Section 1) which can even be made more accurate by using a slightly different energy fu¡rction [8].This yields the polynomial upper bound on the convergence time of Hopfield nets with polynomial weights.Similar arguments ca¡r be used for fully parallel updates.
In the following theorem these results will be translated into the conver- gence time bounds with respect to the length of bit representations of Hopfield nets.Namely, for a symmetric network which is described within M bits, the convergence-time lower and upper bounds, 2a(M1/s) and.2o(M'/'), respectiveþ will be observed.It is an open problem whether these upper or lower bounds can be improved.This is an important issue since the convergence-time results for binary-state networks could be compared with those for analog-state (or even continuous-time) networks in which the precision of real weight pa,rameters (i.e. the representation length) plays an importa^nt role.For example, there exists an analog-state symmetric network with an encoding size of M bits that converges after 2n(s(M)) continuous-time units, where g(M) is an arbitrary continuous function such that g(M): o(M), g(M) = 9(M2/3), arrd M/g(M) is increas.
ing [28].¿FYom the result presented here it follows that the computation of this analog symmetric network terminates later than that of any other discrete Hop- field net of the same representation size.This approach also appears to be more rigorous since we express the convergence time with respect to the full descrip tional complexity of the Hopfield net instead of to the number of neurors which captures its computational sources only partially.Theorem 2. There erists a Hopf,eld.network with o,n encoding size of M bi,ts that conuerges øfter2aQøt/s¡ upd,ates ønd any computøtion of a symmetric network with a binary representation of M b'its terrn'i'nates within 2o(Ma/z¡ d'iscrete computøtionøl steps. Prool. (Sketch) For the underþing lower bound the clock network from the proof of Theorem 1 can again be exploited.For the upper bound, consider a Hopfield network with an M-bit representation that converges after ?(M) updates.A major part of this M-bit representation consists of rn binary encodings of weights 1rr¡ ..,, un of the corresponding lengths Mt,' .',M* where DT=t M, : @ (M)' Clearly, there must be at least T(M) difierent energy levels corresponding to the states visited during the computation.Thus the underlying weights must produce at least S > T(M) different sums !,n, w, for A Ç {1' . . . ,rn} where w, for r € .A agrees withw¡,;.for y¿-aj:l in (4).So' it is sufficient to upper bound the number of different sums over rn weights whose binary representations form a @(M)-bit string altogether.This yields "(M) < zo(M'/').tr 4 Approximating the Minimum Energy Problem Another important issue in Hopfield nets is the MIN ENERGY or GROUND STATE problem of finding a network state with minimal energy ( ) for a given symmetric neural network.Remember that in (4) it is assumed, for reasons of simplicity, that w¡¡ = 0 and ujo :0 for j -1,'.',n.fn addition, without loss of generality [21], we will work throughout this section with frequently used bipolar states -1, L ofneruons instead ofbinary ones 0,1 introduced in (2) where 0 is now replaced by -1.This problem appears to be of a special interest since many hard combinatorial optimization problems have been heuristically solved by minimizing the energy in Hopfield nets [L, 13].This issue is also important in statistical physics which originally inspired the Hopfield net models, e.g.Ising spin glasses [3].
Unfortu¡rately, the decision version of the MIN ENERGY problem, i.e. whether there exits a network state having an energy less than the prescribed value, is NP-complete.This can be observed from the above-mentioned reductions of hard optimization problems to MIN ENERGY.For an explicit NP-completeness proof see e.g.[32] where a reduction from SAT is exploited.On the other hand there is a MIN ENERGY polynomial algorithm for special cases of Hopfield nets whose architectures are planar lattices [6] or planar graphs [3].
Perhaps, the most direct and frequently used reduction to MIN ENERGY is from the MAX CUT problem (see e.g.[4]) which, given an undirected graph G : (V,E) with an integer edge evaluation c: E I Z, is the issue of finding a cutVl C V which maximizes the cut s'ize c(v): t c({i,j})-I c({i,,i}). (6)  {i,j}eÛ¡ieh,jØVt {i,j}Qg,c({i,j})<0 In fact, this is a generalized version of MAX CUT that allows negative edge eval- uations necessary for the opposite reduction from MIN ENERGY to MAX CUT.Recently, a new randomized approximation algorithm with a high performance guarantee o : 0.87856 for this MAX CUT formulation has been proposed [10] and later derandomized [19] which we will exploit for approximating the MIN ÐNERGY problem.Namely, we will observe that MIN ENERGY can be approx- imated in a polynomial time within the absolute error less than 0.243Il' where I,lz is the network weight (5).For W : O(n2) which is satisfied by e.g.Hopfield nets with r¿ neurons a¡rd constant weights, this result matches the lower bor¡nd A(n'-") which cannot be guaranteed by any approximate polynomial time MIN ENERGY algorithm for every e > 0 [4], unless P : llP.In addition, an approxi- mate polynomial time MIN ENERGY atgorithm with absolute error O(nf logn) is also known in a special case of Hopfield nets whose architectures are two-level grids [5].Theorem 3, The MIN ENEùGY problem for Hopfield, nets can be øpprouimated in a polgnomi,al ti,me within the absolute error less thøn 0,243W where W 'is the network weight, (5), Prool. (Sketch) We will first recall the well-known simple reduction between MIN ENERGY and MAX CUT problems.For a Hopfield network with architecture G arrd weights u(i,, j) \¡¿e can easily define the corresponding instance Q : (V, E)i c of MAX CUT with the edge evaluation "({i, j}) : -w(i, j) for {i, j} e E. It can easily be shown that any cut Vt c V of G corresponds to a Hopfield net state Ï € {-1,L}' where U¿ : -J, if i e Vt ñd g¿ : -L for i e V \ 7t, to that the respective cut size c(Vr) is related to the underlying energy E(y) : W -2c(Vr).
This implies that the minimum energy state corresponds to the maximum cut.Now, the approximate polynomial time algorithm from [10] can be employed to solve instance G : (V,E);c of.the MAX CUT problem which provides a cut I{ whose size c(V1) ) oc* is guaranteed to be at least a : 0.87856 times the maximum cut size c*.Let cut I correspond to the Hopfield network state y which implies c(I/r) : ll2(Wø(v)).Hence, we get a guarantee W -E(v) > a(W -E.) where -E* is the minimum energy corresponding to the maximum cut c* which leads to E(V) -E* < (1 -a)(W -E*).Since l,E*l 1 W, we obtain the desired guarantee for the absolute error ,E(y) -E* < (1, -a)2W < 0.243W.n 5 Turing Universality of Finite Analog Hopfield Nets   In this section we will deal with the computational power of finite analog-state discrete-time recurrent neural networks.For the asymmetric analog networks, the computational power is known to increase with the Kolmogorov complexity of real weights [2].With integer weights such networks are equivalent to finite automata 114,15,30], while with rational weights arbitrary T\rring machines can be simulated [15,25].With a¡bitrary real weights the network can even have 'super-T\rring' computational capabilities, e.g.polynomial time computations correspond to the complexity class P/poly and all languages can be recognized within exponential time [24].On the other hand, any amount of analog noise reduces the computational power of this model to that of finite automata [18].
For finite symmetric networks, only the computational power of binary-state Hopfield nets is fully characterized.Nameþ, they recognize the so-called Hop- field languages [26] which establish a proper subclass of regular languages and hence, they are less powerful than finite automata.Hopfield la^nguages can also be faithfully recognized by analog symmetric neural networks [I8,271and this pro vides the lower bor¡nd on their computational power.A natural question arises whether the finite analog Hopfield nets are T\ring universal, i.e' whether a Ttrr- ing machine simulation can be achieved with rational weights similarly as in the asymmetric case [15,25].The main problem is that under fully parallel update any analog Hopfield net with rational weights converges to a limit cycle of length at most two [16].Thus the only possibility of simulating T\.ring machines is to exploit a sequence of rational network states converging to this limit cycle whidr seems to be tricky if possible at all.A more reasonable approach is to supply an external clock that produces an infinite sequence of binary pulses providing the symmetric network with an energy support, e.g. for simulating an asymmetric analog network similarly as in Theorem 1.In this way the computational power of the analog Hopfield nets with an external clock is proved to be the sarne as that of the asymmetric analog networks.Especially for rational weights, this implies that they are T\rring universal.The following theorem also completely characterizes the infinite binary sequences by the external clock, which prevent the Hopfield network from converging.Theorem 4. Let N be an analog-state recurrent neural network with real asym- metric weights and n neurons working i,n a fullg parallel mod,e.Then there enists an analog Hopfield, net N' wi,th the some max'imum, Kolmogorou compleaity of a real we'ight as that'in N ønd withSn*8 units such that N' simuløtes arùA cortupu- tation of N for øny b'inary sequence which'is generated by an additional ertemal i,nput of Nt satisfying the followi,ng property.Namelg, th'is sequence mntst con- tain the infni.tenumber of substrings of the forrn br6 e {0,1}3 where b I E. In add'it'ion, th'is property òs necessøry to preuent N' from conuerging.