Table Of ContentUnsupervised Learning in Neuromemristive Systems
Cory Merkel and Dhireesha Kudithipudi
Department of Computer Engineering
Rochester Institute of Technology
Rochester, New York 14623-5603
Email: {cem1103,dxkeec}@rit.edu
Abstract—Neuromemristive systems (NMSs) currently repre- we believe this work will advance the state of unsupervised
sent the most promising platform to achieve energy efficient learning in NMSs and help others do the same.
neuro-inspired computation. However, since the research field
6 is less than a decade old, there are still countless algorithms II. BRIEFDESCRIPTIONOFMEMRISTORS
1 and design paradigms to be explored within these systems. One
0 particular domain that remains to be fully investigated within Technically, a memristor can be defined as any two-
2 NMSs is unsupervised learning. In this work, we explore the terminal device with a state-dependent Ohm’s law [5]. More
designofanNMSforunsupervisedclustering,whichisacritical concretely, a memristor is a thin film (I) sandwiched between
n
a element of several machine learning algorithms. Using a simple a top (M) and bottom (M) electrode. The stack is referred to
memristorcrossbararchitectureandlearningrule,weareableto
J as a metal-insulator-metal (MIM) structure because the film
7 aclcuhsiteevreinpge.rformance which is on par with MATLAB’s k-means material is nominally insulating. That is, in its stoichiometric
2 crystalline form it will have a large band gap and not enough
free carriers to conduct. The film is made conductive by
] introducing defects in the crystalline structure, either through
T I. INTRODUCTION
fabrication, applying an electric field, or both. Defects may be
E
The present research explores multiple aspects of unsuper- interstitial metallic ions which are oxidized at one electrode
s. vised learning in a class of neurologically-inspired computer and then drift to the other, where they are reduced. Defects
c architectures referred to as neuromemristive systems (NMSs). mayalsobevacanciessuchasoxygenvacanciesinaTiO2film.
[ Although they are closely related, NMSs differ from neuro- In addition, defects may be changes in polarization, such as
1 morphic systems–pioneered by Mead in the late 1980s [1]– thoseinferroelectricfilms,orevenjustchangesincrystallinity
v in two respects: First, they are designed using a mixture of as in phase change memory. In some films, the defect profile
2 CMOS and memristor technologies, which affords levels of can be gradually adjusted by applying electric fields for short
8 connectivityandplasticitythatarenotachievableinneuromor- durations, yielding incremental modifications to the film’s
4 phicsystems.Theseconddistinction,whichismoresubtlebut overall conductance. In other films, only two conductance
7 equally important, is that NMSs focus on abstraction, rather states can be reached. Moreover, there is usually a minimum
0 than mimicking, of neurological processes. This abstraction amountofenergyrequiredtoeffectchangeinthefilm’sdefect
.
1 generally improves the efficiency of the resulting hardware profile. This often translates to a threshold voltage which
0 implementations. must be applied across the film to change its conductance.
6 Giventheconstantevolutionofmemristortechnology,itmakes
1 NMS research and development took off rapidly in the little sense to design an NMS around any specific memristor
: late 2000s, coinciding with a growing interest in two-terminal deviceparameters.Instead,weassumedeviceswillhavethese
v
i memristors, which will be described briefly in Section II general characteristics: (1) a large minimum resistance value
X for unfamiliar readers. The utility of these systems has been (e.g. in kΩs), (2) a large OFF/ON resistance ratio (at least
r demonstratedinvariousapplicationdomains,especiallyimage 103), (3) high endurance (ability to switch many times before
a processing/analysis. See [2] for a review. Learning in these failing),(4)highretention(non-volatility),and(5)incremental
systems has primarily been supervised. Although there are conductance states that can be reached by applying bipolar
some examples of unsupervised learning in spike-based sys- voltagepulsesaboveaparticularthersholdvoltage.Allofthese
tems [3], it is relatively unexplored in non-spiking NMSs. properties have been demonstrated in various devices. See [6]
One example where unsupervised learning in non-spiking for a review.
networks has been demonstrated is in [4], where principal
componentanalysis(PCA)isusedtoreducethedimensionality III. CLUSTERINGALGORITHMDESIGN
of images. However, the authors to not discuss any circuit for
Clustering algorithms uncover structure in a set of m
implementing the PCA algorithm in hardware.
unlabeled input vectors {u(p)} by identifying M groups, or
In this research, we propose a non-spiking NMS design clustersofvectorsthataresimilarinsomeway.Inonecommon
for unsupervised clustering. The NMS is tested on samples approach, each cluster is represented by its centroid, so the
from the MNIST database of handwritten digits. To the best clustering algorithm is reduced to finding each of the M
of our knowledge, this is the first work that explores general centroids. This can be achieved through a simple competitive
aspects of clustering in these systems. Given its ability to learning algorithm: Initialize M vectors w by assigningthem
i
reducedatadimensionalityandaidinclassification,webelieve to randomly-chosen input vectors. These will be referred to
thatclusteringisakeyprimitiveforfutureNMSs.Furthermore, as weight vectors. Then, for each input vector, move the
Algorithm 1 Proposed clustering algorithm. A
T 𝑥 𝑥 … 𝑥
W 1 2 𝑀
1: Map inputs to hypercube vertices.
n
423::: Ifonriftoiearpliopzce=hw1=e:im1g:hNtdeovpeoccthosrsdtoo random input vectors. ecnatsiDoitalucla …
56:: dx∗ii,p==(cid:26)w01,,i·uod(t∗ihp,p)er=w∀iims=eax1(,d2∗i,,p.)..∀,iM=1,2,...,M rotsirmrabssoC UWpedigahtet
7: ∆wi,j = αxiu(jp) ∀i = 1,2,...,M ∀j = eMrCs
1,2,...,m tu 𝑢 𝑢 … 𝑢
p 1 2 𝑁
8: end for n
I
9: end for
Fig.1. BlockdiagramofproposedNMSforunsupervisedclustering.
closest weight vector a little closer. After several iterations, 𝑖𝑢1(𝑝) 𝑖𝑢2(𝑝) 𝑖𝑢𝑁(𝑝) Inputs
the algorithm should converge with the weight vectors lying 𝑣 𝑣 𝑣
at (or close to) the centroids. Of course, there are several 𝑤𝑖,1 𝑤𝑖,2 𝑤𝑖,𝑁
parameterswhichmustbedefined,includingadistancemetric train_en
for measuring closeness. The most obvious choice is the
(cid:96)2-norm. However, computing this is expensive in terms of …
𝑅
hardware because it requires units for calculating squares
and square roots. In addition, as we will discuss later, it is
easy to use a high-density memristor circuit called a crossbar … 𝑣𝑑𝑖∗,𝑝
to compute dot products between input and weight vectors.
Therefore, it is preferred to use a dot product as a distance
𝑅
metric. For example, if all of the vectors are normalized
Memristor Crossbar for 𝐰
((cid:107)u(p)(cid:107) = (cid:107)w (cid:107) = 1), then w ·u(p) > w ·u(p)∀w (cid:54)= w , 𝑖
i i∗ i i i∗
where w is the closest weight vector to u(p). However, the
i∗
constraint that (cid:107)u(p)(cid:107) = (cid:107)w (cid:107) = 1 creates a large overhead, Fig.2. Crossbarandsummingamplifiercircuitforcomputingthedistance
i
betweentheinputandaweightvector.
because every input vector has to be normalized and every
weight vector has to be re-normalized each time it is updated.
IV. NMSHARDWAREDESIGN
We propose the following solution: Map each input vector
to the vertex of a hypercube centered about the origin: u(p) ∈ TheunsupervisedclusteringalgorithmdiscussedinSection
{−1,1}N, where N is the dimensionality of the input space. III can be implemented efficiently in an NMS by representing
Now, w ·u(p) will yield a scalar value d∗ between −N and weight vectors as memristor conductances. A block diagram
i i,p
+N. Moreover, this scalar value can be linearly transformed of the proposed design is shown in Figure 1. The inputs,
toadistanced whichisthe(cid:96)1-norm,orManhattandistance, which are represented as positive and negative currents, are
i,p
between the weight vector and the input: fedthroughM crossbarcircuits.Togetherwithanon-inverting
summing amplifier, (represented as a circle), each crossbar
d ≡N −d∗ =(cid:88)N |w −u(p)|. (1) computesthedistancebetweenthecurrentinputandtheweight
i,p i,p i,j j vector represented by its memristors’ conductances.
j=1
Theconfigurationofthecrossbarandsummingamplifieris
shown in Figure 2. Memristors in the top row inhibit, or con-
Using this distance metric, we don’t ever need to re- tribute a negative component to the output, while memristors
normalize the weight vectors. Furthermore, mapping input in the bottom row excite, or contribute a positive component
vectors to hypercube vertices can usually be accomplished by to the output. Therefore, each crossbar column represents one
thresholding. For example, grayscale images can be mapped component of one weight vector w , which can be positive or
i
by assigning -1 to pixel values from 0 to 127 and +1 to negative. If we assume that the op amp has a high open loop
pixel values from 128 to 255. Algorithm 1 summarizes the gain and the wire resistances are small, then
algorithm. The first two lines are initialization steps. Within
N (cid:18) (cid:19)
otthhfeethwdeoeuciglbohlsetecsftoovmrepclotoonorepn(ctxsailoleifsdt1hthewewhweiinnnnneierrc)aorarernedmsp0oovonetdhdsecrtwlooistsheere.tTionhdtehenxe, vd∗i,p =(cid:88)j=1iu(jp)R GG12−+GG21 i,j, (2)
current input vector using a Hebbian update rule. The pre- where G and G are the top and bottom memristors in each
1 2
factor α, which is called the learning rate, determines how far column, respectively. The output of the circuit is a voltage
the weight vectors move each time they win. Notice that this representationofthedistancebetweenthecurrentinputandthe
algorithm is completely unsupervised, so there are no labeled weight vector represented by the crossbar. The weight vectors
input vectors. are modified by connecting them to write voltages v using
wi,j
Fig.3. 10clustercentroidsfoundinasetof1000MNISTimagesusingtheproposedNMS.
a training enable signal train_en. The write voltages are #10 5
determined by the value of ∆w in line 7 of Algorithm 1. 1.8
i,j
Specifically, if ∆w is negative, then v will be a negative
voltage below theim,jemristor’s write thrweis,jhold, and if ∆w 1.7 MATLAB
i,j Proposed
is positive, then v will be a positive voltage above the
memristor’s write wthir,jeshold. Otherwise, the write voltage is 1.6
zero. J
1.5
So far, we have only discussed the memristor crossbar and
distance calculation parts of Figure 1 (line 5 in Algorithm 1.4
1). The winner-takes-all circuit (line 6 in Algorithm 1) can
be implemented in a number of ways. In this work, we used 1.3
the current-mode design described in [7]. Finally, the weight 0 100 200 300 400 500
update (line 7 in Algorithm 1) can be computed using simple Epoch
combinational logic circuits.
Fig. 4. Cost function versus epoch while clustering MNIST images using
theproposedNMS.
V. CLUSTERINGMNISTIMAGES
Oneexcitingapplicationoftheproposedhardwareisauto-
matically identifying clusters in sets of images. We took 1000 REFERENCES
images (m=1000) from the MNIST handwritten digit dataset [1] C.Mead,“Neuromorphicelectronicsystems,”Proceedings
and clustered them using a behavioral model of the NMS of the IEEE, vol. 78, no. 10, pp. 1629–1636, 1990.
describedinthelastsection.Eachimagewasoriginally20×20
[2] D. Kudithipudi, C. Merkel, M. Soltiz, G. S. Rose, and
grayscale pixels (N=400). They were mapped to hypercube
R.Pino,“Designofneuromorphicarchtectureswithmem-
vertices using the thresholding approach discussed earlier. In ristors,” in Network Science and Cybersecurity, R. Pino,
addition, we used 10 clusters (M=10), 500 training epochs
Ed. Springer, 2014, pp. 93–103.
(N =500), and α=0.005. The results are shown in Figure
train [3] S. Yu, B. Gao, Z. Fang, H. Yu, J. Kang, and H.-S. P.
3. Here, we have plotted the weight vectors representing the
Wong, “A low energy oxide-based electronic synaptic
centroid of each cluster. Figure 4 shows the cost versus the
device for neuromorphic visual systems with tolerance to
training epoch, where the cost is defined as devicevariation.”AdvancedMaterials,vol.25,no.12,pp.
m 1774–9, Mar. 2013.
(cid:88)
J = (mind ∀i). (3) [4] S.Choi,P.Sheridan,andW.D.Lu,“DataClusteringusing
i,p
Memristor Networks.” Scientific reports, vol. 5, p. 10492,
p=1
Jan. 2015.
We see that the cost function for the proposed NMS ap-
[5] L.Chua,“Resistanceswitchingmemoriesarememristors,”
proaches that of MATLAB’s built-in k-means clustering after
AppliedPhysicsA,vol.102,no.4,pp.765–783,Jan.2011.
500 epochs.
[6] D. Kuzum, S. Yu, and H.-S. P. Wong, “Synaptic electron-
ics: materials, devices and applications.” Nanotechnology,
VI. CONCLUSIONS vol. 24, no. 38, p. 382001, Sep. 2013.
[7] J. Lazzaro, S. Ryckebusch, M. A. Mahowald, and C. A.
The goal of this work was to explore both algorithmic and
Mead, “Winner-take-all networks of O(N) complexity,” in
hardwaredesignaspectsofunsupervisedlearninginNMSs.To
AdvancesinNeuralInformationProcessingSystems,1988,
that end, we proposed a clustering algorithm that maps inputs
pp. 703–711.
to vertices of a hypercube, and then iteratively finds clusters’
centroids using a Hebbian learning rule. We argue (although
we haven’t proven) that the proposed algorithm can be im-
plemented more efficiently in an NMS than algorithms that
use either (cid:96)2-norm or cosine similarity as a distance function.
ThealgorithmwasimplementedinacustomNMSdesignthat
leverages crossbar circuits to compute the distance between
inputs and weight vectors. To test our design, we clustered
1000 MNIST images and found the results to be consistent
with MATLAB’s k-means clustering implementation.