Reservoir Computing in Artificial Spin Ice

Artificial spin ice (ASI) are systems of coupled nanomagnets arranged on a 2D lattice. ASIs are promising computing substrates due to the rich variety of emergent behavior, accompanied by considerable control and flexibility. Computational models may exploit the small-scale dynamics of the individual elements, or large-scale emergent behavior of the resulting metamaterial. We investigate the computational capabilities of “pinwheel” ASI, whose emergent ferromagnetic patterns can be observed at different scales. Within a reservoir computing framework, we examine how key system parameters affect performance using well-established reservoir quality metrics. As reservoir output, we consider system state at different granularities, ranging from individual magnets to the collective state of multiple magnets. Our results show that pinwheel ASI exhibits excellent computing capacity, including evidence of fading memory. Interestingly, a wide range of output granularities result in good performance, offering new insights into the scalability and robustness of reservoirs based on self-organized collective behavior. The apparent flexibility in output granularity show that ASIs have computational properties at different abstraction levels, from the small-scale dynamics of simple elements, to the large-scale spatial patterns of the metamaterial.


Introduction
In ASI, each nanomagnet behaves as a macrospin, analogous to the atomic spins in bulk materials. Collectively, the macrospins form a magnetic metamaterial, whose emergent properties can be controlled directly by the placement, orientation and shape of the nanomagnets.
Furthermore, established nanofabrication methods make ASIs readily available for real-world exploration. Unlike atomic spins, the mesoscopic size of the nanomagnets enables direct observation of the macrospin states through magnetic microscopy. Micromagnetic simulations are feasible for smaller systems (Leliaert et al., 2018;Jensen et al., 2018), while large-scale behavior can be captured by mesoscopic models (Jensen et al., 2020).
As systems of coupled spins, ASIs are natural substrates for neuromorphic computing. Like biological computing systems, the coupled nanomagnets form large spatial networks of nonlinear nodes, where computation is closely linked to memory. Computation in neuromorphic systems is inherently parallel, the result of interactions between large numbers of simple elements.
An alternative view is ASI as a metamaterial: when observed at larger scales, magnetic patterns emerge as a result of the underlying macrospin interactions. The metamaterial view is a natural fit for material computation (Stepney et al., 2018). Compared to bulk materials, metamaterials offer considerable control and flexibility, and opens for the design of exotic substrates with unusual physical behavior. Furthermore, computation with large-scale emergent phenomena offers an inherent robustness, as small differences in the underlying state are washed out in the aggregate view.
Here, we explore the computation arising from these alternate ASI views. By observing the system at different scales (adjusting the amount of "squinting"), it is possible to move gradually between the two views: at the smallest scale we have the network of spins, while at larger scales we approach the metamaterial. How does the scale of observation affect computation? This question has practical implications for computing devices based on ASI, where the readout of magnetic state necessitates sensor circuitry with an associated cost. Note that the same is true for all physical computing devices: readout of state has a cost which scales with a growing number of outputs.
Specifically, we investigate the computational properties of "pinwheel" ASI within a Reservoir Computing (RC) framework. Using established RC metrics, we study how key system parameters affect performance. We consider different output granularities to define the reservoir nodes, ranging from single magnets to the aggregate of multiple magnets.

Background Artifical Spin Ice
ASIs have received considerable interest over the last decade, primarily as a model system for the study of fundamental physics. The name "artificial spin ice" stems from the use of engineered systems to mimic the arrangement of molecules in water ice. Established nanofabrication techniques coupled with the ability to directly observe macrospin states, has enabled the study of a wide range of physical phenomena in ASI (Skjaervø et al., 2020). In ASI systems, each nanomagnet behaves as a binary mesoscopic spin. The small size ensures a uniform internal magnetization (a single-domain state), while an elongated shape constrains the orientation of the magnetization to lie along the long axis (a binary state).
The artificial spins are coupled via the magnetic dipoledipole interaction: each magnet is subject to the stray magnetic field of neighboring magnets.
The particular arrangement and orientation of the magnets is referred to as the geometry, which effectively defines the nature of the magnet-magnet interactions. Fig. 1a depicts square ASI, which consists of horizontal and vertical magnets arranged on two square lattices. The sub-lattice with vertical magnets is placed at an offset from the sub-lattice with horizontal magnets, as indicated by the different colors. Pinwheel ASI is shown in Fig. 1c, and is obtained by rotating each magnet in square ASI by 45 • about its center. Some geometries result in antiferromagnetic ordering, where domains of zero net magnetization are energetically favorable. Fig. 1b shows the collective magnetization in square ASI, with the emergence of antiferromagnetic domains (white regions). In antiferromagnetic systems, only the boundaries of the domains have an observable magnetization at larger scales. Pinwheel ASI, on the other hand, exhibits ferromagnetic behavior, i.e., the magnets form domains with coherent magnetization of non-zero magnitude. Fig. 1d shows emergent ferromagnetic patterns as found in pinwheel ASI. Ferromagnetic domains are also clearly visible at large scales, making pinwheel ASI ideal for our study.
There are a myriad of ways to tune the behavior of ASIs. For example, the lattice spacing (distance between magnet centers) determines the size of the anti-or ferromagnetic domains: a smaller spacing results in larger domains. Small changes to the geometry can result in fundamentally different behavior. Novel geometries provide a seemingly endless playground for exploration of self-organization and emergence in-materio. In addition, there are several ways to tune behavior externally, without altering the system, e.g., through an external magnetic field or temperature.

Reservoir Computing
Reservoir Computing (RC) is a methodology which allows a dynamical system to be exploited for computation (Jaeger, 2001;Maass et al., 2002). The key component is the dynamical system, which is referred to as the reservoir. An input signal perturbs the reservoir, which, as a result of its inherent properties, produces a complex dynamic response. The reservoir functions as a nonlinear kernel with memory, maintaining a rich repertoire of nonlinear input transformations. Subsequently, a linear readout layer is trained to produce some desired function as a weighted sum of reservoir states. Crucially, the readout layer is the only trained part of the system, i.e., both the input layer and the reservoir remains unchanged.
State of the art performance has been obtained using RC methods for a variety of tasks, both with classical neural reservoirs (Lukoševičius and Jaeger, 2009) as well as a range of physical reservoirs (Tanaka et al., 2019). A variety of magnetic reservoirs have been proposed, such as magnetic tunnel junctions (Furuta et al., 2018), spin torque oscillators (Torrejon et al., 2017) magnetic skyrmions (Prychynenko et al., 2018), magnetic thin-films (Nakane et al., 2018) and dipole coupled nanomagnets (Nomura et al., 2019). The latter two examples bear some resemblance to ASIs, as magnetic metamaterials consisting of dipole coupled nanomagnets.
Good reservoirs are nonlinear, high dimensional dynamical systems with rich dynamics. Interactions between nodes in the reservoir facilitates the formation of nonlinear memory, i.e., where the state of a node is a nonlinear function of current and previous inputs. Crucially, the reservoir should have the echo-state property which, informally, means the reservoir gradually forgets over time.
ASIs are promising reservoirs since they exhibit many of the above-mentioned properties. Magnetic switching is inherently nonlinear, hence a large number of magnets is a high-dimensional nonlinear system. Magnetic dipole-dipole interactions enable the flow of information between nodes, with the potential for memory formation. Reservoir state can be observed directly as the state of individual spins, or through emergent patterns at coarser granularities.
As dynamical systems, ASIs exhibit a large number of attractors, due to the highly degenerate energy landscape. Earlier work has shown that different attractors can be reached by encoding input as a global external magnetic field (Jensen et al., 2018). Consequently, the system state forms a spatial representation of input history, i.e., exactly the kind of behavior sought in a reservoir.

Reservoir quality
A range of methods have been proposed to evaluate reservoir quality, ranging from benchmark tasks such as speech recognition and signal classification, to more generic measures such as memory capacity (Jaeger, 2002) and information processing capacity (Dambre et al., 2012).
In this work, we employ two generic measures related to signal classification, namely the kernel-quality and generalization-capability (Legenstein and Maass, 2005).
Kernel-quality is a measure of how well the reservoir is able to separate temporal input patterns. It is estimated by perturbing the reservoir with m different input signals. At the end of each signal, the reservoir states are recorded as the columns of an n × m matrix M K where n is the number of reservoir nodes. Computing the rank K of this matrix gives a measure of kernel-quality (higher is better). If the kernel rank K = m, then it is guaranteed that any assignment of target outputs can be implemented by a linear readout. If K < m, the kernel rank can still be viewed as a measure of computational power, since it is a measure of the number of "degrees of freedom" the readout has available.
Kernel-quality is insufficient alone as a measure of reservoir quality. A complementary property is the reservoir's ability to generalize to new unseen input signals.
Generalization-capability is measured the same way as kernel-quality, except the n×m matrix M G is now the reservoir states after seeing m similar input signals. We wish for the generalization rank G of this matrix to be low, meaning the reservoir states are similar and should generalize well.
A good reservoir maximizes K while minimizing G, hence a combined measure of computing capacity Q can be obtained by simply taking the difference: Q = K − G (higher is better). Q is a measure of the usable nodes in the reservoir, i.e., nodes with both good kernel-quality and generalization-capability.
Theoretically, the information processing capacity is bounded by the number of reservoir nodes n (Dambre et al., 2012). Since the rank of a matrix is bounded by its smallest dimension, one should choose m ≥ n to avoid saturation of the measures before the theoretical limit. Thus K, G ∈ [1, n] and Q ∈ [1 − n, n − 1], where Q = n − 1 indicates the best possible performance, while Q ≤ 0 indicates a reservoir with no usable computing capacity.
When comparing reservoirs with different numbers of nodes, the normalized rank r provides a measure of computational power per node: r = R/n where R is the matrix rank. We denote the normalized versions of K, G and Q as k, g and q, respectively. Thus, k, g ∈ [1/n, 1] and −1 < q < 1 with q = 1 − 1/n indicating the best possible performance while q ≤ 0 represents a reservoir with no usable capacity (Haynes et al., 2015).

Magnetic model
For our computational study, we use the flatspin ASI simulator, which enables fast simulations of dynamics in coupled spin systems (Jensen et al., 2020). In flatspin, magnets are modeled as point dipoles with binary state. Each dipole is affected by neighboring magnets through magnetic dipoledipole interactions, as well as a global external field.
Dynamics in flatspin are deterministic, modeled as a series of single spin flips. A spin may flip (switch state) if the total magnetic field acting on it is sufficiently strong, i.e., exceeds its intrinsic coercive field, and is directed in the opposite direction of its magnetization.
The global parameter α scales the strength of the dipoledipole interactions. A large value of α denotes a high degree of coupling between the spins. An increase in α is equivalent to reducing the lattice spacing between all magnets.

Input encoding
As input we consider temporal binary patterns, i.e., the input is a function u(t) ∈ {0, 1} for discrete time t = 0..T . For each input bit we cycle the external field at a fixed field strength H at an angle determined by the input bit: φ 0 for 0 and φ 1 = φ 0 + 90 • for 1. The 90 • offset ensures both 0 and 1 will perturb the system with the same amount of force (due to the pinwheel geometry). To break symmetry, we set φ 0 = 7 • , which causes each input bit to affect magnets in one sub-lattice slightly more than the other. We use a small angle to still allow switching to occur in both sub-lattices.

Output granularity
As reservoir output we record the magnetization of the ASI. The number of reservoir nodes n depends on the granularity of observation (the level of "squinting"). Fig. 2a illustrates nodes containing different numbers of magnets: a single spin, four magnets and twelve magnets. Each group of magnets results in two reservoir nodes, one for each vector component of the collective magnetization. At the finest granularity, we resolve the binary state of individual spins, i.e., the number of reservoir nodes n equals the number of spins N .
To define coarse-grained nodes, we superimpose a regular S × S grid onto the ASI, as shown in Fig. 2b. The magnetization of the spins within each grid cell is then summed to produce an aggregate output vector, as shown in Fig. 2c. For an S × S grid, we obtain n = 2S 2 reservoir nodes, as each grid cell results in two nodes.
The grid won't necessarily align with the underlying ASI geometry, thus the number of magnets within each grid cell may vary. This can be seen in Fig. 2b, where some cells contain four magnets while others contain five.
A decrease in the number of grid cells results in an increase in state resolution, as nodes can take more possible values. Hence, a coarse-grained view offers more computational power per node, at the cost of fewer nodes.
When multiple magnets are aggregated, the reservoir state is effectively degenerate: there will be multiple spin configurations which produce the same vector sum.
Due to manufacturing imperfections there will always be variation in the coercive fields of the magnets. Hence, we apply a disorder of 5% to the coercive fields h (i) k of each magnet i, i.e., the coercive fields are sampled from a normal distribution with mean h k and standard deviation 0.05h k . We define an ASI sample as a set of N coercive fields {h (i) k }. We start with an initially polarized ASI, such that the total magnetization is saturated towards the right (as illustrated in Fig. 1c). Next, the input signal is applied through the external field. For kernel-quality we use m = 220 random binary input signals, each 100 bits in length. For generalizationcapability, we use m = 220 random binary input signals where the first 40 bits are random and the remaining 60 bits are equal across the signals. Hence the generalization rank, at the end of the input signal, is a measure of how sensitive the system is to inputs older than 60 time steps.
In the following, we vary the strength of the external field H and the coupling strength α. For each experiment we generate 30 ASI samples, and take the average rank.
Full visibility First, we consider reservoirs with full visibility of the 220 magnets as output (n = 220 reservoir nodes). For each ASI sample, we sweep the coupling strength α and the strength of the external field H, and measure the corresponding K, G and Q. We sweep 16 values of α in the range 3e−5 to 3e−3, which roughly corresponds to lattice spacings from 1000 nm to 215 nm. For each α value, we sweep 16 values of H in the range 66 mT to 81 mT.
Output granularity Next, we investigate how the output granularity affects performance. Grids of size 1 × 1 to 10 × 10 are superimposed onto the ASI, resulting in n = 2 to n = 200 reservoir nodes. For each n we calculate the corresponding K, G and Q. When comparing performance across different number of nodes n, we use the normalized rank measures k, g and q. We maintain the same number of input patterns m = 220, i.e., independently of n.   Figure 4: Average kernel rank K, generalization rank G and computing capacity Q, along the ridge lines of Fig. 3, i.e., for each α value, the highest value of K, G and Q is plotted.

Results
Full visibility Fig. 3 shows the results of the parameter sweep of H and α, as heatmaps of the average kernel rank K, generalization rank G and computing capacity Q = K − G. Each cell in the heatmap is the average of the 30 different ASI samples. All measures exhibit a ridge line in the H-α plane, which drops quickly for low α values. The ridge shows an apparent linear relationship between H and α, in terms of computational performance.
As can be seen in Fig. 3a, kernel rank K is generally high along the ridge. In Fig. 3b, a similar but thinner ridge is apparent for the generalization rank G. The K and G ridges are in the same location of the H-α plane. In Fig. 3c, the ridge line of their difference Q is shifted slightly to the right. Fig. 4 plots K, G and Q along the ridge lines in Fig. 3, as a function of α, i.e., for each α value, the highest value of K, G and Q is plotted. A general trend is a decline in both K and G as α is increased. K nearly saturates for 1e−3 < α < 2e−3 with ranks as high as 215 on average (recall that the maximum rank is 220). The Q ridge, on the other hand, is fairly flat as a function of α, with an apparent maximum for α = 1.02e−3 and H = 78 mT. However, we note that the standard deviation of K (and hence Q) is significantly higher for large values of α. Fig. 5 shows how G evolves over time, i.e., measured after being perturbed with each of the 100 input bits, for α = 1.02e−3 and H = 78 mT. Recall that after the first random 40 bits, the input signals are identical for the remaining 60 bits. As can be seen in Fig. 5, the average rank drops quickly at t = 40, after which there is a somewhat gradual decline. Inspecting the trajectories of G for the individual ASI samples reveals that there are variations in the behavior: for some samples, the rank drops quickly, while others exhibit a more gradual decline.
Output granularity Fig. 6 shows similar heatmaps of K, G and Q, but using a coarse-grained output with a 5 × 5 grid (n = 50 nodes). Compared to the heatmaps using full visibility of all magnets (Fig. 3), both K and G exhibit wider ridge lines. As a consequence, the Q ridge is shifted diagonally towards higher H and α values. Saturation of K is still obtained for large regions of parameter space (here the maximum rank is 50).
Comparing the heatmaps from all granularities (not shown), a general trend is that, as output becomes more coarse-grained, the Q ridge line moves diagonally in the heatmaps towards higher H and α. Fig. 7 shows the average normalized rank measures k, g and q = k − g, as the number of reservoir nodes n are increased (resulting from the increasing number of grid cells). For completeness, the plots also include results with full visibility at n = 220. Fig. 7a shows the measures for α = 1.02e−3 and H = 78 mT, i.e., the parameters with the best performance from the full visibility experiment. As can be seen, a decrease in n results in poor generalization g and hence a decrease in performance per node q. There's an apparent peak of q ≈ 0.6 for n = 162. Fig. 7b shows the same plot for an increased field strength of H = 79 mT. With a stronger field, much better generalization is obtained. In this case, we observe an increase in q as n is decreased, with an apparent peak at q ≈ 0.8 for n = 50.

Full visibility
Our investigation of the H-α parameter space demonstrates salient features of ASI reservoirs. For most values of α, there exists a corresponding critical field strength H, which is neither too weak (resulting in little activity), nor too strong (causing all magnets to switch).
Clearly, spin interactions play a crucial role in the formation of a complex dynamic response, since low α values result in poor kernel-quality. Intuitively, in an uncoupled system, the state of the spins will only be affected by the current input. Memory formation requires sufficient flow of information between the spins. If the coupling is too weak, the current state will be completely overwritten by new input, and all history of previous inputs is lost.
The saturation of kernel rank, shown in Fig. 3a, demonstrates excellent input separation for large regions of parameter space. In these cases, the ASI states contain sufficient information to discriminate between all the input signals. However, the reservoirs with the highest kernel-quality suffer from poor generalization-capability. High generalization rank is evidence of chaotic dynamics, where the initial difference in states persists for a long time.
The measure of computing capacity Q (Fig. 3c) exhibited a ridge line which, compared to the ridge lines of K and G ( Fig. 3a and Fig. 3b), is shifted slightly towards stronger fields (larger H). A stronger external field will consequently result in more spin flips per input, thus overwriting more of the system state. Indeed, chaotic reservoirs may still be used successfully, as long as the input is sufficiently strong to drive its dynamics out of the chaotic regime (Ozturk and Principe, 2005).
The decrease in kernel-quality as function of coupling strength α is expected, since the size of ferromagnetic domains increase with α, and hence there is less variation in the spin states. Smaller domains result in more diverse spatial patterns, and consequently a richer repertoire of input transformations. However, the parameter regions with the highest kernel-quality have very poor generalization-capability, resulting in poor overall performance. As a result, the capacity measure Q predicts no significant difference in performance between loosely and highly coupled systems. Still, the higher variance observed for large α values is evidence that the particular ASI sample, i.e., the set of coercive fields, plays a more important role for highly coupled ASIs compared to the loosely coupled systems.
The observed gradual decrease in generalization rank over time is clear evidence of fading memory, where past input is gradually forgotten over time. The plot in Fig. 5 is remarkably similar to the time-wise separation observed in neural microcircuits (Maass et al., 2004). Although there are variations in the behavior, the results indicate that ASI reservoirs can indeed exhibit the echo-state property.

Output granularity
Our results revealed that a change of output granularity affects the performance landscape in the H-α parameter space. Parameters which perform well with full visibility of all spins perform poorly with a coarse-grained view.
As the number of spins per node is increased, the areas in the parameter space with good performance move towards stronger fields and higher coupling (compare Figs. 3c and 6c). Interestingly, a small increase in field strength seems to be sufficient to improve performance under a coarse-grained view (Figs. 7a and 7b).
Under a coarse-grained view, one might expect that the more strongly coupled systems would have a benefit, since larger magnetic domains would still be visible without significant information loss. However, we find no evidence of this in our results. In fact, the strongly coupled systems appear to perform worse, regardless of output granularity. Since the computational capacity is bounded by the number of reservoir nodes (Dambre et al., 2012), it would seem like full visibility of all spins is always beneficial. However, in any physical reservoir there will be a cost associated with measurement of state, placing practical limits on the number of output nodes. Additionally, a coarse-grained view brings some additional benefits, which we discuss below.
The normalized rank measures (Figs. 7a and 7b) revealed that, for a system consisting of binary elements, the computing capacity per node q can be increased by combining multiple elements into one node. The increase can be attributed primarily to an increase in the degrees of freedom per node, i.e., as a node can take more possible values. This was confirmed by thresholding the aggregate values, effectively making the grid cells "super-spins", which resulted in a fairly flat q across the different granularities (not shown).
For a given number of reservoir nodes, it should be possible to maximize performance by scaling up the underlying system, while maintaining a fixed-size coarse-grained view.
Another potential benefit of a coarse-grained view is robustness: the output will be less sensitive to small differences in the underlying spin state. If a spin inadvertently flips, e.g., due to noise, its immediate effect will be small under a coarse-grained view. With full visibility, however, the readout may be more sensitive to a single spin flip.
The results show that, at least for pinwheel ASI, there is a great degree of freedom in choosing the output granularity. We may observe the system at a range of different scales, and still obtain good performance.

Conclusion
ASIs are promising computing substrates due to the wide variety of emergent behavior, which can be directly controlled by the system geometry. We have shown how the inherent properties of pinwheel ASI result in complex spatiotemporal patterns that can be readily exploited for computation. Our experiments demonstrate excellent computing capacity in terms of well-established reservoir quality measures. We find clear evidence of fading memory, suggesting the presence of the crucial echo-state property.
An exciting finding is that good performance can also be obtained with a coarse-grained metamaterial view of the system. Although the size of our magnetic system was fixed, our results indicate that ASI reservoirs are scalable, both in terms of the number of nodes as well as the computing capacity per node. The apparent flexibility in output granularity show that ASIs have computational properties at different abstraction levels, from the small-scale dynamics of simple spins, to the large-scale spatial patterns of the metamaterial.
The fact that meaningful computations can be obtained with a very coarse-grained view of the substrate, shows that physical ASI reservoirs are not only possible, but also practical.