AI Archives | Weebit

Pushing the Boundaries of Memory:What’s New with Weebit and AI

Eran Briman — Wed, 01 Oct 2025 07:53:36 +0000

Memory Made Smarter: Weebit Nano’s Role in the AI Hardware Revolution

Artificial intelligence is transforming nearly every industry, from autonomous driving to healthcare to connected devices. But as AI models grow more complex, the biggest barrier to progress is no longer the raw compute; it’s the movement of data. Every time information travels between the memory and processor, precious speed and power are lost.

Memory plays a key role in overcoming this barrier. Our advanced Resistive RAM (ReRAM / RRAM) technology is not only a more efficient embedded non-volatile memory (NVM) than flash; it is also the foundation for new computing paradigms that can dramatically accelerate AI.

Smarter Memory for AI SoCs

Next-generation AI systems-on-chips (SoCs) are typically built on 22nm and smaller technologies. Unlike embedded flash, which cannot scale below 28nm, ReRAM continues to scale to advanced nodes. This allows the memory to be placed closer to the processor, a critical advantage for these systems.

By storing neural network weights directly on-chip, embedded ReRAM eliminates the need for external memories, reducing cost, power, size, and security risks. For today’s AI accelerators and microcontrollers, embedding ReRAM closer to processing units is a critical step forward. Near-memory computing (NMC) minimizes data movement by placing memory directly alongside logic. This enables faster access to weights and parameters, cutting latency and improving energy efficiency for AI inference, particularly in edge devices that must process data locally. In automotive and aerospace, ReRAM’s robust reliability, including AEC-Q100 qualification and radiation tolerance, ensures that AI systems can perform consistently even in the most demanding environments.

Above: We’ve already demonstrated the advantages of
ReRAM for near-memory computing

The next leap is towards computing inside the memory itself, called in-memory computing (IMC). ReRAM crossbars can perform matrix-vector multiplications (the core operation of neural networks) directly within the memory array.

By reducing the constant back-and-forth between memory and the processing unit, IMC promises significant speed-ups and lower power consumption for AI workloads. Weebit ReRAM is ideal for these architectures, with cost-efficiency, ultra-low power consumption, scaling advantages, analog behavior and ease of fabrication in the back end of the line (BEOL).

Looking even further ahead, Weebit ReRAM is naturally suited for neuromorphic computing, which mimics how the human brain processes information. According to Yole Intelligence, the neuromorphic computing market is expected to grow to $412 million by 2029 and $5.4 billion by 2034. The research firm expects that analog IMC solutions including those with ReRAM will ramp up starting in 2027.

The Weebit ReRAM cell functions similarly to a synapse in the brain, making it a promising solution. ReRAM devices can act as artificial synapses, with analog conductance levels representing synaptic weights. This opens the door to energy-efficient, brain-like chips capable of real-time learning and adaptation.

Above: The evolution of NVM as an enabler for AI

Collaborations Drive Innovation

Our partnership with CEA-Leti and collaborations with research institutes around the globe including ongoing neuromorphic studies, position Weebit technology as a building block for future brain-inspired processors.

Weebit is also now a member of the EDGE AI FOUNDATION, bringing our low-power, high-performance ReRAM to a dynamic community focused on uniting industry leaders and researchers to drive innovation, solve global challenges, and democratize edge AI technologies. We will be actively contributing towards this mission.

Above: A brief introduction to the EDGE AI FOUNDATION

In addition, we are collaborating with industry leaders on the development of ultra-low-power neuromorphic processing solutions under the NeMo Consortium, a three-year development program funded by the Israeli Innovation Authority. NeMo brings together research groups from major industry R&D teams and leading academia researchers across Israel. Its goal is to develop a technology infrastructure enabling neuromorphic processing capabilities for various edge products such as medical and security applications, with power consumption three orders of magnitude lower than the state of the art. The system will include dedicated hardware components, advanced AI software modules using spiking neural networks, and algorithms integrated with various sensors to enable ultra-low-power AI applications.

We are also working alongside a large group of companies under the NeAIxt project, which aims to solidify Europe’s position in edge AI and eNVM technology. The group is focused on enhancing AI enablers, evolving embedded NVM for edge applications, and demonstrating AI capabilities at both chip and system levels. The project will integrate advances in NVM technologies with cutting-edge MCU design to enable efficient in-memory computing. NeAIxt will address the entire edge AI value chain, from academia to industry, and from design to end-user applications, building on Europe’s strong technological foundation.

These are just a few of the areas where Weebit is pushing innovation in AI. You can read some of the latest papers in our Resources section.

The Road Ahead

From today’s embedded AI chips to tomorrow’s neuromorphic systems, Weebit is working to ensure that memory is no longer a bottleneck, but a driver of innovation. By making memory smarter, we are helping shape a new era of computing where intelligence is faster, more efficient, and available everywhere.

The post Pushing the Boundaries of Memory:
What’s New with Weebit and AI appeared first on Weebit.

Enabling ‘Few-Shot Learning’AI with ReRAM

Mario Pallo — Thu, 19 Jun 2025 08:24:25 +0000

AI training happens in the cloud because it’s compute-intensive and highly parallel. It requires massive datasets, specialized hardware, and weeks of runtime. Inference, by contrast, is the deployment phase — smaller, faster, and often done at the edge, in real time. The cloud handles the heavy lifting; the edge delivers the result. Now, recent advances in resistive memory technology are making edge AI inferencing more energy-efficient, secure, and responsive.

At the 2025 IEEE Symposium on VLSI Technology and Circuits, researchers from CEA-Leti, Weebit Nano, and the Université Paris-Saclay presented a breakthrough in “on-chip customized learning” — demonstrating how a ReRAM-based platform can support few-shot learning using just five training updates.

Few-shot learning (FSL) is an approach where AI models learn new tasks with only a handful of examples. It is very useful for edge applications, where devices must adapt to specific users or environments and can’t rely on large, labeled datasets.

The team didn’t just train a model — they showed that a memory-embedded chip could adapt in real-time, at the edge, without requiring cloud access, long training cycles, or power-hungry hardware. The core enabler is a combination of Model-Agnostic Meta-Learning (MAML) and multi-level Resistive RAM (ReRAM or RRAM).

MAML provides a clever workaround that can enable learning in power-constrained edge devices. Instead of training from scratch, it trains a model to learn. During an off-chip phase, the system builds a general-purpose model by exposing it to many tasks. This “learned initialization” is then deployed to edge devices, where it can quickly adapt to new tasks with minimal effort.

This means:

No need for the cloud – minimizing bandwidth and latency
Minimal data required – minimizing compute requirements at the edge
Massive time and energy savings

Executing this on edge hardware requires memory technology that can keep up — and that’s where ReRAM comes in.

Because ReRAM is a non-volatile memory that supports analog programming, it is ideal for low-power and in-memory compute architectures. ReRAM can store information as varying conductance states, which can then represent the weights (numerical values that represent the strength or importance of connections between neurons or nodes in a model) in neural networks.

However, ReRAM also comes with challenges — notably variability and some limits on write endurance. Few-shot learning helps overcome both.

Reducing Write Cycles with MAML

In terms of endurance, the key is in leveraging MAML, which enabled the research team to reduce the number of required write operations by orders of magnitude. Instead of millions of updates, they showed that just five updates — each consisting of a handful of conductance tweaks — were enough to adapt to a new task.

For the experiments, a chip fabricated on 130nm CMOS was used which has multi-level Weebit ReRAM integrated in the back end of line (BEOL). The network architecture had four fixed convolutional layers and two trainable fully-connected (FC) layers. Weights in the FC layers were encoded using pairs of ReRAM cells, storing the difference in conductance between them.

Training was carried out using a “computer-in-the-loop” setup, where the system calculated gradients and issued write commands directly to the ReRAM crossbars. In a full deployment, this would be managed by a co-integrated ASIC.

The learning task? Character recognition from the Omniglot dataset, a popular benchmark in FSL. The chip was pre-loaded with the MAML-trained parameters and fine-tuned on-device to recognize new characters using only five gradient updates.

The result:

Starting at 20% accuracy (random guess)
Reaching over 97% accuracy after five updates
Energy use of less than 10 μJ for a 2kbit array

For an optical character recognition (OCR) application using AI with a 2Kbit array, energy consumption of less than 10 μJ represents excellent energy efficiency compared to typical industry benchmarks. This level of power consumption places such a system in the ultra-low-power category suitable for edge AI applications and battery-powered devices.

Programming Strategies to Mitigate Against Drift

In ReRAM conductance levels can drift over time, and adjacent states may overlap, introducing noise. To tackle this, the team tested multiple programming strategies:

Single-shot Set: Simple, fast, but inaccurate
Iterative Set: More precise, but slower
Iterative Reset: Useful for low conductance states
Hybrid strategy: A blend of both, offering the best balance

The hybrid strategy proved most effective, reducing variability and improving long-term retention. After a 12-hour bake at 150°C (equivalent to 10 years at 75°C) the system still maintained over 90% of its accuracy.

This is critical for commercial deployment, where temperature fluctuations and data longevity are real-world concerns.

Looking Ahead

This research points to a compelling future for AI at the edge:

Learn locally: Devices can customize their behavior to individual users
Stay secure: No data needs to be sent to the cloud
Save time and energy: Minimal training and in-memory compute keep power low
Scale affordably: Meta-training can be centralized and shared across devices

And because the platform uses ReRAM, the entire system benefits from ultra-low standby power and reduced silicon area.

This work is more than a proof of concept, it’s a signpost. As more AI applications move to the edge, we’ll need memory technologies that support not just inference, but real learning. ReRAM is emerging as one of the few candidates that can deliver on that vision, especially when paired with smart algorithms like MAML.

View the presentation, “On Chip Customized Learning on Resistive Memory Technology for Secure Edge AI” from the 2025 IEEE Symposium on VLSI Technology and Circuits here.

The post Enabling ‘Few-Shot Learning’
AI with ReRAM appeared first on Weebit.

Relaxation-Aware Programming in ReRAM:Evaluating and Optimizing Write Termination

Marcelo Cueto — Wed, 28 May 2025 12:39:59 +0000

Resistive RAM (ReRAM or RRAM) is the strongest candidate for next-generation non-volatile memory (NVM), combining fast switching speeds with low power consumption. New techniques for managing a memory phenomenon called ‘relaxation’ are making ReRAM more predictable — and easier to specify for real-world applications.

What is the relaxation problem in memory? Short-term conductance drift – known as ‘relaxation’ – presents a challenge for memory stability, especially in neuromorphic computing and multi-bit storage.

At the 2025 International Memory Workshop (IMW), a team from CEA-Leti, CEA-List and Weebit presented a poster session, “Relaxation-Aware Programming in RRAM: Evaluating and Optimizing Write Termination.” The team reported that Write Termination (WT), a widely used energy-saving technique, can make these relaxation effects worse.

So what can be done? Our team proposed a solution: a modest programming voltage overdrive that curbs drift without sacrificing the efficiency advantages of the WT technique.

Energy Savings Versus Stability

Write Termination improves programming efficiency by halting the SET (write) operation once the target current is reached, instead of using a fixed-duration pulse. This reduces both energy use and access times, supporting better endurance across ReRAM arrays.

It’s desirable, but problematic in action.

Tests on a 128kb ReRAM macro showed that unmodified WT increases conductance drift by about 50% compared to constant-duration programming.

In these tests, temperature amplified the effect: at 125°C, the memory window narrowed by 76% under WT, compared to a fixed SET pulse. Even at room temperature, degradation reached 31%.

Such drift risks destabilizing systems that depend on tight resistance margins, including neuromorphic processors and multi-level cell (MLC) storage schemes, where minor shifts can translate into computation errors or data loss.

The experiments used a testchip fabricated on 130nm CMOS, integrating the ReRAM array with a RISC-V subsystem for fine-grained programming control and data capture.

Conductance relaxation was tracked from microseconds to over 10,000 seconds post-programming. A high-speed embedded SRAM buffered short-term readouts, allowing detailed monitoring from 1µs to 1 second, while longer-term behavior was captured with staggered reads.

This statistically robust setup enabled precise analysis of both early and late-stage relaxation dynamics.

To measure stability, the researchers used a metric called the three-sigma memory window (MW₃σ). It looks at how tightly the memory cells hold their high and low resistance states, while ignoring extreme outliers.

When this window gets narrower, the difference between a “0” and a “1” becomes harder to detect — making it easier for errors to creep in during reads.

By focusing on MW₃σ, the team wasn’t just looking at averages — they were measuring how reliably the memory performs under real-world conditions, where even small variations can cause problems.

Addressing Relaxation with Voltage Overdrive

Voltage overdrive is the practice of applying a slightly higher voltage than the minimum required to trigger a specific operation in a memory cell — in this case, the SET operation in ReRAM.

Write Termination cuts the SET pulse short as soon as the target current is reached. That saves energy, but it also means some memory cells are just barely SET. They’re fragile — sitting near the edge of their intended resistance range. That’s where relaxation drift kicks in: over time, conductance slips back toward its original state.

So, the team asked a logical question:

“What if we give the cell just a bit more voltage — enough to push it more firmly into its new state, but not so much that we burn energy or damage endurance?”

Instead of discarding WT, the team increased the SET voltage by 0.2 Arbitrary Units (AU) above the minimum requirement.

Key results:

Relaxation dropped to levels comparable to constant-duration programming
Memory windows remained stable at both room and elevated temperatures
WT’s energy efficiency was mostly preserved, with only a ~20% increase in energy compared to unmodified WT

Modeling predicted that without overdrive, 50% of the array would show significant drift within a day. With overdrive, the same drift level would take more than 10 years, a timescale sufficient for most embedded and computing applications.

Balancing Energy and Stability

The modest voltage increases restored conductance stability without negating WT’s energy and speed benefits. Although the overdrive added some energy overhead, overall consumption remained lower than that of fixed-duration programming.

This adjustment offers a practical balance between robustness and efficiency, critical for commercial deployment.

As ReRAM moves toward wider adoption and is a prime candidate for use in neuromorphic and multi-bit storage applications, conductance drift will become a defining challenge.

The results presented at IMW 2025 show that simple device-level optimizations like voltage overdrive can deliver major gains without requiring disruptive architectural changes.

Check out more details of the research here.

The post Relaxation-Aware Programming in ReRAM:
Evaluating and Optimizing Write Termination appeared first on Weebit.

ReRAM-Powered Edge AI:A Game-Changer for Energy Efficiency, Cost, and Security

Eran Briman — Thu, 27 Mar 2025 10:58:07 +0000

In AI inference, trained models apply their knowledge to make predictions and decisions. To achieve lower latency and better security, the world is transitioning steadily towards performing AI inference at the edge – without sending data back and forth to the cloud – for a wide range of applications.

Because edge devices are often small, battery-powered, and resource-constrained, it’s important that the computing resources enabling this process and the associated memories are ultra-low-power and low-cost. This is a challenge for AI workloads, which are known to be power-hungry.

The industry has been making progress towards lower power computation largely by moving to more advanced process nodes. This enables more performance with greater energy efficiency in smaller silicon area. However, non-volatile memories (NVMs) haven’t been able to scale to advanced nodes along with logic. Today we see advanced chips in process nodes of 3nm. At the same time, embedded flash memory is unable to scale below 28nm. This means that NVM and AI engines are often manufactured at very different process nodes and can’t be integrated on the same silicon die.

This is one of many reasons why the industry is exploring new memory technologies like Weebit ReRAM (RRAM).

The need for a single-die solution

Neural Network coefficients (often referred to as NN weights), which are used for computations by the inference engine, need to be stored in an NVM, so that when the system is powered-on these coefficients are available for compute workloads. Because it’s not possible to integrate flash and an AI engine on one die below 28nm, it is standard practice to implement a two-die solution, with one die at a small process node used for computing, and the other die at a larger process node used for storing the coefficients. These two dies are then either integrated in a single package or in two separate packages. Either way, such a two-die solution is more expensive and has a bigger footprint. Also, copying the coefficients from an external flash to an on-chip SRAM in the AI chip is very power hungry and creates latencies. In addition, the fact that the coefficients are moved from one chip to the other creates a security risk, as it is easy to eavesdrop this communication.

The ideal solution for edge AI computing from power, latency, cost and security perspectives is a single die that hosts both memory and compute.

A scalable, single-chip solution with ReRAM

Embedded ReRAM is the logical alternative to flash for edge AI. ReRAM is significantly more energy efficient than flash, and it provides better endurance and faster program time. Since it is scalable to advanced processes, ReRAM enables a true one-chip solution, with NVM and computing integrated on the same die.

ReRAM-enabled SoCs are less expensive to manufacture because they only require two additional masks in the manufacturing flow, while flash requires 10 or even more such masks. Embedding ReRAM into an AI SoC would eliminate the need for off-chip flash devices and replace most of the large on-chip SRAM used to temporarily store the NN weights. Since the technology is non-volatile, the system can boot much faster as there is no need to wait for loading the AI model and firmware from external NVM, and the security risk is removed. ReRAM is also much denser than SRAM, so more memory can be integrated on-chip to support larger neural networks for the same die size and cost, while enabling more advanced AI algorithms.

New Demo: ReRAM for ultra-low-power edge AI

A new demonstration showcases the advantages of Weebit ReRAM-powered edge AI computing. Developed through a collaboration between Weebit and Embedded AI Systems Pte. Ltd. (EMASS), a subsidiary of Nanoveu, the gesture recognition demo shows Weebit ReRAM working with EMASS’s energy-efficient AI SoC, the EMASS ECS-DOT. The demo emphasizes the ultra-low-power consumption of ReRAM and its ability to enable instant wake-up AI operations. In the real world, such a system could be used to detect driver activity for advanced driver safety systems, or it could be used for safety/surveillance, robotics, and many other applications.

ECS-DOT is an edge AI chip manufactured in a 22nm process that delivers significant energy efficiency and cost advantages, with best-in-class AI capacity. In the demo, ECS-DOT loads the neural network weights from Weebit ReRAM where they are being stored. As noted earlier, this is a powerful feature of ReRAM – it can be used to replace the large on-chip SRAM to store the NN weights, as well as the CPU firmware.

Weebit ReRAM isn’t yet integrated into the ECS-DOT SoC, so the proof-of-concept demo shows a two-chip solution with the 22nm Weebit demo chip communicating with the EMASS chip over an SPI bus. In an end solution, the ReRAM would be integrated on-chip, eliminating latency, cost and security risks, and demonstrating even lower power consumption. Such integration can enhance system performance and also ensure scalability and sustainability, paving the way for smarter, more autonomous edge devices.

Above: ultra-low-power ReRAM based gesture recognition system
with Weebit ReRAM and EMASS AI SoC

EMASS recently made a strategic pivot away from MRAM technology and is embracing ReRAM. The company says that ReRAM is better able to support next-generation systems in IoT, automotive, and consumer electronics.

Looking Ahead

Research is now underway to bring memory and compute resources even closer together through analog in-memory compute. In this paradigm, compute resources and memory reside in the same location, so there is no need to ever move the coefficients. Such a solution using ReRAM will be orders of magnitude more power-efficient than today’s neural network simulations on traditional processors.

You can see our new demo video here:

The post ReRAM-Powered Edge AI:
A Game-Changer for Energy Efficiency, Cost, and Security appeared first on Weebit.

Innovative Memory Architectures for AI

Gideon Intrater — Tue, 09 Jul 2024 06:45:07 +0000

One of the biggest trends in the industry today is the shift towards AI computing at the edge. For many years the expectation was that the huge datacenters on the cloud would be the ones performing all the AI tasks, and the edge devices would only collect the raw data and send it to the cloud, potentially receiving the end directives after the analysis was done.

More recently, however, it has become more and more evident that this can’t work. While the learning task is a strong fit for the cloud, performing inference on the cloud is less optimal.

With the promise of lower latency, lower power and better security, we are seeing AI inference in a growing number of edge applications, from IoT and smart home devices all the way up to critical applications like automotive, medical, and aerospace and defense.

Since edge devices are often small, battery-powered, and resource-constrained, edge AI computing resources must be low-power, high-performance, and low-cost. This is a challenge considering power-hungry AI workloads, which must rely on the storage of large amounts of data in memory and the ability to quickly access it. Some models have millions of parameters (e.g., weights and biases), which must be continually read from memory for processing. This creates a fundamental challenge in terms of power consumption and latency in computing hardware.

Data movement is a key contributor to power consumption. Within chips, significant power is consumed while accessing the memory arrays in which the data is stored and while transferring the data over the on-chip interconnect. The memory access and speed of the interconnect also contribute to latency, which limits the speed of the AI computation. Speed and power both get significantly worse when the data needs to be moved between two separate chips.

To keep edge computing resources low-power and low-latency, hardware must be designed so that memory is as close as possible to the computing resources.

The continuous move to smaller process geometries has helped to keep power consumption to a minimum and has also reduced latency for AI tasks. But while computing resources continually scale to more advanced nodes, Flash memory hasn’t been able to keep pace. Because of this, it isn’t possible to integrate Flash and an AI inference engine in a single SoC at 28nm and below for edge AI.

Today it’s standard practice to implement a two-die solution, where one die at an advanced process node is used for computing, and another die at a more mature process node is used for memory. The two dies are then integrated in a single package or in two separate packages. A two-die solution is detrimental to AI performance because memory resides far away from compute, creating high levels of power consumption, latency and total system costs.

The ideal solution is a single die that hosts memory and compute, and embedded ReRAM (or RRAM) is the logical NVM to use. Embedding ReRAM into an AI SoC would replace off-chip flash devices, and it can also be used to replace the large on-chip SRAM to store the AI weights and CPU firmware. Because ReRAM is non-volatile, there is no need to wait at boot time to load the AI model from external NVM.

Such ReRAM-based chips use less power and have lower latency than two-chip solutions. And, with a wider path to memory enabled by the width of the interface, the memory interface is no longer limited by the number of pins on the memory device. The result is faster access time, faster inference, and the potential for true real-time AI computing at the edge.

ReRAM is also much denser than SRAM which makes it less expensive than SRAM per bit, so more memory can be integrated on-chip to support larger neural networks for the same die size and cost. While on-chip SRAM will still be used for data storage, the array will be smaller and the total solution more cost-effective.

Finally, ReRAM-enabled chipsets are cheaper to manufacture, since they only require the fabrication of one die. This makes edge computing more affordable and consequently more accessible for a large array of applications.

Above: how the various memory technologies compare

You can see here the slides on this topic I presented at the Design Automation Conference (DAC) 2024.

The post Innovative Memory Architectures for AI appeared first on Weebit.

A Complete No-Brainer:ReRAM for Neuromorphic Computing

Giuseppe Piccolboni — Wed, 05 Jun 2024 07:05:54 +0000

In the last 60 years technology has evolved at such an exponentially fast rate that we are now regularly conversing with AI based chatbots, and that same OpenAI technology has been put into a humanoid robot. It’s truly amazing to see this rapid development.

Above: OpenAI technology in a humanoid robot

Continued advancement of AI development faces numerous challenges. One of these is computing architecture. Since it was first described in 1945, the von Neumann architecture has been the foundation for most computing. In this architecture, instructions and data are stored together in memory and communicate via a shared bus to the CPU. This has enabled many decades of continuous technological advancement.

However, there are bottlenecks created by such an architecture, in terms of bandwidth, latency, power consumption, and security, to name a few. For continued AI development, we can’t just make brute force adjustments to this architecture. What’s needed is an evolution to a new computing paradigm that bypasses the bottlenecks inherent in the traditional von Neumann architecture and more precisely mimics the system is trying to imitate: the human brain.

To achieve this, memory must be closer to the compute engine for better efficiency and power consumption. Even better, computation should be done directly within the memory itself. This paradigm change requires new technology, and ReRAM (or RRAM) is among the most promising candidates for future in-memory computing architectures.

Roadmap for ReRAM in AI

Given its long list of advantages, ReRAM can be used in a broad range of applications ranging from mixed signal and power management to IoT, automotive, industrial, and many other areas. We generally see ReRAM rolling out in AI applications over time in different ways. For AI related applications, relevant advantages of ReRAM include its cost efficiency, ultra-low power consumption, scaling capabilities, small footprint and fit into a long-term roadmap to advanced neuromorphic computing.

The shortest-term opportunity for ReRAM is as an embedded memory (10-100 Mb) for edge AI applications. The idea is to bring the NVM closer to the compute engine, therefore massively reducing power consumption. This opportunity can be realized today using ReRAM for synaptic weight storage, replacing the use of external flash and eliminating some of the local SRAM or DRAM. My colleague Gideon Intrater will present on this topic on Monday, June 24^th at the Design Automation Conference 2024. If you are planning to attend, please attend his presentation as part of the session, ‘Cherished Memories – Exploring the Power of Innovative Memory Architectures for AI applications.’

In the mid-term, ReRAM is a great candidate for in-memory computing where analog behavior is required. In this methodology, ReRAM is used for both computation and weight storage – at first in binary (storing two values per bit) and then moving to multi-level operations (multiple values per bit). An example of in-memory computing was proposed in 2022 using arrays based on Weebit ReRAM as Content Addressable Memories. This work, done in collaboration with the Department of Electrical Engineering, Indian Institute of Technology Delhi, is highlighted in the article, ‘In-Memory Computing for AI Similarity Search using Weebit ReRAM,’ by Amir Regev.

My colleague Amir Regev also recently wrote an article, ‘Towards Processing In-Memory,’ which explains more about the idea of in-memory computing with Weebit ReRAM, based on work done with the Department of Electrical Engineering at the Technion Israel Institute of Technology and CEA-Leti.

Above: A roadmap for ReRAM in AI – short-term, mid-term and long-term

In the longer term, neuromorphic computing comes into play. In the brain, synapses provide the connections between neurons, and they can change their strength and connectivity over time in response to patterns of neural activity. Likewise, ReRAM arrays can be used to create artificial synapses in a neural network which change their strength and connectivity over time in response to patterns of input. This allows them to learn and adapt to new information, just like biological synapses.

Areas of particular interest include Bayesian architectures and meta learning. Bayesian neural networks hold great potential for the development of AI, particularly where decision-making under uncertainty is critical. These networks actually quantify uncertainty, so such methods can help AI models avoid overconfidence in their predictions, potentially leading to more reliable, safer AI systems. The characteristics of ReRAM make it an ideal solution for these networks.

The aim of meta learning is to create models that can generalize well to new tasks by leveraging prior experience. As they ‘learn to learn,’ they continuously update their beliefs based on new data without needing to re-train from scratch, making them more adaptable and flexible than today’s methods. The idea is to develop a standalone system capable of learning, adapting and acting locally at the edge. A model would be trained on a server and then optimized parameters would be saved on the chip at the edge. The edge system would then be able to learn new tasks by itself – like humans and other animals.

Compared to current machine learning where models are trained for specific tasks with fixed algorithm on a huge dataset, there are numerous advantages of this concept, including:

Data is stored locally on the chip and not in the cloud so there is greater security, much faster reaction and lower power consumption
Computation is done in-situ so there is no need to transfer data from memory to the computation unit
The system could adapt to very different real world situations since it would imitate human learning ability

A recent joint paper from Politecnico di Milano, Weebit and CEA-Leti proposed a bio-inspired neural network capable of learning using Weebit ReRAM. The focus is on building a bio-inspired system that requires hardware with plasticity, in other words the ability to adjust its state based on specific inputs and rules, as in the case of biological synapses. You can read about this work in an article by Alessandro Bricalli, ‘AI Reinforcement Learning with Weebit ReRAM.’

This is the future of ReRAM in AI, and I can’t wait!

Overcoming hurdles

Like all memory technologies, ReRAM has both pros and cons for neuromorphic applications. On the ‘pros’ side, this includes its non-volatility, ability to scale to smaller nodes, low power consumption and the ability to have multi-level operation.

The ‘cons’ are largely due to phenomena such as limited precision of the programming conductance. ReRAM technologies are also subject to some resistance drift while cycling. Other phenomena, such as relaxation (linked to both time and temperature), can impact resistance values over time.

As we look towards using ReRAM for neuromorphic computing, we won’t let such resistance variability hold us back. There are not only ways to mitigate such factors, but also ways in which these ‘cons’ can be taken advantage of in certain neuromorphic bio-inspired circuits.

Mitigating resistance variability

One of the main ways we can mitigate resistance variability is by using Program and Verify (P&V) algorithms. The idea is quite simple: whenever a cell doesn’t satisfy a given criterion in some way, we can reprogram it and then re-verify its resistance state. Such methods allow us to fine-tune resistance levels in a given range to attain more than just the levels of low-resistance state (LRS) and high-resistance state (HRS).

We can do this in multiple ways. One way is to use a gradual method, in which we repeat the same operation over and over until a cell satisfies the condition imposed (or the maximum number of allowed repetitions has been completed). This method can be incremental, in which case the programming control parameter increases at each repetition, or cumulative, in which case the parameter is kept constant each time.

There are numerous knobs we can control, including the programming direction and level of the control parameter. The total number of P&V cycles, as well as what happens before the verify itself, can vary depending on the goal we want to achieve – whether it’s improving retention, resilience or endurance, or achieving other goals.

The Ielmini Group at the Politecnico di Milano has proposed numerous state-of-the-art algorithms which can help with further tuning. One of these is called ISPVA, in which the gate voltage of the transistor is kept constant, therefore fixing the compliance current, while the top electrode voltage is increased until the desired conductance is attained. Conversely, in the IGVVA approach, the top electrode voltage is kept constant (high enough to grant a successful set operation), while the gate voltage is increased to gradually increase the compliance current.

Variability of the programmed levels is a key parameter in in-memory computing and hardware implementation of deep neural networks. Therefore, it’s important to use algorithms that not only achieve the right level of electrical conductance but also make sure this conductance is consistent across multiple attempts. There are many other P&V algorithms we can employ, for example to reach a more stable conductive filament, reduce post programming fluctuations, or achieve another goal.

It’s important to note that P&V algorithms are not the only tools available to mitigate ReRAM variability. For instance, pulse shape can play an important role in reducing variability and therefore improving neural network accuracy. Some industry work has shown that compared to regular square pulses, triangular pulses reduce the number of oxygen vacancies after set operation, therefore improving conductive filament stability. Triangular pulses have also been shown to be effective in improving the resistance state after the reset operation.

Above: Triangular pulse shape reduces the Vo after set operation, therefore improving conductive filament stability (Y. Feng et al., EDL 2021)

Taking advantage of ReRAM’s ‘cons’ for neuromorphic computing

In a neural network, we would like synapses to have a linear and symmetric response, a large number of analog states, a high on/off ratio, high endurance and no variability. ReRAM has intrinsic variabilities, and we can at least partly mitigate such non-idealities. For neural networks, we can also use them to our advantage!

One example is in a Bayesian neural network where device variability is actually key to its implementation: the natural differences from one device to another are crucial for how it works. For instance, differences in how memory conducts electricity with each use can actually help by providing randomness, which is useful for generating numbers or for algorithms in AI that need randomness, like Bayesian reasoning.

In Bayesian methods, you don’t just get one answer from a given input; instead, you get a distribution of possible answers. The natural variation in ReRAM can be used to create this distribution. This variation is like having physical random numbers that can help perform calculations directly within the memory. This makes it possible to do complex multiplications right where the data is stored. In addition, Bayesian neural networks are resilient to device-to-device variability and system aging.

Summary

ReRAM is a good match for neuromorphic applications due to its cost-efficiency, ultra-low power consumption, scaling advantage at 28nm and below, small footprint to store very large arrays, analog behavior and ease of fabrication in the back end of the line. The conductance of ReRAM can also be easily modulated by controlling a few electrical parameters.

We can mitigate the ‘cons’ of ReRAM to make it shine in edge AI and in-memory computing applications in the short- and mid-term, respectively. In the long term, the similarity of ReRAM cells to synapses in the brain make it a great fit for neuromorphic computing. As we look to these applications for new applications, such as Bayesian neural networks, the ‘cons’ of ReRAM can not only be mitigated, but can even provide advantages.

I recently presented a tutorial at the International Memory Workshop in Seoul, during which I discussed the requirements of new neuromorphic circuits, why ReRAM is an ideal fit for such applications, existing challenges and possible solutions to improve ReRAM-based neural networks.
Please click here to view the presentation.

The post A Complete No-Brainer:
ReRAM for Neuromorphic Computing appeared first on Weebit.

Towards Processing In-Memory

Amir Regev — Thu, 14 Dec 2023 13:01:29 +0000

One of the most exciting things about the future of computing is the ability to process data inside of the memory. This is especially true since the industry has reached the end of Moore’s Law, and scientists and engineers are focused on finding efficient new architectures to overcome the limitations of modern computing systems. Recent advancements in areas such as generative AI are adding even greater pressure to find such solutions.

Most modern computing systems are based on the von Neumann computing architecture. There is a bottleneck that arises in such systems due to the separation of the processing unit and the memory. In the traditional Von-Neuman architecture, 95% of the energy is consumed by the need to transfer data from the processing unit to the memory back and forth. In systems that need fast response, low latency and high bandwidth, designers are moving the memory closer to the CPU so that data doesn’t need to travel as far. Even better is to do the processing within the memory so the data doesn’t need to travel at all. When computing in memory, logic operations are performed directly in the memory without costly data transfer between the memory and a separate processing unit. Such an architecture promises energy efficiency and the potential to overcome the von Neumann bottleneck.

Computing in-memory can be realized using non-volatile devices, with resistive random access memory (ReRAM or RRAM) as an outstanding candidate due to its various advantages in power consumption, speed, durability, and compatibility for 3D integration.

Above: the evolution of compute towards processing in memory

There are various approaches to processing in memory with ReRAM.

One approach for ReRAM-based computing is stateful logic. In this technique, memory cells are used to perform the logic operations without moving any data outside the memory array. The logical states of inputs and outputs are represented as the resistance states of the memristor devices, with logical ’0’ as a High Resistance State (HRS) and logical ’1’ as a Low Resistance State (LRS).

While promising, stateful logic techniques have yet to be demonstrated for large-scale crossbar array implementation. In addition, stateful logic is incompatible with CMOS logic and is limited by a device’s endurance.

Another approach is non-stateful logic. A non-stateful computational operation does not rely on maintaining or remembering the state of previous operations or data. The in-memory logic processes data or performs computations independently of any historical context, performing computations and making decisions quickly for applications such as real-time data processing.

In non-stateful logic, different electrical variables represent their inputs and outputs. For example, the inputs are voltages, and the output is the resistance state of the memristor. Non-stateful logic combines the advantages of computing in-memory with CMOS compatibility. Memristive non-stateful logic techniques can be integrated into a 1T1R memory array, in a similar way to commercial ReRAM products, using a ReRAM like Weebit ReRAM, which is built in a 1T1R configuration where every memory cell has a transistor and a memristive device.

In a new paper by engineers and scientists from Weebit, CEA-Leti and The Technion, “Experimental Demonstration of Non-Stateful In-Memory Logic with 1T1R OxRAM Valence Change Mechanism Memristors,” Weebit ReRAM devices were used to demonstrate two non-stateful logic PIM techniques: Boolean logic with 1T1R and Scouting logic.

The team experimentally demonstrated various logical functions (such as AND, OR and XOR) of the two techniques using Weebit ReRAM to explore their possibilities for various applications. The experiments showed successful operations of both logic types, and correct functionality of the Weebit ReRAM in all cases.

The 1T1R logic technique exhibited notable advantages due to its simplistic design, employing only a single memristor. Scouting logic demonstrated significant potential as it employs a low voltage and no switching during logical operations, promising reduced power consumption and prolonged device lifespan.

Above: Figure 6 from the paper showing the connection of two cells in parallel in an (a) 1T1R standard array and a (b) pseudo-crossbar array

Through additional research and development, the opportunities of this technology will be further explored, ultimately leading to greater efficiency in time and energy. Read the entire paper (with an IEEE subscription) here.

The post Towards Processing In-Memory appeared first on Weebit.

AI Reinforcement Learningwith Weebit ReRAM

Alessandro Bricalli — Mon, 05 Jun 2023 07:00:20 +0000

A paper from Weebit and our partners at CEA-Leti and the Nano-Electronic Device Lab (NEDL) at Politecnico di Milano was recently published in the prestigious journal Nature Communications. It details how bio-inspired systems can learn using ReRAM (RRAM) technology in a way that is much closer to how our own brains learn to solve problems compared to traditional deep learning techniques.

The teams demonstrated this by implementing a bio-inspired neural network using ReRAM arrays in conjunction with an FPGA system and testing whether the network could learn from its experiences and adapt to its environment. The experiments showed that our in-memory hardware not only does this better than conventional deep learning techniques, but it has the potential to achieve a significant boost in speed and power-saving.

Learning by experience

Humans and other animals continuously interact with each other and the surrounding environment to refine their behavior towards the best possible reward. Through a continuous stream of trial-and-error events, we are constantly evolving, learning, improving the efficiency of routine tasks and increasing our resilience to daily life.

The acquisition of experience-based knowledge is an interdisciplinary subject of biology, computer science and neuroscience known as “reinforcement learning,” and it is at the heart of a major objective of the AI community: to build machines that can learn by experience. The goal is machines that can infer concepts and make autonomous decisions in the context of constantly evolving situations.

In reinforcement learning, an agent (the neural network) interacts with its environment and receives feedback based on that interaction in the form of penalties or rewards. Through this feedback, it learns from its experiences and constructs a set of rules that will enable it to reach the best possible outcomes.

In developing such resilient bio-inspired systems, what’s needed is hardware with plasticity, i.e., the ability to adjust its state based on specific inputs and rules, as in the case of biological synapses. The lack of such commercial hardware is one of the current main limitations in implementing systems capable of learning from experience in an efficient way.

NVMs for in-memory computing

Researchers are now looking at non-volatile memories (NVMs) like ReRAM to enable hardware plasticity for neuromorphic computing. ReRAM is particularly well-suited for use in hardware capable of plastic adaptation, as its conductance can be easily modulated by controlling few electrical parameters. We’ve talked about this previously in several papers and a recent demonstration.

When voltage pulses are applied, the conductance of ReRAM can be increased or decreased by set and reset processes. This is how ReRAM stores information. In the brain, synapses provide the connections between neurons, and they can change their strength and connectivity over time in response to patterns of neural activity. Because of this similarity, ReRAM (RRAM) arrays can be used to create artificial synapses in a neural network which change their strength and connectivity over time in response to patterns of input. This allows them to learn and adapt to new information, just like biological synapses.

In addition to their ability to mimic the plasticity of biological synapses, memristors like ReRAM have several other advantages for these systems. ReRAM is small, low-power, and can be fabricated using standard semiconductor manufacturing techniques in the backend-of-the-line (BEOL), making it easy to integrate into electronic systems.

Power and bandwidth

Deep learning is extremely computationally intensive, involving large numbers of computations which can be very power-hungry, particularly when training large models on large datasets. A great deal of power is also consumed through the high number of iterative optimizations needed to adjust the weights of the network.

Deep learning models also require a lot of memory to store the weights and activations of the neurons in the network, and since they rely on traditional computing architectures, they are impacted by communication delays between the processing unit and the memory elements. This can be a bottleneck that not only slows down computations but also consumes a lot of power.

In the brain, there are no such bottlenecks. Processing and storage are inextricably intertwined, leading to fast and efficient learning. This is where in-memory computing with ReRAM can make a huge difference for neural networks. With ReRAM, fast computation can be done in-situ, with computing and storage in the same place.

The maze runner

While memristor-based networks are not always as accurate as standard deep learning approaches, they are very well-suited to implementing systems capable of adapting to changing situations. In our joint paper with CEA-Leti and NEDL we propose a bio-inspired recurrent neural network (RNN) using arrays of ReRAM devices as synaptic elements, that achieves plasticity as well as state-of-the-art accuracy.

To test our proposed architecture for reinforcement learning tasks, we studied the autonomous exploration of continually evolving environments including a two-dimensional dynamic maze showing environmental changes over time. The maze was experimentally implemented using a microcontroller and a field programmable-gate-array (FPGA), which ran the main program, enabled learning rules and kept track of the position of the agent. Weebit’s ReRAM devices were used to store information and adjust the strength of connections between neurons, and also to map the internal state of each neuron.

Above: a Scanning Electron Microscope image of the SiOx RRAM devices and
sample photo of the packaged RRAM arrays used in this work

Our experiments followed the same procedure used in the case of the Morris Water Maze in biology: the agent has a limited time to explore the environment under successive trials, and once a trial starts, the sequence of firing neurons maps the movement of the agent.

Above: Representation of high-level reinforcement learning for autonomous
navigation considering eight main directions of movement

The maze exploration is configured as successive random walks which progressively develop a model of the environment. Here is how it generally progressed:

At the beginning, the network cannot find the solution and spends the maximum amount of time available in the maze.
As the network progressively maps the configuration of its environment, it becomes a master of the problem trial after trial, and it finally finds the optimum path towards the objective.
Once the solution is found, the network decreases the computing time with each successive attempt at solving the same maze configuration, because it remembers the solution.
Next, the maze changes shape and a different escape path must be found. As it attempts to find the solution, the network receives a penalty in unexpected positions. After an exploration period, it successfully gets to the target again.
Finally, the system comes back to the original configuration and the network easily retrieves the first solution – faster than before. This is thanks to the residual memory of the internal states and to the intrinsic recurrent structure.

Above: (left) the system re-learns quickly when presented with “maze 1” the second time; (right) ReRAM resistance can be easily modulated by using different programming currents, enabling some memory of the original maze configuration due to gradual adaptation of the internal voltage of the neurons

You can see a short video here showing the experimental setup and the hardware demonstration of the exploration of the dynamic environment via reinforcement learning.

In our paper, we go into much more detail on the experiments, including testing the hardware for complex cases such as the Mars rover navigation to investigate the scalability and reconfigurability properties of the system.

Saving space with fewer neurons

One of the key features that makes our implementation so effective is that it uses an optimized design based on only eight CMOS neurons, representing the eight possible directions of movement inside the maze. CMOS neurons are generally integrated in the front-end of line (FEOL) and require a large amount of circuitry, so that an increase in the number of neurons is associated to an increase in area/cost.

In our system, the ReRAM, acting as the threshold modulator, is the only thing that changes for each explored position in the maze, while the remaining hardware of the neurons remains the same. For this reason, the size of the network can be increased with very small costs in terms of circuit area by increasing the amount of ReRAM – which is dense and easily integrated in the back-end-of-line (BEOL).

Our bio-inspired approach shows far better management of computing resources compared to standard solutions. In fact, to carry out an exploration at a certain average accuracy (99%), our solution turns out to be 10 times less expensive, as it requires 10 times less synaptic elements (the number of computing elements is directly proportional to the area/power consumption).

Above: Thanks to the reinforcement learning, the energy consumed by
each neuron drastically decreases as more and more trials are allowed

Key Takeaways

Deep learning techniques using standard Von Neumann processors can enable accurate autonomous navigation but require a great deal of power and a long time to make training algorithms effective. This is because the environmental information is often sparse, noisy and delayed, while training procedures are supervised and require direct association between inputs and targets during the backpropagation. This means that complex models of convolutional neural networks are needed to numerically find the best combination of parameters for the deep reinforcement computation.

Our proposed solution overcomes the standard approaches used for autonomous navigation using ReRAM based synapses and algorithms inspired by the human brain. The framework highlights the benefits of the ReRAM-based in-situ computation including high efficiency, resilience, low power consumption and accuracy.

Since biological organisms draw their capability from the inherent parallelism, stochasticity, and resilience of neuronal and synaptic computation, introducing bio-inspired dynamics into neural networks would improve robustness and reliability of artificial intelligent systems.

Read the entire paper here: A self-adaptive hardware with resistive switching synapses for experience-based neurocomputing.

The post AI Reinforcement Learning
with Weebit ReRAM appeared first on Weebit.

In-Memory Computing for AI Similarity Search using Weebit ReRAM

Amir Regev — Thu, 22 Dec 2022 08:37:09 +0000

We recently collaborated with our friends at IIT-Delhi, led by Prof. Manan Suri, on a research project demonstrating an efficient ReRAM based in-memory computing (IMC) capability for a similarity search application. The demonstration was done on 28nm ReRAM technology developed by Weebit in collaboration with CEA-Leti. A paper based on this work, “Fully-Binarized, Parallel, RRAM-based Computing Primitive for In-Memory Similarity Search,” was published in IEEE Transactions on Circuits and Systems II: Express Briefs.

A bit of background: CAMs in AI/ML search applications

Associative memories, also called Content Addressable Memories (CAMs), are an important component of intelligent systems. CAMs perform fast search operations by accepting a query and performing a search over multiple data points stored in memory to find one or more matches based on a distance metric, and then return the locations of the matches. This information can be potentially used for applications such as nearest neighbor searches for classification or unsupervised labeling. Ternary Content-Addressable Memory (TCAM) is a type of CAM that incorporates a “don’t care condition” to assist searches for partial matches and is therefore the most commonly used type of CAM.

TCAMs offer a powerful in-memory computing paradigm for efficient parallel-search and pattern-matching applications. With the emergence of big data and AI/ML, TCAMs have become a promising candidate for a variety of edge and enterprise data-intensive applications. In the research project, we proposed a scheme that demonstrates the use of TCAMs for performing hyperspectral imagery (HSI) pixel matching in the context of remote-sensing applications. TCAMs can also be used to enable applications such as biometrics (facial/iris/fingerprint recognition) and to assist in string matching for large scale database searches.

Traditionally, CAMs/TCAMs are designed using standard memory technologies such as SRAM or DRAM. However, these volatile memory-based circuits have performance limitations in terms of search energy/bit (a metric commonly used for evaluating the performance of CAM circuits), and CAMs based on SRAMs are limited in scale due to relatively large cell areas.

ReRAM can overcome performance limitations

CAM performance limitations can be addressed by using an emerging NVM (Non-Volatile Memory) technology like ReRAM instead of volatile memory technologies. Because ReRAM can help reduce power consumption and cell size, it can be used to build compact and efficient TCAMs. Such NVM devices also reduce circuit complexity and provide opportunity to exploit low-area analog in-memory computing), leading to increased design flexibility.

In the recent paper, the joint IIT-Delhi/Weebit team presented a hardware realization for CAM using Weebit ReRAM arrays. In particular, the researchers proposed an end-to-end engine to realize IMSS (In-Memory Similarity Search) in hardware by using ReRAM devices and binarizing data and queries through a custom pre-processing pipeline. The learning capability of the proposed ReRAM based in-memory computing engine was demonstrated on a hyperspectral imagery pixel classification task using the Salinas dataset, demonstrating an accuracy of 91%.

Above: Figure showing energy efficient classification of agricultural land from hyperspectral imagery using proposed In-Memory Computing Technique.

The team experimentally validated the system on fabricated ReRAM arrays, with full-system validation performed through SPICE simulations using an open source SkyWater 130nm CMOS physical design kit (PDK). We were able to significantly reduce the computations required and improve the speed of computations, leading to benefits in terms of both energy and latency. By projecting estimations to advanced nodes (28nm), we demonstrated energy savings of ~1.5x for a fixed workload compared to the current state-of-the-art technology.

You can access the full paper here.

The post In-Memory Computing for AI
Similarity Search using Weebit ReRAM appeared first on Weebit.