Interest in processing-in-memory technology continues to grow as evidenced by an Israeli start up emerging from stealth mode to unveil a PIM-based data analytics architecture.
Tel Aviv-based NeuroBlade has begun shipping its data accelerator designed to reduce data movement and resulting bottlenecks through integration of the processing function inside memory, according to CEO Elad Sity.
Like many accelerators, NeuroBlade’s has a specific purpose: accelerating data analytics. Other accelerators are focused on improving storage or artificial intelligence workloads.
Founded in 2017, the company has grown to more than 100 employees and recently secured $83 million in venture funding, bringing total invested capital to $110 million. The investment was led by Corner Ventures, with contributions from Intel Capital, among others.
In an interview, Sity said NeuroBlade’s customers and partners are now integrating its data analytics accelerator into their systems.
Getting data to where its needed is critical as volumes grow exponentially and AI workloads become increasingly diverse. Current architectures can’t scale to meet future data analytics needs due to constant shuffling of data between storage, memory and central processing, said Sity. The result is poor application performance and slow response times.
NeuroBlade used PIM to develop a new architectural building block specifically aimed at accelerating workloads that help speed decision-making algorithms.
NeuroBlade initially focused on how and why CPUs and GPUs were unable to cope with data-intensive workloads, despite the addition of more memory with the CPU along with new cache hierarchies.
Sity said PIM makes advances data analytics by reducing data movement, whether in an AI workload or general-purpose computing. Selecting the appropriate logic and the specific operations handled in-memory are critical steps when building computational memory. “That’s dictated by the use case,” Sity said, and each use case requires unique software.
Analytics software from vendors such as SAP can leverage NeuroBlade’s approach, said Sity. Still, potential users large and small want to avoid the hassle of programming memory. Hence, NeuroBlade focused on narrowing use cases for its accelerator, instead developing a platform that could be easily integrated. The logical choice was data analytics with its huge databases operating across enterprise data centers.
Easier integration was achieved by leveraging PCI Express (PCIe) to connect and accelerate CPUs used in data-intensive applications. While high-bandwidth memory is synonymous with high performance computing, NeuroBlade’s XRAM computational memory is DRAM-based, with embedded processing logic and other processing elements integrated near the memory banks to provide massively-parallel execution at high bandwidth.
Sity said NeuroBlade had considered other memories such as 3D Xpoint, but concluded they weren’t ready. And while flash is dense, it’s not fast enough. “DRAM was definitely the obvious choice,” he said.
Billing itself as a data analytics rather than a memory vendor, NeuroBlade relies on reliable PCIe and DRAM, thereby easing of adoption and integration of its technology. Adoption is further aided by offering the accelerator as a data appliance that comes with a software development kit (SDK). “We don’t aim to sell you the XRAM,” Sity said. “The aim is not for someone to just buy extra memories and connectivity to the CPU.”
The PIM approach isn’t novel; the barrier to adoption has been complexity. Samsung recently ramped its efforts to enable PIM adoption with its High Bandwidth Memory PIM by designing the processing and memory architecture to existing industry standards. That approach enabled drop-in replacement for commodity DRAM. Samsung also provides an SDK.
HBM-PIM differs from a traditional von Neumann architecture by bringing processing power directly to where the data is stored, placing a DRAM-optimized AI engine inside each memory bank. The storage sub-unit enables parallel processing while minimizing data movement.
Micron Technology’s Automata processor, announced back in 2013 but no longer in development, exploited the inherent bit parallelism of traditional SDRAM. The approach was touted as a fundamentally new processor architecture aimed at accelerating search and analysis of complex and unstructured data streams. Automata’s design was based on an adaptation of a memory-array architecture, consisting of tens of thousands to millions of interconnected processing elements.
NeuroBlade is positioning its approach as complementing general-purpose and GPU-enabled platforms from companies such as Nvidia along with AI start-ups addressing computing-intensive workloads.