lib Performance Using Direct Virtual Hardware
lib 0sim: Preparing System Software for a World with Terabyte-scale Memories
lib A Compiler Infrastructure for Accelerator Generators
lib A Coordinated Tiling and Batching Framework for Efficient GEMM on GPUs
lib A Hypervisor for Shared-Memory FPGA Platforms
lib A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications
lib A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs
lib A Round-Efficient Distributed Betweenness Centrality Algorithm
lib A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms
lib Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale
lib Adaptive Sparse Matrix-Matrix Multiplication on the GPU
lib AGGREGATHOR: Byzantine Machine Learning via Robust Gradient Aggregation
lib An Effective Fusion and Tile Size Model for Optimizing Image Processing Pipelines
lib Analytical characterization and design space exploration for optimization of CNNs
lib Atomicity Checking in Linear Time using Vector Clocks
lib AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming
lib AvA: Accelerated Virtualization of Accelerators
lib Benchmarking, analysis, and optimization of serverless function snapshots
lib Beyond Data and Model Parallelism for Deep Neural Networks
lib BPPSA: Scaling Back-propagation by Parallel Scan Algorithm
lib Bridging the Gap between Deep Learning and Sparse Matrix Format Selection
lib BYOC: A "Bring Your Own Core" Framework for Heterogeneous-ISA Research
lib C11Tester: A Fuzzer for C/C++ Atomics
lib Cache-Tries: Concurrent Lock-Free Hash Tries with Constant-Time Operations
lib Challenging Sequential Bitstream Processing via Principled Bitwise Speculation
lib Checking Linearizability Using Hitting Families
lib Chronos: Efficient Speculative Parallelism for Accelerators
lib Clobber-NVM: Log Less, Re-execute More
lib COIN Attacks: On Insecurity of Enclave Untrusted Interfaces in SGX
lib Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques
lib Communication-avoiding parallel minimum cuts and connected components
lib Compiler-Driven FPGA Virtualization with SYNERGY
lib Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances
lib Corrected trees for reliable group communication
lib Corundum: Statically-Enforced Persistent Memory Safety
lib Cross-Failure Bug Detection in Persistent Memory Programs
lib CubicleOS: A Library OS with Software Componentisation for Practical Isolation
lib CUDAAdvisor: LLVM-based runtime profiling for modern GPUs
lib CutQC: Using Small Quantum Computers for Large Quantum Circuit Evaluations
lib CVR: efficient vectorization of SpMV on x86 processors
lib DeLICM: scalar dependence removal at zero memory cost
lib Effective Concurrency Testing for Distributed Systems
lib Effective simulation and debugging for a high-level hardware language using software compilers
lib Efficient parallel determinacy race detection for two-dimensional dags
lib Efficient Race Detection with Futures
lib Egalito: Layout-Agnostic Binary Recompilation
lib Extreme scale multi-physics simulations of the tsunamigenic 2004 sumatra megathrust earthquake
lib FaasCache: Keeping Serverless Computing Alive With Greedy-Dual Caching
lib Fast Local Page-Tables for Virtualized NUMA Servers with vMitosis
lib Fast, Flexible and Comprehensive Bug Detection for Persistent Memory Programs
lib Featherlight On-the-fly False-sharing Detection
lib Fine-Grained GPU Sharing Primitives for Deep Learning Applications
lib FirePerf: FPGA-Accelerated Full-System Hardware/Software Performance Profiling and Co-Design
lib FlexAmata: A Universal and Efficient Adaption of Applications to Spatial Automata Processing Accelerators
lib Forget Failure: Exploiting SRAM Data Remanence for Low-overhead Intermittent Computation
lib Game of Threads: Enabling Asynchronous Poisoning Attacks
lib GEMMbench: a framework for reproducible and collaborative benchmarking of matrix multiplication
lib Harnessing Epoch-based Reclamation for Efficient Range Queries
lib Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol
lib HerQules: Securing Programs via Hardware-Enforced Message Queues
lib High performance stencil code generation with Lift
lib HIPPOCRATES: Healing Persistent Memory Bugs Without Doing Any Harm
lib HMC: Model Checking for Hardware Memory Models
lib HSM: A Hybrid Slowdown Model for Multitasking GPUs
lib In-Fat Pointer: Hardware-Assisted Tagged-Pointer Spatial Memory Safety Defense with Subobject Bound Granularity Protection
lib Incremental CFG Patching for Binary Rewriting
lib Incremental Flattening for Nested Data Parallelism
lib Integrating a large-scale testing campaign in the CK framework
lib Integrating algorithmic parameters into benchmarking and design space exploration in dense 3D scene understanding
lib Interval-Based Memory Reclamation
lib Jaaru: Efficiently Model Checking Persistent Memory Programs
lib Jamais Vu: Thwarting Microarchitectural Replay Attacks
lib Judging a Type by its Pointer: Optimizing Virtual Function Calls on GPUs
lib Juggler: A Dependency-Aware Task Based Execution Framework for GPUs
lib Language-Parametric Compiler Validation with Application to LLVM
lib Leveraging Hardware TM in Haskell
lib Leveraging the VTA-TVM Hardware-Software Stack for FPGA Acceleration of 8-bit ResNet-18 Inference
lib LifeStream: A High-performance Stream Processing Engine for Periodic Streams
lib Lift: A Functional Data-Parallel IR for High-Performance GPU Code Generation
lib Lightweight detection of cache conflicts
lib Lightweight Hardware Transactional Memory Profiling
lib Making Pull-Based Graph Processing Performant
lib May-happen-in-parallel analysis with static vector clocks
lib MILEPOST GCC: machine learning based research compiler
lib Milepost GCC: Machine Learning Enabled Self-tuning Compiler
lib Mitosis: Transparently Self-Replicating Page-Tables for Large-Memory Machines
lib MOD: Minimally Ordered Durable Datastructures for Persistent Memory
lib Multi-objective autotuning of MobileNets across the full software/hardware stack
lib nAdroid: statically detecting ordering violations in Android applications
lib Nightcore: Efficient and Scalable Serverless Computing for Latency-Sensitive, Interactive Microservices
lib Noise-Aware Dynamical System Compilation for Analog Devices with Legno
lib Noisy Variational Quantum Algorithm Simulation via Knowledge Compilation for Repeated Inference
lib Occlum: Secure and Efficient Multitasking Inside a Single Enclave of Intel SGX
lib Optimal DNN primitive selection with partitioned boolean quadratic programming
lib Optimistic Loop Optimization
lib Optimizing Deep Learning Workloads on ARM GPU with TVM
lib Optimizing DNN Computation with Relaxed Graph Substitutions
lib Optimizing N-dimensional, winograd-based convolution for manycore CPUs
lib Optimizing Word2Vec Performance on Multicore Systems
lib Orbital Edge Computing: Nanosatellite Constellations as a New Class of Computer System
lib PacketMill: Toward per-core 100-Gbps Networking
lib PAM: Parallel Augmented Maps
lib Peacenik: Architecture Support for Not Failing under Fail-Stop Memory Consistency
lib Perspective: A Sensible Approach to Speculative Automatic Parallelization
lib PIBE: Practical Kernel Control-flow Hardening with Profile-guided Indirect Branch Elimination
lib PMFuzz: Test Case Generation for Persistent Memory Programs
lib Poker: permutation-based SIMD execution of intensive tree search by path encoding
lib Privacy-Preserving Bandits
lib Proactive Work Stealing for Futures
lib Pronto: Easy and Fast Persistence for Volatile Data Structures
lib PTEMagnet: Fine-grained Physical Memory Reservation for Faster Page Walks in Public Clouds
lib QRAFT: Reverse Your Quantum Circuit and Know the Correct Program Output
lib QTLS: high-performance TLS asynchronous offload framework with IntelĀ® QuickAssist technology
lib Quantifying the Design-Space Tradeoffs in Autonomous Drones
lib Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation
lib Real-Time Image Recognition Using Collaborative IoT Devices
lib RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference
lib Register Optimizations for Stencils on GPUs
lib Reliable Timekeeping for Intermittent Computing
lib Rethinking Software Runtimes for Disaggregated Memory
lib SAC: A Co-Design Cache Algorithm for Emerging SMR-based High-Density Disks
lib Safecracker: Leaking Secrets through Compressed Caches
lib Scalable FSM Parallelization via Path Fusion and Higher-Order Speculation
lib Semantics-Aware Scheduling Policies for Synchronization Determinism
lib SEP-Graph: Finding Shortest Execution Paths for Graph Processing under a Hybrid Framework on GPU
lib SGXElide: enabling enclave code secrecy via self-modification
lib SherLock: Unsupervised Synchronization-Operation Inference
lib SIMD intrinsics on managed language runtimes
lib Sinan: ML-Based & QoS-Aware Resource Management for Cloud Microservices
lib Software Mitigation of Crosstalk on Noisy Intermediate-Scale Quantum Computers
lib Software Prefetching for Indirect Memory Accesses
lib Speculative Interference Attacks: Breaking Invisible Speculation Schemes
lib Streamline: A Fast, Flushless Cache Covert-Channel Attack by Enabling Asynchronous Collusion
lib Stretching the capacity of Hardware Transactional Memory in IBM POWER architectures
lib Switches for HIRE: Resource Scheduling for Data Center In-Network Computing
lib swSpTRSV: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures
lib Synthesizing an instruction selection rule library from semantic specifications
lib The Guardian Council: Parallel Programmable Hardware Security
lib Time-sensitive Intermittent Computing Meets Legacy Software
lib Understanding the Downstream Instability of Word Embeddings
lib Vectorization for Digital Signal Processors via Equality Saturation
lib VEGEN: A Vectorizer Generator for SIMD and Beyond
lib VerifiedFT: a verified, high-performance precise dynamic race detector
lib VSync: Push-Button Verification and Optimization for Synchronization Primitives on Weak Memory Models
lib Who's Debugging the Debuggers? Exposing Debug Information Bugs in Optimized Binaries
lib Why GPUs are Slow at Executing NFAs and How to Make them Faster