OSDI '21 Technical Sessions | USENIX It then feeds those invariants and the desired safety properties to an SMT solver to check if the conjunction of the invariants and the safety properties is inductive. OSDI'21 accepted 31 papers and 26 papers participated in the AE, a significant increase in the participate ratio: 84%, compared to OSDI'20 (70%) and SOSP'19 (61%). She has a PhD in computer science from MIT. Penglai also reduces the latency of secure memory initialization by three orders of magnitude and gains 3.6x speedup for real-world applications (e.g., MapReduce). Mothy received a PhD in 1995 from the Computer Laboratory of the University of Cambridge, where he was a principal designer and builder of the Nemesis OS. Based on the observation that invariants are often concise in practice, DistAI starts with small invariant formulas and enumerates all strongest possible invariants that hold for all samples. Papers must be in PDF format and must be submitted via the submission form. Authors may upload supplementary material in files separate from their submissions. Commonly used log archival and compression tools like Gzip provide high compression ratio, yet searching archived logs is a slow and painful process as it first requires decompressing the logs. Password We also propose two file system techniques for ZNS+-aware LFS. We compare Marius against two state-of-the-art industrial systems on a diverse array of benchmarks. Proceedings Front Matter Petuum Awarded OSDI 2021 Best Paper for Goodput-Optimized Deep Learning Research Petuum CASL research and engineering team's Pollux technical paper on adaptive scheduling for optimized. If you submit a paper to either of those venues, you may not also submit it to OSDI 21. He joined Intel Research at Berkeley in April 2002 as a principal architect of PlanetLab, an open, shared platform for developing and deploying planetary-scale services. Her specialties include network routing protocols and network security. NrOS is primarily constructed as a simple, sequential kernel with no concurrency, making it easier to develop and reason about its correctness. Forgot your password? See the USENIX Conference Submissions Policy for details. His work has included the Barrelfish multikernel research OS, as well as work on distributed stream processors, and using formal specifications to describe the hardware/software interfaces of modern computer systems. Calibrated interrupts increase throughput by up to 35%, reduce CPU consumption by as much as 30%, and achieve up to 37% lower latency when interrupts are coalesced. Submissions violating the detailed formatting and anonymization rules will not be considered for review. Most existing schedulers expect users to specify the number of resources for each job, often leading to inefficient resource use. Mothy joined the Computer Science Department ETH Zurich in January 2007 and was named Fellow of the ACM in 2013 for contributions to operating systems and networking research. Instead, we propose addressing the root cause of the heuristics problem by allowing software to explicitly specify to the device if submitted requests are latency-sensitive. By monitoring the status of each job during training, Pollux models how their goodput (a novel metric we introduce that combines system throughput with statistical efficiency) would change by adding or removing resources. A significant obstacle to using SC for practical applications is the memory overhead of the underlying cryptography. Session Chairs: Deniz Altinbken, Google, and Rashmi Vinayak, Carnegie Mellon University, Tanvir Ahmed Khan and Ian Neal, University of Michigan; Gilles Pokam, Intel Corporation; Barzan Mozafari and Baris Kasikci, University of Michigan. We present Nap, a black-box approach that converts concurrent persistent memory (PM) indexes into NUMA-aware counterparts. For instance, FAST 21 and NSDI 21 have author-notification dates after the OSDI 21 abstract-registration deadline. Prior or concurrent publication in non-peer-reviewed contexts, like arXiv.org, technical reports, talks, and social media posts, is permitted. The novel aspect of the nanoPU is the design of a fast path between the network and applications---bypassing the cache and memory hierarchy, and placing arriving messages directly into the CPU register file. This yielded 6% fewer TLB miss stalls, and 26% reduction in memory wasted due to fragmentation. Despite having the same end goals as traditional ML, FL executions differ significantly in scale, spanning thousands to millions of participating devices. Consensus bugs are extremely rare but can be exploited for network split and theft, which cause reliability and security-critical issues in the Ethereum ecosystem. Based on the observation that real-world workloads always feature skewed access patterns, Nap introduces a NUMA-aware layer (NAL) on the top of existing concurrent PM indexes, and steers accesses to hot items to this layer. Furthermore, by combining SanRazor with an existing sanitizer reduction tool ASAP, we show synergistic effect by reducing the runtime cost to only 7.0% with a reasonable tradeoff of security. We have implemented a prototype of our design based on Penglai, an open-sourced enclave system for RISC-V. Tao Luo, Mingen Pan, Pierre Tholoniat, Asaf Cidon, and Roxana Geambasu, Columbia University; Mathias Lcuyer, Microsoft Research. Authors are also encouraged to contact the program co-chairs, osdi21chairs@usenix.org, if needed to relate their OSDI submissions to relevant submissions of their own that are simultaneously under review or awaiting publication at other venues. Third, GNNAdvisor capitalizes on the GPU memory hierarchy for acceleration by gracefully coordinating the execution of GNNs according to the characteristics of the GPU memory structure and GNN workloads.
USENIX ATC '21 - HotCRP.com USENIX Security '21 has three submission deadlines. However, existing enclave designs fail to meet the requirements of scalability demanded by new scenarios like serverless computing, mainly due to the limitations in their secure memory protection mechanisms, including static allocation, restricted capacity and high-cost initialization. Using this property, MAGE calculates the memory access pattern ahead of time and uses it to produce a memory management plan. Authors are required to register abstracts by 3:00 p.m. PST on December 3, 2020, and to submit full papers by 3:00 p.m. PST on December 10, 2020. We build Polyjuice based on our learning framework and evaluate it against several existing algorithms. She developed the technology for making network routing self-stabilizing, largely self-managing, and scalable. The file system performance of the proposed ZNS+ storage system was 1.33--2.91 times better than that of the normal ZNS-based storage system.
Oort Accepted to Appear at OSDI'2021 | Mosharaf Chowdhury HotCRP.com signin Sign in using your HotCRP.com account. After three years working on web-based collaboration systems at a startup in North Carolina, he joined Sprint's Advanced Technology Lab in Burlingame, California, in 1998, working on cloud computing and network monitoring. Our evaluation on the SPEC benchmarks shows that SanRazor can reduce the overhead of sanitizers significantly, from 73.8% to 28.062.0% for AddressSanitizer, and from 160.1% to 36.6124.4% for UndefinedBehaviorSanitizer (depending on the applied reduction scheme). Ankit Bhardwaj and Chinmay Kulkarni, University of Utah; Reto Achermann, University of British Columbia; Irina Calciu, VMware Research; Sanidhya Kashyap, EPFL; Ryan Stutsman, University of Utah; Amy Tai and Gerd Zellweger, VMware Research. USENIX discourages program co-chairs from submitting papers to the conferences they organize, although they are allowed to do so. We also verified a simple NFS server using GoJournals specs, which confirms that they are helpful for application verification: a significant part of the proof doesnt have to consider concurrency and crashes. Authors may use this for content that may be of interest to some readers but is peripheral to the main technical contributions of the paper. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. For conference information, . Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level. We develop rigorous theoretical foundations to simplify equivalence examination and correction for partially equivalent transformations, and design an efficient search algorithm to quickly discover highly optimized programs by combining fully and partially equivalent optimizations at the tensor, operator, and graph levels. The papers will be available online to everyone beginning on the first day of the conference, July 14, 2021. Sat, Aug 7, 2021 3 min read researches review. We develop MAGE, an execution engine for SC that efficiently runs SC computations that do not fit in memory. We prove that DistAI is guaranteed to find the -free inductive invariant that proves the desired safety properties in finite time, if one exists. This approach misses possible optimization opportunities as transformations that only preserve equivalence on subsets of the output tensors are excluded. Professor Veloso is the Past President of AAAI (the Association for the Advancement of Artificial Intelligence), and the co-founder, Trustee, and Past President of RoboCup. Academic and industrial participants present research and experience papers that cover the full range of theory and practice of computer . The abstractions we design for the privacy resource mirror those defined by Kubernetes for traditional resources, but there are also major differences. We propose Marius, a system for efficient training of graph embeddings that leverages partition caching and buffer-aware data orderings to minimize disk access and interleaves data movement with computation to maximize utilization.
. Computation separation makes it possible to construct a deep, bounded-asynchronous pipeline where graph and tensor parallel tasks can fully overlap, effectively hiding the network latency incurred by Lambdas. We implement and evaluate a suite of applications, including MICA, Raft and Set Algebra for document retrieval; and we demonstrate that the nanoPU can be used as a high performance, programmable alternative for one-sided RDMA operations. Swapnil Gandhi and Anand Padmanabha Iyer, Microsoft Research. In this paper, we propose Oort to improve the performance of federated training and testing with guided participant selection. We have made Fluffy publicly available at https://github.com/snuspl/fluffy to contribute to the security of Ethereum. Our approach effectively eliminates high communication and partitioning overheads, and couples it with a new pipelined push-pull parallelism based execution strategy for fast model training. Kernel code requires manual memory management and type-unsafe code and must efficiently handle complex, asynchronous events. We present selective profiling, a technique that locates data locality problems with low-enough overhead that is suitable for production use. The co-chairs may then share that paper with the workshops organizers and discuss it with them.
USENIX Security '21 Summer Accepted Papers | USENIX We focus on NVMe storage devices and show that it is natural to express these semantics in the kernel and the application and only requires a modest two-bit change to the device interface. Dorylus is up to 3.8 faster and 10.7 cheaper compared to existing sampling-based systems. Important Dates Abstract registrations due: Thursday, December 3, 2020, 3:00 pm PST Complete paper submissions due: Thursday, December 10, 2020, 3:00pm PST Author Response Period NrOS replicates kernel state on each NUMA node and uses operation logs to maintain strong consistency between replicas. Our approach outperforms existing file systems on a block SSD by a wide margin 6.2 on average for metadata-intensive benchmarks. For example, traditional compute resources are replenishable while privacy is not: a CPU can be regained after a model finishes execution while privacy budget cannot.
News Baris Kasikci's Home Page - Electrical Engineering and Computer USENIX NSDI, 2021 Acceptance Rate: 15.99% Fluid: Resource-Aware Hyperparameter Tuning Engine P. Yu*, J. Liu*, M. Chowdhury (*Equal contribution) MLSys, 2021 Acceptance Rate: 23.53% NetLock: Fast, Centralized Lock Management Using Programmable Switches Z. Yu, Y. Zhang, V. Braverman, M. Chowdhury, X. Jin ACM SIGCOMM, 2020 Acceptance Rate: 21.6% 64 papers accepted out of 341 submitted. We present case studies and end-to-end applications that show how Storm lets developers specify diverse policies while centralizing the trusted code to under 1% of the application, and statically enforces security with modest type annotation overhead, and no run-time cost. Please identify yourself as a presenter and include your mailing address in your email. Metadata from voice calls, such as the knowledge of who is communicating with whom, contains rich information about peoples lives. Responses should be limited to clarifying the submitted work. These are hard deadlines, and no extensions will be given. Last year, 70% of accepted OSDI papers participated in the . Fluffy found two new consensus bugs in the most popular Geth Ethereum client which were exploitable on the live Ethereum mainnet. The 20th ACM Workshop on Hot Topics in Networks (HotNets 2021) will bring together researchers in computer networks and systems to engage in a lively debate on the theory and practice of computer networking. Thanks to selective profiling, DMons profiling overhead is 1.36% on average, making it feasible for production use. How can we design systems that will be reliable despite misbehaving participants? They collectively make the backup fresh, columnar, and fault-tolerant, even facing millions of concurrent transactions per second. When further combined with a simple caching strategy, our evaluation shows that P3 is able to outperform existing state-of-the-art distributed GNN frameworks by up to 7. In this paper, we present P3, a system that focuses on scaling GNN model training to large real-world graphs in a distributed setting. Moreover, to handle dynamic workloads, Nap adopts a fast NAL switch mechanism. We built a functional NFSv3 server, called GoNFS, to use GoJournal. Based on this observation, P3 proposes a new approach for distributed GNN training. Timothy Roscoe is a Full Professor in the Systems Group of the Computer Science Department at ETH Zurich, where he works on operating systems, networks, and distributed systems, and is currently head of department. For any further information, please contact the PC chairs: pc-chairs-2022@eurosys.org. Conference site 49 papers accepted out of 251 submitted. Concretely, Dorylus is 1.22 faster and 4.83 cheaper than GPU servers for massive sparse graphs. She has been recognized with many industry honors including induction into the National Academy of Engineering, the Inventor Hall of Fame, The Internet Hall of Fame, Washington State Academy of Science, and lifetime achievement awards from USENIX and SIGCOMM. Upon these two primitives, our system can scale to thousands of concurrent enclaves with high resource utilization and eliminate the high-cost initialization of secure memory using fork-style enclave creation without weakening the security guarantees. Professor Veloso has been recognized with a multiple honors, including being a Fellow of the ACM, IEEE, AAAS, and AAAI. However, memory allocation decisions also impact overall application performance via data placement, offering opportunities to improve fleetwide productivity by completing more units of application work using fewer hardware resources. Only two types of supplementary material are permitted: source code described in the paper and formal proofs sketched in the paper.
SOSP Conference - Home - ACM Digital Library Kyuhwa Han, Sungkyunkwan University and Samsung Electronics; Hyunho Gwak and Dongkun Shin, Sungkyunkwan University; Jooyoung Hwang, Samsung Electronics. Submitted papers must be no longer than 12 single-spaced 8.5 x 11 pages, including figures and tables, plus as many pages as needed for references, using 10-point type on 12-point (single-spaced) leading, two-column format, Times Roman or a similar font, within a text block 7 wide x 9 deep.